Shard splitting for immediate performance boost?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Shard splitting for immediate performance boost?

Robert Brown
Hi,

I have an index of 60m docs split across 2 shards (each with a replica).

When load testing queries (picking random keywords I know exist), and
randomly requesting facets too, 95% of my responses are under 0.5s.

However, during some random manual tests, sometimes I see searches
taking between 1-2 seconds.

Should I expect a simple shard split to assist with the speed
immediately?  Even with the 2 new shards still being on the original
servers?

Will move them to their own dedicated hosts, but just want to understand
what I should expect during the process.

Thanks,
Rob

Reply | Threaded
Open this post in threaded view
|

Re: Shard splitting for immediate performance boost?

Shawn Heisey-2
On 3/19/2016 11:12 AM, Robert Brown wrote:

> I have an index of 60m docs split across 2 shards (each with a replica).
>
> When load testing queries (picking random keywords I know exist), and
> randomly requesting facets too, 95% of my responses are under 0.5s.
>
> However, during some random manual tests, sometimes I see searches
> taking between 1-2 seconds.
>
> Should I expect a simple shard split to assist with the speed
> immediately?  Even with the 2 new shards still being on the original
> servers?
>
> Will move them to their own dedicated hosts, but just want to
> understand what I should expect during the process.

Maybe.  It depends on why the responses are slow in the first place.

If your queries are completely CPU-bound, then splitting into more
shards and either putting those shards on additional machines or taking
advantage of idle CPUs will make performance better.  Note that if your
query rate is extremely high, you should only have one shard replica on
each server -- all your CPU power will be needed for handling query
volume, so none of your CPUs will be idle.

Most of the time, Solr installations are actually I/O bound, because
there's not enough unused RAM to effectively cache the index.  If this
is what's happening and you don't add memory (which you can do by adding
machines and adding/removing replicas to move them), then you'll make
performance worse by splitting into more shards.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Shard splitting for immediate performance boost?

Erick Erickson
Be _very_ cautious when you're looking at these timings. Random
spikes are often due to opening a new searcher (assuming
you're indexing as you query) and are eminently tunable by
autowarming. Obviously you can't fire the same query again and again,
but if you collect a set of "bad" queries and, say, measure them after
Solr has been running for a while (without any indexing going on) and
the times are radically better, then autowarming is where I'd look first.

Second, what are you measuring? Time until the client displays the
results? There's a bunch of other things going on there, it might be
network issues. The QTime is a rough measure of how long Solr is
taking, although it doesn't include the time spent assembling the return
packet.

Third, "randomly requesting facets" is something of a red flag. Make
really sure the facets are realistic. Fields with high cardinality make
solr work harder. For instance, let's say you have a date field with
millisecond resolution. I'd bet that faceting on that field is not something
you'll ever support. NOTE: I'm talking about just setting
facet.field=datefield
here. A range facet on the field is totally reasonable. Really, I'm saying
to insure that your queries are realistic before jumping into sharding.

Fourth, garbage collection (especially "stop the world" GCs) won't be helped
by just splitting into shards.

And the list goes on and on. Really what both Shawn and I are saying is
that you really need to identify _what's_ slowing you down before trying
a solution like sharding. And you need to be able to quantify that rather
than
"well, sometimes when I put stuff in it seems slow" or you'll spend a large
amount of time chasing the wrong thing (at least I have).

30M docs per shard is well within a reasonable range, although the
complexity of your docs may push that number up or down. You haven't told
us much about how much memory you have on your machine, how much
RAM you're allocating to Solr and the like so it's hard to say much other
than generalities....

Best,
Erick

On Sat, Mar 19, 2016 at 10:41 AM, Shawn Heisey <[hidden email]> wrote:

> On 3/19/2016 11:12 AM, Robert Brown wrote:
> > I have an index of 60m docs split across 2 shards (each with a replica).
> >
> > When load testing queries (picking random keywords I know exist), and
> > randomly requesting facets too, 95% of my responses are under 0.5s.
> >
> > However, during some random manual tests, sometimes I see searches
> > taking between 1-2 seconds.
> >
> > Should I expect a simple shard split to assist with the speed
> > immediately?  Even with the 2 new shards still being on the original
> > servers?
> >
> > Will move them to their own dedicated hosts, but just want to
> > understand what I should expect during the process.
>
> Maybe.  It depends on why the responses are slow in the first place.
>
> If your queries are completely CPU-bound, then splitting into more
> shards and either putting those shards on additional machines or taking
> advantage of idle CPUs will make performance better.  Note that if your
> query rate is extremely high, you should only have one shard replica on
> each server -- all your CPU power will be needed for handling query
> volume, so none of your CPUs will be idle.
>
> Most of the time, Solr installations are actually I/O bound, because
> there's not enough unused RAM to effectively cache the index.  If this
> is what's happening and you don't add memory (which you can do by adding
> machines and adding/removing replicas to move them), then you'll make
> performance worse by splitting into more shards.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Shard splitting for immediate performance boost?

Robert Brown
Thanks Erick,

I have another index with the same infrastructure setup, but only 10m
docs, and never see these slow-downs, that's why my first instinct was
to look at creating more shards.

I'll definitely make a point of investigating further tho with all the
things you and Shawn mentioned, time is unfortunately against me.

Cheers,
Rob



On 19/03/16 19:11, Erick Erickson wrote:

> Be _very_ cautious when you're looking at these timings. Random
> spikes are often due to opening a new searcher (assuming
> you're indexing as you query) and are eminently tunable by
> autowarming. Obviously you can't fire the same query again and again,
> but if you collect a set of "bad" queries and, say, measure them after
> Solr has been running for a while (without any indexing going on) and
> the times are radically better, then autowarming is where I'd look first.
>
> Second, what are you measuring? Time until the client displays the
> results? There's a bunch of other things going on there, it might be
> network issues. The QTime is a rough measure of how long Solr is
> taking, although it doesn't include the time spent assembling the return
> packet.
>
> Third, "randomly requesting facets" is something of a red flag. Make
> really sure the facets are realistic. Fields with high cardinality make
> solr work harder. For instance, let's say you have a date field with
> millisecond resolution. I'd bet that faceting on that field is not something
> you'll ever support. NOTE: I'm talking about just setting
> facet.field=datefield
> here. A range facet on the field is totally reasonable. Really, I'm saying
> to insure that your queries are realistic before jumping into sharding.
>
> Fourth, garbage collection (especially "stop the world" GCs) won't be helped
> by just splitting into shards.
>
> And the list goes on and on. Really what both Shawn and I are saying is
> that you really need to identify _what's_ slowing you down before trying
> a solution like sharding. And you need to be able to quantify that rather
> than
> "well, sometimes when I put stuff in it seems slow" or you'll spend a large
> amount of time chasing the wrong thing (at least I have).
>
> 30M docs per shard is well within a reasonable range, although the
> complexity of your docs may push that number up or down. You haven't told
> us much about how much memory you have on your machine, how much
> RAM you're allocating to Solr and the like so it's hard to say much other
> than generalities....
>
> Best,
> Erick
>
> On Sat, Mar 19, 2016 at 10:41 AM, Shawn Heisey <[hidden email]> wrote:
>
>> On 3/19/2016 11:12 AM, Robert Brown wrote:
>>> I have an index of 60m docs split across 2 shards (each with a replica).
>>>
>>> When load testing queries (picking random keywords I know exist), and
>>> randomly requesting facets too, 95% of my responses are under 0.5s.
>>>
>>> However, during some random manual tests, sometimes I see searches
>>> taking between 1-2 seconds.
>>>
>>> Should I expect a simple shard split to assist with the speed
>>> immediately?  Even with the 2 new shards still being on the original
>>> servers?
>>>
>>> Will move them to their own dedicated hosts, but just want to
>>> understand what I should expect during the process.
>> Maybe.  It depends on why the responses are slow in the first place.
>>
>> If your queries are completely CPU-bound, then splitting into more
>> shards and either putting those shards on additional machines or taking
>> advantage of idle CPUs will make performance better.  Note that if your
>> query rate is extremely high, you should only have one shard replica on
>> each server -- all your CPU power will be needed for handling query
>> volume, so none of your CPUs will be idle.
>>
>> Most of the time, Solr installations are actually I/O bound, because
>> there's not enough unused RAM to effectively cache the index.  If this
>> is what's happening and you don't add memory (which you can do by adding
>> machines and adding/removing replicas to move them), then you'll make
>> performance worse by splitting into more shards.
>>
>> Thanks,
>> Shawn
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Shard splitting for immediate performance boost?

Erick Erickson
Well, I do tend to go on....

As Shawn mentioned memory is usually the most
precious resource and splitting to more shards, assuming
they're in separate JVMs and preferably on separate
machines certainly will relieve some of that pressure.

My only caution there is that splitting to more shards may
be masking some kind of underlying configuration issues
by throwing more hardware at the problem. So you _may_
be able to keep the same topology if you uncover those.

That said, depending on how much your hardware costs,
it may not be worth the engineering effort....

Best,
Erick

On Sat, Mar 19, 2016 at 12:55 PM, Robert Brown <[hidden email]> wrote:

> Thanks Erick,
>
> I have another index with the same infrastructure setup, but only 10m
> docs, and never see these slow-downs, that's why my first instinct was to
> look at creating more shards.
>
> I'll definitely make a point of investigating further tho with all the
> things you and Shawn mentioned, time is unfortunately against me.
>
> Cheers,
> Rob
>
>
>
> On 19/03/16 19:11, Erick Erickson wrote:
>
>> Be _very_ cautious when you're looking at these timings. Random
>> spikes are often due to opening a new searcher (assuming
>> you're indexing as you query) and are eminently tunable by
>> autowarming. Obviously you can't fire the same query again and again,
>> but if you collect a set of "bad" queries and, say, measure them after
>> Solr has been running for a while (without any indexing going on) and
>> the times are radically better, then autowarming is where I'd look first.
>>
>> Second, what are you measuring? Time until the client displays the
>> results? There's a bunch of other things going on there, it might be
>> network issues. The QTime is a rough measure of how long Solr is
>> taking, although it doesn't include the time spent assembling the return
>> packet.
>>
>> Third, "randomly requesting facets" is something of a red flag. Make
>> really sure the facets are realistic. Fields with high cardinality make
>> solr work harder. For instance, let's say you have a date field with
>> millisecond resolution. I'd bet that faceting on that field is not
>> something
>> you'll ever support. NOTE: I'm talking about just setting
>> facet.field=datefield
>> here. A range facet on the field is totally reasonable. Really, I'm saying
>> to insure that your queries are realistic before jumping into sharding.
>>
>> Fourth, garbage collection (especially "stop the world" GCs) won't be
>> helped
>> by just splitting into shards.
>>
>> And the list goes on and on. Really what both Shawn and I are saying is
>> that you really need to identify _what's_ slowing you down before trying
>> a solution like sharding. And you need to be able to quantify that rather
>> than
>> "well, sometimes when I put stuff in it seems slow" or you'll spend a
>> large
>> amount of time chasing the wrong thing (at least I have).
>>
>> 30M docs per shard is well within a reasonable range, although the
>> complexity of your docs may push that number up or down. You haven't told
>> us much about how much memory you have on your machine, how much
>> RAM you're allocating to Solr and the like so it's hard to say much other
>> than generalities....
>>
>> Best,
>> Erick
>>
>> On Sat, Mar 19, 2016 at 10:41 AM, Shawn Heisey <[hidden email]>
>> wrote:
>>
>> On 3/19/2016 11:12 AM, Robert Brown wrote:
>>>
>>>> I have an index of 60m docs split across 2 shards (each with a replica).
>>>>
>>>> When load testing queries (picking random keywords I know exist), and
>>>> randomly requesting facets too, 95% of my responses are under 0.5s.
>>>>
>>>> However, during some random manual tests, sometimes I see searches
>>>> taking between 1-2 seconds.
>>>>
>>>> Should I expect a simple shard split to assist with the speed
>>>> immediately?  Even with the 2 new shards still being on the original
>>>> servers?
>>>>
>>>> Will move them to their own dedicated hosts, but just want to
>>>> understand what I should expect during the process.
>>>>
>>> Maybe.  It depends on why the responses are slow in the first place.
>>>
>>> If your queries are completely CPU-bound, then splitting into more
>>> shards and either putting those shards on additional machines or taking
>>> advantage of idle CPUs will make performance better.  Note that if your
>>> query rate is extremely high, you should only have one shard replica on
>>> each server -- all your CPU power will be needed for handling query
>>> volume, so none of your CPUs will be idle.
>>>
>>> Most of the time, Solr installations are actually I/O bound, because
>>> there's not enough unused RAM to effectively cache the index.  If this
>>> is what's happening and you don't add memory (which you can do by adding
>>> machines and adding/removing replicas to move them), then you'll make
>>> performance worse by splitting into more shards.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>>
>