Solr multi core query too slow

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr multi core query too slow

Anshuman Singh
I have a Solr cloud setup (Solr 7.4) with a collection "test" having two
shards on two different nodes. There are 4M records equally distributed
across the shards.

If I query the collection like below, it is slow.
http://localhost:8983/solr/*test*/select?q=*:*&rows=100000
QTime: 6930

If I query a particular shard like below, it is also slow.
http://localhost:8983/solr/*test_shard1_replica_n2*
/select?q=*:*&rows=100000&shards=*shard2*
QTime: 5494
*Notice shard2 in shards parameter and shard1 in the core being queried.*

But this is faster:
http://localhost:8983/solr/*test_shard1_replica_n2*
/select?q=*:*&rows=100000&shards=*shard1*
QTime: 57

This is also faster:
http://localhost:8983/solr/*test_shard2_replica_n4*
/select?q=*:*&rows=100000&shards=*shard2*
QTime: 71

I don't think it is the network as I performed similar tests with a single
node setup as well. If you query a particular core and the corresponding
logical shard, it is much faster than querying a different shard or core.

Why is this behaviour? How to make the first two queries work as fast as
the last two queries?
Reply | Threaded
Open this post in threaded view
|

Re: Solr multi core query too slow

Erick Erickson
First of all, asking for that many rows will spend a lot of time
gathering the document fields. Assuming you have stored fields,
each doc requires
1> the aggregator node getting the candidate 100000 docs from each shard

2> The aggregator node sorting those 100000 docs from each shard into the true top 100000 based on the sort criteria (score by default)

3> the aggregator node going back to the shards and asking them for those docs of that 100000 that are resident on that shard

4> the aggregator node assembling the final docs to be sent to the client and sending them.

So my guess is that when you fire requests at a particular replica that has to get them from the other shard’s replica on another host, the network back-and-forth is killing your perf. It’s not that your network is having problems, just that you’re pushing a lot of data back and forth in your poorly-performing cases.

So first of all, specifying 100K rows is an anti-pattern. Outside of streaming, Solr is built on the presumption that you’re after the top few rows (< 100, say). The times vary a lot depending on whether you need to read stored fields BTW.

Second, I suspect your test is bogus. If you run the tests in the order you gave, the first one will read the necessary data from disk and probably have it in the OS disk cache for the second and subsequent. And/or you’re getting results from your queryResultCache (although you’d have to have a big one). Specifying the exact same query when trying to time things is usually a mistake.

If your use-case requires 100K rows, you should be using streaming or cursorMark. While that won’t make the end-to-end time shorter, but won’t put such a strain on the system.

Best,
Erick

> On May 27, 2020, at 10:38 AM, Anshuman Singh <[hidden email]> wrote:
>
> I have a Solr cloud setup (Solr 7.4) with a collection "test" having two
> shards on two different nodes. There are 4M records equally distributed
> across the shards.
>
> If I query the collection like below, it is slow.
> http://localhost:8983/solr/*test*/select?q=*:*&rows=100000
> QTime: 6930
>
> If I query a particular shard like below, it is also slow.
> http://localhost:8983/solr/*test_shard1_replica_n2*
> /select?q=*:*&rows=100000&shards=*shard2*
> QTime: 5494
> *Notice shard2 in shards parameter and shard1 in the core being queried.*
>
> But this is faster:
> http://localhost:8983/solr/*test_shard1_replica_n2*
> /select?q=*:*&rows=100000&shards=*shard1*
> QTime: 57
>
> This is also faster:
> http://localhost:8983/solr/*test_shard2_replica_n4*
> /select?q=*:*&rows=100000&shards=*shard2*
> QTime: 71
>
> I don't think it is the network as I performed similar tests with a single
> node setup as well. If you query a particular core and the corresponding
> logical shard, it is much faster than querying a different shard or core.
>
> Why is this behaviour? How to make the first two queries work as fast as
> the last two queries?

Reply | Threaded
Open this post in threaded view
|

Re: Solr multi core query too slow

Anshuman Singh
Thanks for your reply, Erick. You helped me in improving my understanding
of how Solr distributed requests work internally.

Actually my ultimate goal is to improve search performance in one of our
test environment where the queries are taking upto 60 seconds to execute.
*We want to fetch at least the first top 100 rows in seconds (< 5 seconds).
*

Right now, we have 7 Collections across 10 Solr nodes, each Collection
having approx 2B records equally distributed across 20 shards with rf 2.
Each replica/core is ~40GB in size . The number of users are very few
(<10). We are using HDDs and each host has 128 GB RAM. Solr JVM Heap size
is 24GB. In the actual production environment, we are planning for 100 such
machines and we will be ingesting ~2B records on daily basis. We will
retain data of upto 3 months.

I followed your suggestion of not querying more than 100 rows and this is
my observation. I ran queries with the debugQuery param and found that the
query response time depends on the worst performing shard as some of the
shards take longer to execute the query than other shards.

Here are my questions:

   1.  Is decreasing number of shards going to help us as there will be
   lesser number of shards to be queried?
   2.  Is increasing number of replicas going to help us as there will be
   load balancing?
   3.  How many records should we keep in each Collection or in each
   replica/core? Will we face performance issues if the core size becomes too
   big?

Any other suggestions are appreciated.

On Wed, May 27, 2020 at 9:23 PM Erick Erickson <[hidden email]>
wrote:

> First of all, asking for that many rows will spend a lot of time
> gathering the document fields. Assuming you have stored fields,
> each doc requires
> 1> the aggregator node getting the candidate 100000 docs from each shard
>
> 2> The aggregator node sorting those 100000 docs from each shard into the
> true top 100000 based on the sort criteria (score by default)
>
> 3> the aggregator node going back to the shards and asking them for those
> docs of that 100000 that are resident on that shard
>
> 4> the aggregator node assembling the final docs to be sent to the client
> and sending them.
>
> So my guess is that when you fire requests at a particular replica that
> has to get them from the other shard’s replica on another host, the network
> back-and-forth is killing your perf. It’s not that your network is having
> problems, just that you’re pushing a lot of data back and forth in your
> poorly-performing cases.
>
> So first of all, specifying 100K rows is an anti-pattern. Outside of
> streaming, Solr is built on the presumption that you’re after the top few
> rows (< 100, say). The times vary a lot depending on whether you need to
> read stored fields BTW.
>
> Second, I suspect your test is bogus. If you run the tests in the order
> you gave, the first one will read the necessary data from disk and probably
> have it in the OS disk cache for the second and subsequent. And/or you’re
> getting results from your queryResultCache (although you’d have to have a
> big one). Specifying the exact same query when trying to time things is
> usually a mistake.
>
> If your use-case requires 100K rows, you should be using streaming or
> cursorMark. While that won’t make the end-to-end time shorter, but won’t
> put such a strain on the system.
>
> Best,
> Erick
>
> > On May 27, 2020, at 10:38 AM, Anshuman Singh <[hidden email]>
> wrote:
> >
> > I have a Solr cloud setup (Solr 7.4) with a collection "test" having two
> > shards on two different nodes. There are 4M records equally distributed
> > across the shards.
> >
> > If I query the collection like below, it is slow.
> > http://localhost:8983/solr/*test*/select?q=*:*&rows=100000
> > QTime: 6930
> >
> > If I query a particular shard like below, it is also slow.
> > http://localhost:8983/solr/*test_shard1_replica_n2*
> > /select?q=*:*&rows=100000&shards=*shard2*
> > QTime: 5494
> > *Notice shard2 in shards parameter and shard1 in the core being queried.*
> >
> > But this is faster:
> > http://localhost:8983/solr/*test_shard1_replica_n2*
> > /select?q=*:*&rows=100000&shards=*shard1*
> > QTime: 57
> >
> > This is also faster:
> > http://localhost:8983/solr/*test_shard2_replica_n4*
> > /select?q=*:*&rows=100000&shards=*shard2*
> > QTime: 71
> >
> > I don't think it is the network as I performed similar tests with a
> single
> > node setup as well. If you query a particular core and the corresponding
> > logical shard, it is much faster than querying a different shard or core.
> >
> > Why is this behaviour? How to make the first two queries work as fast as
> > the last two queries?
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr multi core query too slow

Erick Erickson
Right, you’re running into the “laggard” problem, you can’t get the overall
result back until every shard has responded. There’s an interesting
parameter “shards.info=true” will give you some information about
the time taken by the sub-search on each shard.

But given your numbers, I think your root problem is that
your hardware is overmatched. In total, you have 14B documents, correct?
and a replication factor of 2. Meaning each of your 10 machines has 2.8
billion docs in 128G total memory. Another way to look at it is that you
have 7 x 20 x 2 = 280 replicas each with a 40G index. So each node has
28 replicas/node and handles over a terabyte of index in aggregate. At first
blush, you’ve overloaded your hardware. My guess here is that one node or
the other has to do a lot of swapping/gc/whatever quite regularly when
you query. Given that you’re on HDDs, this can be quite expensive.

I think you can simplify your analysis problem a lot by concentrating on
a single machine, load it up and analyze it heavily. Monitor I/O, analyze
GC and CPU utilization. My first guess is you’ll see heavy I/O. Once the index
is read into memory, a well-sized Solr installation won’t see much I/O,
_especially_ if you simplify the query to only ask for, say, some docValues
field or rows=0. I think that your OS is swapping significant segments of the
Lucene index in and out to satisfy your queries.

GC is always a place to look. You should have GC logs available to see if
you spend lots of CPU cycles in GC. I doubt that GC tuning will fix your
performance issues, but it’s something to look at.

A quick-n-dirty way to see if it’s swapping as I suspect is to monitor
CPU utilization. A red flag is if it’s low and your queries _still_ take
a long time. That’s a “smoking gun” that you’re swapping. While
indicative, that’s not definitive since I’ve also seen CPU pegged because
of GC, so if CPUs are running hot, you have to dig deeper...


So to answer your questions:

1> I doubt decreasing the number of shards will help.. I think you
     simply have too many docs per node, changing the number of
     shards isn’t going to change that.

2> A strongly suspect that increasing replicas will make the
     problem _worse_ since it’ll just add more docs per node.

3> It Depends (tm). I wrote “the sizing blog” a long time ago because
     this question comes up so often and is impossible to answer in the
     abstract. Well, it’s possible in theory, but in practice there are just
     too many variables. What you need to do to fully answer this in your
     situation with your data is set up a node and “test it to destruction”.

     https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

    I want to emphasize that you _must_ have realistic queries when
    you test as in that blog, and you _must_ have a bunch of them
    in order to not get fooled by hitting, say, your queryResultCache. I
    had one client who “stress tested” with the same query and was
    getting 3ms response times because, after the first one, they never
    needed to do any searching at all, everything was in caches. Needless
    to say that didn’t hold up when they used a realistic query mix.

Best,
Erick

> On May 29, 2020, at 4:53 AM, Anshuman Singh <[hidden email]> wrote:
>
> Thanks for your reply, Erick. You helped me in improving my understanding
> of how Solr distributed requests work internally.
>
> Actually my ultimate goal is to improve search performance in one of our
> test environment where the queries are taking upto 60 seconds to execute.
> *We want to fetch at least the first top 100 rows in seconds (< 5 seconds).
> *
>
> Right now, we have 7 Collections across 10 Solr nodes, each Collection
> having approx 2B records equally distributed across 20 shards with rf 2.
> Each replica/core is ~40GB in size . The number of users are very few
> (<10). We are using HDDs and each host has 128 GB RAM. Solr JVM Heap size
> is 24GB. In the actual production environment, we are planning for 100 such
> machines and we will be ingesting ~2B records on daily basis. We will
> retain data of upto 3 months.
>
> I followed your suggestion of not querying more than 100 rows and this is
> my observation. I ran queries with the debugQuery param and found that the
> query response time depends on the worst performing shard as some of the
> shards take longer to execute the query than other shards.
>
> Here are my questions:
>
>   1.  Is decreasing number of shards going to help us as there will be
>   lesser number of shards to be queried?
>   2.  Is increasing number of replicas going to help us as there will be
>   load balancing?
>   3.  How many records should we keep in each Collection or in each
>   replica/core? Will we face performance issues if the core size becomes too
>   big?
>
> Any other suggestions are appreciated.
>
> On Wed, May 27, 2020 at 9:23 PM Erick Erickson <[hidden email]>
> wrote:
>
>> First of all, asking for that many rows will spend a lot of time
>> gathering the document fields. Assuming you have stored fields,
>> each doc requires
>> 1> the aggregator node getting the candidate 100000 docs from each shard
>>
>> 2> The aggregator node sorting those 100000 docs from each shard into the
>> true top 100000 based on the sort criteria (score by default)
>>
>> 3> the aggregator node going back to the shards and asking them for those
>> docs of that 100000 that are resident on that shard
>>
>> 4> the aggregator node assembling the final docs to be sent to the client
>> and sending them.
>>
>> So my guess is that when you fire requests at a particular replica that
>> has to get them from the other shard’s replica on another host, the network
>> back-and-forth is killing your perf. It’s not that your network is having
>> problems, just that you’re pushing a lot of data back and forth in your
>> poorly-performing cases.
>>
>> So first of all, specifying 100K rows is an anti-pattern. Outside of
>> streaming, Solr is built on the presumption that you’re after the top few
>> rows (< 100, say). The times vary a lot depending on whether you need to
>> read stored fields BTW.
>>
>> Second, I suspect your test is bogus. If you run the tests in the order
>> you gave, the first one will read the necessary data from disk and probably
>> have it in the OS disk cache for the second and subsequent. And/or you’re
>> getting results from your queryResultCache (although you’d have to have a
>> big one). Specifying the exact same query when trying to time things is
>> usually a mistake.
>>
>> If your use-case requires 100K rows, you should be using streaming or
>> cursorMark. While that won’t make the end-to-end time shorter, but won’t
>> put such a strain on the system.
>>
>> Best,
>> Erick
>>
>>> On May 27, 2020, at 10:38 AM, Anshuman Singh <[hidden email]>
>> wrote:
>>>
>>> I have a Solr cloud setup (Solr 7.4) with a collection "test" having two
>>> shards on two different nodes. There are 4M records equally distributed
>>> across the shards.
>>>
>>> If I query the collection like below, it is slow.
>>> http://localhost:8983/solr/*test*/select?q=*:*&rows=100000
>>> QTime: 6930
>>>
>>> If I query a particular shard like below, it is also slow.
>>> http://localhost:8983/solr/*test_shard1_replica_n2*
>>> /select?q=*:*&rows=100000&shards=*shard2*
>>> QTime: 5494
>>> *Notice shard2 in shards parameter and shard1 in the core being queried.*
>>>
>>> But this is faster:
>>> http://localhost:8983/solr/*test_shard1_replica_n2*
>>> /select?q=*:*&rows=100000&shards=*shard1*
>>> QTime: 57
>>>
>>> This is also faster:
>>> http://localhost:8983/solr/*test_shard2_replica_n4*
>>> /select?q=*:*&rows=100000&shards=*shard2*
>>> QTime: 71
>>>
>>> I don't think it is the network as I performed similar tests with a
>> single
>>> node setup as well. If you query a particular core and the corresponding
>>> logical shard, it is much faster than querying a different shard or core.
>>>
>>> Why is this behaviour? How to make the first two queries work as fast as
>>> the last two queries?
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Solr multi core query too slow

Anshuman Singh
Thanks again, Erick, for pointing us in the right direction.

Yes, I am seeing heavy disk I/O while querying. I queried a single
collection. A query for 10 rows can cause 100-150 MB disk read on each
node. While querying for a 1000 rows, disk read is in range of 2-7 GB per
node.

Is this normal? I didn't quite get what is happening behind the scenes. I
mean just 1000 rows causing up to 2-7 GB of disk read? Now it may be
something basic but it would be helpful if you put some light on it.

It seems like disk I/O is the bottleneck here. With the amount of data we
are dealing with, is increasing the number of hosts the only option or we
may have missed on configuring Solr properly?

On Fri, May 29, 2020 at 5:13 PM Erick Erickson <[hidden email]>
wrote:

> Right, you’re running into the “laggard” problem, you can’t get the overall
> result back until every shard has responded. There’s an interesting
> parameter “shards.info=true” will give you some information about
> the time taken by the sub-search on each shard.
>
> But given your numbers, I think your root problem is that
> your hardware is overmatched. In total, you have 14B documents, correct?
> and a replication factor of 2. Meaning each of your 10 machines has 2.8
> billion docs in 128G total memory. Another way to look at it is that you
> have 7 x 20 x 2 = 280 replicas each with a 40G index. So each node has
> 28 replicas/node and handles over a terabyte of index in aggregate. At
> first
> blush, you’ve overloaded your hardware. My guess here is that one node or
> the other has to do a lot of swapping/gc/whatever quite regularly when
> you query. Given that you’re on HDDs, this can be quite expensive.
>
> I think you can simplify your analysis problem a lot by concentrating on
> a single machine, load it up and analyze it heavily. Monitor I/O, analyze
> GC and CPU utilization. My first guess is you’ll see heavy I/O. Once the
> index
> is read into memory, a well-sized Solr installation won’t see much I/O,
> _especially_ if you simplify the query to only ask for, say, some docValues
> field or rows=0. I think that your OS is swapping significant segments of
> the
> Lucene index in and out to satisfy your queries.
>
> GC is always a place to look. You should have GC logs available to see if
> you spend lots of CPU cycles in GC. I doubt that GC tuning will fix your
> performance issues, but it’s something to look at.
>
> A quick-n-dirty way to see if it’s swapping as I suspect is to monitor
> CPU utilization. A red flag is if it’s low and your queries _still_ take
> a long time. That’s a “smoking gun” that you’re swapping. While
> indicative, that’s not definitive since I’ve also seen CPU pegged because
> of GC, so if CPUs are running hot, you have to dig deeper...
>
>
> So to answer your questions:
>
> 1> I doubt decreasing the number of shards will help.. I think you
>      simply have too many docs per node, changing the number of
>      shards isn’t going to change that.
>
> 2> A strongly suspect that increasing replicas will make the
>      problem _worse_ since it’ll just add more docs per node.
>
> 3> It Depends (tm). I wrote “the sizing blog” a long time ago because
>      this question comes up so often and is impossible to answer in the
>      abstract. Well, it’s possible in theory, but in practice there are
> just
>      too many variables. What you need to do to fully answer this in your
>      situation with your data is set up a node and “test it to
> destruction”.
>
>
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
>     I want to emphasize that you _must_ have realistic queries when
>     you test as in that blog, and you _must_ have a bunch of them
>     in order to not get fooled by hitting, say, your queryResultCache. I
>     had one client who “stress tested” with the same query and was
>     getting 3ms response times because, after the first one, they never
>     needed to do any searching at all, everything was in caches. Needless
>     to say that didn’t hold up when they used a realistic query mix.
>
> Best,
> Erick
>
> > On May 29, 2020, at 4:53 AM, Anshuman Singh <[hidden email]>
> wrote:
> >
> > Thanks for your reply, Erick. You helped me in improving my understanding
> > of how Solr distributed requests work internally.
> >
> > Actually my ultimate goal is to improve search performance in one of our
> > test environment where the queries are taking upto 60 seconds to execute.
> > *We want to fetch at least the first top 100 rows in seconds (< 5
> seconds).
> > *
> >
> > Right now, we have 7 Collections across 10 Solr nodes, each Collection
> > having approx 2B records equally distributed across 20 shards with rf 2.
> > Each replica/core is ~40GB in size . The number of users are very few
> > (<10). We are using HDDs and each host has 128 GB RAM. Solr JVM Heap size
> > is 24GB. In the actual production environment, we are planning for 100
> such
> > machines and we will be ingesting ~2B records on daily basis. We will
> > retain data of upto 3 months.
> >
> > I followed your suggestion of not querying more than 100 rows and this is
> > my observation. I ran queries with the debugQuery param and found that
> the
> > query response time depends on the worst performing shard as some of the
> > shards take longer to execute the query than other shards.
> >
> > Here are my questions:
> >
> >   1.  Is decreasing number of shards going to help us as there will be
> >   lesser number of shards to be queried?
> >   2.  Is increasing number of replicas going to help us as there will be
> >   load balancing?
> >   3.  How many records should we keep in each Collection or in each
> >   replica/core? Will we face performance issues if the core size becomes
> too
> >   big?
> >
> > Any other suggestions are appreciated.
> >
> > On Wed, May 27, 2020 at 9:23 PM Erick Erickson <[hidden email]>
> > wrote:
> >
> >> First of all, asking for that many rows will spend a lot of time
> >> gathering the document fields. Assuming you have stored fields,
> >> each doc requires
> >> 1> the aggregator node getting the candidate 100000 docs from each shard
> >>
> >> 2> The aggregator node sorting those 100000 docs from each shard into
> the
> >> true top 100000 based on the sort criteria (score by default)
> >>
> >> 3> the aggregator node going back to the shards and asking them for
> those
> >> docs of that 100000 that are resident on that shard
> >>
> >> 4> the aggregator node assembling the final docs to be sent to the
> client
> >> and sending them.
> >>
> >> So my guess is that when you fire requests at a particular replica that
> >> has to get them from the other shard’s replica on another host, the
> network
> >> back-and-forth is killing your perf. It’s not that your network is
> having
> >> problems, just that you’re pushing a lot of data back and forth in your
> >> poorly-performing cases.
> >>
> >> So first of all, specifying 100K rows is an anti-pattern. Outside of
> >> streaming, Solr is built on the presumption that you’re after the top
> few
> >> rows (< 100, say). The times vary a lot depending on whether you need to
> >> read stored fields BTW.
> >>
> >> Second, I suspect your test is bogus. If you run the tests in the order
> >> you gave, the first one will read the necessary data from disk and
> probably
> >> have it in the OS disk cache for the second and subsequent. And/or
> you’re
> >> getting results from your queryResultCache (although you’d have to have
> a
> >> big one). Specifying the exact same query when trying to time things is
> >> usually a mistake.
> >>
> >> If your use-case requires 100K rows, you should be using streaming or
> >> cursorMark. While that won’t make the end-to-end time shorter, but won’t
> >> put such a strain on the system.
> >>
> >> Best,
> >> Erick
> >>
> >>> On May 27, 2020, at 10:38 AM, Anshuman Singh <
> [hidden email]>
> >> wrote:
> >>>
> >>> I have a Solr cloud setup (Solr 7.4) with a collection "test" having
> two
> >>> shards on two different nodes. There are 4M records equally distributed
> >>> across the shards.
> >>>
> >>> If I query the collection like below, it is slow.
> >>> http://localhost:8983/solr/*test*/select?q=*:*&rows=100000
> >>> QTime: 6930
> >>>
> >>> If I query a particular shard like below, it is also slow.
> >>> http://localhost:8983/solr/*test_shard1_replica_n2*
> >>> /select?q=*:*&rows=100000&shards=*shard2*
> >>> QTime: 5494
> >>> *Notice shard2 in shards parameter and shard1 in the core being
> queried.*
> >>>
> >>> But this is faster:
> >>> http://localhost:8983/solr/*test_shard1_replica_n2*
> >>> /select?q=*:*&rows=100000&shards=*shard1*
> >>> QTime: 57
> >>>
> >>> This is also faster:
> >>> http://localhost:8983/solr/*test_shard2_replica_n4*
> >>> /select?q=*:*&rows=100000&shards=*shard2*
> >>> QTime: 71
> >>>
> >>> I don't think it is the network as I performed similar tests with a
> >> single
> >>> node setup as well. If you query a particular core and the
> corresponding
> >>> logical shard, it is much faster than querying a different shard or
> core.
> >>>
> >>> Why is this behaviour? How to make the first two queries work as fast
> as
> >>> the last two queries?
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr multi core query too slow

Erick Erickson
Best guess is that your indexes are too big for your memory.

I think your focus on number of rows is misleading you, you’ll
see why in a moment.

Lucene indexes are essentially accessed randomly, there’s
very little locality. Here’s an excellent article explaining
how Lucene uses memory:
https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

But the important point is that once physical memory is full, in order
to access any other bit of the index that bit must be read into OS memory. So
what happens is that the block of OS memory for the oldest page is replaced
with the next bit needed.

In your case, because you have such large indexes relative to your physical
memory, you’re having to re-read indexes into memory from disk quite often.

Then you query collection2 and guess what? The
search you just ran on collection1 may have replaced many of the pages
of the index in OS memory space, causing them to have to be read from
disk again. And round and round you go.

As for the difference between 100 and 1,000 rows, that’s likely a red
herring. I claim that if you reduce rows to zero and make sure to vary your
queries (collections, terms and fields) you’ll see similar characteristics on _some_
queries. In fact I’ll make a bold prediction that if you do some worst-case
testing where you have:
1> many terms that are not close together lexically for each query
2> many fields per query
3> round-robin the collection queried (i.e. run a query against collection1, then 2, then 3…. last_collection, rinse, repeat)

<1> and <2> force lots of parts of the index to be required for each query.
<3> ages out pages from OS memory space.

you’ll be able to generate the kind of I/O you’re seeing at will. More rows will
certainly make the problem worse, but that’s probably not the main driver unless
your documents are enormous.

You ask if this is normal. Well, it’s certainly not _desirable_, but it’s what has to happen
to search at all when the hardware is undersized. For fast searches, one of the main
goals of tuning your hardware and application is exactly to prevent this kind of I/O, which
means almost all the time reading the relevant portions of the index into memory
to get the result siet is NOT required.

A couple of caveats here:

1> when a searcher is opened you expect to see spikes in I/O as the index is read
into memory the first time. This happens either because of autowarming or when the
first few queries are run.

2> You expect to see _some_ I/O when you return documents, the stored fields may
be read off disk. That’s usually low-level noise though.

You ask what can be done… Sure, there are a number of ways you can make the
index smaller, each one requires that you give up some functionality. For instance,
position information is required for, say, phrase searches. If you don’t need to do
phrase searches on a field and turn off indexing position information, that’ll help make
the index smaller, more of it can fit into memory at once, and your situation will
be better. I don’t think this a silver bullet in your case, but worth looking at. Take
a quick look at your index files and you can get an idea of which options are taking up
space by correlating them with the explanation here:

https://lucene.apache.org/core/7_0_0/core/org/apache/lucene/codecs/lucene70/package-summary.html#package_description

Do note that the *.fdt and *.fdx files, those hold the stored=true data and are
not necessary for searches, they come into play when docs are returned. That’s
the I/O I mentioned above about when you return docs.

But other than that, you need more physical RAM for your nodes or more nodes.
Again, you can get a feel for how much by following the outline in the sizing blog.

Best,
Erick

> On May 30, 2020, at 12:14 PM, Anshuman Singh <[hidden email]> wrote:
>
> Thanks again, Erick, for pointing us in the right direction.
>
> Yes, I am seeing heavy disk I/O while querying. I queried a single
> collection. A query for 10 rows can cause 100-150 MB disk read on each
> node. While querying for a 1000 rows, disk read is in range of 2-7 GB per
> node.
>
> Is this normal? I didn't quite get what is happening behind the scenes. I
> mean just 1000 rows causing up to 2-7 GB of disk read? Now it may be
> something basic but it would be helpful if you put some light on it.
>
> It seems like disk I/O is the bottleneck here. With the amount of data we
> are dealing with, is increasing the number of hosts the only option or we
> may have missed on configuring Solr properly?
>
> On Fri, May 29, 2020 at 5:13 PM Erick Erickson <[hidden email]>
> wrote:
>
>> Right, you’re running into the “laggard” problem, you can’t get the overall
>> result back until every shard has responded. There’s an interesting
>> parameter “shards.info=true” will give you some information about
>> the time taken by the sub-search on each shard.
>>
>> But given your numbers, I think your root problem is that
>> your hardware is overmatched. In total, you have 14B documents, correct?
>> and a replication factor of 2. Meaning each of your 10 machines has 2.8
>> billion docs in 128G total memory. Another way to look at it is that you
>> have 7 x 20 x 2 = 280 replicas each with a 40G index. So each node has
>> 28 replicas/node and handles over a terabyte of index in aggregate. At
>> first
>> blush, you’ve overloaded your hardware. My guess here is that one node or
>> the other has to do a lot of swapping/gc/whatever quite regularly when
>> you query. Given that you’re on HDDs, this can be quite expensive.
>>
>> I think you can simplify your analysis problem a lot by concentrating on
>> a single machine, load it up and analyze it heavily. Monitor I/O, analyze
>> GC and CPU utilization. My first guess is you’ll see heavy I/O. Once the
>> index
>> is read into memory, a well-sized Solr installation won’t see much I/O,
>> _especially_ if you simplify the query to only ask for, say, some docValues
>> field or rows=0. I think that your OS is swapping significant segments of
>> the
>> Lucene index in and out to satisfy your queries.
>>
>> GC is always a place to look. You should have GC logs available to see if
>> you spend lots of CPU cycles in GC. I doubt that GC tuning will fix your
>> performance issues, but it’s something to look at.
>>
>> A quick-n-dirty way to see if it’s swapping as I suspect is to monitor
>> CPU utilization. A red flag is if it’s low and your queries _still_ take
>> a long time. That’s a “smoking gun” that you’re swapping. While
>> indicative, that’s not definitive since I’ve also seen CPU pegged because
>> of GC, so if CPUs are running hot, you have to dig deeper...
>>
>>
>> So to answer your questions:
>>
>> 1> I doubt decreasing the number of shards will help.. I think you
>>     simply have too many docs per node, changing the number of
>>     shards isn’t going to change that.
>>
>> 2> A strongly suspect that increasing replicas will make the
>>     problem _worse_ since it’ll just add more docs per node.
>>
>> 3> It Depends (tm). I wrote “the sizing blog” a long time ago because
>>     this question comes up so often and is impossible to answer in the
>>     abstract. Well, it’s possible in theory, but in practice there are
>> just
>>     too many variables. What you need to do to fully answer this in your
>>     situation with your data is set up a node and “test it to
>> destruction”.
>>
>>
>> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>>    I want to emphasize that you _must_ have realistic queries when
>>    you test as in that blog, and you _must_ have a bunch of them
>>    in order to not get fooled by hitting, say, your queryResultCache. I
>>    had one client who “stress tested” with the same query and was
>>    getting 3ms response times because, after the first one, they never
>>    needed to do any searching at all, everything was in caches. Needless
>>    to say that didn’t hold up when they used a realistic query mix.
>>
>> Best,
>> Erick
>>
>>> On May 29, 2020, at 4:53 AM, Anshuman Singh <[hidden email]>
>> wrote:
>>>
>>> Thanks for your reply, Erick. You helped me in improving my understanding
>>> of how Solr distributed requests work internally.
>>>
>>> Actually my ultimate goal is to improve search performance in one of our
>>> test environment where the queries are taking upto 60 seconds to execute.
>>> *We want to fetch at least the first top 100 rows in seconds (< 5
>> seconds).
>>> *
>>>
>>> Right now, we have 7 Collections across 10 Solr nodes, each Collection
>>> having approx 2B records equally distributed across 20 shards with rf 2.
>>> Each replica/core is ~40GB in size . The number of users are very few
>>> (<10). We are using HDDs and each host has 128 GB RAM. Solr JVM Heap size
>>> is 24GB. In the actual production environment, we are planning for 100
>> such
>>> machines and we will be ingesting ~2B records on daily basis. We will
>>> retain data of upto 3 months.
>>>
>>> I followed your suggestion of not querying more than 100 rows and this is
>>> my observation. I ran queries with the debugQuery param and found that
>> the
>>> query response time depends on the worst performing shard as some of the
>>> shards take longer to execute the query than other shards.
>>>
>>> Here are my questions:
>>>
>>>  1.  Is decreasing number of shards going to help us as there will be
>>>  lesser number of shards to be queried?
>>>  2.  Is increasing number of replicas going to help us as there will be
>>>  load balancing?
>>>  3.  How many records should we keep in each Collection or in each
>>>  replica/core? Will we face performance issues if the core size becomes
>> too
>>>  big?
>>>
>>> Any other suggestions are appreciated.
>>>
>>> On Wed, May 27, 2020 at 9:23 PM Erick Erickson <[hidden email]>
>>> wrote:
>>>
>>>> First of all, asking for that many rows will spend a lot of time
>>>> gathering the document fields. Assuming you have stored fields,
>>>> each doc requires
>>>> 1> the aggregator node getting the candidate 100000 docs from each shard
>>>>
>>>> 2> The aggregator node sorting those 100000 docs from each shard into
>> the
>>>> true top 100000 based on the sort criteria (score by default)
>>>>
>>>> 3> the aggregator node going back to the shards and asking them for
>> those
>>>> docs of that 100000 that are resident on that shard
>>>>
>>>> 4> the aggregator node assembling the final docs to be sent to the
>> client
>>>> and sending them.
>>>>
>>>> So my guess is that when you fire requests at a particular replica that
>>>> has to get them from the other shard’s replica on another host, the
>> network
>>>> back-and-forth is killing your perf. It’s not that your network is
>> having
>>>> problems, just that you’re pushing a lot of data back and forth in your
>>>> poorly-performing cases.
>>>>
>>>> So first of all, specifying 100K rows is an anti-pattern. Outside of
>>>> streaming, Solr is built on the presumption that you’re after the top
>> few
>>>> rows (< 100, say). The times vary a lot depending on whether you need to
>>>> read stored fields BTW.
>>>>
>>>> Second, I suspect your test is bogus. If you run the tests in the order
>>>> you gave, the first one will read the necessary data from disk and
>> probably
>>>> have it in the OS disk cache for the second and subsequent. And/or
>> you’re
>>>> getting results from your queryResultCache (although you’d have to have
>> a
>>>> big one). Specifying the exact same query when trying to time things is
>>>> usually a mistake.
>>>>
>>>> If your use-case requires 100K rows, you should be using streaming or
>>>> cursorMark. While that won’t make the end-to-end time shorter, but won’t
>>>> put such a strain on the system.
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>>> On May 27, 2020, at 10:38 AM, Anshuman Singh <
>> [hidden email]>
>>>> wrote:
>>>>>
>>>>> I have a Solr cloud setup (Solr 7.4) with a collection "test" having
>> two
>>>>> shards on two different nodes. There are 4M records equally distributed
>>>>> across the shards.
>>>>>
>>>>> If I query the collection like below, it is slow.
>>>>> http://localhost:8983/solr/*test*/select?q=*:*&rows=100000
>>>>> QTime: 6930
>>>>>
>>>>> If I query a particular shard like below, it is also slow.
>>>>> http://localhost:8983/solr/*test_shard1_replica_n2*
>>>>> /select?q=*:*&rows=100000&shards=*shard2*
>>>>> QTime: 5494
>>>>> *Notice shard2 in shards parameter and shard1 in the core being
>> queried.*
>>>>>
>>>>> But this is faster:
>>>>> http://localhost:8983/solr/*test_shard1_replica_n2*
>>>>> /select?q=*:*&rows=100000&shards=*shard1*
>>>>> QTime: 57
>>>>>
>>>>> This is also faster:
>>>>> http://localhost:8983/solr/*test_shard2_replica_n4*
>>>>> /select?q=*:*&rows=100000&shards=*shard2*
>>>>> QTime: 71
>>>>>
>>>>> I don't think it is the network as I performed similar tests with a
>>>> single
>>>>> node setup as well. If you query a particular core and the
>> corresponding
>>>>> logical shard, it is much faster than querying a different shard or
>> core.
>>>>>
>>>>> Why is this behaviour? How to make the first two queries work as fast
>> as
>>>>> the last two queries?
>>>>
>>>>
>>
>>