Solr cluster tuning

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr cluster tuning

Vidhya Kailash
We are currently using Solr Cloud Version 7.4 with SolrJ api to fetch data
from collections. We recently deployed our code to production and noticed
that response time is more if the number of incoming requests are less.

But strangely, if we bombard the system with more and more requests we get
much better response time.

My suspicion is client is closing the connections sooner in case of slower
requests and slower in case of faster requests.

We tried tuning by passing custom HTTPClient to SolrJ and also by updating
HttpShardHandlerFactory settings. For example we made -
maxThreadIdleTime = 60000
socketTimeOut = 180000

Wondering what other tuning we can do to make this perform the same
irrespective of the number of requests.

Thanks!

Vidhya
Reply | Threaded
Open this post in threaded view
|

RE: Solr cluster tuning

Davis, Daniel (NIH/NLM) [C]
Usually, responses are due to I/O waits getting the data off of the disk.   So, to me, this seems more likely because as you bombard the server with queries, you cause more and more of the data needed to answer the query into memory.

To verify this, I'd bombard your server with queries to warm it up, and then repeat your test with the queries coming in slowly or quickly.

If it still holds up, then there is something other than Solr going on with that server, and taking memory from Solr or your index is somewhat too big for your server.  Linux likes to overcommit memory - try setting vm swappiness to something low, like 10, rather than the default 60.   Look for anything on the server with Solr that may be competing with it for I/O resources, and causing its pages to swap out.

Also, look at the size of your index data.

These are general advises in dealing with inverted indexes - some of the Solr engineers on this list may have some very specific ideas, such as merging activity or other background tasks running when the query load is lighter.   I wouldn't know how to check for these things, but would thing they wouldn't affect query response time that badly.

-----Original Message-----
From: Vidhya Kailash <[hidden email]>
Sent: Wednesday, October 24, 2018 4:22 PM
To: [hidden email]
Subject: Solr cluster tuning

We are currently using Solr Cloud Version 7.4 with SolrJ api to fetch data from collections. We recently deployed our code to production and noticed that response time is more if the number of incoming requests are less.

But strangely, if we bombard the system with more and more requests we get much better response time.

My suspicion is client is closing the connections sooner in case of slower requests and slower in case of faster requests.

We tried tuning by passing custom HTTPClient to SolrJ and also by updating HttpShardHandlerFactory settings. For example we made - maxThreadIdleTime = 60000 socketTimeOut = 180000

Wondering what other tuning we can do to make this perform the same irrespective of the number of requests.

Thanks!

Vidhya
Reply | Threaded
Open this post in threaded view
|

Re: Solr cluster tuning

Erick Erickson
To add to Daniel's comments: Are you indexing at the same time? Say
your autocommit time is 10 seconds. For the sake of argument let's say
it takes 15 queries to warm your searcher. Let's further say that the
average time for those 15 queries is 500ms each and once the searcher
is warmed the average time drops to 100ms. You'll have an average
close to 100ms.

OTOH, if you only fire 15 queries over that 10 seconds, the average
would be 500ms.

My guess is your autowarm counts for filterCache and queryResult cache
are the default 0 and if you set them to, say, 20 each much of your
problem would disappear.  Ditto if you stopped indexing. Both point to
the searchers having to pull data into memory from disk and/or rebuild
caches.

Best,
Erick
On Wed, Oct 24, 2018 at 1:37 PM Davis, Daniel (NIH/NLM) [C]
<[hidden email]> wrote:

>
> Usually, responses are due to I/O waits getting the data off of the disk.   So, to me, this seems more likely because as you bombard the server with queries, you cause more and more of the data needed to answer the query into memory.
>
> To verify this, I'd bombard your server with queries to warm it up, and then repeat your test with the queries coming in slowly or quickly.
>
> If it still holds up, then there is something other than Solr going on with that server, and taking memory from Solr or your index is somewhat too big for your server.  Linux likes to overcommit memory - try setting vm swappiness to something low, like 10, rather than the default 60.   Look for anything on the server with Solr that may be competing with it for I/O resources, and causing its pages to swap out.
>
> Also, look at the size of your index data.
>
> These are general advises in dealing with inverted indexes - some of the Solr engineers on this list may have some very specific ideas, such as merging activity or other background tasks running when the query load is lighter.   I wouldn't know how to check for these things, but would thing they wouldn't affect query response time that badly.
>
> -----Original Message-----
> From: Vidhya Kailash <[hidden email]>
> Sent: Wednesday, October 24, 2018 4:22 PM
> To: [hidden email]
> Subject: Solr cluster tuning
>
> We are currently using Solr Cloud Version 7.4 with SolrJ api to fetch data from collections. We recently deployed our code to production and noticed that response time is more if the number of incoming requests are less.
>
> But strangely, if we bombard the system with more and more requests we get much better response time.
>
> My suspicion is client is closing the connections sooner in case of slower requests and slower in case of faster requests.
>
> We tried tuning by passing custom HTTPClient to SolrJ and also by updating HttpShardHandlerFactory settings. For example we made - maxThreadIdleTime = 60000 socketTimeOut = 180000
>
> Wondering what other tuning we can do to make this perform the same irrespective of the number of requests.
>
> Thanks!
>
> Vidhya
Reply | Threaded
Open this post in threaded view
|

Re: Solr cluster tuning

Vidhya Kailash
Thank you Erick and Daniel for your prompt responses. We were trying a few things (moving to G1GC, optimizing by throwing away some fields that need not be indexed & stored) and hence the late response. 

First of all, thought of giving a overview of the environment... We have a four node Solr Cloud cluster. We have 2 indexes which is spread across 4 shards and has 2 replicas. We have a total of 30GB on each of the nodes (all dedicated to running the Solr Cloud alone). Of which 15GB are allocated to the JVM and the rest for the OS to manage. All the indexes together take up just 1.4GB on the disk. Running version 7.4 with a dedicated Zookeeper cluster.

Something of concern I see on the Solr Admin is the use of that memory. 

this is what I see by running Top:


Is there a general calculation on how much to leave for OS caching for an index of 2GB? 
To answer Ericks question, no we are not indexing at the same time. In fact we have stopped indexing just to test the theory and dont see any improvements. I dont think I need to worry about autocommit then right? 
Daniel, we did try what you mentioned here (that is warm up the cache and then do a slow and a fast test) and we still see the slow test yielding slower results. 


Any thoughts anyone? Much appreciate your responses....


thanks
Vidhya


On Wed, Oct 24, 2018 at 6:40 PM Erick Erickson <[hidden email]> wrote:
To add to Daniel's comments: Are you indexing at the same time? Say
your autocommit time is 10 seconds. For the sake of argument let's say
it takes 15 queries to warm your searcher. Let's further say that the
average time for those 15 queries is 500ms each and once the searcher
is warmed the average time drops to 100ms. You'll have an average
close to 100ms.

OTOH, if you only fire 15 queries over that 10 seconds, the average
would be 500ms.

My guess is your autowarm counts for filterCache and queryResult cache
are the default 0 and if you set them to, say, 20 each much of your
problem would disappear.  Ditto if you stopped indexing. Both point to
the searchers having to pull data into memory from disk and/or rebuild
caches.

Best,
Erick
On Wed, Oct 24, 2018 at 1:37 PM Davis, Daniel (NIH/NLM) [C]
<[hidden email]> wrote:
>
> Usually, responses are due to I/O waits getting the data off of the disk.   So, to me, this seems more likely because as you bombard the server with queries, you cause more and more of the data needed to answer the query into memory.
>
> To verify this, I'd bombard your server with queries to warm it up, and then repeat your test with the queries coming in slowly or quickly.
>
> If it still holds up, then there is something other than Solr going on with that server, and taking memory from Solr or your index is somewhat too big for your server.  Linux likes to overcommit memory - try setting vm swappiness to something low, like 10, rather than the default 60.   Look for anything on the server with Solr that may be competing with it for I/O resources, and causing its pages to swap out.
>
> Also, look at the size of your index data.
>
> These are general advises in dealing with inverted indexes - some of the Solr engineers on this list may have some very specific ideas, such as merging activity or other background tasks running when the query load is lighter.   I wouldn't know how to check for these things, but would thing they wouldn't affect query response time that badly.
>
> -----Original Message-----
> From: Vidhya Kailash <[hidden email]>
> Sent: Wednesday, October 24, 2018 4:22 PM
> To: [hidden email]
> Subject: Solr cluster tuning
>
> We are currently using Solr Cloud Version 7.4 with SolrJ api to fetch data from collections. We recently deployed our code to production and noticed that response time is more if the number of incoming requests are less.
>
> But strangely, if we bombard the system with more and more requests we get much better response time.
>
> My suspicion is client is closing the connections sooner in case of slower requests and slower in case of faster requests.
>
> We tried tuning by passing custom HTTPClient to SolrJ and also by updating HttpShardHandlerFactory settings. For example we made - maxThreadIdleTime = 60000 socketTimeOut = 180000
>
> Wondering what other tuning we can do to make this perform the same irrespective of the number of requests.
>
> Thanks!
>
> Vidhya


--
Vidhya Kailash