Need urgent help -- High cpu on solr

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Need urgent help -- High cpu on solr

yaswanthcse
I am using solr 8.2 with zoo 3.4, and configured 5 node solr cloud with
around 100 collections each collection having ~20k documents.

These nodes are vm's with 6 core cpu and 2 cores per socket. All of sudden
seeing hikes on CPU's and which brought down some nodes (GONE state on solr
cloud and also faced latencies while trying to login to those nodes ssh)

Memory : 32GB and 20GB was allotted for jvm heap on solr config.

 <filterCache class="solr.FastLRUCache"
                 size="50000"
                 initialSize="512"
                 autowarmCount="128"/>
 <queryResultCache class="solr.LRUCache"
                      size="50000"
                      initialSize="512"
                      autowarmCount="128"/>
 <documentCache class="solr.LRUCache"
                   size="50000"
                   initialSize="512"
                   autowarmCount="128"/>
 <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
    <queryResultWindowSize>100</queryResultWindowSize>
    <enableLazyFieldLoading>true</enableLazyFieldLoading>
 <useColdSearcher>false</useColdSearcher>
    <maxWarmingSearchers>4</maxWarmingSearchers>

These are just from the defaults that shipped with SOLR package.

One data point is that these nodes gets very frequent hits to them for
searching, so do I need to consider increasing the above sizes to get down
the CPU's and see more stable solr cloud?

--
Thanks & Regards,
Yaswanth Kumar Konathala.
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Need urgent help -- High cpu on solr

Zisis T.
The values you have for the caches and the maxwarmingsearchers do not look
like the default. Cache sizes are 512 for the most part and
maxwarmingsearchers are 2 (if not limit them to 2)

Sudden CPU spikes probably indicate GC issues. The #  of documents you have
is small, are they huge documents? The # of collections is OK in general but
since they are crammed in 5 Solr nodes the memory requirements might be
bigger. Especially if filter and the other caches get populated with 50K
entries.

I'd first go through the GC activity to make sure that this is not causing
the issue. The fact that you lose some Solr servers is also an indicator of
large GC pauses that might create a problem when Solr communicates with
Zookeeper.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Need urgent help -- High cpu on solr

Erick Erickson
Zisis makes good points. One other thing is I’d look to
see if the CPU spikes coincide with commits. But GC
is where I’d look first.

Continuing on with the theme of caches, yours are far too large
at first glance. The default is, indeed, size=512. Every time
you open a new searcher, you’ll be executing 128 queries
for autowarming the filterCache and another 128 for the queryResultCache.
autowarming alone might be accounting for it. I’d reduce
the size back to 512 and an autowarm count nearer 16
and monitor the cache hit ratio. There’s little or no benefit
in squeezing the last few percent from the hit ratio. If your
hit ratio is small even with the settings you have, then your caches
don’t do you much good anyway so I’d make them much smaller.

You haven’t told us how often your indexes are
updated, which will be significant CPU hit due to
your autowarming.

Once you’re done with that, I’d then try reducing the heap. Most
of the actual searching is done in Lucene via MMapDirectory,
which resides in the OS memory space. See:

https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Finally, if it is GC, consider G1GC if you’re not using that
already.

Best,
Erick


> On Oct 14, 2020, at 7:37 AM, Zisis T. <[hidden email]> wrote:
>
> The values you have for the caches and the maxwarmingsearchers do not look
> like the default. Cache sizes are 512 for the most part and
> maxwarmingsearchers are 2 (if not limit them to 2)
>
> Sudden CPU spikes probably indicate GC issues. The #  of documents you have
> is small, are they huge documents? The # of collections is OK in general but
> since they are crammed in 5 Solr nodes the memory requirements might be
> bigger. Especially if filter and the other caches get populated with 50K
> entries.
>
> I'd first go through the GC activity to make sure that this is not causing
> the issue. The fact that you lose some Solr servers is also an indicator of
> large GC pauses that might create a problem when Solr communicates with
> Zookeeper.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Need urgent help -- High cpu on solr

Rahul Goswami
In addition to the insightful pointers by Zisis and Erick, I would like to
mention an approach in the link below that I generally use to pinpoint
exactly which threads are causing the CPU spike. Knowing this you can
understand which aspect of Solr (search thread, GC, update thread etc) is
taking more CPU and develop a mitigation strategy accordingly. (eg: if it's
a GC thread, maybe try tuning the params or switch to G1 GC). Just helps to
take the guesswork out of the many possible causes. Of course the
suggestions received earlier are best practices and should be taken into
consideration nevertheless.

https://backstage.forgerock.com/knowledge/kb/article/a39551500

The hex number the author talks about in the link above is the native
thread id.

Best,
Rahul


On Wed, Oct 14, 2020 at 8:00 AM Erick Erickson <[hidden email]>
wrote:

> Zisis makes good points. One other thing is I’d look to
> see if the CPU spikes coincide with commits. But GC
> is where I’d look first.
>
> Continuing on with the theme of caches, yours are far too large
> at first glance. The default is, indeed, size=512. Every time
> you open a new searcher, you’ll be executing 128 queries
> for autowarming the filterCache and another 128 for the queryResultCache.
> autowarming alone might be accounting for it. I’d reduce
> the size back to 512 and an autowarm count nearer 16
> and monitor the cache hit ratio. There’s little or no benefit
> in squeezing the last few percent from the hit ratio. If your
> hit ratio is small even with the settings you have, then your caches
> don’t do you much good anyway so I’d make them much smaller.
>
> You haven’t told us how often your indexes are
> updated, which will be significant CPU hit due to
> your autowarming.
>
> Once you’re done with that, I’d then try reducing the heap. Most
> of the actual searching is done in Lucene via MMapDirectory,
> which resides in the OS memory space. See:
>
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Finally, if it is GC, consider G1GC if you’re not using that
> already.
>
> Best,
> Erick
>
>
> > On Oct 14, 2020, at 7:37 AM, Zisis T. <[hidden email]> wrote:
> >
> > The values you have for the caches and the maxwarmingsearchers do not
> look
> > like the default. Cache sizes are 512 for the most part and
> > maxwarmingsearchers are 2 (if not limit them to 2)
> >
> > Sudden CPU spikes probably indicate GC issues. The #  of documents you
> have
> > is small, are they huge documents? The # of collections is OK in general
> but
> > since they are crammed in 5 Solr nodes the memory requirements might be
> > bigger. Especially if filter and the other caches get populated with 50K
> > entries.
> >
> > I'd first go through the GC activity to make sure that this is not
> causing
> > the issue. The fact that you lose some Solr servers is also an indicator
> of
> > large GC pauses that might create a problem when Solr communicates with
> > Zookeeper.
> >
> >
> >
> > --
> > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
>