Solr using all available CPU and becoming unresponsive

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr using all available CPU and becoming unresponsive

Jeremy Smith
Hello all,
     We have been struggling with an issue where solr will intermittently use all available CPU and become unresponsive.  It will remain in this state until we restart.  Solr will remain stable for some time, usually a few hours to a few days, before this happens again.  We've tried adjusting the caches and adding memory to both the VM and JVM, but we haven't been able to solve the issue yet.

Here is some info about our server:
Solr:
  Solr 7.3.1, running on Java 1.8
  Running in cloud mode, but there's only one core

Host:
  CentOS7
  8 CPU, 56GB RAM
  The only other processes running on this VM are two zookeepers, one for this Solr instance, one for another Solr instance

Solr Config:
 - One Core
 - 36 Million documents (Max Doc), 28 million (Num Docs)
 - ~15GB
 - 10-20 Requests/second
 - The schema is fairly large (~100 fields) and we allow faceting and searching on many, but not all, of the fields
 - Data are imported once per minute through the DataImportHandler, with a hard commit at the end.  We usually index ~100-500 documents per minute, with many of these being updates to existing documents.

Cache settings:
    <filterCache class="solr.FastLRUCache"
                 size="256"
                 initialSize="256"
                 autowarmCount="8"
                 showItems="64"/>

    <queryResultCache class="solr.LRUCache"
                      size="256"
                      initialSize="256"
                      autowarmCount="0"/>

    <documentCache class="solr.LRUCache"
                   size="1024"
                   initialSize="1024"
                   autowarmCount="0"/>

For the filterCache, we have tried sizes as low as 128, which caused our CPU usage to go up and didn't solve our issue.  autowarmCount used to be much higher, but we have reduced it to try to address this issue.


The behavior we see:

Solr is normally using ~3-6GB of heap and we usually have ~20GB of free memory.  Occasionally, though, solr is not able to free up memory and the heap usage climbs.  Analyzing the GC logs shows a sharp incline of usage with the GC (the default CMS) working hard to free memory, but not accomplishing much.  Eventually, it fills up the heap, maxes out the CPUs, and never recovers.  We have tried to analyze the logs to see if there are particular queries causing issues or if there are network issues to zookeeper, but we haven't been able to find any patterns.  After the issues start, we often see session timeouts to zookeeper, but it doesn't appear​ that they are the cause.



Does anyone have any recommendations on things to try or metrics to look into or configuration issues I may be overlooking?

Thanks,
Jeremy

Reply | Threaded
Open this post in threaded view
|

Re: Solr using all available CPU and becoming unresponsive

Michael Gibney
Hi Jeremy,
Can you share your analysis chain configs? (SOLR-13336 can manifest in a
similar way, and would affect 7.3.1 with a susceptible config, given the
right (wrong?) input ...)
Michael

On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith <[hidden email]> wrote:

> Hello all,
>      We have been struggling with an issue where solr will intermittently
> use all available CPU and become unresponsive.  It will remain in this
> state until we restart.  Solr will remain stable for some time, usually a
> few hours to a few days, before this happens again.  We've tried adjusting
> the caches and adding memory to both the VM and JVM, but we haven't been
> able to solve the issue yet.
>
> Here is some info about our server:
> Solr:
>   Solr 7.3.1, running on Java 1.8
>   Running in cloud mode, but there's only one core
>
> Host:
>   CentOS7
>   8 CPU, 56GB RAM
>   The only other processes running on this VM are two zookeepers, one for
> this Solr instance, one for another Solr instance
>
> Solr Config:
>  - One Core
>  - 36 Million documents (Max Doc), 28 million (Num Docs)
>  - ~15GB
>  - 10-20 Requests/second
>  - The schema is fairly large (~100 fields) and we allow faceting and
> searching on many, but not all, of the fields
>  - Data are imported once per minute through the DataImportHandler, with a
> hard commit at the end.  We usually index ~100-500 documents per minute,
> with many of these being updates to existing documents.
>
> Cache settings:
>     <filterCache class="solr.FastLRUCache"
>                  size="256"
>                  initialSize="256"
>                  autowarmCount="8"
>                  showItems="64"/>
>
>     <queryResultCache class="solr.LRUCache"
>                       size="256"
>                       initialSize="256"
>                       autowarmCount="0"/>
>
>     <documentCache class="solr.LRUCache"
>                    size="1024"
>                    initialSize="1024"
>                    autowarmCount="0"/>
>
> For the filterCache, we have tried sizes as low as 128, which caused our
> CPU usage to go up and didn't solve our issue.  autowarmCount used to be
> much higher, but we have reduced it to try to address this issue.
>
>
> The behavior we see:
>
> Solr is normally using ~3-6GB of heap and we usually have ~20GB of free
> memory.  Occasionally, though, solr is not able to free up memory and the
> heap usage climbs.  Analyzing the GC logs shows a sharp incline of usage
> with the GC (the default CMS) working hard to free memory, but not
> accomplishing much.  Eventually, it fills up the heap, maxes out the CPUs,
> and never recovers.  We have tried to analyze the logs to see if there are
> particular queries causing issues or if there are network issues to
> zookeeper, but we haven't been able to find any patterns.  After the issues
> start, we often see session timeouts to zookeeper, but it doesn't appear​
> that they are the cause.
>
>
>
> Does anyone have any recommendations on things to try or metrics to look
> into or configuration issues I may be overlooking?
>
> Thanks,
> Jeremy
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr using all available CPU and becoming unresponsive

Jeremy Smith
Thanks Michael,
     SOLR-13336 seems intriguing.  I'm not a solr expert, but I believe these are the relevant sections from our schema definition:

    <fieldType name="specimenId" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
    <fieldType name="ml_text_general" class="solr.TextField" positionIncrementGap="100" multiValued="false">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

Our other fieldTypes don't have any analyzers attached to them.


If SOLR-13336 is the cause of the issue is the best remedy to upgrade to solr 8?  It doesn't look like the fix was back patched to 7.x.

Our schema has some issues arising from not fully understanding Solr and just copying existing structures from the defaults.  In this case, stopwords.txt is completely empty and synonyms.txt is just the default synonyms.txt, which seems not useful at all for us.  Could I just take out the StopFilterFactory and SynonymGraphFilterFactory from the query section (and maybe the StopFilterFactory from the index section as well)?

Thanks again,
Jeremy

________________________________
From: Michael Gibney <[hidden email]>
Sent: Monday, January 11, 2021 8:30 PM
To: [hidden email] <[hidden email]>
Subject: Re: Solr using all available CPU and becoming unresponsive

Hi Jeremy,
Can you share your analysis chain configs? (SOLR-13336 can manifest in a
similar way, and would affect 7.3.1 with a susceptible config, given the
right (wrong?) input ...)
Michael

On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith <[hidden email]> wrote:

> Hello all,
>      We have been struggling with an issue where solr will intermittently
> use all available CPU and become unresponsive.  It will remain in this
> state until we restart.  Solr will remain stable for some time, usually a
> few hours to a few days, before this happens again.  We've tried adjusting
> the caches and adding memory to both the VM and JVM, but we haven't been
> able to solve the issue yet.
>
> Here is some info about our server:
> Solr:
>   Solr 7.3.1, running on Java 1.8
>   Running in cloud mode, but there's only one core
>
> Host:
>   CentOS7
>   8 CPU, 56GB RAM
>   The only other processes running on this VM are two zookeepers, one for
> this Solr instance, one for another Solr instance
>
> Solr Config:
>  - One Core
>  - 36 Million documents (Max Doc), 28 million (Num Docs)
>  - ~15GB
>  - 10-20 Requests/second
>  - The schema is fairly large (~100 fields) and we allow faceting and
> searching on many, but not all, of the fields
>  - Data are imported once per minute through the DataImportHandler, with a
> hard commit at the end.  We usually index ~100-500 documents per minute,
> with many of these being updates to existing documents.
>
> Cache settings:
>     <filterCache class="solr.FastLRUCache"
>                  size="256"
>                  initialSize="256"
>                  autowarmCount="8"
>                  showItems="64"/>
>
>     <queryResultCache class="solr.LRUCache"
>                       size="256"
>                       initialSize="256"
>                       autowarmCount="0"/>
>
>     <documentCache class="solr.LRUCache"
>                    size="1024"
>                    initialSize="1024"
>                    autowarmCount="0"/>
>
> For the filterCache, we have tried sizes as low as 128, which caused our
> CPU usage to go up and didn't solve our issue.  autowarmCount used to be
> much higher, but we have reduced it to try to address this issue.
>
>
> The behavior we see:
>
> Solr is normally using ~3-6GB of heap and we usually have ~20GB of free
> memory.  Occasionally, though, solr is not able to free up memory and the
> heap usage climbs.  Analyzing the GC logs shows a sharp incline of usage
> with the GC (the default CMS) working hard to free memory, but not
> accomplishing much.  Eventually, it fills up the heap, maxes out the CPUs,
> and never recovers.  We have tried to analyze the logs to see if there are
> particular queries causing issues or if there are network issues to
> zookeeper, but we haven't been able to find any patterns.  After the issues
> start, we often see session timeouts to zookeeper, but it doesn't appear​
> that they are the cause.
>
>
>
> Does anyone have any recommendations on things to try or metrics to look
> into or configuration issues I may be overlooking?
>
> Thanks,
> Jeremy
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr using all available CPU and becoming unresponsive

Michael Gibney
Ahh ok. If those are your only fieldType definitions, and most of your
config is copied from the default, then SOLR-13336 is unlikely to be the
culprit. Looking at more general options, off the top of my head:
1. make sure you haven't allocated all physical memory to heap (leave a
decent amount for OS page cache)
2. disable swap, if you can (this is esp. important if using network
storage as swap). There are potential downsides to this (so proceed with
caution); but if part of your heap gets swapped out (and it almost
certainly will, with a sufficiently large heap) full GCs lead to a swap
storm that compounds the problem. (fwiw, this is probably the first thing
I'd recommend looking into and trying, because it's so easy, and can in
some cases yield a dramatic improvement. N.b., I'm talking about `swapoff
-a`, not `sysctl -w vm.swappiness=0` -- I find that the latter does *not*
eliminate swapping in the way that's needed to achieve the desired goal in
this case. Again, exercise caution in doing this, discuss, research, etc.).
Related documentation was added in 8.5, but absolutely applies to 7.3.1 as
well:
https://lucene.apache.org/solr/guide/8_7/taking-solr-to-production.html#avoid-swapping-nix-operating-systems
-- the note there about "lowering swappiness" being an acceptable
alternative contradicts my experience, but I suppose ymmv?
3. if you're faceting on fields -- especially high-cardinality fields (many
values) -- make sure that you have `docValues=true, uninvertible=false`
configured (to ensure that you're not building large on-heap data
structures when there's an alternative that doesn't require it.

These are all recommendations that are explained in more detail by others
elsewhere; I think they should all apply to 7.3.1; fwiw, I would recommend
upgrading if you have the (human) bandwidth to do so. Good luck!

Michael

On Tue, Jan 12, 2021 at 8:39 AM Jeremy Smith <[hidden email]> wrote:

> Thanks Michael,
>      SOLR-13336 seems intriguing.  I'm not a solr expert, but I believe
> these are the relevant sections from our schema definition:
>
>     <fieldType name="specimenId" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>     <fieldType name="ml_text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="false">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.SynonymGraphFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> Our other fieldTypes don't have any analyzers attached to them.
>
>
> If SOLR-13336 is the cause of the issue is the best remedy to upgrade to
> solr 8?  It doesn't look like the fix was back patched to 7.x.
>
> Our schema has some issues arising from not fully understanding Solr and
> just copying existing structures from the defaults.  In this case,
> stopwords.txt is completely empty and synonyms.txt is just the default
> synonyms.txt, which seems not useful at all for us.  Could I just take out
> the StopFilterFactory and SynonymGraphFilterFactory from the query section
> (and maybe the StopFilterFactory from the index section as well)?
>
> Thanks again,
> Jeremy
>
> ________________________________
> From: Michael Gibney <[hidden email]>
> Sent: Monday, January 11, 2021 8:30 PM
> To: [hidden email] <[hidden email]>
> Subject: Re: Solr using all available CPU and becoming unresponsive
>
> Hi Jeremy,
> Can you share your analysis chain configs? (SOLR-13336 can manifest in a
> similar way, and would affect 7.3.1 with a susceptible config, given the
> right (wrong?) input ...)
> Michael
>
> On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith <[hidden email]> wrote:
>
> > Hello all,
> >      We have been struggling with an issue where solr will intermittently
> > use all available CPU and become unresponsive.  It will remain in this
> > state until we restart.  Solr will remain stable for some time, usually a
> > few hours to a few days, before this happens again.  We've tried
> adjusting
> > the caches and adding memory to both the VM and JVM, but we haven't been
> > able to solve the issue yet.
> >
> > Here is some info about our server:
> > Solr:
> >   Solr 7.3.1, running on Java 1.8
> >   Running in cloud mode, but there's only one core
> >
> > Host:
> >   CentOS7
> >   8 CPU, 56GB RAM
> >   The only other processes running on this VM are two zookeepers, one for
> > this Solr instance, one for another Solr instance
> >
> > Solr Config:
> >  - One Core
> >  - 36 Million documents (Max Doc), 28 million (Num Docs)
> >  - ~15GB
> >  - 10-20 Requests/second
> >  - The schema is fairly large (~100 fields) and we allow faceting and
> > searching on many, but not all, of the fields
> >  - Data are imported once per minute through the DataImportHandler, with
> a
> > hard commit at the end.  We usually index ~100-500 documents per minute,
> > with many of these being updates to existing documents.
> >
> > Cache settings:
> >     <filterCache class="solr.FastLRUCache"
> >                  size="256"
> >                  initialSize="256"
> >                  autowarmCount="8"
> >                  showItems="64"/>
> >
> >     <queryResultCache class="solr.LRUCache"
> >                       size="256"
> >                       initialSize="256"
> >                       autowarmCount="0"/>
> >
> >     <documentCache class="solr.LRUCache"
> >                    size="1024"
> >                    initialSize="1024"
> >                    autowarmCount="0"/>
> >
> > For the filterCache, we have tried sizes as low as 128, which caused our
> > CPU usage to go up and didn't solve our issue.  autowarmCount used to be
> > much higher, but we have reduced it to try to address this issue.
> >
> >
> > The behavior we see:
> >
> > Solr is normally using ~3-6GB of heap and we usually have ~20GB of free
> > memory.  Occasionally, though, solr is not able to free up memory and the
> > heap usage climbs.  Analyzing the GC logs shows a sharp incline of usage
> > with the GC (the default CMS) working hard to free memory, but not
> > accomplishing much.  Eventually, it fills up the heap, maxes out the
> CPUs,
> > and never recovers.  We have tried to analyze the logs to see if there
> are
> > particular queries causing issues or if there are network issues to
> > zookeeper, but we haven't been able to find any patterns.  After the
> issues
> > start, we often see session timeouts to zookeeper, but it doesn't appear​
> > that they are the cause.
> >
> >
> >
> > Does anyone have any recommendations on things to try or metrics to look
> > into or configuration issues I may be overlooking?
> >
> > Thanks,
> > Jeremy
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr using all available CPU and becoming unresponsive

Charlie Hull
Hi Jeremy,

You might find our recent blog on Debugging Solr Performance Issues
useful
https://opensourceconnections.com/blog/2021/01/05/a-solr-performance-debugging-toolkit/ 
- also check out Savan Das' blog which is linked within.

Best

Charlie

On 12/01/2021 14:53, Michael Gibney wrote:

> Ahh ok. If those are your only fieldType definitions, and most of your
> config is copied from the default, then SOLR-13336 is unlikely to be the
> culprit. Looking at more general options, off the top of my head:
> 1. make sure you haven't allocated all physical memory to heap (leave a
> decent amount for OS page cache)
> 2. disable swap, if you can (this is esp. important if using network
> storage as swap). There are potential downsides to this (so proceed with
> caution); but if part of your heap gets swapped out (and it almost
> certainly will, with a sufficiently large heap) full GCs lead to a swap
> storm that compounds the problem. (fwiw, this is probably the first thing
> I'd recommend looking into and trying, because it's so easy, and can in
> some cases yield a dramatic improvement. N.b., I'm talking about `swapoff
> -a`, not `sysctl -w vm.swappiness=0` -- I find that the latter does *not*
> eliminate swapping in the way that's needed to achieve the desired goal in
> this case. Again, exercise caution in doing this, discuss, research, etc.).
> Related documentation was added in 8.5, but absolutely applies to 7.3.1 as
> well:
> https://lucene.apache.org/solr/guide/8_7/taking-solr-to-production.html#avoid-swapping-nix-operating-systems
> -- the note there about "lowering swappiness" being an acceptable
> alternative contradicts my experience, but I suppose ymmv?
> 3. if you're faceting on fields -- especially high-cardinality fields (many
> values) -- make sure that you have `docValues=true, uninvertible=false`
> configured (to ensure that you're not building large on-heap data
> structures when there's an alternative that doesn't require it.
>
> These are all recommendations that are explained in more detail by others
> elsewhere; I think they should all apply to 7.3.1; fwiw, I would recommend
> upgrading if you have the (human) bandwidth to do so. Good luck!
>
> Michael
>
> On Tue, Jan 12, 2021 at 8:39 AM Jeremy Smith <[hidden email]> wrote:
>
>> Thanks Michael,
>>       SOLR-13336 seems intriguing.  I'm not a solr expert, but I believe
>> these are the relevant sections from our schema definition:
>>
>>      <fieldType name="specimenId" class="solr.TextField"
>> positionIncrementGap="100">
>>        <analyzer type="index">
>>          <tokenizer class="solr.StandardTokenizerFactory"/>
>>          <filter class="solr.LowerCaseFilterFactory"/>
>>        </analyzer>
>>        <analyzer type="query">
>>          <tokenizer class="solr.StandardTokenizerFactory"/>
>>          <filter class="solr.LowerCaseFilterFactory"/>
>>        </analyzer>
>>      </fieldType>
>>      <fieldType name="ml_text_general" class="solr.TextField"
>> positionIncrementGap="100" multiValued="false">
>>        <analyzer type="index">
>>          <tokenizer class="solr.StandardTokenizerFactory"/>
>>          <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>>          <filter class="solr.LowerCaseFilterFactory"/>
>>        </analyzer>
>>        <analyzer type="query">
>>          <tokenizer class="solr.StandardTokenizerFactory"/>
>>          <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>>          <filter class="solr.SynonymGraphFilterFactory"
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>          <filter class="solr.LowerCaseFilterFactory"/>
>>        </analyzer>
>>      </fieldType>
>>
>> Our other fieldTypes don't have any analyzers attached to them.
>>
>>
>> If SOLR-13336 is the cause of the issue is the best remedy to upgrade to
>> solr 8?  It doesn't look like the fix was back patched to 7.x.
>>
>> Our schema has some issues arising from not fully understanding Solr and
>> just copying existing structures from the defaults.  In this case,
>> stopwords.txt is completely empty and synonyms.txt is just the default
>> synonyms.txt, which seems not useful at all for us.  Could I just take out
>> the StopFilterFactory and SynonymGraphFilterFactory from the query section
>> (and maybe the StopFilterFactory from the index section as well)?
>>
>> Thanks again,
>> Jeremy
>>
>> ________________________________
>> From: Michael Gibney <[hidden email]>
>> Sent: Monday, January 11, 2021 8:30 PM
>> To: [hidden email] <[hidden email]>
>> Subject: Re: Solr using all available CPU and becoming unresponsive
>>
>> Hi Jeremy,
>> Can you share your analysis chain configs? (SOLR-13336 can manifest in a
>> similar way, and would affect 7.3.1 with a susceptible config, given the
>> right (wrong?) input ...)
>> Michael
>>
>> On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith <[hidden email]> wrote:
>>
>>> Hello all,
>>>       We have been struggling with an issue where solr will intermittently
>>> use all available CPU and become unresponsive.  It will remain in this
>>> state until we restart.  Solr will remain stable for some time, usually a
>>> few hours to a few days, before this happens again.  We've tried
>> adjusting
>>> the caches and adding memory to both the VM and JVM, but we haven't been
>>> able to solve the issue yet.
>>>
>>> Here is some info about our server:
>>> Solr:
>>>    Solr 7.3.1, running on Java 1.8
>>>    Running in cloud mode, but there's only one core
>>>
>>> Host:
>>>    CentOS7
>>>    8 CPU, 56GB RAM
>>>    The only other processes running on this VM are two zookeepers, one for
>>> this Solr instance, one for another Solr instance
>>>
>>> Solr Config:
>>>   - One Core
>>>   - 36 Million documents (Max Doc), 28 million (Num Docs)
>>>   - ~15GB
>>>   - 10-20 Requests/second
>>>   - The schema is fairly large (~100 fields) and we allow faceting and
>>> searching on many, but not all, of the fields
>>>   - Data are imported once per minute through the DataImportHandler, with
>> a
>>> hard commit at the end.  We usually index ~100-500 documents per minute,
>>> with many of these being updates to existing documents.
>>>
>>> Cache settings:
>>>      <filterCache class="solr.FastLRUCache"
>>>                   size="256"
>>>                   initialSize="256"
>>>                   autowarmCount="8"
>>>                   showItems="64"/>
>>>
>>>      <queryResultCache class="solr.LRUCache"
>>>                        size="256"
>>>                        initialSize="256"
>>>                        autowarmCount="0"/>
>>>
>>>      <documentCache class="solr.LRUCache"
>>>                     size="1024"
>>>                     initialSize="1024"
>>>                     autowarmCount="0"/>
>>>
>>> For the filterCache, we have tried sizes as low as 128, which caused our
>>> CPU usage to go up and didn't solve our issue.  autowarmCount used to be
>>> much higher, but we have reduced it to try to address this issue.
>>>
>>>
>>> The behavior we see:
>>>
>>> Solr is normally using ~3-6GB of heap and we usually have ~20GB of free
>>> memory.  Occasionally, though, solr is not able to free up memory and the
>>> heap usage climbs.  Analyzing the GC logs shows a sharp incline of usage
>>> with the GC (the default CMS) working hard to free memory, but not
>>> accomplishing much.  Eventually, it fills up the heap, maxes out the
>> CPUs,
>>> and never recovers.  We have tried to analyze the logs to see if there
>> are
>>> particular queries causing issues or if there are network issues to
>>> zookeeper, but we haven't been able to find any patterns.  After the
>> issues
>>> start, we often see session timeouts to zookeeper, but it doesn't appear​
>>> that they are the cause.
>>>
>>>
>>>
>>> Does anyone have any recommendations on things to try or metrics to look
>>> into or configuration issues I may be overlooking?
>>>
>>> Thanks,
>>> Jeremy
>>>
>>>

--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
<www.o19s.com>
Founding member of The Search Network <https://thesearchnetwork.com/>
and co-author of Searching the Enterprise
<https://opensourceconnections.com/about-us/books-resources/>
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828