> Ahh ok. If those are your only fieldType definitions, and most of your
> config is copied from the default, then SOLR-13336 is unlikely to be the
> culprit. Looking at more general options, off the top of my head:
> 1. make sure you haven't allocated all physical memory to heap (leave a
> decent amount for OS page cache)
> 2. disable swap, if you can (this is esp. important if using network
> storage as swap). There are potential downsides to this (so proceed with
> caution); but if part of your heap gets swapped out (and it almost
> certainly will, with a sufficiently large heap) full GCs lead to a swap
> storm that compounds the problem. (fwiw, this is probably the first thing
> I'd recommend looking into and trying, because it's so easy, and can in
> some cases yield a dramatic improvement. N.b., I'm talking about `swapoff
> -a`, not `sysctl -w vm.swappiness=0` -- I find that the latter does *not*
> eliminate swapping in the way that's needed to achieve the desired goal in
> this case. Again, exercise caution in doing this, discuss, research, etc.).
> Related documentation was added in 8.5, but absolutely applies to 7.3.1 as
> well:
>
https://lucene.apache.org/solr/guide/8_7/taking-solr-to-production.html#avoid-swapping-nix-operating-systems> -- the note there about "lowering swappiness" being an acceptable
> alternative contradicts my experience, but I suppose ymmv?
> 3. if you're faceting on fields -- especially high-cardinality fields (many
> values) -- make sure that you have `docValues=true, uninvertible=false`
> configured (to ensure that you're not building large on-heap data
> structures when there's an alternative that doesn't require it.
>
> These are all recommendations that are explained in more detail by others
> elsewhere; I think they should all apply to 7.3.1; fwiw, I would recommend
> upgrading if you have the (human) bandwidth to do so. Good luck!
>
> Michael
>
> On Tue, Jan 12, 2021 at 8:39 AM Jeremy Smith <
[hidden email]> wrote:
>
>> Thanks Michael,
>> SOLR-13336 seems intriguing. I'm not a solr expert, but I believe
>> these are the relevant sections from our schema definition:
>>
>> <fieldType name="specimenId" class="solr.TextField"
>> positionIncrementGap="100">
>> <analyzer type="index">
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>> <analyzer type="query">
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>> </fieldType>
>> <fieldType name="ml_text_general" class="solr.TextField"
>> positionIncrementGap="100" multiValued="false">
>> <analyzer type="index">
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>> <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>> <analyzer type="query">
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>> <filter class="solr.SynonymGraphFilterFactory"
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>> </fieldType>
>>
>> Our other fieldTypes don't have any analyzers attached to them.
>>
>>
>> If SOLR-13336 is the cause of the issue is the best remedy to upgrade to
>> solr 8? It doesn't look like the fix was back patched to 7.x.
>>
>> Our schema has some issues arising from not fully understanding Solr and
>> just copying existing structures from the defaults. In this case,
>> stopwords.txt is completely empty and synonyms.txt is just the default
>> synonyms.txt, which seems not useful at all for us. Could I just take out
>> the StopFilterFactory and SynonymGraphFilterFactory from the query section
>> (and maybe the StopFilterFactory from the index section as well)?
>>
>> Thanks again,
>> Jeremy
>>
>> ________________________________
>> From: Michael Gibney <
[hidden email]>
>> Sent: Monday, January 11, 2021 8:30 PM
>> To:
[hidden email] <
[hidden email]>
>> Subject: Re: Solr using all available CPU and becoming unresponsive
>>
>> Hi Jeremy,
>> Can you share your analysis chain configs? (SOLR-13336 can manifest in a
>> similar way, and would affect 7.3.1 with a susceptible config, given the
>> right (wrong?) input ...)
>> Michael
>>
>> On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith <
[hidden email]> wrote:
>>
>>> Hello all,
>>> We have been struggling with an issue where solr will intermittently
>>> use all available CPU and become unresponsive. It will remain in this
>>> state until we restart. Solr will remain stable for some time, usually a
>>> few hours to a few days, before this happens again. We've tried
>> adjusting
>>> the caches and adding memory to both the VM and JVM, but we haven't been
>>> able to solve the issue yet.
>>>
>>> Here is some info about our server:
>>> Solr:
>>> Solr 7.3.1, running on Java 1.8
>>> Running in cloud mode, but there's only one core
>>>
>>> Host:
>>> CentOS7
>>> 8 CPU, 56GB RAM
>>> The only other processes running on this VM are two zookeepers, one for
>>> this Solr instance, one for another Solr instance
>>>
>>> Solr Config:
>>> - One Core
>>> - 36 Million documents (Max Doc), 28 million (Num Docs)
>>> - ~15GB
>>> - 10-20 Requests/second
>>> - The schema is fairly large (~100 fields) and we allow faceting and
>>> searching on many, but not all, of the fields
>>> - Data are imported once per minute through the DataImportHandler, with
>> a
>>> hard commit at the end. We usually index ~100-500 documents per minute,
>>> with many of these being updates to existing documents.
>>>
>>> Cache settings:
>>> <filterCache class="solr.FastLRUCache"
>>> size="256"
>>> initialSize="256"
>>> autowarmCount="8"
>>> showItems="64"/>
>>>
>>> <queryResultCache class="solr.LRUCache"
>>> size="256"
>>> initialSize="256"
>>> autowarmCount="0"/>
>>>
>>> <documentCache class="solr.LRUCache"
>>> size="1024"
>>> initialSize="1024"
>>> autowarmCount="0"/>
>>>
>>> For the filterCache, we have tried sizes as low as 128, which caused our
>>> CPU usage to go up and didn't solve our issue. autowarmCount used to be
>>> much higher, but we have reduced it to try to address this issue.
>>>
>>>
>>> The behavior we see:
>>>
>>> Solr is normally using ~3-6GB of heap and we usually have ~20GB of free
>>> memory. Occasionally, though, solr is not able to free up memory and the
>>> heap usage climbs. Analyzing the GC logs shows a sharp incline of usage
>>> with the GC (the default CMS) working hard to free memory, but not
>>> accomplishing much. Eventually, it fills up the heap, maxes out the
>> CPUs,
>>> and never recovers. We have tried to analyze the logs to see if there
>> are
>>> particular queries causing issues or if there are network issues to
>>> zookeeper, but we haven't been able to find any patterns. After the
>> issues
>>> start, we often see session timeouts to zookeeper, but it doesn't appear
>>> that they are the cause.
>>>
>>>
>>>
>>> Does anyone have any recommendations on things to try or metrics to look
>>> into or configuration issues I may be overlooking?
>>>
>>> Thanks,
>>> Jeremy
>>>
>>>