Determing Solr heap requirments and analyzing memory usage

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Determing Solr heap requirments and analyzing memory usage

Brian Ecker
Hello,

We are currently running into a situation where Solr (version 7.4) in slowly using up all available memory allocated to the heap, and then eventually hitting an OutOfMemory error. We have tried increasing the heap size and also tuning the GC settings, but this does not seem to solve the issue. What we see is a slow increase in G1 Old Gen heap utilization until it eventually takes all of the heap space and causes instances to crash. Previously we tried running each instance with 10GB of heap space allocated. We then tried running with 20GB of heap space, and we ran into the same issue. I have attached a histogram of the heap captured from an instance using nearly all the available heap when allocated 10GB. What I’m trying to determine is (1) How much heap does this setup need before it stabilizes and stops crashing with OOM errors, (2) can this requirement somehow be reduced so that we can use less memory, and (3) from the heap histogram, what is actually using memory (lots of primitive type arrays and data structures, but what part of Solr is using those)?

I am aware that distributing the index would reduce the requirements for each shard, but we’d like to avoid that for as long as possible due to operational difficulties associated. As far as I can tell, very few of the conditions listed under https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap section actually apply to our instance. We don’t have a very large index, we never update in production (only query), the documents don’t seem very large (~4KB each), we don’t use faceting, caches are reasonably small (~3GB max), RAMBufferSizeMB is 100MB, we don’t use RAMDirectoryFactory (as far as I can tell), and we don’t use sort parameters. The solr instance is used for a full-text complete-as-you-type use case. The typical query looks something like the following (field names anonymized):

?q=(single_value_f1:"baril" OR multivalue_f1:"baril")^=1 (single_value_f2:(baril) OR multivalue_f2:(baril))^=0.5 &fl=score,myfield1,myfield2,myfield3:myfield3.ar&bf=product(def(myfield3.ar,0),1)&rows=200&df=dummy&spellcheck=on&spellcheck.dictionary=spellchecker.es&spellcheck.dictionary=spellchecker.und&spellcheck.q=baril&spellcheck.accuracy=0.5&spellcheck.count=1&fq=+myfield1:(100 OR 200 OR 500)&fl=score&fl=myfield1&fl=myfield2&fl=myfield3:myfield3.ar

I have attached in various screenshots details from top on a running Solr instance, GC logs, solr-config.xml, and also a heap histogram sampled with Java Mission Control. I also provide various additional details below related to how the instances are set up and details about their configuration.

Operational summary:
We run multiple Solr instances, each acting as a completely independent node. They are not a cluster and are not set up using Solr Cloud. Each replica contains the entire index. These replicas run in Kubernetes on GCP.

GC Settings:
-XX:+UnlockExperimentalVMOptions -Xlog:gc*,heap=info -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=50 -XX:InitiatingHeapOccupancyPercent=40 -XX:-G1UseAdaptiveIHOP

Index summary:
* ~2,100,000 documents
* Total size: 9.09 GB
* Average document size = 9.09 GB / 2,100,000 docs = 4.32 KB/doc
* 215 fields per document
    * 77 are stored.
    * 137 are multivalued
* Makes fields use of many spell checkers for different languages (see solr config.xml)
* Most fields include some sort of tokenization and analysis. Example config:

  <fieldType name=“myfield" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.ASCIIFoldingFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z0-9])" replacement="" replace="all"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="40"/>
    </analyzer>


    <analyzer type="query">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.ASCIIFoldingFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z0-9])" replacement="" replace="all"/>
    </analyzer>
  </fieldType>

Please let me know if there is any additional information required.


solrconfig-anonymized.xml (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Determing Solr heap requirments and analyzing memory usage

Shawn Heisey-2
On 4/23/2019 6:34 AM, Brian Ecker wrote:
> What I’m trying to determine is (1) How much heap does
> this setup need before it stabilizes and stops crashing with OOM errors,
> (2) can this requirement somehow be reduced so that we can use less
> memory, and (3) from the heap histogram, what is actually using memory
> (lots of primitive type arrays and data structures, but what part of
> Solr is using those)?

Exactly one attachment made it through:  The file named
solrconfig-anonymized.xml.  Attachments can't be used to share files
because the mailing list software is going to eat them and we won't see
them.  You'll need to use a file sharing website.  Dropbox is often a
good choice.

We won't be able to tell anything about what's using all the memory from
a histogram.  We would need an actual heap dump from Java.  This file
will be huge -- if you have a 10GB heap, and that heap is full, the file
will likely be larger than 10GB.

There is no way for us to know how much heap you need.  With a large
amount of information about your setup, we can make a guess, but that
guess will probably be wrong.  Info we'll need to make a start:

*) How many documents is this Solr instance handling?  You find this out
by looking at every core and adding up all the "maxDoc" numbers.

*) How much disk space is the index data taking?  This could be found
either by getting a disk usage value for the solr home, or looking at
every core and adding up the size of each one.

*) What kind of queries are you running?  Anything with facets, or
grouping?  Are you using a lot of sort fields?

*) What kind of data is in each document, and how large is that data?

Your cache sizes are reasonable.  So you can't reduce heap requirements
by much by reducing cache sizes.

Here's some info about what takes a lot of heap and ideas for reducing
the requirements:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

That page also reiterates what I said above:  It's unlikely that anybody
will be able to tell you exactly how much heap you need at a minimum.
We can make guesses, but those guesses might be wrong.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Determing Solr heap requirments and analyzing memory usage

Brian Ecker
Thanks for your response. See below please for detailed responses.

On Tue, Apr 23, 2019 at 6:04 PM Shawn Heisey <[hidden email]> wrote:

> On 4/23/2019 6:34 AM, Brian Ecker wrote:
> > What I’m trying to determine is (1) How much heap does
> > this setup need before it stabilizes and stops crashing with OOM errors,
> > (2) can this requirement somehow be reduced so that we can use less
> > memory, and (3) from the heap histogram, what is actually using memory
> > (lots of primitive type arrays and data structures, but what part of
> > Solr is using those)?
>
> Exactly one attachment made it through:  The file named
> solrconfig-anonymized.xml.  Attachments can't be used to share files
> because the mailing list software is going to eat them and we won't see
> them.  You'll need to use a file sharing website.  Dropbox is often a
> good choice.
>

I see. The other files I meant to attach were the GC log (
https://pastebin.com/raw/qeuQwsyd), the heap histogram (
https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
http://oi64.tinypic.com/21r0bk.jpg).

>
> We won't be able to tell anything about what's using all the memory from
> a histogram.  We would need an actual heap dump from Java.  This file
> will be huge -- if you have a 10GB heap, and that heap is full, the file
> will likely be larger than 10GB.


I'll work on getting the heap dump, but would it also be sufficient to use
say a 5GB dump from when it's half full and then extrapolate to the
contents of the heap when it's full? That way the dump would be a bit
easier to work with.

>
> There is no way for us to know how much heap you need.  With a large
> amount of information about your setup, we can make a guess, but that
> guess will probably be wrong.  Info we'll need to make a start:
>

I believe I already provided most of this information in my original post,
as I understand that it's not trivial to make this assessment accurately.
I'll re-iterate below, but please see the original post too because I tried
to provide as much detail as possible.

>
> *) How many documents is this Solr instance handling?  You find this out
> by looking at every core and adding up all the "maxDoc" numbers.
>

There are around 2,100,000 documents.

>
> *) How much disk space is the index data taking?  This could be found
> either by getting a disk usage value for the solr home, or looking at
> every core and adding up the size of each one.
>

The data takes around 9GB on disk.

>
> *) What kind of queries are you running?  Anything with facets, or
> grouping?  Are you using a lot of sort fields?


No facets or grouping and no sort fields. The application performs a
full-text search complete-as-you-type function. Much of this is done using
prefix analyzers and edge ngrams. We also make heavy use of spellchecking.
An example of one of the queries produced is the following:

?q=(single_value_f1:"baril" OR multivalue_f1:"baril")^=1
(single_value_f2:(baril) OR multivalue_f2:(baril))^=0.5
&fl=score,myfield1,myfield2,myfield3:myfield3.ar&bf=product(def(myfield3.ar
,0),1)&rows=200&df=dummy&spellcheck=on&spellcheck.dictionary=spellchecker.es&spellcheck.dictionary=spellchecker.und&spellcheck.q=baril&spellcheck.accuracy=0.5&spellcheck.count=1&fq=+myfield1:(100
OR 200 OR 500)&fl=score&fl=myfield1&fl=myfield2&fl=myfield3:myfield3.ar


> *) What kind of data is in each document, and how large is that data?
>

The data contained is mostly 1-5 words of text in various fields and in
various languages. We apply different tokenizers and some language specific
analyzers for different fields, but almost every field is tokenized. There
are 215 fields in total, 77 of which are stored. Based on the index size on
disk and the number of documents, I guess that gives 4.32 KB/doc on
average.

>
> Your cache sizes are reasonable.  So you can't reduce heap requirements
> by much by reducing cache sizes.
>
> Here's some info about what takes a lot of heap and ideas for reducing
> the requirements:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap


Thank you, but I've seen that page already and that's part of why I'm
confused, as I believe most of those points that usually take a lot of heap
don't seem to apply to my setup.

>
>
> That page also reiterates what I said above:  It's unlikely that anybody
> will be able to tell you exactly how much heap you need at a minimum.
> We can make guesses, but those guesses might be wrong.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: Determing Solr heap requirments and analyzing memory usage

Shawn Heisey-2
On 4/23/2019 11:48 AM, Brian Ecker wrote:
> I see. The other files I meant to attach were the GC log (
> https://pastebin.com/raw/qeuQwsyd), the heap histogram (
> https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
> http://oi64.tinypic.com/21r0bk.jpg).

I have no idea what to do with the histogram.  I doubt it's all that
useful anyway, as it wouldn't have any information about what parts of
the system are using the most.

The GC log is not complete.  It only covers 2 min 47 sec 674 ms of time.
  To get anything useful out of a GC log, it would probably need to
cover hours of runtime.

But if you are experiencing OutOfMemoryError, then either you have run
into something where a memory leak exists, or there's something about
your index or your queries that needs more heap than you have allocated.
  Memory leaks are not super common in Solr, but they have happened.

Tuning GC will never help OOME problems.

The screenshot looks like it matches the info below.

> I'll work on getting the heap dump, but would it also be sufficient to use
> say a 5GB dump from when it's half full and then extrapolate to the
> contents of the heap when it's full? That way the dump would be a bit
> easier to work with.

That might be useful.  The only way to know for sure is to take a look
at it to see if the part of the code using lots of heap is detectable.

> There are around 2,100,000 documents.
<snip>
> The data takes around 9GB on disk.

Ordinarily, I would expect that level of data to not need a whole lot of
heap.  10GB would be more than I would think necessary, but if your
queries are big consumers of memory, I could be wrong.  I ran indexes
with 30 million documents taking up 50GB of disk space on an 8GB heap.
I probably could have gone lower with no problems.

I have absolutely no idea what kind of requirements the spellcheck
feature has.  I've never used that beyond a few test queries.  If the
query information you sent is complete, I wouldn't expect the
non-spellcheck parts to require a whole lot of heap.  So perhaps
spellcheck is the culprit here.  Somebody else will need to comment on that.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Determing Solr heap requirments and analyzing memory usage

Brian Ecker
Just to update here in order to help others that might run into similar
issues in the future, the problem is resolved. The issue was caused by the
queryResultCache. This was very easy to determine by analyzing a heap dump.
In our setup we had the following config:

<queryResultCache class="solr.FastLRUCache" maxRamMB="3072"
autowarmCount="0"/>

In reality this maxRamMB="3072" was not as expected, and this cache was
using *way* more memory (about 6-8 times the amount). See the following
screenshot from Eclipse MAT (http://oi63.tinypic.com/epn341.jpg). Notice in
the left window that ramBytes, the internal calculation of how much memory
Solr currently thinks this cache is using, is 1894333464B (1894MB). Now
notice that the highlighted line, the ConcurrentLRUCache used internally by
the FastLRUCache representing the queryResultCache, is actually using
12212779160B (12212MB). On further investigation, I realized that this
cache is a map from a query with all its associated objects as the key, to
a very simple object containing an array of document (integer) ids as the
value.

Looking into the lucene-solr source, I found the following line for the
calculation of ramBytesUsed
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/util/ConcurrentLRUCache.java#L605.
Surprisingly, the query objects used as keys in the queryResultCache do not
implement Accountable as far as I can tell, and this lines up very well
with our observation of memory usage because in the heap dump we can also
see that the keys in the cache are using substantially more memory than the
values and completely account for the additional memory usage. It was quite
surprising to me that the keys were given a default value of 192B as
specified in LRUCache.DEFAULT_RAM_BYTES_USED because I can't actually
imagine a case where the keys in the queryResultCache would be so small. I
imagine that in almost all cases the keys would actually be larger than the
values for the queryResultCache, but that's probably not true for all
usages of a FastLRUCache.

We solved our memory usage issue by drastically reducing the maxRamMB value
and calculating the actual max usage as maxRamMB * 8. It would be quite
useful to have this detail at least documented somewhere.

-Brian

On Tue, Apr 23, 2019 at 9:49 PM Shawn Heisey <[hidden email]> wrote:

> On 4/23/2019 11:48 AM, Brian Ecker wrote:
> > I see. The other files I meant to attach were the GC log (
> > https://pastebin.com/raw/qeuQwsyd), the heap histogram (
> > https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
> > http://oi64.tinypic.com/21r0bk.jpg).
>
> I have no idea what to do with the histogram.  I doubt it's all that
> useful anyway, as it wouldn't have any information about what parts of
> the system are using the most.
>
> The GC log is not complete.  It only covers 2 min 47 sec 674 ms of time.
>   To get anything useful out of a GC log, it would probably need to
> cover hours of runtime.
>
> But if you are experiencing OutOfMemoryError, then either you have run
> into something where a memory leak exists, or there's something about
> your index or your queries that needs more heap than you have allocated.
>   Memory leaks are not super common in Solr, but they have happened.
>
> Tuning GC will never help OOME problems.
>
> The screenshot looks like it matches the info below.
>
> > I'll work on getting the heap dump, but would it also be sufficient to
> use
> > say a 5GB dump from when it's half full and then extrapolate to the
> > contents of the heap when it's full? That way the dump would be a bit
> > easier to work with.
>
> That might be useful.  The only way to know for sure is to take a look
> at it to see if the part of the code using lots of heap is detectable.
>
> > There are around 2,100,000 documents.
> <snip>
> > The data takes around 9GB on disk.
>
> Ordinarily, I would expect that level of data to not need a whole lot of
> heap.  10GB would be more than I would think necessary, but if your
> queries are big consumers of memory, I could be wrong.  I ran indexes
> with 30 million documents taking up 50GB of disk space on an 8GB heap.
> I probably could have gone lower with no problems.
>
> I have absolutely no idea what kind of requirements the spellcheck
> feature has.  I've never used that beyond a few test queries.  If the
> query information you sent is complete, I wouldn't expect the
> non-spellcheck parts to require a whole lot of heap.  So perhaps
> spellcheck is the culprit here.  Somebody else will need to comment on
> that.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: Determing Solr heap requirments and analyzing memory usage

Erick Erickson
Brian:

Many thanks for letting us know what you found. I’ll attach this to SOLR-13003 which is about this exact issue but doesn’t contain this information. This is a great help.

> On May 2, 2019, at 6:15 AM, Brian Ecker <[hidden email]> wrote:
>
> Just to update here in order to help others that might run into similar
> issues in the future, the problem is resolved. The issue was caused by the
> queryResultCache. This was very easy to determine by analyzing a heap dump.
> In our setup we had the following config:
>
> <queryResultCache class="solr.FastLRUCache" maxRamMB="3072"
> autowarmCount="0"/>
>
> In reality this maxRamMB="3072" was not as expected, and this cache was
> using *way* more memory (about 6-8 times the amount). See the following
> screenshot from Eclipse MAT (http://oi63.tinypic.com/epn341.jpg). Notice in
> the left window that ramBytes, the internal calculation of how much memory
> Solr currently thinks this cache is using, is 1894333464B (1894MB). Now
> notice that the highlighted line, the ConcurrentLRUCache used internally by
> the FastLRUCache representing the queryResultCache, is actually using
> 12212779160B (12212MB). On further investigation, I realized that this
> cache is a map from a query with all its associated objects as the key, to
> a very simple object containing an array of document (integer) ids as the
> value.
>
> Looking into the lucene-solr source, I found the following line for the
> calculation of ramBytesUsed
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/util/ConcurrentLRUCache.java#L605.
> Surprisingly, the query objects used as keys in the queryResultCache do not
> implement Accountable as far as I can tell, and this lines up very well
> with our observation of memory usage because in the heap dump we can also
> see that the keys in the cache are using substantially more memory than the
> values and completely account for the additional memory usage. It was quite
> surprising to me that the keys were given a default value of 192B as
> specified in LRUCache.DEFAULT_RAM_BYTES_USED because I can't actually
> imagine a case where the keys in the queryResultCache would be so small. I
> imagine that in almost all cases the keys would actually be larger than the
> values for the queryResultCache, but that's probably not true for all
> usages of a FastLRUCache.
>
> We solved our memory usage issue by drastically reducing the maxRamMB value
> and calculating the actual max usage as maxRamMB * 8. It would be quite
> useful to have this detail at least documented somewhere.
>
> -Brian
>
> On Tue, Apr 23, 2019 at 9:49 PM Shawn Heisey <[hidden email]> wrote:
>
>> On 4/23/2019 11:48 AM, Brian Ecker wrote:
>>> I see. The other files I meant to attach were the GC log (
>>> https://pastebin.com/raw/qeuQwsyd), the heap histogram (
>>> https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
>>> http://oi64.tinypic.com/21r0bk.jpg).
>>
>> I have no idea what to do with the histogram.  I doubt it's all that
>> useful anyway, as it wouldn't have any information about what parts of
>> the system are using the most.
>>
>> The GC log is not complete.  It only covers 2 min 47 sec 674 ms of time.
>>  To get anything useful out of a GC log, it would probably need to
>> cover hours of runtime.
>>
>> But if you are experiencing OutOfMemoryError, then either you have run
>> into something where a memory leak exists, or there's something about
>> your index or your queries that needs more heap than you have allocated.
>>  Memory leaks are not super common in Solr, but they have happened.
>>
>> Tuning GC will never help OOME problems.
>>
>> The screenshot looks like it matches the info below.
>>
>>> I'll work on getting the heap dump, but would it also be sufficient to
>> use
>>> say a 5GB dump from when it's half full and then extrapolate to the
>>> contents of the heap when it's full? That way the dump would be a bit
>>> easier to work with.
>>
>> That might be useful.  The only way to know for sure is to take a look
>> at it to see if the part of the code using lots of heap is detectable.
>>
>>> There are around 2,100,000 documents.
>> <snip>
>>> The data takes around 9GB on disk.
>>
>> Ordinarily, I would expect that level of data to not need a whole lot of
>> heap.  10GB would be more than I would think necessary, but if your
>> queries are big consumers of memory, I could be wrong.  I ran indexes
>> with 30 million documents taking up 50GB of disk space on an 8GB heap.
>> I probably could have gone lower with no problems.
>>
>> I have absolutely no idea what kind of requirements the spellcheck
>> feature has.  I've never used that beyond a few test queries.  If the
>> query information you sent is complete, I wouldn't expect the
>> non-spellcheck parts to require a whole lot of heap.  So perhaps
>> spellcheck is the culprit here.  Somebody else will need to comment on
>> that.
>>
>> Thanks,
>> Shawn
>>