Solr heap Old generation grows and it is not recovered by G1GC

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr heap Old generation grows and it is not recovered by G1GC

Odysci
Hi,

I have a solrcloud setup with 12GB heap and I've been trying to optimize it
to avoid OOM errors. My index has about 30million docs and about 80GB
total, 2 shards, 2 replicas.

In my testing setup I submit multiple queries to solr (same node),
sequentially, and with no overlap between the documents returned in each
query (so docs do not need to be kept in cache)

When the queries return a smallish number of docs (say, below 1000), the
heap behavior seems "normal". Monitoring the gc log I see that young
generation grows then when GC kicks in, it goes considerably down. And the
old generation grows just a bit.

However, at some point i have a query that returns over 300K docs (for a
total size of approximately 1GB). At this very point the OLD generation
size grows (almost by 2GB), and it remains high for all remaining time.
Even as new queries are executed, the OLD generation size does not go down,
despite multiple GC calls done afterwards.

Can anyone shed some light on this behavior?

I'm using the following GC options:
GC_TUNE=" \

-XX:+UseG1GC \

-XX:+PerfDisableSharedMem \

-XX:+ParallelRefProcEnabled \

-XX:G1HeapRegionSize=4m \

-XX:MaxGCPauseMillis=250 \

-XX:InitiatingHeapOccupancyPercent=75 \

-XX:+UseLargePages \

-XX:+AggressiveOpts \

"
Thanks
Reinaldo
Reply | Threaded
Open this post in threaded view
|

Re: Solr heap Old generation grows and it is not recovered by G1GC

kamaci
Hi Reinaldo,

Which version of Solr do you use and could you share your cache settings?

On the other hand, did you check here:
https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems

Kind Regards,
Furkan KAMACI

On Thu, Jun 25, 2020 at 11:09 PM Odysci <[hidden email]> wrote:

> Hi,
>
> I have a solrcloud setup with 12GB heap and I've been trying to optimize it
> to avoid OOM errors. My index has about 30million docs and about 80GB
> total, 2 shards, 2 replicas.
>
> In my testing setup I submit multiple queries to solr (same node),
> sequentially, and with no overlap between the documents returned in each
> query (so docs do not need to be kept in cache)
>
> When the queries return a smallish number of docs (say, below 1000), the
> heap behavior seems "normal". Monitoring the gc log I see that young
> generation grows then when GC kicks in, it goes considerably down. And the
> old generation grows just a bit.
>
> However, at some point i have a query that returns over 300K docs (for a
> total size of approximately 1GB). At this very point the OLD generation
> size grows (almost by 2GB), and it remains high for all remaining time.
> Even as new queries are executed, the OLD generation size does not go down,
> despite multiple GC calls done afterwards.
>
> Can anyone shed some light on this behavior?
>
> I'm using the following GC options:
> GC_TUNE=" \
>
> -XX:+UseG1GC \
>
> -XX:+PerfDisableSharedMem \
>
> -XX:+ParallelRefProcEnabled \
>
> -XX:G1HeapRegionSize=4m \
>
> -XX:MaxGCPauseMillis=250 \
>
> -XX:InitiatingHeapOccupancyPercent=75 \
>
> -XX:+UseLargePages \
>
> -XX:+AggressiveOpts \
>
> "
> Thanks
> Reinaldo
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr heap Old generation grows and it is not recovered by G1GC

Odysci
Hi Furkan,

I'm using solr 8.3.1 (with openjdk version "11.0.7"),  with the following
cache settings:

    <filterCache class="solr.CaffeineCache"

                 size="8192"

                 initialSize="512"

                 maxRamMB="512"

                 autowarmCount="128"/>


   <queryResultCache class="solr.CaffeineCache"

                      size="8192"

                      initialSize="1024"

                      maxRamMB="256"

                      autowarmCount="128"/>


   <documentCache class="solr.CaffeineCache"

                   size="16384"

                   initialSize="1024"

                   maxRamMB="1280"

                   autowarmCount="0"/>



   <fieldValueCache class="solr.CaffeineCache"

                     size="64"

                     autowarmCount="128"

                     showItems="32" />


Thanks
Reinaldo

On Thu, Jun 25, 2020 at 7:45 PM Furkan KAMACI <[hidden email]>
wrote:

> Hi Reinaldo,
>
> Which version of Solr do you use and could you share your cache settings?
>
> On the other hand, did you check here:
> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, Jun 25, 2020 at 11:09 PM Odysci <[hidden email]> wrote:
>
> > Hi,
> >
> > I have a solrcloud setup with 12GB heap and I've been trying to optimize
> it
> > to avoid OOM errors. My index has about 30million docs and about 80GB
> > total, 2 shards, 2 replicas.
> >
> > In my testing setup I submit multiple queries to solr (same node),
> > sequentially, and with no overlap between the documents returned in each
> > query (so docs do not need to be kept in cache)
> >
> > When the queries return a smallish number of docs (say, below 1000), the
> > heap behavior seems "normal". Monitoring the gc log I see that young
> > generation grows then when GC kicks in, it goes considerably down. And
> the
> > old generation grows just a bit.
> >
> > However, at some point i have a query that returns over 300K docs (for a
> > total size of approximately 1GB). At this very point the OLD generation
> > size grows (almost by 2GB), and it remains high for all remaining time.
> > Even as new queries are executed, the OLD generation size does not go
> down,
> > despite multiple GC calls done afterwards.
> >
> > Can anyone shed some light on this behavior?
> >
> > I'm using the following GC options:
> > GC_TUNE=" \
> >
> > -XX:+UseG1GC \
> >
> > -XX:+PerfDisableSharedMem \
> >
> > -XX:+ParallelRefProcEnabled \
> >
> > -XX:G1HeapRegionSize=4m \
> >
> > -XX:MaxGCPauseMillis=250 \
> >
> > -XX:InitiatingHeapOccupancyPercent=75 \
> >
> > -XX:+UseLargePages \
> >
> > -XX:+AggressiveOpts \
> >
> > "
> > Thanks
> > Reinaldo
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr heap Old generation grows and it is not recovered by G1GC

Zisis T.
I have faced similar issues and the culprit was filterCache when using
maxRAMMB. More specifically on a sharded Solr cluster with lots of faceting
during search (which makes use of the filterCache in a distributed setting)
I noticed that maxRAMMB value was not respected. I had a value of 300MB set
but I witnessed an instance sized a couple of GBs in a heap dump at some
point. The thing that I found was that because the keys of the Map
(BooleanQuery or something if I recall correctly) was not implementing the
Accountable interface it was NOT taken into account when calculating the
cache's size. But all that was on a 7.5 cluster using FastLRUCache.

There's also https://issues.apache.org/jira/browse/SOLR-12743 on caches
memory leak which does not seem to have been fixed yet although the trigger
points of this memory leak are not clear. I've witnessed this as well on a
7.5 cluster with multiple (>10) filter cache objects for a single core each
holding from a few MBs to GBs.

Try to get a heap dump from your cluster, the truth is almost always hidden
there.

One workaround which seems to alleviate the problem is to check you running
Solr cluster and see in reality how many cache entries actually give you a
good hit ratio and get rid of the maxRAMMB attribute. Play only with the
size.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr heap Old generation grows and it is not recovered by G1GC

Odysci
Thanks.
The heapdump indicated that most of the space was occupied by the caches
(filter and documentCache in my case).
I followed your suggestion of removing the limit on maxRAMMB on filterCache
and documentCache and decreasing the number of entries allowed.
It did have a significant impact on the used heap size. So I guess, I have
to find the sweet spot between hit ratio and size
Still, the OldGeneration does not seem to fall significantly even if I
force a full GC (using jvisualvm).

Any other suggestions are welcome!
Thanks

Reinaldo

On Fri, Jun 26, 2020 at 5:05 AM Zisis T. <[hidden email]> wrote:

> I have faced similar issues and the culprit was filterCache when using
> maxRAMMB. More specifically on a sharded Solr cluster with lots of faceting
> during search (which makes use of the filterCache in a distributed setting)
> I noticed that maxRAMMB value was not respected. I had a value of 300MB set
> but I witnessed an instance sized a couple of GBs in a heap dump at some
> point. The thing that I found was that because the keys of the Map
> (BooleanQuery or something if I recall correctly) was not implementing the
> Accountable interface it was NOT taken into account when calculating the
> cache's size. But all that was on a 7.5 cluster using FastLRUCache.
>
> There's also https://issues.apache.org/jira/browse/SOLR-12743 on caches
> memory leak which does not seem to have been fixed yet although the trigger
> points of this memory leak are not clear. I've witnessed this as well on a
> 7.5 cluster with multiple (>10) filter cache objects for a single core each
> holding from a few MBs to GBs.
>
> Try to get a heap dump from your cluster, the truth is almost always hidden
> there.
>
> One workaround which seems to alleviate the problem is to check you running
> Solr cluster and see in reality how many cache entries actually give you a
> good hit ratio and get rid of the maxRAMMB attribute. Play only with the
> size.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr heap Old generation grows and it is not recovered by G1GC

Zisis T.
Hi Reinaldo,

Glad that helped. I've had several sleepless nights with Solr clusters
failing spectacularly in production due to that but I still cannot say that
the problem is completely away.

Did you check in the heap dump if you have cache memory leaks as described
in https://issues.apache.org/jira/browse/SOLR-12743?

Say you have 4 cache instances (filterCache, documentCache etc) per core and
you have 5 Solr cores you should not see more than 20 CaffeineCache
instances in your dump.

Unfortunately I still cannot determine what exactly triggers this memory
leak although since I removed the maxRAMMB setting I've not seen similar
behavior for more than a month now in production.

The weird thing is that I was running on Solr 7.5.0 for quite some time
without any issues and it was at some point in time that those problems
started appearing...



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr heap Old generation grows and it is not recovered by G1GC

Odysci
Hi,

Just summarizing:
I've experimented using different sized of filtercache and documentcache,
after removing any maxRamMB.  Now the heap seems to behave as expected,
that is, it grows, then GC (not full one) kicks in multiple times and keep
the used heap under control. eventually full GC may kick in and the size
goes down a little more.

Previously, when I had maxRamMB specified, the heap would grow considerably
(for a search returning about 300K docs) and after that it would not go
down again (and those docs were never again requested). This did not work
well.

I looked at the heapdump and saw all the caches (filter, document, one type
per core), so if you have multiple shards you may have to be very careful
not to increase the cache sizes, because they apply to each core.

I still think there is something strange when a search returns a large
number of docs - the G1GC didn't seem to handle that very well in some
cases (when maxRamMB was specified), but that may be the symptom and not
the cause.
Thanks for the help.

Reinaldo

On Sat, Jun 27, 2020 at 4:29 AM Zisis T. <[hidden email]> wrote:

> Hi Reinaldo,
>
> Glad that helped. I've had several sleepless nights with Solr clusters
> failing spectacularly in production due to that but I still cannot say that
> the problem is completely away.
>
> Did you check in the heap dump if you have cache memory leaks as described
> in https://issues.apache.org/jira/browse/SOLR-12743?
>
> Say you have 4 cache instances (filterCache, documentCache etc) per core
> and
> you have 5 Solr cores you should not see more than 20 CaffeineCache
> instances in your dump.
>
> Unfortunately I still cannot determine what exactly triggers this memory
> leak although since I removed the maxRAMMB setting I've not seen similar
> behavior for more than a month now in production.
>
> The weird thing is that I was running on Solr 7.5.0 for quite some time
> without any issues and it was at some point in time that those problems
> started appearing...
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>