Troubleshooting java heap out-of-memory

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Troubleshooting java heap out-of-memory

jrodenburg
I've read through the list entries here, the Lucene list, and the wiki docs
and am not resolving a major pain point  for us.  We've been trying to
determine what could possibly cause us to hit this in our given environment,
and am hoping more eyes on this issue can help.

Our scenario: 150MB index, 140000 documents, read/write servers in place
using standard replication.  Running Tomcat 5.5.17 on Redhat Enterprise
Linux 4.  Java configured to start with -Xmx1024m.  We encounter java heap
out-of-memory issues on the read server at staggered times, but usually once
every 48 hours.  Search request load is roughly 2 searches every 3 seconds,
with some spikes here or there.  We are using facets: 3 are based on type
integer, one is based on type string.  We are using sorts: 1 is based on
type sint, 2 are based on type date.  Caching is disabled.  Solr bits are
also from September 2006.

Is there anything in that configuration that we should interrogate?

thanks,
j
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

jrodenburg
Hoping I can get a better response with a more directed question:

With facet queries and the fields used, what qualifies as a "large" number
of values?  The wiki uses U.S. states as an example, so the number of unique
values = 50.  More to the point, is there an algorithm that I can use to
estimate the cache consumption rate for facet queries?

-- j




On 4/1/07, Jeff Rodenburg <[hidden email]> wrote:

>
> I've read through the list entries here, the Lucene list, and the wiki
> docs and am not resolving a major pain point  for us.  We've been trying to
> determine what could possibly cause us to hit this in our given environment,
> and am hoping more eyes on this issue can help.
>
> Our scenario: 150MB index, 140000 documents, read/write servers in place
> using standard replication.  Running Tomcat 5.5.17 on Redhat Enterprise
> Linux 4.  Java configured to start with -Xmx1024m.  We encounter java heap
> out-of-memory issues on the read server at staggered times, but usually once
> every 48 hours.  Search request load is roughly 2 searches every 3 seconds,
> with some spikes here or there.  We are using facets: 3 are based on type
> integer, one is based on type string.  We are using sorts: 1 is based on
> type sint, 2 are based on type date.  Caching is disabled.  Solr bits are
> also from September 2006.
>
> Is there anything in that configuration that we should interrogate?
>
> thanks,
> j
>
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

Mike Klaas
On 4/2/07, Jeff Rodenburg <[hidden email]> wrote:
> Hoping I can get a better response with a more directed question:

I haven't answered your original question as it seems that general
java memory debugging techniques would be the most useful thing here.

> With facet queries and the fields used, what qualifies as a "large" number
> of values?  The wiki uses U.S. states as an example, so the number of unique
> values = 50.  More to the point, is there an algorithm that I can use to
> estimate the cache consumption rate for facet queries?

The cache consumption rate is one entry per unique value in all
faceted fields, excluding fields that have faceting satisfied via
FieldCache (single-valued fields with exacly one token per document).

The size of each cached filter is num docs / 8 bytes, unless the
number of maching docs is less than the useHashSet threshold in
solrconfig.xml.

Sorting requires FieldCache population, which consists of an integer
per document plus the sum of the lengths of the unique values in the
field (less for pure int/float fields, but I'm not sure if Solr's sint
qualifies).

Both faceting and sorting shouldn't consume more memory after their
datastructures have been built, so it would be odd to see OOM after 48
hours if they were the cause.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

Yonik Seeley-2
In reply to this post by jrodenburg
On 4/1/07, Jeff Rodenburg <[hidden email]> wrote:
> Our scenario: 150MB index, 140000 documents, read/write servers in place
> using standard replication.  Running Tomcat 5.5.17 on Redhat Enterprise
> Linux 4.  Java configured to start with -Xmx1024m.  We encounter java heap
> out-of-memory issues on the read server at staggered times, but usually once
> every 48 hours.

Could you do a grep through your server logs for "WARNING", to
eliminate the possibility of multiple overlapping searchers causing
the OOM issue?

Are you doing incremental updates?  If so, try lowering your
mergeFactor for the index, or optimize more frequently.  As an index
is incrementally updated, old docs are marked as deleted and new docs
are added.  This leaves "holes" in the document id space which can
increase memory usage.  Both BitSet filters and FieldCache entry sizes
are proportionally related to maxDoc (the maximum internal docid in
the index).

You can see maxDoc from the statistics page... there might be a correlation.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

jrodenburg
On 4/2/07, Yonik Seeley <[hidden email]> wrote:

>
> On 4/1/07, Jeff Rodenburg <[hidden email]> wrote:
> > Our scenario: 150MB index, 140000 documents, read/write servers in place
> > using standard replication.  Running Tomcat 5.5.17 on Redhat Enterprise
> > Linux 4.  Java configured to start with -Xmx1024m.  We encounter java
> heap
> > out-of-memory issues on the read server at staggered times, but usually
> once
> > every 48 hours.
>
> Could you do a grep through your server logs for "WARNING", to
> eliminate the possibility of multiple overlapping searchers causing
> the OOM issue?


We're not seeing warnings for overlapping searchers prior to the oom
events.  Only "SEVERE" -- java.lang.OutOfMemoryError: Java heap space.

Are you doing incremental updates?  If so, try lowering your
> mergeFactor for the index, or optimize more frequently.  As an index
> is incrementally updated, old docs are marked as deleted and new docs
> are added.  This leaves "holes" in the document id space which can
> increase memory usage.  Both BitSet filters and FieldCache entry sizes
> are proportionally related to maxDoc (the maximum internal docid in
> the index).
>
> You can see maxDoc from the statistics page... there might be a
> correlation.


We are doing incremental updates, and we optimize quite a bit.  mergeFactor
presently set to 10.
maxDoc count = 144156
numDocs count = 144145
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

Chris Hostetter-3
In reply to this post by jrodenburg

: values = 50.  More to the point, is there an algorithm that I can use to
: estimate the cache consumption rate for facet queries?

I'm confused ... i thought you said in your orriginal mail that you had
all the caching disabled? (except for FieldCache which is so low level in
Lucene it's always used)

are the fields you are faceting on multiValued or single valued?


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

jrodenburg
In reply to this post by Mike Klaas
Thanks for the pointers, Mike.  I'm trying to determine the math to resolve
some strange numbers we're seeing.  Here's the top dozen lines from a jmap
analysis on a heap dump:

Size        Count     Class description
---------------------------------------------------------
428246064   1792204   int[]
93175176    3213131   char[]
77195040    3216460   java.lang.String
67479112    3945      long[]
53073888    1658559   java.util.LinkedHashMap$Entry
39668352    1652848   org.apache.solr.search.HashDocSet
28195280    27131     byte[]
27165456    1697841   org.apache.lucene.index.Term
27024016    1689001   org.apache.lucene.search.TermQuery
22265920    695810    org.apache.lucene.document.Field
4931568     5974      java.lang.Object[]
4366768     77978     org.apache.lucene.store.FSIndexInput

I see the HashDocSet numbers (count=1.65 million), assume they have
references to the int arrays (count=1.79 million)  and wonder how I could
have so many of those in memory.  A few more data tidbits:

- Facet field Id1 = type int, unique values = 2710
- Facet field Id2 = type int, unique values = 65
- Facet field Id3 = type string, unique values = 15179

Thanks for the extra eyes on this, much appreciated.

-- j



On 4/2/07, Mike Klaas <[hidden email]> wrote:

>
> On 4/2/07, Jeff Rodenburg <[hidden email]> wrote:
> > With facet queries and the fields used, what qualifies as a "large"
> number
> > of values?  The wiki uses U.S. states as an example, so the number of
> unique
> > values = 50.  More to the point, is there an algorithm that I can use to
> > estimate the cache consumption rate for facet queries?
>
> The cache consumption rate is one entry per unique value in all
> faceted fields, excluding fields that have faceting satisfied via
> FieldCache (single-valued fields with exacly one token per document).
>
> The size of each cached filter is num docs / 8 bytes, unless the
> number of maching docs is less than the useHashSet threshold in
> solrconfig.xml.
>
> Sorting requires FieldCache population, which consists of an integer
> per document plus the sum of the lengths of the unique values in the
> field (less for pure int/float fields, but I'm not sure if Solr's sint
> qualifies).
>
> Both faceting and sorting shouldn't consume more memory after their
> datastructures have been built, so it would be odd to see OOM after 48
> hours if they were the cause.
>
> -Mike
>
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

Yonik Seeley-2
In reply to this post by jrodenburg
On 4/2/07, Jeff Rodenburg <[hidden email]> wrote:
> We are doing incremental updates, and we optimize quite a bit.  mergeFactor
> presently set to 10.
> maxDoc count = 144156
> numDocs count = 144145

What version of Solr are you using?  Another potential OOM (multiple
threads generating the same FieldCache entry) was fixed in later
versions of Lucene included with Solr.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

jrodenburg
In reply to this post by Chris Hostetter-3
Sorry for the confusion.  We do have caching disabled.  I was asking the
question because I wasn't certain if the configurable cache settings applied
throughout, or if the FieldCache in lucene still came in play.

The two integer-based facets are single valued per document.  The
string-based facet is multiValued.



On 4/2/07, Chris Hostetter <[hidden email]> wrote:

>
>
> : values = 50.  More to the point, is there an algorithm that I can use to
> : estimate the cache consumption rate for facet queries?
>
> I'm confused ... i thought you said in your orriginal mail that you had
> all the caching disabled? (except for FieldCache which is so low level in
> Lucene it's always used)
>
> are the fields you are faceting on multiValued or single valued?
>
>
> -Hoss
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

jrodenburg
In reply to this post by Yonik Seeley-2
Major version is 1.0.  The bits are from a nightly build from early
September 2006.

We do have plans to upgrade solr soon.

On 4/2/07, Yonik Seeley <[hidden email]> wrote:

>
> On 4/2/07, Jeff Rodenburg <[hidden email]> wrote:
> > We are doing incremental updates, and we optimize quite a
> bit.  mergeFactor
> > presently set to 10.
> > maxDoc count = 144156
> > numDocs count = 144145
>
> What version of Solr are you using?  Another potential OOM (multiple
> threads generating the same FieldCache entry) was fixed in later
> versions of Lucene included with Solr.
>
> -Yonik
>
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

jrodenburg
In reply to this post by Yonik Seeley-2
Yonik - is this the JIRA entry you're referring to?

http://issues.apache.org/jira/browse/LUCENE-754



On 4/2/07, Yonik Seeley <[hidden email]> wrote:

>
> On 4/2/07, Jeff Rodenburg <[hidden email]> wrote:
> > We are doing incremental updates, and we optimize quite a
> bit.  mergeFactor
> > presently set to 10.
> > maxDoc count = 144156
> > numDocs count = 144145
>
> What version of Solr are you using?  Another potential OOM (multiple
> threads generating the same FieldCache entry) was fixed in later
> versions of Lucene included with Solr.
>
> -Yonik
>
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

Yonik Seeley-2
On 4/2/07, Jeff Rodenburg <[hidden email]> wrote:
> Yonik - is this the JIRA entry you're referring to?
>
> http://issues.apache.org/jira/browse/LUCENE-754

Yes.  But from the heap dump you provided, that doesn't look like the issue.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Troubleshooting java heap out-of-memory

jrodenburg
Thanks Yonik.  I'm not a java developer by trade, but the objects mentioned
in the heap dump along with the objects mentioned in the jira issue sound
eerily familiar to me.

Last year, we encountered a memory leak on the Lucene.Net project, stemming
from the FieldCache class having references left open after a Searcher had
been closed.  In troubleshooting that memory leak, we compared the
Lucene.Net code to the Java version, wondering why there might be a
difference.  In C#, there's no such thing as a WeakHashMap (what was used in
Java lucene at the time, if my memory serves correctly.)  We resolved our
issue by forcing a close of the FieldCache objects whenever its referencing
Searcher was closed.  In process of troubleshooting that issue, the memory
dump of the C# heap showed a lot of the same types of objects, and the
scenario was the same as what we've been experiencing (searchers closing and
refreshing for new changes).

We're going to investigate repetition of the error in our current test
environment, then run the latest solr bits (with the patched lucene version)
with the same scenario and see if the condition improves.

Thanks to all for your support on this issue.

cheers,
j

On 4/2/07, Yonik Seeley <[hidden email]> wrote:

>
> On 4/2/07, Jeff Rodenburg <[hidden email]> wrote:
> > Yonik - is this the JIRA entry you're referring to?
> >
> > http://issues.apache.org/jira/browse/LUCENE-754
>
> Yes.  But from the heap dump you provided, that doesn't look like the
> issue.
>
> -Yonik
>