process dies with OOM after processing 10k docs

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

process dies with OOM after processing 10k docs

jmuguruza
Hi,

I am having a memory issue with Lucene2.4. I am strating a process
with 128MB of ram, this process handles incoming request from others,
and indexes objects in a number of lucene indexes.

My lucene docs, all have 6 fields:
-one is small: Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES
-rest are:  Field.Store.YES, Field.Index.UN_TOKENIZED,
Field.TermVector.NO, two of them can be quite large.

I set field.setOmitNorms(true); for all fields.
And I also set writer.setRAMBufferSizeMB(30);

I have archived around 10k lucene docs, they are evenly distritubed in
around 100 indexes. I keep a cache of max 60 indexes open, when a new
one has to be open for writing, another one is closed.
I have started the process with -XX:+HeapDumpOnOutOfMemoryError so I
get a hprof file dumped when I get an OOM.

I open the hprof file with visualvm and I can see the following:
class                                                  instances           size
FreqProxTermsWriter$PostingList         572672              20616192
String                                                  186154
      4467698
char[]                                                  184689
     30216520

In the indexes directories, there are lock files left, as the process
dies with the OOM, there are not prx files, but there are frq files (I
cannot be sure 100% about that last one but I think so, I dont have
access to the server now).

I also add each doc into a MemoryIndex, and run a number of
 memoryindex.search(lucBooleanQuery);
I dont close these MemoryIndex or anything, they just go out of scope,
I thought that would be enough to dispose of them.

Obviously something in my code is wrong, does someone see anything
suspicious here? Any idea where should I be begin looking in my code?

thanks a lot
javi

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: process dies with OOM after processing 10k docs

Ian Lea
Can you not just give the process some more memory?  128Mb seems very
low for what you are doing.


--
Ian.


On Mon, Dec 15, 2008 at 6:28 PM, jm <[hidden email]> wrote:

> Hi,
>
> I am having a memory issue with Lucene2.4. I am strating a process
> with 128MB of ram, this process handles incoming request from others,
> and indexes objects in a number of lucene indexes.
>
> My lucene docs, all have 6 fields:
> -one is small: Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES
> -rest are:  Field.Store.YES, Field.Index.UN_TOKENIZED,
> Field.TermVector.NO, two of them can be quite large.
>
> I set field.setOmitNorms(true); for all fields.
> And I also set writer.setRAMBufferSizeMB(30);
>
> I have archived around 10k lucene docs, they are evenly distritubed in
> around 100 indexes. I keep a cache of max 60 indexes open, when a new
> one has to be open for writing, another one is closed.
> I have started the process with -XX:+HeapDumpOnOutOfMemoryError so I
> get a hprof file dumped when I get an OOM.
>
> I open the hprof file with visualvm and I can see the following:
> class                                                  instances           size
> FreqProxTermsWriter$PostingList         572672              20616192
> String                                                  186154
>      4467698
> char[]                                                  184689
>     30216520
>
> In the indexes directories, there are lock files left, as the process
> dies with the OOM, there are not prx files, but there are frq files (I
> cannot be sure 100% about that last one but I think so, I dont have
> access to the server now).
>
> I also add each doc into a MemoryIndex, and run a number of
>  memoryindex.search(lucBooleanQuery);
> I dont close these MemoryIndex or anything, they just go out of scope,
> I thought that would be enough to dispose of them.
>
> Obviously something in my code is wrong, does someone see anything
> suspicious here? Any idea where should I be begin looking in my code?
>
> thanks a lot
> javi
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: process dies with OOM after processing 10k docs

jmuguruza
yes I have tested with up to 512MB, althought I dont have the hprof
dump file of those tests, they also got the OOM. I was just wondering
whether having so many instances of FreqProxTermsWriter$PostingList
around is a clear indicator of something I am not releasing or
something.

javier

On Tue, Dec 16, 2008 at 11:24 AM, Ian Lea <[hidden email]> wrote:

> Can you not just give the process some more memory?  128Mb seems very
> low for what you are doing.
>
>
> --
> Ian.
>
>
> On Mon, Dec 15, 2008 at 6:28 PM, jm <[hidden email]> wrote:
>> Hi,
>>
>> I am having a memory issue with Lucene2.4. I am strating a process
>> with 128MB of ram, this process handles incoming request from others,
>> and indexes objects in a number of lucene indexes.
>>
>> My lucene docs, all have 6 fields:
>> -one is small: Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES
>> -rest are:  Field.Store.YES, Field.Index.UN_TOKENIZED,
>> Field.TermVector.NO, two of them can be quite large.
>>
>> I set field.setOmitNorms(true); for all fields.
>> And I also set writer.setRAMBufferSizeMB(30);
>>
>> I have archived around 10k lucene docs, they are evenly distritubed in
>> around 100 indexes. I keep a cache of max 60 indexes open, when a new
>> one has to be open for writing, another one is closed.
>> I have started the process with -XX:+HeapDumpOnOutOfMemoryError so I
>> get a hprof file dumped when I get an OOM.
>>
>> I open the hprof file with visualvm and I can see the following:
>> class                                                  instances           size
>> FreqProxTermsWriter$PostingList         572672              20616192
>> String                                                  186154
>>      4467698
>> char[]                                                  184689
>>     30216520
>>
>> In the indexes directories, there are lock files left, as the process
>> dies with the OOM, there are not prx files, but there are frq files (I
>> cannot be sure 100% about that last one but I think so, I dont have
>> access to the server now).
>>
>> I also add each doc into a MemoryIndex, and run a number of
>>  memoryindex.search(lucBooleanQuery);
>> I dont close these MemoryIndex or anything, they just go out of scope,
>> I thought that would be enough to dispose of them.
>>
>> Obviously something in my code is wrong, does someone see anything
>> suspicious here? Any idea where should I be begin looking in my code?
>>
>> thanks a lot
>> javi
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: process dies with OOM after processing 10k docs

Ian Lea
I'm no expert on lucene internals, but maybe the posting lists just
happen to be what is around when your program hits the OOM error.  It
seems more likely that you are getting OOM because of all the caches
and other stuff you are doing.  I suggest giving it more memory or
cache fewer indexes, or try it without the memory index.



--
Ian.


On Tue, Dec 16, 2008 at 10:37 AM, jm <[hidden email]> wrote:

> yes I have tested with up to 512MB, althought I dont have the hprof
> dump file of those tests, they also got the OOM. I was just wondering
> whether having so many instances of FreqProxTermsWriter$PostingList
> around is a clear indicator of something I am not releasing or
> something.
>
> javier
>
> On Tue, Dec 16, 2008 at 11:24 AM, Ian Lea <[hidden email]> wrote:
>> Can you not just give the process some more memory?  128Mb seems very
>> low for what you are doing.
>>
>>
>> --
>> Ian.
>>
>>
>> On Mon, Dec 15, 2008 at 6:28 PM, jm <[hidden email]> wrote:
>>> Hi,
>>>
>>> I am having a memory issue with Lucene2.4. I am strating a process
>>> with 128MB of ram, this process handles incoming request from others,
>>> and indexes objects in a number of lucene indexes.
>>>
>>> My lucene docs, all have 6 fields:
>>> -one is small: Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES
>>> -rest are:  Field.Store.YES, Field.Index.UN_TOKENIZED,
>>> Field.TermVector.NO, two of them can be quite large.
>>>
>>> I set field.setOmitNorms(true); for all fields.
>>> And I also set writer.setRAMBufferSizeMB(30);
>>>
>>> I have archived around 10k lucene docs, they are evenly distritubed in
>>> around 100 indexes. I keep a cache of max 60 indexes open, when a new
>>> one has to be open for writing, another one is closed.
>>> I have started the process with -XX:+HeapDumpOnOutOfMemoryError so I
>>> get a hprof file dumped when I get an OOM.
>>>
>>> I open the hprof file with visualvm and I can see the following:
>>> class                                                  instances           size
>>> FreqProxTermsWriter$PostingList         572672              20616192
>>> String                                                  186154
>>>      4467698
>>> char[]                                                  184689
>>>     30216520
>>>
>>> In the indexes directories, there are lock files left, as the process
>>> dies with the OOM, there are not prx files, but there are frq files (I
>>> cannot be sure 100% about that last one but I think so, I dont have
>>> access to the server now).
>>>
>>> I also add each doc into a MemoryIndex, and run a number of
>>>  memoryindex.search(lucBooleanQuery);
>>> I dont close these MemoryIndex or anything, they just go out of scope,
>>> I thought that would be enough to dispose of them.
>>>
>>> Obviously something in my code is wrong, does someone see anything
>>> suspicious here? Any idea where should I be begin looking in my code?
>>>
>>> thanks a lot
>>> javi
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: process dies with OOM after processing 10k docs

Michael McCandless-2
In reply to this post by jmuguruza

That class is what's used to buffer the added docs in IndexWriter.  
The heap dump seems to indicate you've got ~55 MB worth of buffered  
docs pending.  Since you allow a 30MB RAM buffer for each writer, and  
it seems like you allow up to 60 writers to be opened at once, it  
seems like in the worst case you need to allow for at least 60*30 =  
1800 MB RAM consumption just for the writers?

Mike

jm wrote:

> yes I have tested with up to 512MB, althought I dont have the hprof
> dump file of those tests, they also got the OOM. I was just wondering
> whether having so many instances of FreqProxTermsWriter$PostingList
> around is a clear indicator of something I am not releasing or
> something.
>
> javier
>
> On Tue, Dec 16, 2008 at 11:24 AM, Ian Lea <[hidden email]> wrote:
>> Can you not just give the process some more memory?  128Mb seems very
>> low for what you are doing.
>>
>>
>> --
>> Ian.
>>
>>
>> On Mon, Dec 15, 2008 at 6:28 PM, jm <[hidden email]> wrote:
>>> Hi,
>>>
>>> I am having a memory issue with Lucene2.4. I am strating a process
>>> with 128MB of ram, this process handles incoming request from  
>>> others,
>>> and indexes objects in a number of lucene indexes.
>>>
>>> My lucene docs, all have 6 fields:
>>> -one is small: Field.Store.YES, Field.Index.UN_TOKENIZED,  
>>> Field.TermVector.YES
>>> -rest are:  Field.Store.YES, Field.Index.UN_TOKENIZED,
>>> Field.TermVector.NO, two of them can be quite large.
>>>
>>> I set field.setOmitNorms(true); for all fields.
>>> And I also set writer.setRAMBufferSizeMB(30);
>>>
>>> I have archived around 10k lucene docs, they are evenly  
>>> distritubed in
>>> around 100 indexes. I keep a cache of max 60 indexes open, when a  
>>> new
>>> one has to be open for writing, another one is closed.
>>> I have started the process with -XX:+HeapDumpOnOutOfMemoryError so I
>>> get a hprof file dumped when I get an OOM.
>>>
>>> I open the hprof file with visualvm and I can see the following:
>>> class                                                  
>>> instances           size
>>> FreqProxTermsWriter$PostingList         572672              20616192
>>> String                                                  186154
>>>     4467698
>>> char[]                                                  184689
>>>    30216520
>>>
>>> In the indexes directories, there are lock files left, as the  
>>> process
>>> dies with the OOM, there are not prx files, but there are frq  
>>> files (I
>>> cannot be sure 100% about that last one but I think so, I dont have
>>> access to the server now).
>>>
>>> I also add each doc into a MemoryIndex, and run a number of
>>> memoryindex.search(lucBooleanQuery);
>>> I dont close these MemoryIndex or anything, they just go out of  
>>> scope,
>>> I thought that would be enough to dispose of them.
>>>
>>> Obviously something in my code is wrong, does someone see anything
>>> suspicious here? Any idea where should I be begin looking in my  
>>> code?
>>>
>>> thanks a lot
>>> javi
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: process dies with OOM after processing 10k docs

jmuguruza
right...I was forgetting the 30MB flush by ram is PER writer....I'll
make some tests to verify this and fix accordingly...

Thanks!!

On Tue, Dec 16, 2008 at 12:06 PM, Michael McCandless
<[hidden email]> wrote:

>
> That class is what's used to buffer the added docs in IndexWriter.  The heap
> dump seems to indicate you've got ~55 MB worth of buffered docs pending.
>  Since you allow a 30MB RAM buffer for each writer, and it seems like you
> allow up to 60 writers to be opened at once, it seems like in the worst case
> you need to allow for at least 60*30 = 1800 MB RAM consumption just for the
> writers?
>
> Mike
>
> jm wrote:
>
>> yes I have tested with up to 512MB, althought I dont have the hprof
>> dump file of those tests, they also got the OOM. I was just wondering
>> whether having so many instances of FreqProxTermsWriter$PostingList
>> around is a clear indicator of something I am not releasing or
>> something.
>>
>> javier
>>
>> On Tue, Dec 16, 2008 at 11:24 AM, Ian Lea <[hidden email]> wrote:
>>>
>>> Can you not just give the process some more memory?  128Mb seems very
>>> low for what you are doing.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Mon, Dec 15, 2008 at 6:28 PM, jm <[hidden email]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am having a memory issue with Lucene2.4. I am strating a process
>>>> with 128MB of ram, this process handles incoming request from others,
>>>> and indexes objects in a number of lucene indexes.
>>>>
>>>> My lucene docs, all have 6 fields:
>>>> -one is small: Field.Store.YES, Field.Index.UN_TOKENIZED,
>>>> Field.TermVector.YES
>>>> -rest are:  Field.Store.YES, Field.Index.UN_TOKENIZED,
>>>> Field.TermVector.NO, two of them can be quite large.
>>>>
>>>> I set field.setOmitNorms(true); for all fields.
>>>> And I also set writer.setRAMBufferSizeMB(30);
>>>>
>>>> I have archived around 10k lucene docs, they are evenly distritubed in
>>>> around 100 indexes. I keep a cache of max 60 indexes open, when a new
>>>> one has to be open for writing, another one is closed.
>>>> I have started the process with -XX:+HeapDumpOnOutOfMemoryError so I
>>>> get a hprof file dumped when I get an OOM.
>>>>
>>>> I open the hprof file with visualvm and I can see the following:
>>>> class                                                  instances
>>>>   size
>>>> FreqProxTermsWriter$PostingList         572672              20616192
>>>> String                                                  186154
>>>>    4467698
>>>> char[]                                                  184689
>>>>   30216520
>>>>
>>>> In the indexes directories, there are lock files left, as the process
>>>> dies with the OOM, there are not prx files, but there are frq files (I
>>>> cannot be sure 100% about that last one but I think so, I dont have
>>>> access to the server now).
>>>>
>>>> I also add each doc into a MemoryIndex, and run a number of
>>>> memoryindex.search(lucBooleanQuery);
>>>> I dont close these MemoryIndex or anything, they just go out of scope,
>>>> I thought that would be enough to dispose of them.
>>>>
>>>> Obviously something in my code is wrong, does someone see anything
>>>> suspicious here? Any idea where should I be begin looking in my code?
>>>>
>>>> thanks a lot
>>>> javi
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]