outof memory error

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

outof memory error

SK R
Hi,
   I got outof memory exception while  indexing  huge documents (~1GB) in
one thread and optimizing some other (2 to 3) indexes in different threads.
Max JVM heap size is 512MB. I'm using lucene2.3.0.

   Please suggest a way to avoid this exception.

Regards
  RSK
Reply | Threaded
Open this post in threaded view
|

Re: outof memory error

Erick Erickson
ummmm index smaller documents? <G>

You cannot expect to index a 1G doc with 512M of memory in the JVM.
The first thing I'd try is upping your JVM memory to the max your machine
will accept.

Make sure you flush your IndexWriter before attempting to index this
document.

But I would not be surprised if this failed to solve the problem. What's in
this massive document? Would it be possible to break it up into
smaller segments and index many sub-documents for this massive doc?
I also wonder what problem you're trying to solve by indexing this doc.
Is it a log file? I can't imagine a text document that big. That's like a
100 volume encyclopedia, and I can't help but wonder whether your users
would be better served by indexing it in pieces.

Best
Erick

On Feb 4, 2008 10:25 AM, SK R <[hidden email]> wrote:

> Hi,
>   I got outof memory exception while  indexing  huge documents (~1GB) in
> one thread and optimizing some other (2 to 3) indexes in different
> threads.
> Max JVM heap size is 512MB. I'm using lucene2.3.0.
>
>   Please suggest a way to avoid this exception.
>
> Regards
>  RSK
>
Reply | Threaded
Open this post in threaded view
|

Re: outof memory error

SK R
Hi,
   Thanks for your help Erick.

   I changed my code to flush writer before document add which helps to
reduce memory usage.
   Also reducing mergefactor and max buffered docs to some level help me to
avoid this OOM error (eventhough index size is ~1GB).

But please clarify below doubts

Make sure you flush your IndexWriter before attempting to index this
document.

 - Is it good to call writer.flush() before adding every document into
writer? Doesn't it affect performance of indexing or search? Whether it's
also similar to setting MaxBufferDocs=1?

    Also guide me which one is relatively good (take less time & memory)
among this
        (i) create 4 indexes each of 250MB and merge them to single index
file by using writer.addIndexes(..)
        (ii) create a 1GB index & optimize it?

Thanks & Regards
RSK



On Feb 4, 2008 9:23 PM, Erick Erickson <[hidden email]> wrote:

> ummmm index smaller documents? <G>
>
> You cannot expect to index a 1G doc with 512M of memory in the JVM.
> The first thing I'd try is upping your JVM memory to the max your machine
> will accept.
>
> Make sure you flush your IndexWriter before attempting to index this
> document.
>
> But I would not be surprised if this failed to solve the problem. What's
> in
> this massive document? Would it be possible to break it up into
> smaller segments and index many sub-documents for this massive doc?
> I also wonder what problem you're trying to solve by indexing this doc.
> Is it a log file? I can't imagine a text document that big. That's like a
> 100 volume encyclopedia, and I can't help but wonder whether your users
> would be better served by indexing it in pieces.
>
> Best
> Erick
>
> On Feb 4, 2008 10:25 AM, SK R <[hidden email]> wrote:
>
> > Hi,
> >   I got outof memory exception while  indexing  huge documents (~1GB) in
> > one thread and optimizing some other (2 to 3) indexes in different
> > threads.
> > Max JVM heap size is 512MB. I'm using lucene2.3.0.
> >
> >   Please suggest a way to avoid this exception.
> >
> > Regards
> >  RSK
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: outof memory error

Erick Erickson
See below:

On Feb 5, 2008 9:41 AM, SK R <[hidden email]> wrote:

> Hi,
>   Thanks for your help Erick.
>
>   I changed my code to flush writer before document add which helps to
> reduce memory usage.
>   Also reducing mergefactor and max buffered docs to some level help me to
> avoid this OOM error (eventhough index size is ~1GB).
>
> But please clarify below doubts
>
> Make sure you flush your IndexWriter before attempting to index this
> document.
>
>  - Is it good to call writer.flush() before adding every document into
> writer? Doesn't it affect performance of indexing or search? Whether it's
> also similar to setting MaxBufferDocs=1?
>

No, this is not a good idea. I'd expect this to slow down indexing
significantly.
What I was assuming is that you'd have something like:

if (incoming document is huge) flush index writer

just to free up all the memory you can.


>
>    Also guide me which one is relatively good (take less time & memory)
> among this
>        (i) create 4 indexes each of 250MB and merge them to single index
> file by using writer.addIndexes(..)
>        (ii) create a 1GB index & optimize it?
>

Don't know. You have to measure your particular situation. There's some
discussion
(search the archives) about using several threads to speed up indexing.
Also, there's
the wiki page, see

http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

The first bullet point is important here. Do you really need to improve
indexing speed?
How long does it take and how often to you build it?

But perhaps I mis-read your original post. I *thought* you were talking
about
indexing a 1G *document*. The size of the index shouldn't matter as far as
an OOM error. But now that I re-read your original post, I should have also
suggested that you optimize in different processes than you index since the
implication is that they are separate indexes anyway.

Best
Erick


>
> Thanks & Regards
> RSK
>
>
>
> On Feb 4, 2008 9:23 PM, Erick Erickson <[hidden email]> wrote:
>
> > ummmm index smaller documents? <G>
> >
> > You cannot expect to index a 1G doc with 512M of memory in the JVM.
> > The first thing I'd try is upping your JVM memory to the max your
> machine
> > will accept.
> >
> > Make sure you flush your IndexWriter before attempting to index this
> > document.
> >
> > But I would not be surprised if this failed to solve the problem. What's
> > in
> > this massive document? Would it be possible to break it up into
> > smaller segments and index many sub-documents for this massive doc?
> > I also wonder what problem you're trying to solve by indexing this doc.
> > Is it a log file? I can't imagine a text document that big. That's like
> a
> > 100 volume encyclopedia, and I can't help but wonder whether your users
> > would be better served by indexing it in pieces.
> >
> > Best
> > Erick
> >
> > On Feb 4, 2008 10:25 AM, SK R <[hidden email]> wrote:
> >
> > > Hi,
> > >   I got outof memory exception while  indexing  huge documents (~1GB)
> in
> > > one thread and optimizing some other (2 to 3) indexes in different
> > > threads.
> > > Max JVM heap size is 512MB. I'm using lucene2.3.0.
> > >
> > >   Please suggest a way to avoid this exception.
> > >
> > > Regards
> > >  RSK
> > >
> >
>