OutOfMemory when indexing

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

OutOfMemory when indexing

Stanislav Jordanov
High guys,
Building some huge index (about 500,000 docs totaling to 10megs of plain
text) we've run into the following problem:
Most of the time the IndexWriter process consumes a fairly small amount
of memory (about 32 megs).
However, as the index size grows, the memory usage sporadically bursts
to levels of (say) 1000 gigs and then falls back to its level.
The problem is that unless te process is started with some option like
-Xmx1000m this situation causes an OutOfMemoryException which terminates
the indexing process.

My question is - is there a way to avoid it?

Regards
Stanislav

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemory when indexing

Harald Stowasser
Stanislav Jordanov schrieb:

> High guys,
> Building some huge index (about 500,000 docs totaling to 10megs of plain
> text) we've run into the following problem:
> Most of the time the IndexWriter process consumes a fairly small amount
> of memory (about 32 megs).
> However, as the index size grows, the memory usage sporadically bursts
> to levels of (say) 1000 gigs and then falls back to its level.
> The problem is that unless te process is started with some option like
> -Xmx1000m this situation causes an OutOfMemoryException which terminates
> the indexing process.
>
> My question is - is there a way to avoid it?

1.
I start my programm with:
java -Xms256M -Xmx512M -jar Suchmaschine.jar &

This protect me now from OutOfMemoryException. After I use
iterative-subroutines.

2.
Free your variables as soon as possible.
like "term=null;"
This will help your Garbage-Collector!

3.
Maybe you should watch totalMemory and R.freeMemory() from
Runtime.getRuntime()
That will help you to find the "Memory-dissipater"

4.
I had the problem when deleting Documents from Index. I used a
Subroutine to delete single Documents.
It runs much better when I replaced it into a "iterative" subroutine
like this:

  public int deleteMany(String keywords)
  {
    int anzahl=0;
    try
    {
      openReader();
      String[] temp = keywords.split(",");
      //Runtime R = Runtime.getRuntime();
      for (int i = 0 ; i < temp.length ; i++)
      {
        Term term =new Term("keyword",temp[i]);
        anzahl+= mReader.delete(term);
        term=null;
        /*System.out.println("deleted " + temp[i]
                   +" t:"+R.totalMemory()
                   +" f:"+R.freeMemory()
                   +" m"+R.maxMemory());
        */
      }
      close();
    } catch (Exception e){
      cIdowa.error( "Could not delete Documents:" + keywords
            +". Because:"+ e.getMessage() + "\n" +e.toString() );
    }
    return anzahl;
  }




signature.asc (258 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemory when indexing

Markus Wiederkehr
In reply to this post by Stanislav Jordanov
I am not an expert, but maybe the occasionally high memory usage is
because Lucene is merging multiple index segments together.

Maybe it would help if you set maxMergeDocs to 10,000 or something. In
your case that would mean that the minimum number of index segments
would be 50.

But again, this may be completely wrong...

Markus

On 6/13/05, Stanislav Jordanov <[hidden email]> wrote:

> High guys,
> Building some huge index (about 500,000 docs totaling to 10megs of plain
> text) we've run into the following problem:
> Most of the time the IndexWriter process consumes a fairly small amount
> of memory (about 32 megs).
> However, as the index size grows, the memory usage sporadically bursts
> to levels of (say) 1000 gigs and then falls back to its level.
> The problem is that unless te process is started with some option like
> -Xmx1000m this situation causes an OutOfMemoryException which terminates
> the indexing process.
>
> My question is - is there a way to avoid it?
>
> Regards
> Stanislav

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemory when indexing

Gusenbauer Stefan
In reply to this post by Harald Stowasser
Harald Stowasser wrote:

>Stanislav Jordanov schrieb:
>
>  
>
>>High guys,
>>Building some huge index (about 500,000 docs totaling to 10megs of plain
>>text) we've run into the following problem:
>>Most of the time the IndexWriter process consumes a fairly small amount
>>of memory (about 32 megs).
>>However, as the index size grows, the memory usage sporadically bursts
>>to levels of (say) 1000 gigs and then falls back to its level.
>>The problem is that unless te process is started with some option like
>>-Xmx1000m this situation causes an OutOfMemoryException which terminates
>>the indexing process.
>>
>>My question is - is there a way to avoid it?
>>    
>>
>
>
>1.
>I start my programm with:
>java -Xms256M -Xmx512M -jar Suchmaschine.jar &
>
>This protect me now from OutOfMemoryException. After I use
>iterative-subroutines.
>
>2.
>Free your variables as soon as possible.
>like "term=null;"
>This will help your Garbage-Collector!
>
>3.
>Maybe you should watch totalMemory and R.freeMemory() from
>Runtime.getRuntime()
>That will help you to find the "Memory-dissipater"
>
>4.
>I had the problem when deleting Documents from Index. I used a
>Subroutine to delete single Documents.
>It runs much better when I replaced it into a "iterative" subroutine
>like this:
>
>  public int deleteMany(String keywords)
>  {
>    int anzahl=0;
>    try
>    {
>      openReader();
>      String[] temp = keywords.split(",");
>      //Runtime R = Runtime.getRuntime();
>      for (int i = 0 ; i < temp.length ; i++)
>      {
>        Term term =new Term("keyword",temp[i]);
>        anzahl+= mReader.delete(term);
>        term=null;
>        /*System.out.println("deleted " + temp[i]
>                   +" t:"+R.totalMemory()
>                   +" f:"+R.freeMemory()
>                   +" m"+R.maxMemory());
>        */
>      }
>      close();
>    } catch (Exception e){
>      cIdowa.error( "Could not delete Documents:" + keywords
>            +". Because:"+ e.getMessage() + "\n" +e.toString() );
>    }
>    return anzahl;
>  }
>
>
>
>  
>
A few weeks before I had a similar problem too. I will write my problem
and the solution for it:
I'm indexing docs and every parsed document is stored in an ArrayList.
This solution worked for little directories with a little number of
files in it but when the things are growing you're in trouble.
My solution was whenever I will run out of memory I will "save" the
documents. I open the indexwriter and write every document from the
arraylist to the index. Then I set the arraylist and some other stuff =
null and try to invoke the garbage collector. Then I do some
reinitializing and continue indexing.
 Looks easy but it wasn't. How do I check if i will run out of memory?
Runtimeclass and its methods for getting information about the free
memory were very unreliable.
Therefore I changed to Java 1.5 and implemented a memorynotification
listener which is support by the java.lang.management package. There you
can adjust a threshold when you should be informed. After the
notification I perform a "save".

Hope this will help you
Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemory when indexing

Stanislav Jordanov

Gusenbauer Stefan wrote:

>A few weeks before I had a similar problem too. I will write my problem
>and the solution for it:
>I'm indexing docs and every parsed document is stored in an ArrayList.
>This solution worked for little directories with a little number of
>files in it but when the things are growing you're in trouble.
>My solution was whenever I will run out of memory I will "save" the
>documents. I open the indexwriter and write every document from the
>arraylist to the index. Then I set the arraylist and some other stuff =
>null and try to invoke the garbage collector. Then I do some
>reinitializing and continue indexing.
> Looks easy but it wasn't. How do I check if i will run out of memory?
>Runtimeclass and its methods for getting information about the free
>memory were very unreliable.
>Therefore I changed to Java 1.5 and implemented a memorynotification
>listener which is support by the java.lang.management package. There you
>can adjust a threshold when you should be informed. After the
>notification I perform a "save".
>
>Hope this will help you
>Stefan
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
>  
>
Thank you Stefan,
unfortunately, our situation is a bit different - we are not caching
parsed docs in any way.
When a document is parsed it is indexed immediately.
So in our case it is not the accumulation of documents waiting to be
indexed that causes the OutOfMemory exception.
I believe it is a "pure lucene" issue - just as at some point when the
next doc is added to the index, and this causes (perhaps)
the merging of segments, the memory consumption raises drastically.

Regards
Stanislav


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemory when indexing

Stanislav Jordanov
In reply to this post by Markus Wiederkehr
Thanks for the advice,
I think this may be a solution.
In case you've experimented with this setting, could you please tell me
what are the side effects of limitting segment size?
This will probably cause searches to run slower?

Markus Wiederkehr wrote:

>I am not an expert, but maybe the occasionally high memory usage is
>because Lucene is merging multiple index segments together.
>
>Maybe it would help if you set maxMergeDocs to 10,000 or something. In
>your case that would mean that the minimum number of index segments
>would be 50.
>
>But again, this may be completely wrong...
>
>Markus
>
>On 6/13/05, Stanislav Jordanov <[hidden email]> wrote:
>  
>
>>High guys,
>>Building some huge index (about 500,000 docs totaling to 10megs of plain
>>text) we've run into the following problem:
>>Most of the time the IndexWriter process consumes a fairly small amount
>>of memory (about 32 megs).
>>However, as the index size grows, the memory usage sporadically bursts
>>to levels of (say) 1000 gigs and then falls back to its level.
>>The problem is that unless te process is started with some option like
>>-Xmx1000m this situation causes an OutOfMemoryException which terminates
>>the indexing process.
>>
>>My question is - is there a way to avoid it?
>>
>>Regards
>>Stanislav
>>    
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]