Quantcast

Contentions observed in lucene execution

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Contentions observed in lucene execution

RameshIyerV
Hi All,

I need some help in analyzing some contentions we observe in the Lucene execution.

We are supporting the Sterling 9.0 fulfillment application and it uses Lucene 2.4 for catalog search functionality.

---The Issue----
This system is Live in production since Nov 2012 and only recently (Mid June 2013) our application is forming stuck threads during lucene invocations, this causes our application to crash.

This occurs 2 - 3 times a week, on other days we see spikes of very slow performance on the exact places that causes stuck threads.

---The research---
We have validated that the data or the usage has not grown between Jan 2012 & now.

We took snapshot of the code execution (through visual VM) and for slow running treads we validated that too much time is spent at certain spots (these very same spots appear in the stack trace of the stuck threads).

---Help needed---
If you can guide me on what kind of contentions (heap, IO, Data, CPU, JVM params) can cause such a behavior it will really help.


---Lucene Invocation contentions observed---
(We find stuck threads & slowness at the following spots, ordered in the order of severity [high to low])
1. java.io.RandomAccessFile.readBytes(Native Method)
        java.io.RandomAccessFile.read(RandomAccessFile.java:338)
        org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596)
        org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)

2. org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
        org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:167)
        org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:373)
        org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
        org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)

3. org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80)
        org.apache.lucene.index.TermBuffer.read(TermBuffer.java:65)
        org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:127)
        org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:389)
        org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
        org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)

4.     java.io.RandomAccessFile.seek(Native Method)
        org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:591)
        org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Contentions observed in lucene execution

Michael McCandless-2
Lucene 2.4.x is quite ancient by now ...

FSDirectory.FSIndexInput is single-threaded in seeking/reading bytes,
which I think explains your 1 and 4.  Try using MMapDirectory, if you
are using a 64 bit JVM or if your index is tiny.  Newer Lucene
versions also have NIOFSDirectory, which is thread-friendly on Unix
(but not on Windows due to a JVM bug).

For 2 and 3, creating a FieldCache entry is also single threaded, but
this is a one-time event on the first search to the IndexReader
requiring that entry.  Lucene 4.x adds doc values which are much more
efficient to init at search time.

But, what changed in your app?  Perhaps there's less RAM available to
the OS for caching IO pages (this could explain 1 and 4)?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jul 18, 2013 at 6:46 AM, RameshIyerV <[hidden email]> wrote:

> Hi All,
>
> I need some help in analyzing some contentions we observe in the Lucene
> execution.
>
> We are supporting the Sterling 9.0 fulfillment application and it uses
> Lucene 2.4 for catalog search functionality.
>
> ---The Issue----
> This system is Live in production since Nov 2012 and only recently (Mid June
> 2013) our application is forming stuck threads during lucene invocations,
> this causes our application to crash.
>
> This occurs 2 - 3 times a week, on other days we see spikes of very slow
> performance on the exact places that causes stuck threads.
>
> ---The research---
> We have validated that the data or the usage has not grown between Jan 2012
> & now.
>
> We took snapshot of the code execution (through visual VM) and for slow
> running treads we validated that too much time is spent at certain spots
> (these very same spots appear in the stack trace of the stuck threads).
>
> ---Help needed---
> If you can guide me on what kind of contentions (heap, IO, Data, CPU, JVM
> params) can cause such a behavior it will really help.
>
>
> ---Lucene Invocation contentions observed---
> (We find stuck threads & slowness at the following spots, ordered in the
> order of severity [high to low])
> 1.      java.io.RandomAccessFile.readBytes(Native Method)
>         java.io.RandomAccessFile.read(RandomAccessFile.java:338)
>
> org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596)
>
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)
>
> 2.      org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
>         org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:167)
>
> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:373)
>         org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
>
> org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)
>
> 3.      org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80)
>         org.apache.lucene.index.TermBuffer.read(TermBuffer.java:65)
>         org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:127)
>
> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:389)
>         org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:71)
>
> org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:351)
>
> 4.     java.io.RandomAccessFile.seek(Native Method)
>
> org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:591)
>
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Contentions-observed-in-lucene-execution-tp4078796.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Contentions observed in lucene execution

RameshIyerV
In reply to this post by RameshIyerV
Thanks Mike,

This is running on Hotspots VM (on unix) & the JVM has 24 GB of max heap.

Moving to Lucene 4.X is not an easy option because for us Lucene has shipped as part of the Sterling 9.0 product & if we upgrade part of the product we would have to give away support.

One of my coworkers had the same doubt (that we are running out of heap - although I am not so sure because the warnings appear in the logs after 100's of stuck threads) and so we plan to bump up the heap to a max of 34 and then watch closely. I'll post if that does not work, thanks for taking a look.

- Ramesh.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Contentions observed in lucene execution

Uwe Schindler
Hi,

you should not use too much heap space for the Java VM because Lucene relies heavily on the file system cache. If your machine has say 32 Gigs of RAM don't use more than 1/4 (8 DB) as heap for the Java application. Ideally only use so much heap + some overhead, that you get no OOMs (this depends on your index size and application). Lucene generally uses only few heap space unless very huge field caches are used.

For the concurrency problem, as Mike already said, try using MMapDirectory (which is supported on Lucene 2.4, too - but not yet so efficient and does not support unmapping) instead of FSDirectory. For more information, see http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

RandomAccessFile (which is used by FSDirectory) is not behaving well in multi-threaded environments, because the underlying file descriptor can only be used single-threaded.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: RameshIyerV [mailto:[hidden email]]
> Sent: Monday, July 22, 2013 11:15 AM
> To: [hidden email]
> Subject: Re: Contentions observed in lucene execution
>
> Thanks Mike,
>
> This is running on Hotspots VM (on unix) & the JVM has 24 GB of max heap.
>
> Moving to Lucene 4.X is not an easy option because for us Lucene has
> shipped as part of the Sterling 9.0 product & if we upgrade part of the
> product we would have to give away support.
>
> One of my coworkers had the same doubt (that we are running out of heap -
> although I am not so sure because the warnings appear in the logs after 100's
> of stuck threads) and so we plan to bump up the heap to a max of 34 and
> then watch closely. I'll post if that does not work, thanks for taking a look.
>
> - Ramesh.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Contentions-observed-in-lucene-
> execution-tp4078796p4079410.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email] For additional
> commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Contentions observed in lucene execution

RameshIyerV
Uwe, we do not have the flexibility to change the implementation to user MMapDirectory, lucene implementation is core to the Sterling 9.0 product (and ships with it).

On heap, the box has 250+ GB, our application's (2) is using close to 68.

Also the application internally uses ehcache to cache the lucene index files. And we have configured the ehcache to store all the elements in memory, we did this to minimize disk IO.

However it still seems to be doing the RandomAccessFile (used by FSDirectory), do you know what it is attempting to ready?
Loading...