debugging long commits

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

debugging long commits

bwhitman
We have a distributed setup that has been experiencing glacially slow commit
times on only some of the shards. (10s on a good shard, 263s on a slow
shard.) Each shard for this index has about 10GB of lucene index data and
the documents are segregated by an md5 hash, so the distribution of
document/data types should be equal across all shards. I've turned off our
postcommit hooks to isolate the problem, so it's not a snapshot run amok or
anything. I also moved the indexes over to new machines and the same indexes
that were slow in production are also slow on the test machines.
During the slow commit, the jetty process is 100% CPU / 50% RAM on a 8GB
quad core machine. The slow commit happens every time after I add at least
one document. (If I don't add any documents the commit is immediate.)

What can I do to look into this problem?
Reply | Threaded
Open this post in threaded view
|

Re: debugging long commits

bwhitman
Not sure if these help. Here's the stack trace and jmap -histo output during
a long (bad) commit
Full thread dump Java HotSpot(TM) 64-Bit Server VM (11.0-b16 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00002aabf9954400 nid=0x5e1c runnable
[0x0000000000000000..0x0000000042048d20]
   java.lang.Thread.State: RUNNABLE

"Lucene Merge Thread #0" daemon prio=10 tid=0x00002aabf9c7f400 nid=0x5e00
runnable [0x0000000041f47000..0x0000000041f47ba0]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.store.RAMOutputStream.flush(RAMOutputStream.java:147)
at org.apache.lucene.store.RAMOutputStream.writeTo(RAMOutputStream.java:56)
at
org.apache.lucene.index.MultiLevelSkipListWriter.writeSkip(MultiLevelSkipListWriter.java:146)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:588)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:546)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:499)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4291)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3932)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:260)

"pool-2-thread-1" prio=10 tid=0x00002aabf9b94c00 nid=0x5dc3 waiting on
condition [0x0000000041a42000..0x0000000041a42aa0]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00002aaabecd12d0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
Source)
at java.util.concurrent.DelayQueue.take(Unknown Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

"DestroyJavaVM" prio=10 tid=0x00002aabfc006c00 nid=0x5da3 waiting on
condition [0x0000000000000000..0x000000004022ad10]
   java.lang.Thread.State: RUNNABLE

"Timer-2" daemon prio=10 tid=0x00002aabfc007c00 nid=0x5dc1 in Object.wait()
[0x0000000041e46000..0x0000000041e46d20]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabef91338> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Unknown Source)
- locked <0x00002aaabef91338> (a java.util.TaskQueue)
at java.util.TimerThread.run(Unknown Source)

"Checkpointer" daemon prio=10 tid=0x00002aabf995b800 nid=0x5dc0 in
Object.wait() [0x0000000041d45000..0x0000000041d45da0]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabebf9700> (a java.lang.Object)
at java.lang.Object.wait(Object.java:485)
at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:152)
- locked <0x00002aaabebf9700> (a java.lang.Object)
at java.lang.Thread.run(Unknown Source)

"Cleaner-1" daemon prio=10 tid=0x00002aabf995b000 nid=0x5dbf in
Object.wait() [0x0000000041c44000..0x0000000041c44c20]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabebee158> (a java.lang.Object)
at java.lang.Object.wait(Object.java:485)
at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:152)
- locked <0x00002aaabebee158> (a java.lang.Object)
at java.lang.Thread.run(Unknown Source)

"INCompressor" daemon prio=10 tid=0x00002aabf9a3cc00 nid=0x5dbe in
Object.wait() [0x0000000041b43000..0x0000000041b43ca0]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabeef3500> (a java.lang.Object)
at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)
- locked <0x00002aaabeef3500> (a java.lang.Object)
at java.lang.Thread.run(Unknown Source)

"pool-1-thread-1" prio=10 tid=0x00002aabfa015800 nid=0x5dbc waiting on
condition [0x0000000041941000..0x0000000041941ba0]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00002aaabef91a08> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown
Source)
at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

"Timer-1" daemon prio=10 tid=0x00002aabfa0f1000 nid=0x5dbb in Object.wait()
[0x0000000041840000..0x0000000041840a20]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabef19a48> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Unknown Source)
- locked <0x00002aaabef19a48> (a java.util.TaskQueue)
at java.util.TimerThread.run(Unknown Source)

"btpool0-9 - Acceptor0 SelectChannelConnector@0.0.0.0:80" prio=10
tid=0x00002aabf9807800 nid=0x5dba runnable
[0x000000004173f000..0x000000004173faa0]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00002aaabebcb2a8> (a sun.nio.ch.Util$1)
- locked <0x00002aaabebcb290> (a java.util.Collections$UnmodifiableSet)
- locked <0x00002aaabebc8b00> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:411)
at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:167)
at
org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
at
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)

"btpool0-8 - Acceptor1 SelectChannelConnector@0.0.0.0:80" prio=10
tid=0x00002aabf980d000 nid=0x5db9 runnable
[0x000000004163e000..0x000000004163ed20]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00002aaabebcb5f8> (a sun.nio.ch.Util$1)
- locked <0x00002aaabebcb5e0> (a java.util.Collections$UnmodifiableSet)
- locked <0x00002aaabebc4d30> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:411)
at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:167)
at
org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
at
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)

"btpool0-7" prio=10 tid=0x00002aabf9594800 nid=0x5db8 in Object.wait()
[0x000000004153c000..0x000000004153dda0]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaad9d0c3f0> (a org.apache.solr.update.SolrIndexWriter)
at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4554)
- locked <0x00002aaad9d0c3f0> (a org.apache.solr.update.SolrIndexWriter)
at org.apache.lucene.index.IndexWriter.finishMerges(IndexWriter.java:2873)
- locked <0x00002aaad9d0c3f0> (a org.apache.solr.update.SolrIndexWriter)
at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1718)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1674)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1648)
at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:175)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:359)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
at
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1088)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:843)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:647)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:205)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)

"btpool0-6" prio=10 tid=0x00002aabf98fdc00 nid=0x5db7 in Object.wait()
[0x000000004143c000..0x000000004143cc20]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf15c7b0> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf15c7b0> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)

"btpool0-5" prio=10 tid=0x00002aabf9b10000 nid=0x5db6 in Object.wait()
[0x000000004133b000..0x000000004133bca0]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf160ab8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf160ab8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)

"btpool0-4" prio=10 tid=0x00002aabf9b97400 nid=0x5db5 in Object.wait()
[0x000000004123a000..0x000000004123ab20]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf160d08> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf160d08> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)

"btpool0-3" prio=10 tid=0x00002aabf9b96000 nid=0x5db4 in Object.wait()
[0x0000000041139000..0x0000000041139ba0]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf160f58> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf160f58> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)

"btpool0-2" prio=10 tid=0x00002aabf9a07000 nid=0x5db3 in Object.wait()
[0x0000000041038000..0x0000000041038a20]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf1611a8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf1611a8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)

"btpool0-1" prio=10 tid=0x00002aabf9324400 nid=0x5db2 in Object.wait()
[0x0000000040f37000..0x0000000040f37aa0]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf1613f8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf1613f8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)

"btpool0-0" prio=10 tid=0x00002aabf9a83800 nid=0x5db1 in Object.wait()
[0x0000000040e36000..0x0000000040e36d20]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf161648> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf161648> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)

"Timer-0" daemon prio=10 tid=0x00002aabf9af6000 nid=0x5db0 in Object.wait()
[0x0000000040d35000..0x0000000040d35da0]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabebacf80> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Unknown Source)
- locked <0x00002aaabebacf80> (a java.util.TaskQueue)
at java.util.TimerThread.run(Unknown Source)

"Low Memory Detector" daemon prio=10 tid=0x00002aabf933fc00 nid=0x5dac
runnable [0x0000000000000000..0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0x00002aabf933d000 nid=0x5dab waiting
on condition [0x0000000000000000..0x0000000040a31360]
   java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x00002aabf9339000 nid=0x5daa waiting
on condition [0x0000000000000000..0x00000000409303e0]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00002aabf9337400 nid=0x5da9 waiting
on condition [0x0000000000000000..0x00000000408306b0]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00002aabf9314400 nid=0x5da8 in
Object.wait() [0x000000004072f000..0x000000004072faa0]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabeb86f50> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
- locked <0x00002aaabeb86f50> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

"Reference Handler" daemon prio=10 tid=0x00002aabf9312800 nid=0x5da7 in
Object.wait() [0x000000004062e000..0x000000004062ed20]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabeb86ec8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
- locked <0x00002aaabeb86ec8> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x00002aabf917a000 nid=0x5da6 runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000004011c800 nid=0x5da4
runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000004011e000 nid=0x5da5
runnable

"VM Periodic Task Thread" prio=10 tid=0x00002aabf9342000 nid=0x5dad waiting
on condition

JNI global references: 971

Heap
 PSYoungGen      total 1395264K, used 965841K [0x00002aab8d4b0000,
0x00002aabf7f50000, 0x00002aabf7f50000)
  eden space 1030400K, 93% used
[0x00002aab8d4b0000,0x00002aabc83d4788,0x00002aabcc2f0000)
  from space 364864K, 0% used
[0x00002aabe1b00000,0x00002aabe1b10000,0x00002aabf7f50000)
  to   space 352320K, 0% used
[0x00002aabcc2f0000,0x00002aabcc2f0000,0x00002aabe1b00000)
 PSOldGen        total 3495296K, used 642758K [0x00002aaab7f50000,
0x00002aab8d4b0000, 0x00002aab8d4b0000)
  object space 3495296K, 18% used
[0x00002aaab7f50000,0x00002aaadf301a78,0x00002aab8d4b0000)
 PSPermGen       total 21248K, used 19258K [0x00002aaaaff50000,
0x00002aaab1410000, 0x00002aaab7f50000)
  object space 21248K, 90% used
[0x00002aaaaff50000,0x00002aaab121e8d8,0x00002aaab1410000)

 num     #instances         #bytes  class name
----------------------------------------------
   1:       6459678      491568792  [C
   2:       6456059      258242360  java.lang.String
   3:       6282264      251290560  org.apache.lucene.index.TermInfo
   4:       6282189      201030048  org.apache.lucene.index.Term
   5:         70220       39109632  [I
   6:          6082       25264288  [B
   7:           149       20355504  [J
   8:           133       20354208  [Lorg.apache.lucene.index.Term;
   9:           133       20354208  [Lorg.apache.lucene.index.TermInfo;
  10:        160230        8972880  java.nio.HeapByteBuffer
  11:        160218        8972208  java.nio.HeapCharBuffer
  12:        160210        8971760
 org.apache.lucene.index.FieldsReader$FieldForMerge
  13:         30440        4095480  <constMethodKlass>
  14:         30440        3660128  <methodKlass>
  15:          2605        3026184  <constantPoolKlass>
  16:         22065        3025120  [Ljava.lang.Object;
  17:          1297        2411792  [Ljava.util.HashMap$Entry;
  18:         48691        2309696  <symbolKlass>
  19:          2604        1981728  <instanceKlassKlass>
  20:          2194        1889888  <constantPoolCacheKlass>
  21:         27444        1317312  java.util.HashMap$Entry
  22:         24954         998160  java.util.AbstractList$Itr
  23:         18834         753360  org.apache.lucene.index.FieldInfo
  24:          2846         523664  java.lang.Class
  25:         13021         520840  java.util.ArrayList
  26:         12471         399072  org.apache.lucene.document.Document
  27:          3895         372216  [[I
  28:          3904         309592  [S
  29:           534         249632  <methodDataKlass>
  30:          3451         220864
 org.apache.lucene.index.SegmentReader$Norm
  31:          1547         136136
 org.apache.lucene.store.FSDirectory$FSIndexInput
  32:           213         120984  <objArrayKlassKlass>
  33:           737         112024  java.lang.reflect.Method
  34:          1575         100800  java.lang.ref.Finalizer
  35:          1345          86080
 org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor
  36:          1188          76032  java.util.HashMap
...




On Fri, Jan 2, 2009 at 11:20 AM, Brian Whitman <[hidden email]> wrote:

> We have a distributed setup that has been experiencing glacially slow
> commit times on only some of the shards. (10s on a good shard, 263s on a
> slow shard.) Each shard for this index has about 10GB of lucene index data
> and the documents are segregated by an md5 hash, so the distribution of
> document/data types should be equal across all shards. I've turned off our
> postcommit hooks to isolate the problem, so it's not a snapshot run amok or
> anything. I also moved the indexes over to new machines and the same indexes
> that were slow in production are also slow on the test machines.
> During the slow commit, the jetty process is 100% CPU / 50% RAM on a 8GB
> quad core machine. The slow commit happens every time after I add at least
> one document. (If I don't add any documents the commit is immediate.)
>
> What can I do to look into this problem?
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: debugging long commits

bwhitman
I think I'm getting close with this (sorry for the self-replies)

I tried an optimize (which we never do) and it took 30m and said this a lot:

Exception in thread "Lucene Merge Thread #4"
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 34950
at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:314)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out of
range: 34950
at org.apache.lucene.util.BitVector.get(BitVector.java:91)
at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
at
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:633)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:585)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:546)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:499)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4291)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3932)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:260)
Jan 2, 2009 6:05:49 PM org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: background merge hit exception: _ks4:C2504982
_oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201 _1agw:C1717926
into _1agy [optimize]
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2346)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:355)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcesso
...


But then it finished. And now commits are OK again.

Anyone know what the merge hit exception means and if i lost anything?
Reply | Threaded
Open this post in threaded view
|

Re: debugging long commits

Mike Klaas

Hi Brian,

You might want to follow up on the Lucene list ([hidden email]
).  Something was causing problems with the merging and thus you ended  
up with too many segments (hence the slow commits).  I doubt that you  
lost anything--usually the merge function doesn't modify the index  
until the merge is complete.  But I am not familiar enough with this  
code in lucene to be sure.

-Mike

On 2-Jan-09, at 10:17 AM, Brian Whitman wrote:

> I think I'm getting close with this (sorry for the self-replies)
>
> I tried an optimize (which we never do) and it took 30m and said  
> this a lot:
>
> Exception in thread "Lucene Merge Thread #4"
> org.apache.lucene.index.MergePolicy$MergeException:
> java.lang.ArrayIndexOutOfBoundsException: Array index out of range:  
> 34950
> at
> org
> .apache
> .lucene
> .index
> .ConcurrentMergeScheduler
> .handleMergeException(ConcurrentMergeScheduler.java:314)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler
> $MergeThread.run(ConcurrentMergeScheduler.java:291)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out  
> of
> range: 34950
> at org.apache.lucene.util.BitVector.get(BitVector.java:91)
> at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:
> 125)
> at
> org
> .apache
> .lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
> at
> org
> .apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:
> 633)
> at
> org
> .apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:
> 585)
> at
> org
> .apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:
> 546)
> at  
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:
> 499)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
> at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:
> 4291)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3932)
> at
> org
> .apache
> .lucene
> .index
> .ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler
> $MergeThread.run(ConcurrentMergeScheduler.java:260)
> Jan 2, 2009 6:05:49 PM org.apache.solr.common.SolrException log
> SEVERE: java.io.IOException: background merge hit exception:  
> _ks4:C2504982
> _oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201  
> _1agw:C1717926
> into _1agy [optimize]
> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2346)
> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280)
> at
> org
> .apache
> .solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:
> 355)
> at
> org
> .apache
> .solr
> .update.processor.RunUpdateProcessor.processCommit(RunUpdateProcesso
> ...
>
>
> But then it finished. And now commits are OK again.
>
> Anyone know what the merge hit exception means and if i lost anything?