Hard commits blocked | non-solrcloud v6.6.2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Hard commits blocked | non-solrcloud v6.6.2

mmb1234

I am seeing that after some time hard commits in all my solr cores stop and
each one's searcher has an "opened at" date to be hours ago even though they
are continuing to ingesting data successfully (index size increasing
continuously).

http://localhost:8983/solr/#/xxxxx-core/plugins?type=core&entry=searcher
"openedAt: 2018-02-08T01:52:24.950Z"

Is there something I am doing incorrectly in my config ?

I setup my solrconfig.xml without <updateLog> for my "bulk indexing" use
cases.

solrConfig.xml:-
  <indexConfig>
    <lockType>none</lockType>
    <ramBufferSizeMB>200</ramBufferSizeMB>
    <mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
      <int name="maxThreadCount">1</int>
      <int name="maxMergeCount">5</int>
    </mergeScheduler>
  </indexConfig>

  <updateHandler class="solr.DirectUpdateHandler2">
    <autoCommit>
      <maxTime>${solr.autoCommit.maxTime:10000}</maxTime>
      <openSearcher>true</openSearcher>
    </autoCommit>
    <autoSoftCommit>
      <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
      <maxDocs>${solr.autoSoftCommit.maxDocs:-1}</maxDocs>
      <openSearcher>false</openSearcher>
    </autoSoftCommit>
  </updateHandler>

Thread dump:-
"commitScheduler-20-thread-1" #391 prio=5 os_prio=0 tid=0x00007ef194011000
nid=0x43a in Object.wait() [0x00007ec99533f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at
org.apache.lucene.index.ConcurrentMergeScheduler.doStall(ConcurrentMergeScheduler.java:616)
        - eliminated <0x000000027005a0f0> (a
org.apache.lucene.index.ConcurrentMergeScheduler)
        at
org.apache.lucene.index.ConcurrentMergeScheduler.maybeStall(ConcurrentMergeScheduler.java:602)
        - locked <0x000000027005a0f0> (a
org.apache.lucene.index.ConcurrentMergeScheduler)
        at
org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:524)
        - locked <0x000000027005a0f0> (a
org.apache.lucene.index.ConcurrentMergeScheduler)
        at
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2083)
        at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:487)
        at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:291)
        at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:276)
        at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:235)
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1980)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1926)
        at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:675)
        - locked <0x000000026f20bb88> (a java.lang.Object)
        at org.apache.solr.update.CommitTracker.run(CommitTracker.java:217)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

Erick Erickson
This is very odd. Do you by any chance have custom code in place
that's not closing searchers properly? If you take a heap dump, how
many searchers to you have open?

If you issue a manual commit
(http://blah/solr/core/update?commit=true) what happens?

Best,
Erick

On Wed, Feb 7, 2018 at 8:01 PM, mmb1234 <[hidden email]> wrote:

>
> I am seeing that after some time hard commits in all my solr cores stop and
> each one's searcher has an "opened at" date to be hours ago even though they
> are continuing to ingesting data successfully (index size increasing
> continuously).
>
> http://localhost:8983/solr/#/xxxxx-core/plugins?type=core&entry=searcher
> "openedAt: 2018-02-08T01:52:24.950Z"
>
> Is there something I am doing incorrectly in my config ?
>
> I setup my solrconfig.xml without <updateLog> for my "bulk indexing" use
> cases.
>
> solrConfig.xml:-
>   <indexConfig>
>     <lockType>none</lockType>
>     <ramBufferSizeMB>200</ramBufferSizeMB>
>     <mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler">
>       <int name="maxThreadCount">1</int>
>       <int name="maxMergeCount">5</int>
>     </mergeScheduler>
>   </indexConfig>
>
>   <updateHandler class="solr.DirectUpdateHandler2">
>     <autoCommit>
>       <maxTime>${solr.autoCommit.maxTime:10000}</maxTime>
>       <openSearcher>true</openSearcher>
>     </autoCommit>
>     <autoSoftCommit>
>       <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>       <maxDocs>${solr.autoSoftCommit.maxDocs:-1}</maxDocs>
>       <openSearcher>false</openSearcher>
>     </autoSoftCommit>
>   </updateHandler>
>
> Thread dump:-
> "commitScheduler-20-thread-1" #391 prio=5 os_prio=0 tid=0x00007ef194011000
> nid=0x43a in Object.wait() [0x00007ec99533f000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at
> org.apache.lucene.index.ConcurrentMergeScheduler.doStall(ConcurrentMergeScheduler.java:616)
>         - eliminated <0x000000027005a0f0> (a
> org.apache.lucene.index.ConcurrentMergeScheduler)
>         at
> org.apache.lucene.index.ConcurrentMergeScheduler.maybeStall(ConcurrentMergeScheduler.java:602)
>         - locked <0x000000027005a0f0> (a
> org.apache.lucene.index.ConcurrentMergeScheduler)
>         at
> org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:524)
>         - locked <0x000000027005a0f0> (a
> org.apache.lucene.index.ConcurrentMergeScheduler)
>         at
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2083)
>         at
> org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:487)
>         at
> org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:291)
>         at
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:276)
>         at
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:235)
>         at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1980)
>         at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
>         at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1926)
>         at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:675)
>         - locked <0x000000026f20bb88> (a java.lang.Object)
>         at org.apache.solr.update.CommitTracker.run(CommitTracker.java:217)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

mmb1234
> If you issue a manual commit
> (http://blah/solr/core/update?commit=true) what happens?

That call never returned back to client browser.

So I also tried a core reload and did capture in the thread dump. That too
never returned.

"qtp310656974-1022" #1022 prio=5 os_prio=0 tid=0x00007ef254010000 nid=0x6bf
runnable [0x00007f0138394000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000271e49b00> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
        at
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
        at
org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:176)
        at
org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:227)
        at org.apache.solr.core.SolrCore.reload(SolrCore.java:644)
        - locked <0x0000000271f08ac8> (a java.lang.Object)
        at
org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1181)
        at
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:111)
        at
org.apache.solr.handler.admin.CoreAdminOperation$$Lambda$95/234250762.execute(Unknown
Source)
        at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:384)
        at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388)
        at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
        at
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748)
        at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:729)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
        at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at org.eclipse.jetty.server.Server.handle(Server.java:534)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
        at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
        at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
        at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
        at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.jav\
a:303)
        at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
        at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
        at java.lang.Thread.run(Thread.java:748)




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

Shawn Heisey-2
In reply to this post by mmb1234
On 2/7/2018 9:01 PM, mmb1234 wrote:
> I am seeing that after some time hard commits in all my solr cores stop and
> each one's searcher has an "opened at" date to be hours ago even though they
> are continuing to ingesting data successfully (index size increasing
> continuously).
>
> http://localhost:8983/solr/#/xxxxx-core/plugins?type=core&entry=searcher
> "openedAt: 2018-02-08T01:52:24.950Z"

Like Erick, I think this is really really strange.  Your config looks
like it could result in some issues with opening searchers, but I
wouldn't expect it to leave you with a searcher that's hours old when
indexing is happening continuously. 

>     <autoCommit>
>       <maxTime>${solr.autoCommit.maxTime:10000}</maxTime>
>       <openSearcher>true</openSearcher>
>     </autoCommit>
>     <autoSoftCommit>
>       <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>       <maxDocs>${solr.autoSoftCommit.maxDocs:-1}</maxDocs>
>       <openSearcher>false</openSearcher>
>     </autoSoftCommit>

Setting openSearcher to false on autoSoftCommit makes no sense.  But for
autoCommit, it SHOULD be false.  For change visibility, it's better to
use soft commits than hard.

Something like this is what I recommend that you use (one minute for
autoCommit, two minutes for autoSoftCommit):

    <autoCommit>
      <maxTime>60000</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>
    <autoSoftCommit>
      <maxTime>120000</maxTime>
    </autoSoftCommit>

If you really want commits to happen more frequently than I have
described, then you could probably go as low as 30 seconds on
autoSoftCommit.  But this will cause Solr to do more work, so I don't
recommend it unless it's actually required for your use case.  I would
definitely not go any lower than 30 seconds on autoSoftCommit unless
*all* cache warming is disabled.

In your reply to Erick's message, you included *ONE* thread from a
thread dump.  A different thread was included in the initial message,
but it was still only one thread.  A thread dump that includes exactly
one thread isn't useful.  The only thing we can say from that is that
the method is waiting on another thread to release a lock before it can
proceed ... but we can't tell WHAT it is waiting on.  The entire thread
dump is required.

Due to the size of an entire thread dump, and the fact that attachments
usually don't make it to the list, you're going to need to use a paste
website or a file-sharing website to provide that information.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

mmb1234

> Setting openSearcher to false on autoSoftCommit makes no sense.

That was my mistake in my solrconfig.xml. Thank you for identifying it. I
have corrected it.

I then removed my custom <mergeScheduler> element from my solrconfig.xml and
both hard commit and /solr/admin/core hang issues seemed to go way for a
couple of hours.

Then I increased the ingestion load and the issue came back. Taking a heap
dump I now see lots of these and back to commit hanging. Full thread dump is
at https://github.com/mohsinbeg/datadump/blob/master/threadDump-7pjql_1.zip

"commitScheduler-24-thread-1" #187 prio=5 os_prio=0 tid=0x00007f7c9401d000
nid=0x369 runnable [0x00007f7c1a664000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000026b4317f0> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
        at
org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:176)
        at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:121)
        at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:618)
        at org.apache.solr.update.CommitTracker.run(CommitTracker.java:217)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

mmb1234

Shawn, Eric,

Were you able to look at the thread dump ?
https://github.com/mohsinbeg/datadump/blob/master/threadDump-7pjql_1.zip

Or is there additional data I may provide.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

Shawn Heisey-2
In reply to this post by mmb1234
On 2/8/2018 5:50 PM, mmb1234 wrote:
> I then removed my custom <mergeScheduler> element from my solrconfig.xml and
> both hard commit and /solr/admin/core hang issues seemed to go way for a
> couple of hours.

The mergeScheduler config you had is not likely to be causing any
issues.  It's a good config.  But removing it shouldn't do all that much
for the things we've been discussing either.  It would be fine to put it
back in, or leave it out.

> Then I increased the ingestion load and the issue came back. Taking a heap
> dump I now see lots of these and back to commit hanging. Full thread dump is
> at https://github.com/mohsinbeg/datadump/blob/master/threadDump-7pjql_1.zip

There are over six thousand threads (which is a lot) that are in a
WAITING state while trying to add documents to the index.  But I cannot
tell what they are waiting for -- the hex ID values that the threads are
waiting for are not mentioned as being held by any of the threads in the
dump.

At the end of this message is an example of a stacktrace for one of
those threads.  It doesn't appear to be the commits that are the problem
here, but rather something keeping the indexing from happening ...
except that you have said the index size is increasing, so I don't
really know what's going on.

There are some other details I would like to see.

I would like to see a "top" display.  Run the top utility (not htop or
any other variant), press shift-M to sort by memory, grab a screenshot
of that display, and put it somewhere we can retrieve it.

Also, can you let Solr run for a while, preferably while recreating the
problem, and then share the GC log that Solr creates?

Thanks,
Shawn


==========

"qtp310656974-50242" #50242 prio=5 os_prio=0 tid=0x00007e2120b35800
nid=0xaf95 runnable [0x00007f866155b000]
    java.lang.Thread.State: TIMED_WAITING (parking)
         at sun.misc.Unsafe.park(Native Method)
         - parking to wait for  <0x0000000268c4e590> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
         at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
         at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
         at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
         at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
         at
org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:176)
         at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:121)
         at
org.apache.solr.update.DirectUpdateHandler2.allowDuplicateUpdate(DirectUpdateHandler2.java:269)
         at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:242)
         at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194)
         at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
         at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
         at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1005)
         at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
         at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
         at
com.vmware.ops.data.solr.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:342)
         at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
         at
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
         at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
         at
com.vmware.ops.data.solr.processor.UnknownFieldNameMutatingUpdateProcessorFactory$1.processAdd(UnknownFieldNameMutatingUpdateProcessorFactory.java:91)
         at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
         at
com.vmware.ops.data.solr.processor.CopyTextFieldProcessorFactory$CopyTextFieldUpdateProcessor.processAdd(CopyTextFieldProcessorFactory.java:58)
         at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
         at
com.vmware.ops.data.solr.processor.TimestampExtractorUpdateProcessor.processAdd(UpdateTimestampProcessorFactory.java:67)
         at
com.vmware.ops.data.solr.handler.IngestionHandler.handleRequestBody(IngestionHandler.java:340)
         at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
         at
com.vmware.ops.data.solr.handler.AggregateIngestionHandler.handleRequestBody(AggregateIngestionHandler.java:157)
         at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
         at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
         at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
         at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
         at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
         at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
         at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
         at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
         at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
         at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
         at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
         at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
         at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
         at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
         at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
         at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
         at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
         at org.eclipse.jetty.server.Server.handle(Server.java:534)
         at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
         at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
         at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
         at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
         at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
         at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
         at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
         at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
         at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
         at java.lang.Thread.run(Thread.java:748)

==========
Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

mmb1234
Ran /solr/58f449cec94a2c75-core-248/admin/luke at 7:05pm PST

It showed "lastModified: 2018-02-10T02:25:08.231Z" indicating commit blocked
for about 41 mins.
Hard commit is set as 10secs in solrconfig.xml

Other cores are also now blocked.

https://jstack.review analysis of the thread dump says "Potential Deadlock"
for three 3 cores.

0x000000027afca8e0
ConcurrentMergeScheduler
Possible deadlock
Held by commitScheduler-26-thread-1 awaiting notification , holding [
0x000000027afca8e0 ]

0x000000027fd19080
ConcurrentMergeScheduler
Possible deadlock
Held by commitScheduler-19-thread-1 awaiting notification , holding [
0x000000027fd19080 0x00000002615ada20 ]

0x0000000280da09c8
ConcurrentMergeScheduler
Possible deadlock
Held by commitScheduler-24-thread-1 awaiting notification , holding [
0x0000000280da09c8 0x000000028e0cbf10 ]

Thread dump and top output are from that condition are available at
https://github.com/mohsinbeg/datadump/tree/master/solr58f449cec94a2c75_core_248/solr58f449cec94a2c75_core_248


848 thread(s) have this kind of stack:
-------
at java.lang.Object.wait(Native Method)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doStall(ConcurrentMergeScheduler.java:616)
at
org.apache.lucene.index.ConcurrentMergeScheduler.maybeStall(ConcurrentMergeScheduler.java:602)
...
 at
org.apache.solr.update.DirectUpdateHandler2.allowDuplicateUpdate(DirectUpdateHandler2.java:276)
...



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

Shawn Heisey-2
On 2/9/2018 8:20 PM, mmb1234 wrote:

> Ran /solr/58f449cec94a2c75-core-248/admin/luke at 7:05pm PST
>
> It showed "lastModified: 2018-02-10T02:25:08.231Z" indicating commit blocked
> for about 41 mins.
> Hard commit is set as 10secs in solrconfig.xml
>
> Other cores are also now blocked.
>
> https://jstack.review analysis of the thread dump says "Potential Deadlock"
> for three 3 cores.

The deadlock is not detected by analysis on the fastthread.io website. 
I don't know if that's because jstack.review is a better tool or not.

I do notice that you are using custom Solr components from Amazon and
VMWare.  There is another user writing a thread on this list right now
who is also experiencing a lot of blocked threads. That user is ALSO
running custom components in Solr.  I suspect there may be bugs in the
custom components.

> Thread dump and top output are from that condition are available at
> https://github.com/mohsinbeg/datadump/tree/master/solr58f449cec94a2c75_core_248/solr58f449cec94a2c75_core_248

Those screenshots were of the "htop" program.  I very specifically said
to NOT use htop -- it must be the actual "top" program.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

Erick Erickson
If you flood the system with updates, eventually you run out of merge
threads and then updates block until the merge is done. It's may be
that at this point you get into some deadlocked state, updates should
slow down but not deadlock.

How are you updating? Are you stringing together a zillion clients (or
threads)? Are you _not_ waiting for a return before sending more docs?
If updates are blocking due to merging threads and your clients are
waiting for the response from Solr before sending more docs, it should
be self-limiting.


Some things you can do to index quickly
1> batch updates together. I usually send packets of 1,000 updates,
some experimentation will help you decide a good number.
2> make very sure you aren't committing from your clients.

There's some discussion of rate-limiting Solr to avoid this kind of
thing, but no definitive plans.

Best,
Erick

On Sat, Feb 10, 2018 at 2:57 AM, Shawn Heisey <[hidden email]> wrote:

> On 2/9/2018 8:20 PM, mmb1234 wrote:
>>
>> Ran /solr/58f449cec94a2c75-core-248/admin/luke at 7:05pm PST
>>
>> It showed "lastModified: 2018-02-10T02:25:08.231Z" indicating commit
>> blocked
>> for about 41 mins.
>> Hard commit is set as 10secs in solrconfig.xml
>>
>> Other cores are also now blocked.
>>
>> https://jstack.review analysis of the thread dump says "Potential
>> Deadlock"
>> for three 3 cores.
>
>
> The deadlock is not detected by analysis on the fastthread.io website.  I
> don't know if that's because jstack.review is a better tool or not.
>
> I do notice that you are using custom Solr components from Amazon and
> VMWare.  There is another user writing a thread on this list right now who
> is also experiencing a lot of blocked threads. That user is ALSO running
> custom components in Solr.  I suspect there may be bugs in the custom
> components.
>
>> Thread dump and top output are from that condition are available at
>>
>> https://github.com/mohsinbeg/datadump/tree/master/solr58f449cec94a2c75_core_248/solr58f449cec94a2c75_core_248
>
>
> Those screenshots were of the "htop" program.  I very specifically said to
> NOT use htop -- it must be the actual "top" program.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

mmb1234
Hi Shawn, Erik

> updates should slow down but not deadlock.
The net effect is the same. As the CLOSE_WAITs increase, jvm ultimately
stops accepting new socket requests, at which point `kill <solrpid>` is the
only option.

This means if replication handler is invoked which sets the deletion policy,
the threads blocked rises even faster and system fails even faster.

Each solr POST is a blocking call, hence the CLOSE_WAITs. Also the POST gzip
is an json array of 100 json objects (1 json doc = 1 solr doc).

All custom AbstractSolrEventListener listeners were disabled to not process
any post commit events. Those threads are in WAITING state, which is ok.

I then ran /solr/58f449cec94a2c75-core-256/admin/luke at 10:30pm PST

It showed "lastModified: 2018-02-11T04:46:54.540Z" indicating commit blocked
for about 2 hours.
Hard commit is set as 10secs in solrconfig.xml

Other cores are also blocked for a while.

Thread dump and top output are from that condition are at
https://github.com/mohsinbeg/datadump/tree/master/solr58f449cec94a2c75_core_256

netstat CLOSE_WAIT are correlated with DirectUpdateHandler2 /
UpdateRequestProcessor.processAdd() requests.

solr [ /tmp ]$ sudo netstat -ptan | awk '{print $6 " " $7 }' | sort | uniq
-c; TZ=PST8PDT date;
   7728 CLOSE_WAIT -
     65 ESTABLISHED -
      1 FIN_WAIT2 -
      1 Foreign Address
      6 LISTEN -
     36 TIME_WAIT -
      1 established)
Sat Feb 10 22:27:07 PST 2018

http://fastthread.io shows lots 6,700 threads in TIMED_WAIT

https://jstack.review shows 6584 threads with this stack
at java.lang.Object.wait(Native Method)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doStall(ConcurrentMergeScheduler.java:616)
at
org.apache.lucene.index.ConcurrentMergeScheduler.maybeStall(ConcurrentMergeScheduler.java:602)
...
at
org.apache.solr.update.DirectUpdateHandler2.allowDuplicateUpdate(DirectUpdateHandler2.java:276)
...
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
...


Only `top` available on Photon OS is
https://github.com/vmware/photon/blob/1.0/SPECS/procps-ng/procps-ng.spec.
Those screenshots are attached.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

Shawn Heisey
On 2/10/2018 11:58 PM, mmb1234 wrote:
> Only `top` available on Photon OS is
> https://github.com/vmware/photon/blob/1.0/SPECS/procps-ng/procps-ng.spec.
> Those screenshots are attached.

Attachments rarely make it to the mailing list.  I don't see any
attachments.

This is an example of what I am looking for:

https://www.dropbox.com/s/85h1weuopa643za/bigindy5-top-small-shr.png?dl=0

This screenshot is from a machine running one Solr instance with a 28GB
heap, and over 700GB of index data.  These values can be deduced from
the screenshot.  The other java processes are home-grown software
related to Solr, not using very many resources.

The machine has 64GB of RAM.  This is a dev server.  If I were to see
these numbers from somebody's production install, I would conclude that
the machine had insufficient memory for good performance -- 64GB of
total memory, 28GB of that used by Solr's heap, leaves about 36GB to
cache over 700GB of index data.  It's not enough.  This machines rarely
sees any query traffic.  What queries *do* get sent there do not perform
very well.  Because it's a dev server, it just needs to work, good
performance is unnecessary.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

mmb1234

> https://github.com/mohsinbeg/datadump/tree/master/solr58f449cec94a2c75_core_256

I had uploaded the output at the above link.

The OS has no swap configured. There are other processes on the host but
<1GB or <5% CPU cumulatively but none inside the docker as `top` shows. Solr
JVM heap is at 30GB. JVM monitoring shows heap never went over 10GB.

Also there were no queries at that time. Few hours earlier I did run a quite
a few heavy hitting distributed paging and json facet queries.

The machine is 32vCPU and 240GB memory.

Please let me know if vmstat and/or iostat will be helpful. I didn't save
the gc logs at that time but can capture it if you like.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

Emir Arnautović
I didn’t have time to look at the full thread dump, but noticed one thing in pasted stack trace:

AddSchemaFieldsUpdateProcessor.processAdd

Is it possible that you are doing a lot changes to your schema?

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 11 Feb 2018, at 20:54, mmb1234 <[hidden email]> wrote:
>
>
>> https://github.com/mohsinbeg/datadump/tree/master/solr58f449cec94a2c75_core_256
>
> I had uploaded the output at the above link.
>
> The OS has no swap configured. There are other processes on the host but
> <1GB or <5% CPU cumulatively but none inside the docker as `top` shows. Solr
> JVM heap is at 30GB. JVM monitoring shows heap never went over 10GB.
>
> Also there were no queries at that time. Few hours earlier I did run a quite
> a few heavy hitting distributed paging and json facet queries.
>
> The machine is 32vCPU and 240GB memory.
>
> Please let me know if vmstat and/or iostat will be helpful. I didn't save
> the gc logs at that time but can capture it if you like.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Hard commits blocked | non-solrcloud v6.6.2

mmb1234

The below solrconfig.xml settings resolved the TIMED_WAIT in
ConcurrentMergeScheduler.doStall(). Thanks to Shawn and Erik for their
pointers.

  <indexConfig>
    ...
    <mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">
       <int name="maxMergeAtOnce">30</int>
       <int name="maxMergeAtOnceExplicit">100</int>
       <double name="segmentsPerTier">30.0</double>
    </mergePolicyFactory>
    <mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
       <int name="maxMergeCount">18</int>
       <int name="maxThreadCount">6</int>
    </mergeScheduler>
    <ramBufferSizeMB>300</ramBufferSizeMB>
    ...
  </indexConfig>

  <updateHandler class="solr.DirectUpdateHandler2">
    <autoCommit>
      <maxTime>${solr.autoCommit.maxTime:30000}</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>
    <autoSoftCommit>
      <maxTime>${solr.autoSoftCommit.maxTime:10000}</maxTime>
      <maxDocs>${solr.autoSoftCommit.maxDocs:-1}</maxDocs>
      <openSearcher>true</openSearcher>
    </autoSoftCommit>
  </updateHandler>



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html