Solr 7.1 nodes shutting down

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr 7.1 nodes shutting down

Joe Obernberger
Hi All - having an issue that seems to be related to the machine being
under a high CPU load.  Occasionally a node will fall out of the solr
cloud cluster.  It will be using 200% CPU and show the following exception:

2018-08-10 15:36:43.416 INFO  (qtp1908316405-203450) [c:models s:shard3
r:core_node17 x:models_shard3_replica_n14] o.a.s.s.HttpSolrCall Unable
to write response, client closed connection or we are shutting down
org.eclipse.jetty.io.EofException: Closed
         at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:659)
         at
org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:55)
         at
org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54)
         at java.io.OutputStream.write(OutputStream.java:116)
         at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
         at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
         at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
         at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
         at org.apache.solr.util.FastWriter.flush(FastWriter.java:140)
         at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:154)
         at
org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:96)
         at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:73)
         at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
         at
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:789)
         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)
         at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
         at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
         at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)
         at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
         at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
         at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
         at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
         at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
         at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
         at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
         at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
         at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
         at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
         at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
         at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
         at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
         at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
         at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
         at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
         at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at org.eclipse.jetty.server.Server.handle(Server.java:530)
         at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)
         at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)
         at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
         at
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
         at
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
         at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
         at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
         at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
         at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
         at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
         at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
         at java.lang.Thread.run(Thread.java:748)

This is followed by a bunch of exceptions such as:


2018-08-10 19:15:58.989 ERROR (qtp1908316405-209211) [c:UNCLASS
s:shard23 r:core_node47 x:UNCLASS_shard23_replica_n44]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
ClusterState says we are the leader
(http://triton:9100/solr/UNCLASS_shard23_replica_n44), but locally we
don't think so. Request came from null
         at
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:571)
         at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:324)
         at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:259)
         at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:614)
         at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
         at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)
         at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:188)
         at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:144)
         at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:311)
         at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
         at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:130)
         at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:276)
         at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
         at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)
         at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:195)
         at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:109)
         at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
         at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
         at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
         at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
         at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:517)
         at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
         at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
         at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)
         at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
         at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
         at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
         at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
         at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
         at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
         at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
         at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
         at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
         at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
         at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
         at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
         at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
         at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
         at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
         at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
         at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
         at org.eclipse.jetty.server.Server.handle(Server.java:530)
         at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)
         at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)
         at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
         at
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
         at
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
         at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
         at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
         at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
         at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
         at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
         at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
         at java.lang.Thread.run(Thread.java:748)

and:

2018-08-10 19:14:10.401 ERROR (qtp1908316405-209211) [c:UNCLASS
s:shard23 r:core_node47 x:UNCLASS_shard23_replica_n44]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: No
registered leader was found after waiting for 4000ms , collection:
UNCLASS slice: shard23 saw
state=DocCollection(UNCLASS//collections/UNCLASS/state.json/3828)={

any ideas on what to try?  I've been trying to figure this out for a
couple days now, but it's very intermittent.

Thank you!

-Joe

Reply | Threaded
Open this post in threaded view
|

Re: Solr 7.1 nodes shutting down

Shawn Heisey-2
On 8/10/2018 1:20 PM, Joe Obernberger wrote:

> Hi All - having an issue that seems to be related to the machine being
> under a high CPU load.  Occasionally a node will fall out of the solr
> cloud cluster.  It will be using 200% CPU and show the following
> exception:
>
> 2018-08-10 15:36:43.416 INFO  (qtp1908316405-203450) [c:models
> s:shard3 r:core_node17 x:models_shard3_replica_n14]
> o.a.s.s.HttpSolrCall Unable to write response, client closed
> connection or we are shutting down
> org.eclipse.jetty.io.EofException: Closed

EofException means that the TCP connection got closed. Because the
timeout that can cause such a disconnection is typically configured for
either 50 or 60 seconds, something *extreme* has happened in order for
that timeout to be exceeded.

With no real information to go on, I would guess that you're having
extreme GC pauses, probably from your heap being too small.

If that's not it, figuring out the problem is going to be an involved
process that could take a while.  You might want to hang out in the IRC
channel for a more interactive chat.

Thanks,
Shawn