Solr 7.3 suggest dictionary building fails in cloud mode with large number of rows

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr 7.3 suggest dictionary building fails in cloud mode with large number of rows

Yogendra Kumar Soni
I have 130 million documents and each document has unique document id. I
want to build suggester on document id. suggest dictionary building is
failing for 130 millions. while testing it was successful with 50 million
documents.

8 nodes with 50 GB head for each node and total 600 gb ram

heap usage is around 10 GB - 12 GB per node.

It takes around 50 -60 min then fail and different shards fails on diffrent
tries.

Solr version 7.3.0

OS: linux

runtime: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_172
25.172-b11

suggester configuration in solrconfig.xml

<searchComponent name="suggestdn" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">suggestdn</str>
    <str name="lookupImpl">FuzzyLookupFactory</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="field">docid</str>
    <str name="suggestAnalyzerFieldType">string</str>
    <str name="exactMatchFirst">true</str>
    <str name="buildOnStartup">false</str>
    <str name="buildOnCommit">false</str>
  </lst>
</searchComponent>

<requestHandler name="/suggestdn" class="solr.SearchHandler" startup="lazy" >
    <lst name="defaults">
      <str name="suggest">true</str>
      <str name="suggest.count">10</str>
      <str name="suggest.dictionary">suggestpn</str>
      <bool name="distrib">true</bool>
      <str name="shards.qt">/suggestdn</str>
    </lst>

    <arr name="components">
      <str>suggestdn</str>
    </arr>
    </requestHandler>

I am getting following stacktrace:

HttpSolrCall
null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Timeout occured
while waiting response from server at:
http://localhost:11180/solr/shard5_replica_n8
null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Timeout occured
while waiting response from server at:
http://localhost:11180/solr/hard5_replica_n8
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:410)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:517)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.Server.handle(Server.java:530)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.client.solrj.SolrServerException: Timeout
occured while waiting response from server at:
http://10.1.1.189:14080/solr/alexandria-standard_shard5_replica_n8
    at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:654)
    at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
    at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
    at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
    at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
    at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    ... 1 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
    at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
    at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
    at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
    at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
    at org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:118)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
    at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542)
    ... 12 more


*Thanks and Regards,*
*Yogendra Kumar Soni*
Reply | Threaded
Open this post in threaded view
|

Re: Solr 7.3 suggest dictionary building fails in cloud mode with large number of rows

Alessandro Benedetti
Hi Yogendra,
you mentioned you are using SolrCloud.
In SolrCloud an investigation does not isolate to a single Solr log : you
see a timeout, i would recommend to check both the nodes involved.

When you say : " heap usage is around 10 GB - 12 GB per node.", do you refer
to the effective usage by the Solr JVM or the allocated heap ?
Are you monitoring the memory utilisation for your Solr nodes ?
Are Garbage Collection cycles behaving correctly ?
When a timeout occurs, something bad happened in the communication between
the Solr nodes.
It could be network, but in your case it may be some Stop World situation
caused by GC.




-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Solr 7.3 suggest dictionary building fails in cloud mode with large number of rows

Yogendra Kumar Soni
I sent  log of node to which i  sent the request. need to check other nodes
log
>>In SolrCloud an investigation does not isolate to a single Solr log : you
>>see a timeout, i would recommend to check both the nodes involved.


monitored from admin UI, could not find any clue at the time of failure.

>>Are you monitoring the memory utilisation for your Solr nodes ?


>>When you say : " heap usage is around 10 GB - 12 GB per node.", do you
refer
    to the effective usage by the Solr JVM or the allocated heap ?


heap usage varies from 5 gb to 12 gb . Initially it was 5 gb then increased
to 12 gb gradually and decreasing to 5 gb again. (may be because of garbage
collection)
10-12 GB maximum  heap uses, allocated is 50 GB.

>>Are Garbage Collection cycles behaving correctly ?
>>When a timeout occurs, something bad happened in the communication between
>>the Solr nodes.

Need to  analyze GC pause. Any suggestion how i can monitor resource usage
and GC pause effectively.
>>It could be network, but in your case it may be some Stop World situation
>>caused by GC.

On Mon, Jun 4, 2018 at 3:27 PM, Alessandro Benedetti <[hidden email]>
wrote:

> Hi Yogendra,
> you mentioned you are using SolrCloud.
> In SolrCloud an investigation does not isolate to a single Solr log : you
> see a timeout, i would recommend to check both the nodes involved.
>
> When you say : " heap usage is around 10 GB - 12 GB per node.", do you
> refer
> to the effective usage by the Solr JVM or the allocated heap ?
> Are you monitoring the memory utilisation for your Solr nodes ?
> Are Garbage Collection cycles behaving correctly ?
> When a timeout occurs, something bad happened in the communication between
> the Solr nodes.
> It could be network, but in your case it may be some Stop World situation
> caused by GC.
>
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



--
*Thanks and Regards,*
*Yogendra Kumar Soni*
Reply | Threaded
Open this post in threaded view
|

Re: Solr 7.3 suggest dictionary building fails in cloud mode with large number of rows

Erick Erickson
bq. I have 130 million documents and each document has unique document id. I
want to build suggester on document id.

Why do it this way? I'm supposing you want to have someone start
typing in the doc ID
then do autocomplete on it. For such a simple operation, it would be
far easier and
pretty certainly fast enough to just use the Terms component and specify
terms.prefix. See:
https://lucene.apache.org/solr/guide/6_6/the-terms-component.html

This would not require any build step, would be as up-to-date as your
last commit,
would not consume the additional resources a suggester would work if
you shard.....

Best,
Erick

On Mon, Jun 4, 2018 at 4:23 AM, Yogendra Kumar Soni
<[hidden email]> wrote:

> I sent  log of node to which i  sent the request. need to check other nodes
> log
>>>In SolrCloud an investigation does not isolate to a single Solr log : you
>>>see a timeout, i would recommend to check both the nodes involved.
>
>
> monitored from admin UI, could not find any clue at the time of failure.
>
>>>Are you monitoring the memory utilisation for your Solr nodes ?
>
>
>>>When you say : " heap usage is around 10 GB - 12 GB per node.", do you
> refer
>     to the effective usage by the Solr JVM or the allocated heap ?
>
>
> heap usage varies from 5 gb to 12 gb . Initially it was 5 gb then increased
> to 12 gb gradually and decreasing to 5 gb again. (may be because of garbage
> collection)
> 10-12 GB maximum  heap uses, allocated is 50 GB.
>
>>>Are Garbage Collection cycles behaving correctly ?
>>>When a timeout occurs, something bad happened in the communication between
>>>the Solr nodes.
>
> Need to  analyze GC pause. Any suggestion how i can monitor resource usage
> and GC pause effectively.
>>>It could be network, but in your case it may be some Stop World situation
>>>caused by GC.
>
> On Mon, Jun 4, 2018 at 3:27 PM, Alessandro Benedetti <[hidden email]>
> wrote:
>
>> Hi Yogendra,
>> you mentioned you are using SolrCloud.
>> In SolrCloud an investigation does not isolate to a single Solr log : you
>> see a timeout, i would recommend to check both the nodes involved.
>>
>> When you say : " heap usage is around 10 GB - 12 GB per node.", do you
>> refer
>> to the effective usage by the Solr JVM or the allocated heap ?
>> Are you monitoring the memory utilisation for your Solr nodes ?
>> Are Garbage Collection cycles behaving correctly ?
>> When a timeout occurs, something bad happened in the communication between
>> the Solr nodes.
>> It could be network, but in your case it may be some Stop World situation
>> caused by GC.
>>
>>
>>
>>
>> -----
>> ---------------
>> Alessandro Benedetti
>> Search Consultant, R&D Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>
>
> --
> *Thanks and Regards,*
> *Yogendra Kumar Soni*
Reply | Threaded
Open this post in threaded view
|

Re: Solr 7.3 suggest dictionary building fails in cloud mode with large number of rows

Walter Underwood
Yes, why are you doing this? A suggester is designed to have a smaller set of terms than the entire index.

I would never expect a 130 million term suggester to work. I’m astonished that it works with 50 million terms.

We typically have about 50 thousand terms in a suggester.

Also, you haven’t said which kind of suggester you have configured. Some of them are in memory.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Jun 4, 2018, at 9:09 AM, Erick Erickson <[hidden email]> wrote:
>
> bq. I have 130 million documents and each document has unique document id. I
> want to build suggester on document id.
>
> Why do it this way? I'm supposing you want to have someone start
> typing in the doc ID
> then do autocomplete on it. For such a simple operation, it would be
> far easier and
> pretty certainly fast enough to just use the Terms component and specify
> terms.prefix. See:
> https://lucene.apache.org/solr/guide/6_6/the-terms-component.html
>
> This would not require any build step, would be as up-to-date as your
> last commit,
> would not consume the additional resources a suggester would work if
> you shard.....
>
> Best,
> Erick
>
> On Mon, Jun 4, 2018 at 4:23 AM, Yogendra Kumar Soni
> <[hidden email]> wrote:
>> I sent  log of node to which i  sent the request. need to check other nodes
>> log
>>>> In SolrCloud an investigation does not isolate to a single Solr log : you
>>>> see a timeout, i would recommend to check both the nodes involved.
>>
>>
>> monitored from admin UI, could not find any clue at the time of failure.
>>
>>>> Are you monitoring the memory utilisation for your Solr nodes ?
>>
>>
>>>> When you say : " heap usage is around 10 GB - 12 GB per node.", do you
>> refer
>>    to the effective usage by the Solr JVM or the allocated heap ?
>>
>>
>> heap usage varies from 5 gb to 12 gb . Initially it was 5 gb then increased
>> to 12 gb gradually and decreasing to 5 gb again. (may be because of garbage
>> collection)
>> 10-12 GB maximum  heap uses, allocated is 50 GB.
>>
>>>> Are Garbage Collection cycles behaving correctly ?
>>>> When a timeout occurs, something bad happened in the communication between
>>>> the Solr nodes.
>>
>> Need to  analyze GC pause. Any suggestion how i can monitor resource usage
>> and GC pause effectively.
>>>> It could be network, but in your case it may be some Stop World situation
>>>> caused by GC.
>>
>> On Mon, Jun 4, 2018 at 3:27 PM, Alessandro Benedetti <[hidden email]>
>> wrote:
>>
>>> Hi Yogendra,
>>> you mentioned you are using SolrCloud.
>>> In SolrCloud an investigation does not isolate to a single Solr log : you
>>> see a timeout, i would recommend to check both the nodes involved.
>>>
>>> When you say : " heap usage is around 10 GB - 12 GB per node.", do you
>>> refer
>>> to the effective usage by the Solr JVM or the allocated heap ?
>>> Are you monitoring the memory utilisation for your Solr nodes ?
>>> Are Garbage Collection cycles behaving correctly ?
>>> When a timeout occurs, something bad happened in the communication between
>>> the Solr nodes.
>>> It could be network, but in your case it may be some Stop World situation
>>> caused by GC.
>>>
>>>
>>>
>>>
>>> -----
>>> ---------------
>>> Alessandro Benedetti
>>> Search Consultant, R&D Software Engineer, Director
>>> Sease Ltd. - www.sease.io
>>> --
>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>>
>>
>>
>>
>> --
>> *Thanks and Regards,*
>> *Yogendra Kumar Soni*

Reply | Threaded
Open this post in threaded view
|

Re: Solr 7.3 suggest dictionary building fails in cloud mode with large number of rows

Alessandro Benedetti
In addition to what Erick and Walter correctly mentioned :

"heap usage varies from 5 gb to 12 gb . Initially it was 5 gb then increased
to 12 gb gradually and decreasing to 5 gb again. (may be because of garbage
collection)
10-12 GB maximum  heap uses, allocated is 50 GB. "

Did I read it right ?
Is 50 Gb allocated to the phisical/virtual machine where Solr is running or
to the Solr JVM ?
If the first is ok, the latter is considered a bad practice unless you
really need all that heap for your Solr process ( which is extremely
unlikely)

You need to leave memory to the OS memory mapping ( which is heavily used by
Solr).
With such a big heap, you GC may indeed end up in long pauses.
It is recommended to allocate to the Solr process as little as possible (
according yo your requirements)

Regards



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io