[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748047#comment-13748047 ]

Kevin Osborn commented on SOLR-5081:
------------------------------------

I may have this issue as well. I am posting batches of 1000 through SolrJ. I have autoCommit set to 15000 with openSearcher=false. autoSoftCommit is set to 30000. During my initial testing, I was able to recreate it after just a couple updates. I then change the limit of the number of open files for the process from 4096 to 15000. This seemed to help, but only to a point.

If all my updates are at once, it seems to succeed. But if I have pauses between updates, it seems to have problems. I have also only seen this error when I have more than 1 node in my SolrCloud cluster.

I also took a look at netstat. There seemed to be a lot of connections between my two nodes. Could the the frequency of my updates be overwhelming the connection from the leader to the replica?

Deletes also fail, but queries still seem to work.

Restarting the nodes fixes the problem.
               

> Highly parallel document insertion hangs SolrCloud
> --------------------------------------------------
>
>                 Key: SOLR-5081
>                 URL: https://issues.apache.org/jira/browse/SOLR-5081
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.3.1
>            Reporter: Mike Schrag
>         Attachments: threads.txt
>
>
> If I do a highly parallel document load using a Hadoop cluster into an 18 node solrcloud cluster, I can deadlock solr every time.
> The ulimits on the nodes are:
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 1031181
> max locked memory       (kbytes, -l) unlimited
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 32768
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 515590
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> The open file count is only around 4000 when this happens.
> If I bounce all the servers, things start working again, which makes me think this is Solr and not ZK.
> I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]