Advice: solrCloud + DIH

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Advice: solrCloud + DIH

roySolr
Hello,

I need some advice with my solrcloud cluster and the DIH. I have a cluster with 3 cloud servers. Every server has an solr instance and a zookeeper instance. I start it with the -Dzkhost parameter. It works great, i send updates by an curl(xml) like this:

curl http:/ip:SOLRport/solr/update -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">223232</field><field name="content">test</field></doc></add>'

Solr has 2 million docs in the index. Now i want a extra field: content2. I add this in my schema and upload this again to the cluster with -Dbootstrap_confdir and -Dcollection.configName. It's replicated to the whole cluster.

Now i need a re-index to add the field to every doc. I have a database with all the data and want to use the full-import of DIH(this was the way i did this in previous solr versions). When i run this it goes with 3 doc/s(Really slow). When i run solr alone(not solrcloud) it goes 600 docs/sec.

What's the best way to do a full re-index with solrcloud? Does solrcloud support DIH?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Advice: solrCloud + DIH

Mark Miller-3

On Mar 14, 2013, at 9:22 AM, roySolr <[hidden email]> wrote:

> Hello,
>
>  When i run this it goes with 3 doc/s(Really
> slow). When i run solr alone(not solrcloud) it goes 600 docs/sec.
>
> What's the best way to do a full re-index with solrcloud? Does solrcloud
> support DIH?
>
> Thanks
>

SolrCloud supports DIH, but not fully and happily. It's setup to work pretty nicely with non SolrCloud - it will load pretty quick - with SolrCloud a few things can happen - one is that you might be running DIH on a replica rather than a leader - and that can change without your consent - in this case all docs will go to another node and then come back. SolrCloud also works best with multiple threads really - DIH will only use one to my knowledge.

Still, at 3 docs/s, something sounds wrong. That's too slow.

- Mark

Reply | Threaded
Open this post in threaded view
|

Re: Advice: solrCloud + DIH

rulinma
This post was updated on .
In reply to this post by roySolr
3docs/s is lower, I test with 4 node is more 1000docs/s and 4k/doc with solrcloud. Every leader has a replica.

I am tuning to improve to 3000docs/s. 3docs/s is too slow.
btw: I use multithread to insert.

3x!
Reply | Threaded
Open this post in threaded view
|

Re: Advice: solrCloud + DIH

roySolr
This post was updated on .
In reply to this post by Mark Miller-3
Thanks for the support so far,

I was running the dataimport on a replica! Now i start it on the leader and it goes with 590 doc/s. I think all docs were going to another node and then came back.

Is there a way to get the leader? If there is, i can detect the leader with a script and start the DIH every night on the right server.

Roy

Reply | Threaded
Open this post in threaded view
|

Re: Advice: solrCloud + DIH

rulinma
Yes u can know that, u must understand shard partition.
Reply | Threaded
Open this post in threaded view
|

答复: Advice: solrCloud + DIH

Rollin.R.Ma (lab.sh04.Newegg) 41099
2000docs/s is my result. Near to embededsolr. Can be tuned .


Yes u can know that, u must understand shard partition.



--
View this message in context: http://lucene.472066.n3.nabble.com/Advice-solrCloud-DIH-tp4047339p4047673.html
Sent from the Solr - User mailing list archive at Nabble.com.