Data Import Handler with Solr Source behind Load Balancer

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Data Import Handler with Solr Source behind Load Balancer

Zimmermann, Thomas
We have a Solr v7 Instance sourcing data from a Data Import Handler with a Solr data source running Solr v4. When it hits a single server in that instance directly, all documents are read and written correctly to the v7. When we hit the load balancer DNS entry, the resulting data import handler json states that it read all the documents and skipped none, and all looks fine, but the result set is missing ~20% of the documents in the v7 core. This has happened multiple time on multiple environments.

Any thoughts on whether this might be a bug in the underlying DIH code? I'll also pass it along to the server admins on our side for input.
Reply | Threaded
Open this post in threaded view
|

Re: Data Import Handler with Solr Source behind Load Balancer

Emir Arnautović
Hi Thomas,
Is this SolrCloud or Solr master-slave? Do you update index while indexing? Did you check if all your instances behind LB are in sync if you are using master-slave?
My guess would be that DIH is using cursors to read data from another Solr. If you are using multiple Solr instances behind LB there might be some diffs in index that results in different documents being returned for the same cursor mark. Is num doc and max doc the same on new instance after import?

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Sep 2018, at 05:53, Zimmermann, Thomas <[hidden email]> wrote:
>
> We have a Solr v7 Instance sourcing data from a Data Import Handler with a Solr data source running Solr v4. When it hits a single server in that instance directly, all documents are read and written correctly to the v7. When we hit the load balancer DNS entry, the resulting data import handler json states that it read all the documents and skipped none, and all looks fine, but the result set is missing ~20% of the documents in the v7 core. This has happened multiple time on multiple environments.
>
> Any thoughts on whether this might be a bug in the underlying DIH code? I'll also pass it along to the server admins on our side for input.