Delete by query in SOLR 6.3

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Delete by query in SOLR 6.3

RAKESH KOTE
Hi,   We are using SOLR 6.3 in cloud and we have created 2 collections in a single SOLR cluster consisting of 20 shards and 3 replicas each(overall 20X3 = 60 instances). The first collection has close to 2.5 billion records and the second collection has 350 million records. Both the collection uses the same instances which has 4 cores and 26 GB RAM (10 -12 GB assigned for Heap and 14 GB assigned for OS).The first collection's index size is close to 50GB and second collection index size is close to 5 GB in each of the instances. We are using the default solrconfig values and the autoCommit and softCommits are set to 5 minutes. The SOLR cluster is supported by 3 ZK.
We are able to reach 5000/s updates and we are using solrj to index the data to solr. We also delete the documents in each of the collection periodically using solrj  delete by query method(we use a non-id filed in delete query).(we are using java 1.8) The updates happens without much issues but when we try to delete, it is taking considerable amount of time(close to 20 sec on an average but some of them takes more than 4-5 mins) which slows down the whole application. We don't do an explicit commit after deletion and let the autoCommit take care of it for every 5 mins. Since we are not doing a commit we are wondering why the delete is taking more time comparing to updates which are very fast and finishes in less than 50ms - 100 ms. Could you please let us know the reason or how the deletes are different than the updates operation in SOLR.
with warm regards,RK.
Reply | Threaded
Open this post in threaded view
|

Re: Delete by query in SOLR 6.3

Emir Arnautović
Hi Rakesh,
Since Solr has to maintain eventual consistency of all replicas, it has to block updates while DBQ is running. Here is blog post with high level explaination of the issue: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>

You should do query and delete by ids in order to avoid issues caused by DBQ.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 15 Nov 2018, at 06:09, RAKESH KOTE <[hidden email]> wrote:
>
> Hi,   We are using SOLR 6.3 in cloud and we have created 2 collections in a single SOLR cluster consisting of 20 shards and 3 replicas each(overall 20X3 = 60 instances). The first collection has close to 2.5 billion records and the second collection has 350 million records. Both the collection uses the same instances which has 4 cores and 26 GB RAM (10 -12 GB assigned for Heap and 14 GB assigned for OS).The first collection's index size is close to 50GB and second collection index size is close to 5 GB in each of the instances. We are using the default solrconfig values and the autoCommit and softCommits are set to 5 minutes. The SOLR cluster is supported by 3 ZK.
> We are able to reach 5000/s updates and we are using solrj to index the data to solr. We also delete the documents in each of the collection periodically using solrj  delete by query method(we use a non-id filed in delete query).(we are using java 1.8) The updates happens without much issues but when we try to delete, it is taking considerable amount of time(close to 20 sec on an average but some of them takes more than 4-5 mins) which slows down the whole application. We don't do an explicit commit after deletion and let the autoCommit take care of it for every 5 mins. Since we are not doing a commit we are wondering why the delete is taking more time comparing to updates which are very fast and finishes in less than 50ms - 100 ms. Could you please let us know the reason or how the deletes are different than the updates operation in SOLR.
> with warm regards,RK.