how to improve indexing using autocommit

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

how to improve indexing using autocommit

babloorawat
Hi,

We are performing solr indexing on a daily basis
full import: once in a day
delta import: after every 3 hours.
We have around 40000 docs for indexing.

Time taken to do full import indexing is around 1 hour 45 minutes and we
need to optimize it.
I am wondering if anyone helps me figure out how to make full use of
auto-commit.
Since during full import we are removing previously indexed documents we
have set the auto-commit property as
    <autoCommit>
       <maxTime>6300000</maxTime> .  //commit after full indexing
       <openSearcher>true</openSearcher>
     </autoCommit>

How can I make use of autocommit without removing previously indexed
documents till full indexing finishes or any alternative like that.

Regards
Babloo.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: how to improve indexing using autocommit

Erick Erickson
First of all, taking that long to index that few documents is surprising. My guess is that the data acquisition is the problem here, Solr is just idling along.

But the best way I know of is to alias. You have two collections, call them “query” and “index”, that originally point to collection1 and collection2 respectively.

Your customer-facing app uses “query”, and your indexing app uses “index”.

Now you do your full import to the “index” collection, which is aliased to collection2. When it’s done you point your “query” alias to collection2 and your “index” alias to collection1. Rinse. Repeat.

Alternatively, you include a batch number and then an “fq” clause that specifies the batch. You increment the batch number each run, and when it’s done your app has to change the fq clause to the most recently indexed batch. Then you can delete all the docs in the old batch. To make that work, your <uniqueKey> should include the batch number to keep things separate.

Another alternative is to not open new searchers. Your autocommit configuration has <openSearcher>false</openSearcher>. Be sure to set your autoSoftCommit to -1 in that case too. Then, at the end of your indexing run, to a hard commit externally (i.e. <a href="http://node:port/solr/collection/update?commit=true">http://node:port/solr/collection/update?commit=true).

I’d strongly recommend you reduce your autocommit interval. To support Real Time Get some in-memory structures are kept that aren’t flushed until a commit and openSearcher happen.


Best,
Erick

> On Oct 3, 2019, at 4:08 AM, babloorawat <[hidden email]> wrote:
>
> Hi,
>
> We are performing solr indexing on a daily basis
> full import: once in a day
> delta import: after every 3 hours.
> We have around 40000 docs for indexing.
>
> Time taken to do full import indexing is around 1 hour 45 minutes and we
> need to optimize it.
> I am wondering if anyone helps me figure out how to make full use of
> auto-commit.
> Since during full import we are removing previously indexed documents we
> have set the auto-commit property as
>    <autoCommit>
>       <maxTime>6300000</maxTime> .  //commit after full indexing
>       <openSearcher>true</openSearcher>
>     </autoCommit>
>
> How can I make use of autocommit without removing previously indexed
> documents till full indexing finishes or any alternative like that.
>
> Regards
> Babloo.
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html