SOLR Cloud - Full index replication

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

SOLR Cloud - Full index replication

Doss
we are using 3 node solr (64GB ram/8cpu/12GB heap)cloud setup with version
7.X. we have 3 indexes/collection on each node. index size were about
250GB. NRT with 5sec soft /10min hard commit. Sometimes in any one node we
are seeing full index replication started running..  is there any
configuration which forces solr to replicate full , like 100/200 updates
difference if a node sees with the leader ? - Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: SOLR Cloud - Full index replication

Erick Erickson
No. There's a "peer sync" that will try to update from the leader's
transaction log if (and only if) the replica has fallen behind. By
"fallen behind" I mean it was unable to accept any updates for
some period of time. The default peer sync size is 100 docs,
you can make it larger see numRecordsToKeep here:
http://lucene.apache.org/solr/guide/7_6/updatehandlers-in-solrconfig.html

Some observations though:
12G heap for 250G of index on disk _may_ work, but I'd be looking at
the GC characteristics, particularly stop-the-world pauses.

Your hard commit interval looks too long. I'd shorten it to < 1 minute
with openSearcher=false. See:
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

I'd concentrate on _why_ the replica goes into recovery in the first
place. You say you're on 7x, which one? Starting in 7.3 the recovery
logic was pretty thoroughly reworked, so _which_ 7x version is
important to know.

The Solr logs should give you some idea of _why_ the replica
goes into recovery, concentrate on the replica that goes into
recovery and the corresponding leader's log.

Best,
Erick

On Sat, Dec 29, 2018 at 6:23 PM Doss <[hidden email]> wrote:
>
> we are using 3 node solr (64GB ram/8cpu/12GB heap)cloud setup with version
> 7.X. we have 3 indexes/collection on each node. index size were about
> 250GB. NRT with 5sec soft /10min hard commit. Sometimes in any one node we
> are seeing full index replication started running..  is there any
> configuration which forces solr to replicate full , like 100/200 updates
> difference if a node sees with the leader ? - Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: SOLR Cloud - Full index replication

Doss
Thanks Erick!

We are using SOLR version 7.0.1.

is there any disadvantages if we increase  peer sync size to 1000 ?

We have analysed the GC logs but we have not seen long GC pauses so far.

We tried to find the reason for the full sync, but noting more informative,
but we have seen too many logs which reads "No registered leader was found
after waiting for 4000ms" followed by this full index.

Thanks,
Doss.


On Sun, Dec 30, 2018 at 8:49 AM Erick Erickson <[hidden email]>
wrote:

> No. There's a "peer sync" that will try to update from the leader's
> transaction log if (and only if) the replica has fallen behind. By
> "fallen behind" I mean it was unable to accept any updates for
> some period of time. The default peer sync size is 100 docs,
> you can make it larger see numRecordsToKeep here:
> http://lucene.apache.org/solr/guide/7_6/updatehandlers-in-solrconfig.html
>
> Some observations though:
> 12G heap for 250G of index on disk _may_ work, but I'd be looking at
> the GC characteristics, particularly stop-the-world pauses.
>
> Your hard commit interval looks too long. I'd shorten it to < 1 minute
> with openSearcher=false. See:
>
> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> I'd concentrate on _why_ the replica goes into recovery in the first
> place. You say you're on 7x, which one? Starting in 7.3 the recovery
> logic was pretty thoroughly reworked, so _which_ 7x version is
> important to know.
>
> The Solr logs should give you some idea of _why_ the replica
> goes into recovery, concentrate on the replica that goes into
> recovery and the corresponding leader's log.
>
> Best,
> Erick
>
> On Sat, Dec 29, 2018 at 6:23 PM Doss <[hidden email]> wrote:
> >
> > we are using 3 node solr (64GB ram/8cpu/12GB heap)cloud setup with
> version
> > 7.X. we have 3 indexes/collection on each node. index size were about
> > 250GB. NRT with 5sec soft /10min hard commit. Sometimes in any one node
> we
> > are seeing full index replication started running..  is there any
> > configuration which forces solr to replicate full , like 100/200 updates
> > difference if a node sees with the leader ? - Thanks.
>
Reply | Threaded
Open this post in threaded view
|

Re: SOLR Cloud - Full index replication

Erick Erickson
No particular downside to increasing numRecordsToKeep except
there is some additional disk space required and a bit of
bookkeeping.

Frankly, though, that's a bandaid at best. There should be more
information in the logs about _why_ they go into recovery.

If you're indexing while nodes are down that would certainly
explain it. But it nodes are going into recovery when everything
is up and running, there should be _some_ messages in the
logs as to why.

Best,
Erick

On Sun, Dec 30, 2018 at 9:42 PM Doss <[hidden email]> wrote:

>
> Thanks Erick!
>
> We are using SOLR version 7.0.1.
>
> is there any disadvantages if we increase  peer sync size to 1000 ?
>
> We have analysed the GC logs but we have not seen long GC pauses so far.
>
> We tried to find the reason for the full sync, but noting more informative,
> but we have seen too many logs which reads "No registered leader was found
> after waiting for 4000ms" followed by this full index.
>
> Thanks,
> Doss.
>
>
> On Sun, Dec 30, 2018 at 8:49 AM Erick Erickson <[hidden email]>
> wrote:
>
> > No. There's a "peer sync" that will try to update from the leader's
> > transaction log if (and only if) the replica has fallen behind. By
> > "fallen behind" I mean it was unable to accept any updates for
> > some period of time. The default peer sync size is 100 docs,
> > you can make it larger see numRecordsToKeep here:
> > http://lucene.apache.org/solr/guide/7_6/updatehandlers-in-solrconfig.html
> >
> > Some observations though:
> > 12G heap for 250G of index on disk _may_ work, but I'd be looking at
> > the GC characteristics, particularly stop-the-world pauses.
> >
> > Your hard commit interval looks too long. I'd shorten it to < 1 minute
> > with openSearcher=false. See:
> >
> > https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> > I'd concentrate on _why_ the replica goes into recovery in the first
> > place. You say you're on 7x, which one? Starting in 7.3 the recovery
> > logic was pretty thoroughly reworked, so _which_ 7x version is
> > important to know.
> >
> > The Solr logs should give you some idea of _why_ the replica
> > goes into recovery, concentrate on the replica that goes into
> > recovery and the corresponding leader's log.
> >
> > Best,
> > Erick
> >
> > On Sat, Dec 29, 2018 at 6:23 PM Doss <[hidden email]> wrote:
> > >
> > > we are using 3 node solr (64GB ram/8cpu/12GB heap)cloud setup with
> > version
> > > 7.X. we have 3 indexes/collection on each node. index size were about
> > > 250GB. NRT with 5sec soft /10min hard commit. Sometimes in any one node
> > we
> > > are seeing full index replication started running..  is there any
> > > configuration which forces solr to replicate full , like 100/200 updates
> > > difference if a node sees with the leader ? - Thanks.
> >