Full index replication upon service restart

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Full index replication upon service restart

Rahul Goswami
Hello Solr gurus,

So I have a scenario where on Solr cluster restart the replica node goes
into full index replication for about 7 hours. Both replica nodes are
restarted around the same time for maintenance. Also, during usual times,
if one node goes down for whatever reason, upon restart it again does index
replication. In certain instances, some replicas just fail to recover.

*SolrCloud 7.2.1 *cluster configuration*:*
============================
16 shards - replication factor=2

Per server configuration:
======================
32GB machine - 16GB heap space for Solr
Index size : 3TB per server

autoCommit (openSearcher=false) of 3 minutes

We have a heavy indexing load of about 10,000 documents every 150 seconds.
Not so heavy query load.

Reading through some of the threads on similar topic, I suspect it would be
the disparity between the number of updates(>100) between the replicas that
is causing this (courtesy our indexing load). One of the suggestions I saw
was using numRecordsToKeep.
However as Erick mentioned in one of the threads, that's a bandaid measure
and I am trying to eliminate some of the fundamental issues that might
exist.

1) Is the heap too less for that index size? If yes, what would be a
recommended max heap size?
2) Is there a general guideline to estimate the required max heap based on
index size on disk?
3) What would be a recommended autoCommit and autoSoftCommit interval ?
4) Any configurations that would help improve the restart time and avoid
full replication?
5) Does Solr retain "numRecordsToKeep" number of  documents in tlog *per
replica*?
6) The reasons for peersync from below logs are not completely clear to me.
Can someone please elaborate?

*PeerSync fails with* :

Failure type 1:
-----------------
2019-02-04 20:43:50.018 INFO
(recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
[c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
org.apache.solr.update.PeerSync Fingerprint comparison: 1

2019-02-04 20:43:50.018 INFO
(recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
[c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
org.apache.solr.update.PeerSync Other fingerprint:
{maxVersionSpecified=1624579878580912128,
maxVersionEncountered=1624579893816721408, maxInHash=1624579878580912128,
versionsHash=-8308981502886241345, numVersions=32966082, numDocs=32966165,
maxDoc=1828452}, Our fingerprint: {maxVersionSpecified=1624579878580912128,
maxVersionEncountered=1624579975760838656, maxInHash=1624579878580912128,
versionsHash=4017509388564167234, numVersions=32966066, numDocs=32966165,
maxDoc=1828452}

2019-02-04 20:43:50.018 INFO
(recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
[c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
org.apache.solr.update.PeerSync PeerSync:
core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42 url=
http://indexnode1:8983/solr DONE. sync failed

2019-02-04 20:43:50.018 INFO
(recoveryExecutor-4-thread-2-processing-n:indexnode1:8983_solr
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
[c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
org.apache.solr.cloud.RecoveryStrategy PeerSync Recovery was not successful
- trying replication.


Failure type 2:
------------------
2019-02-02 20:26:56.256 WARN
(recoveryExecutor-4-thread-11-processing-n:indexnode1:20000_solr
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46
s:shard12 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node49)
[c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard12 r:core_node49
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46]
org.apache.solr.update.PeerSync PeerSync:
core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46 url=
http://indexnode1:20000/solr too many updates received since start -
startingUpdates no longer overlaps with our currentUpdates


Thanks,
Rahul
Reply | Threaded
Open this post in threaded view
|

Re: Full index replication upon service restart

Erick Erickson
bq. We have a heavy indexing load of about 10,000 documents every 150 seconds.
Not so heavy query load.

It's unlikely that changing numRecordsToKeep will help all that much if your
maintenance window is very large. Rather, that number would have to be _very_
high.

7 hours is huge. How big are your indexes on disk? You're essentially
going to get a
full copy from the leader for each replica, so network bandwidth may
be the bottleneck.
Plus, every doc that gets indexed to the leader during sync will be stored
away in the replica's tlog (not limited by numRecordsToKeep) and replayed after
the full index replication is accomplished.

Much of the retry logic for replication has been improved starting
with Solr 7.3 and,
in particular, Solr 7.5. That might address your replicas that just
fail to replicate ever,
but won't help that replicas need to full sync anyway.

That said, by far the simplest thing would be to stop indexing during
your maintenance
window if at all possible.

Best,
Erick

On Tue, Feb 5, 2019 at 9:11 PM Rahul Goswami <[hidden email]> wrote:

>
> Hello Solr gurus,
>
> So I have a scenario where on Solr cluster restart the replica node goes
> into full index replication for about 7 hours. Both replica nodes are
> restarted around the same time for maintenance. Also, during usual times,
> if one node goes down for whatever reason, upon restart it again does index
> replication. In certain instances, some replicas just fail to recover.
>
> *SolrCloud 7.2.1 *cluster configuration*:*
> ============================
> 16 shards - replication factor=2
>
> Per server configuration:
> ======================
> 32GB machine - 16GB heap space for Solr
> Index size : 3TB per server
>
> autoCommit (openSearcher=false) of 3 minutes
>
> We have a heavy indexing load of about 10,000 documents every 150 seconds.
> Not so heavy query load.
>
> Reading through some of the threads on similar topic, I suspect it would be
> the disparity between the number of updates(>100) between the replicas that
> is causing this (courtesy our indexing load). One of the suggestions I saw
> was using numRecordsToKeep.
> However as Erick mentioned in one of the threads, that's a bandaid measure
> and I am trying to eliminate some of the fundamental issues that might
> exist.
>
> 1) Is the heap too less for that index size? If yes, what would be a
> recommended max heap size?
> 2) Is there a general guideline to estimate the required max heap based on
> index size on disk?
> 3) What would be a recommended autoCommit and autoSoftCommit interval ?
> 4) Any configurations that would help improve the restart time and avoid
> full replication?
> 5) Does Solr retain "numRecordsToKeep" number of  documents in tlog *per
> replica*?
> 6) The reasons for peersync from below logs are not completely clear to me.
> Can someone please elaborate?
>
> *PeerSync fails with* :
>
> Failure type 1:
> -----------------
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync Fingerprint comparison: 1
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync Other fingerprint:
> {maxVersionSpecified=1624579878580912128,
> maxVersionEncountered=1624579893816721408, maxInHash=1624579878580912128,
> versionsHash=-8308981502886241345, numVersions=32966082, numDocs=32966165,
> maxDoc=1828452}, Our fingerprint: {maxVersionSpecified=1624579878580912128,
> maxVersionEncountered=1624579975760838656, maxInHash=1624579878580912128,
> versionsHash=4017509388564167234, numVersions=32966066, numDocs=32966165,
> maxDoc=1828452}
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync PeerSync:
> core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42 url=
> http://indexnode1:8983/solr DONE. sync failed
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:8983_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.cloud.RecoveryStrategy PeerSync Recovery was not successful
> - trying replication.
>
>
> Failure type 2:
> ------------------
> 2019-02-02 20:26:56.256 WARN
> (recoveryExecutor-4-thread-11-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46
> s:shard12 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node49)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard12 r:core_node49
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46]
> org.apache.solr.update.PeerSync PeerSync:
> core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46 url=
> http://indexnode1:20000/solr too many updates received since start -
> startingUpdates no longer overlaps with our currentUpdates
>
>
> Thanks,
> Rahul
Reply | Threaded
Open this post in threaded view
|

Re: Full index replication upon service restart

Rahul Goswami
Thanks for the response Eric. To answer your question about index size on
disk, it is 3 TB on every node. As mentioned it's a 32 GB machine and I
allocated 24GB to Java heap.

Further monitoring the recovery, I see that when the follower node is
recovering, the leader node (which is NOT recovering) almost freezes with
100% CPU usage and 80%+ memory usage. Follower node's memory usage is 80%+
but CPU is very healthy. Also Follower node's log is filled up with updates
forwarded from the leader ("...PRE_UPDATE FINISH
{update.distrib=FROMLEADER&distrib.from=...") and replication starts much
afterwards.
There have been instances when complete recovery took 10+ hours. We have
upgraded to a 4 Gbps NIC between the nodes to see if it helps.

Also, a few followup questions:

1) Is  there a configuration which would start throttling update requests
if the replica falls behind a certain number of updates so as to not
trigger an index replication later?  If not, would it be a worthy
enhancement?
2) What would be a recommended hard commit interval for this kind of setup
?
3) What are some of the improvements in 7.5 with respect to recovery as
compared to 7.2.1?
4) What do the below peersync failure logs lines mean?  This would help me
better understand the reasons for peersync failure and maybe devise some
alert mechanism to start throttling update requests from application
program if feasible.

*PeerSync Failure type 1*:
----------------------------------
2019-02-04 20:43:50.018 INFO
(recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
[c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
org.apache.solr.update.PeerSync Fingerprint comparison: 1

2019-02-04 20:43:50.018 INFO
(recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
[c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
org.apache.solr.update.PeerSync Other fingerprint:
{maxVersionSpecified=1624579878580912128,
maxVersionEncountered=1624579893816721408, maxInHash=1624579878580912128,
versionsHash=-8308981502886241345, numVersions=32966082, numDocs=32966165,
maxDoc=1828452}, Our fingerprint: {maxVersionSpecified=1624579878580912128,
maxVersionEncountered=1624579975760838656, maxInHash=1624579878580912128,
versionsHash=4017509388564167234, numVersions=32966066, numDocs=32966165,
maxDoc=1828452}

2019-02-04 20:43:50.018 INFO
(recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
[c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
org.apache.solr.update.PeerSync PeerSync:
core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42 url=
http://indexnode1:8983/solr DONE. sync failed

2019-02-04 20:43:50.018 INFO
(recoveryExecutor-4-thread-2-processing-n:indexnode1:8983_solr
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
[c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
org.apache.solr.cloud.RecoveryStrategy PeerSync Recovery was not successful
- trying replication.


*PeerSync Failure type 1*:
---------------------------------
2019-02-02 20:26:56.256 WARN
(recoveryExecutor-4-thread-11-processing-n:indexnode1:20000_solr
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46
s:shard12 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node49)
[c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard12 r:core_node49
x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46]
org.apache.solr.update.PeerSync PeerSync:
core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46 url=
http://indexnode1:20000/solr too many updates received since start -
startingUpdates no longer overlaps with our currentUpdates


Regards,
Rahul

On Thu, Feb 7, 2019 at 12:59 PM Erick Erickson <[hidden email]>
wrote:

> bq. We have a heavy indexing load of about 10,000 documents every 150
> seconds.
> Not so heavy query load.
>
> It's unlikely that changing numRecordsToKeep will help all that much if
> your
> maintenance window is very large. Rather, that number would have to be
> _very_
> high.
>
> 7 hours is huge. How big are your indexes on disk? You're essentially
> going to get a
> full copy from the leader for each replica, so network bandwidth may
> be the bottleneck.
> Plus, every doc that gets indexed to the leader during sync will be stored
> away in the replica's tlog (not limited by numRecordsToKeep) and replayed
> after
> the full index replication is accomplished.
>
> Much of the retry logic for replication has been improved starting
> with Solr 7.3 and,
> in particular, Solr 7.5. That might address your replicas that just
> fail to replicate ever,
> but won't help that replicas need to full sync anyway.
>
> That said, by far the simplest thing would be to stop indexing during
> your maintenance
> window if at all possible.
>
> Best,
> Erick
>
> On Tue, Feb 5, 2019 at 9:11 PM Rahul Goswami <[hidden email]>
> wrote:
> >
> > Hello Solr gurus,
> >
> > So I have a scenario where on Solr cluster restart the replica node goes
> > into full index replication for about 7 hours. Both replica nodes are
> > restarted around the same time for maintenance. Also, during usual times,
> > if one node goes down for whatever reason, upon restart it again does
> index
> > replication. In certain instances, some replicas just fail to recover.
> >
> > *SolrCloud 7.2.1 *cluster configuration*:*
> > ============================
> > 16 shards - replication factor=2
> >
> > Per server configuration:
> > ======================
> > 32GB machine - 16GB heap space for Solr
> > Index size : 3TB per server
> >
> > autoCommit (openSearcher=false) of 3 minutes
> >
> > We have a heavy indexing load of about 10,000 documents every 150
> seconds.
> > Not so heavy query load.
> >
> > Reading through some of the threads on similar topic, I suspect it would
> be
> > the disparity between the number of updates(>100) between the replicas
> that
> > is causing this (courtesy our indexing load). One of the suggestions I
> saw
> > was using numRecordsToKeep.
> > However as Erick mentioned in one of the threads, that's a bandaid
> measure
> > and I am trying to eliminate some of the fundamental issues that might
> > exist.
> >
> > 1) Is the heap too less for that index size? If yes, what would be a
> > recommended max heap size?
> > 2) Is there a general guideline to estimate the required max heap based
> on
> > index size on disk?
> > 3) What would be a recommended autoCommit and autoSoftCommit interval ?
> > 4) Any configurations that would help improve the restart time and avoid
> > full replication?
> > 5) Does Solr retain "numRecordsToKeep" number of  documents in tlog *per
> > replica*?
> > 6) The reasons for peersync from below logs are not completely clear to
> me.
> > Can someone please elaborate?
> >
> > *PeerSync fails with* :
> >
> > Failure type 1:
> > -----------------
> > 2019-02-04 20:43:50.018 INFO
> > (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > org.apache.solr.update.PeerSync Fingerprint comparison: 1
> >
> > 2019-02-04 20:43:50.018 INFO
> > (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > org.apache.solr.update.PeerSync Other fingerprint:
> > {maxVersionSpecified=1624579878580912128,
> > maxVersionEncountered=1624579893816721408, maxInHash=1624579878580912128,
> > versionsHash=-8308981502886241345, numVersions=32966082,
> numDocs=32966165,
> > maxDoc=1828452}, Our fingerprint:
> {maxVersionSpecified=1624579878580912128,
> > maxVersionEncountered=1624579975760838656, maxInHash=1624579878580912128,
> > versionsHash=4017509388564167234, numVersions=32966066, numDocs=32966165,
> > maxDoc=1828452}
> >
> > 2019-02-04 20:43:50.018 INFO
> > (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > org.apache.solr.update.PeerSync PeerSync:
> > core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> url=
> > http://indexnode1:8983/solr DONE. sync failed
> >
> > 2019-02-04 20:43:50.018 INFO
> > (recoveryExecutor-4-thread-2-processing-n:indexnode1:8983_solr
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > org.apache.solr.cloud.RecoveryStrategy PeerSync Recovery was not
> successful
> > - trying replication.
> >
> >
> > Failure type 2:
> > ------------------
> > 2019-02-02 20:26:56.256 WARN
> > (recoveryExecutor-4-thread-11-processing-n:indexnode1:20000_solr
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46
> > s:shard12 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node49)
> > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard12 r:core_node49
> > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46]
> > org.apache.solr.update.PeerSync PeerSync:
> > core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46
> url=
> > http://indexnode1:20000/solr too many updates received since start -
> > startingUpdates no longer overlaps with our currentUpdates
> >
> >
> > Thanks,
> > Rahul
>
Reply | Threaded
Open this post in threaded view
|

Re: Full index replication upon service restart

Erick Erickson
bq. To answer your question about index size on
disk, it is 3 TB on every node. As mentioned it's a 32 GB machine and I
allocated 24GB to Java heap.

This is massively undersized in terms of RAM in my experience. You're
trying to cram 3TB of index into 32GB of memory. Frankly, I don't think
there's much you can do to increase stability in this situation, too many
things are going on. In particular, you're indexing during node restart.

That means that
1> you'll almost inevitably get a full sync on start given your update
     rate.
2> while you're doing the full sync, all new updates are sent to the
      recovering replica and put in the tlog.
3> When the initial replication is done, the documents sent to the
     tlog while recovering are indexed. This is 7 hours of accumulated
     updates.
4> If much goes wrong in this situation, then you're talking another full
     sync.
5> rinse, repeat.

There are no magic tweaks here. You really have to rethink your
architecture. I'm actually surprised that your queries are performant.
I expect you're getting a _lot_ of I/O, that is the relevant parts of your
index are swapping in and out of the OS memory space. A _lot_.
Or you're only using a _very_ small bit of your index.

Sorry to be so negative, but this is not a situation that's amenable to
a quick fix.

Best,
Erick




On Mon, Feb 11, 2019 at 4:10 PM Rahul Goswami <[hidden email]> wrote:

>
> Thanks for the response Eric. To answer your question about index size on
> disk, it is 3 TB on every node. As mentioned it's a 32 GB machine and I
> allocated 24GB to Java heap.
>
> Further monitoring the recovery, I see that when the follower node is
> recovering, the leader node (which is NOT recovering) almost freezes with
> 100% CPU usage and 80%+ memory usage. Follower node's memory usage is 80%+
> but CPU is very healthy. Also Follower node's log is filled up with updates
> forwarded from the leader ("...PRE_UPDATE FINISH
> {update.distrib=FROMLEADER&distrib.from=...") and replication starts much
> afterwards.
> There have been instances when complete recovery took 10+ hours. We have
> upgraded to a 4 Gbps NIC between the nodes to see if it helps.
>
> Also, a few followup questions:
>
> 1) Is  there a configuration which would start throttling update requests
> if the replica falls behind a certain number of updates so as to not
> trigger an index replication later?  If not, would it be a worthy
> enhancement?
> 2) What would be a recommended hard commit interval for this kind of setup
> ?
> 3) What are some of the improvements in 7.5 with respect to recovery as
> compared to 7.2.1?
> 4) What do the below peersync failure logs lines mean?  This would help me
> better understand the reasons for peersync failure and maybe devise some
> alert mechanism to start throttling update requests from application
> program if feasible.
>
> *PeerSync Failure type 1*:
> ----------------------------------
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync Fingerprint comparison: 1
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync Other fingerprint:
> {maxVersionSpecified=1624579878580912128,
> maxVersionEncountered=1624579893816721408, maxInHash=1624579878580912128,
> versionsHash=-8308981502886241345, numVersions=32966082, numDocs=32966165,
> maxDoc=1828452}, Our fingerprint: {maxVersionSpecified=1624579878580912128,
> maxVersionEncountered=1624579975760838656, maxInHash=1624579878580912128,
> versionsHash=4017509388564167234, numVersions=32966066, numDocs=32966165,
> maxDoc=1828452}
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync PeerSync:
> core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42 url=
> http://indexnode1:8983/solr DONE. sync failed
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:8983_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.cloud.RecoveryStrategy PeerSync Recovery was not successful
> - trying replication.
>
>
> *PeerSync Failure type 1*:
> ---------------------------------
> 2019-02-02 20:26:56.256 WARN
> (recoveryExecutor-4-thread-11-processing-n:indexnode1:20000_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46
> s:shard12 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node49)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard12 r:core_node49
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46]
> org.apache.solr.update.PeerSync PeerSync:
> core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46 url=
> http://indexnode1:20000/solr too many updates received since start -
> startingUpdates no longer overlaps with our currentUpdates
>
>
> Regards,
> Rahul
>
> On Thu, Feb 7, 2019 at 12:59 PM Erick Erickson <[hidden email]>
> wrote:
>
> > bq. We have a heavy indexing load of about 10,000 documents every 150
> > seconds.
> > Not so heavy query load.
> >
> > It's unlikely that changing numRecordsToKeep will help all that much if
> > your
> > maintenance window is very large. Rather, that number would have to be
> > _very_
> > high.
> >
> > 7 hours is huge. How big are your indexes on disk? You're essentially
> > going to get a
> > full copy from the leader for each replica, so network bandwidth may
> > be the bottleneck.
> > Plus, every doc that gets indexed to the leader during sync will be stored
> > away in the replica's tlog (not limited by numRecordsToKeep) and replayed
> > after
> > the full index replication is accomplished.
> >
> > Much of the retry logic for replication has been improved starting
> > with Solr 7.3 and,
> > in particular, Solr 7.5. That might address your replicas that just
> > fail to replicate ever,
> > but won't help that replicas need to full sync anyway.
> >
> > That said, by far the simplest thing would be to stop indexing during
> > your maintenance
> > window if at all possible.
> >
> > Best,
> > Erick
> >
> > On Tue, Feb 5, 2019 at 9:11 PM Rahul Goswami <[hidden email]>
> > wrote:
> > >
> > > Hello Solr gurus,
> > >
> > > So I have a scenario where on Solr cluster restart the replica node goes
> > > into full index replication for about 7 hours. Both replica nodes are
> > > restarted around the same time for maintenance. Also, during usual times,
> > > if one node goes down for whatever reason, upon restart it again does
> > index
> > > replication. In certain instances, some replicas just fail to recover.
> > >
> > > *SolrCloud 7.2.1 *cluster configuration*:*
> > > ============================
> > > 16 shards - replication factor=2
> > >
> > > Per server configuration:
> > > ======================
> > > 32GB machine - 16GB heap space for Solr
> > > Index size : 3TB per server
> > >
> > > autoCommit (openSearcher=false) of 3 minutes
> > >
> > > We have a heavy indexing load of about 10,000 documents every 150
> > seconds.
> > > Not so heavy query load.
> > >
> > > Reading through some of the threads on similar topic, I suspect it would
> > be
> > > the disparity between the number of updates(>100) between the replicas
> > that
> > > is causing this (courtesy our indexing load). One of the suggestions I
> > saw
> > > was using numRecordsToKeep.
> > > However as Erick mentioned in one of the threads, that's a bandaid
> > measure
> > > and I am trying to eliminate some of the fundamental issues that might
> > > exist.
> > >
> > > 1) Is the heap too less for that index size? If yes, what would be a
> > > recommended max heap size?
> > > 2) Is there a general guideline to estimate the required max heap based
> > on
> > > index size on disk?
> > > 3) What would be a recommended autoCommit and autoSoftCommit interval ?
> > > 4) Any configurations that would help improve the restart time and avoid
> > > full replication?
> > > 5) Does Solr retain "numRecordsToKeep" number of  documents in tlog *per
> > > replica*?
> > > 6) The reasons for peersync from below logs are not completely clear to
> > me.
> > > Can someone please elaborate?
> > >
> > > *PeerSync fails with* :
> > >
> > > Failure type 1:
> > > -----------------
> > > 2019-02-04 20:43:50.018 INFO
> > > (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > > org.apache.solr.update.PeerSync Fingerprint comparison: 1
> > >
> > > 2019-02-04 20:43:50.018 INFO
> > > (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > > org.apache.solr.update.PeerSync Other fingerprint:
> > > {maxVersionSpecified=1624579878580912128,
> > > maxVersionEncountered=1624579893816721408, maxInHash=1624579878580912128,
> > > versionsHash=-8308981502886241345, numVersions=32966082,
> > numDocs=32966165,
> > > maxDoc=1828452}, Our fingerprint:
> > {maxVersionSpecified=1624579878580912128,
> > > maxVersionEncountered=1624579975760838656, maxInHash=1624579878580912128,
> > > versionsHash=4017509388564167234, numVersions=32966066, numDocs=32966165,
> > > maxDoc=1828452}
> > >
> > > 2019-02-04 20:43:50.018 INFO
> > > (recoveryExecutor-4-thread-2-processing-n:indexnode1:20000_solr
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > > org.apache.solr.update.PeerSync PeerSync:
> > > core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > url=
> > > http://indexnode1:8983/solr DONE. sync failed
> > >
> > > 2019-02-04 20:43:50.018 INFO
> > > (recoveryExecutor-4-thread-2-processing-n:indexnode1:8983_solr
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> > > s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> > > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> > > org.apache.solr.cloud.RecoveryStrategy PeerSync Recovery was not
> > successful
> > > - trying replication.
> > >
> > >
> > > Failure type 2:
> > > ------------------
> > > 2019-02-02 20:26:56.256 WARN
> > > (recoveryExecutor-4-thread-11-processing-n:indexnode1:20000_solr
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46
> > > s:shard12 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node49)
> > > [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard12 r:core_node49
> > > x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46]
> > > org.apache.solr.update.PeerSync PeerSync:
> > > core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard12_replica_n46
> > url=
> > > http://indexnode1:20000/solr too many updates received since start -
> > > startingUpdates no longer overlaps with our currentUpdates
> > >
> > >
> > > Thanks,
> > > Rahul
> >