[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[jira] [Commented] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124804#comment-16124804 ]

Erick Erickson commented on SOLR-11069:
---------------------------------------

I'm dithering back and forth about this. I suspect that we're conflating a couple of issues. There's definitely a problem with bootstrapping (I'll attach a patch in a minute). It may well be that the LASTPROCESSEDVERSION is not actually a problem, at least in some testing (with the attached patch) the fact that it is -1 when buffering is enabled seems to be OK.

I propose we use the patch as a starting point to see if this LASTPROCESSEDVERSION is a problem or not.

1> when buffering is enabled, tlogs will accrue forever according to the original intent. From Renaud:

The original goal of the buffer on cdcr is to indeed keep indefinitely the tlogs until the buffer is deactivated (https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462#CrossDataCenterReplication(CDCR)-TheBufferElement). This was useful for example during maintenance operations, to ensure that the source cluster will keep all the tlogs until the target clsuter is properly initialised. In this scenario, one will activate the buffer on the source. The source will start to store all the tlogs (and does not purge them). Once the target cluster is initialised, and has register a tlog pointer on the source, one can deactivate the buffer on the source and the tlog will start to be purged once they are read by the target cluster.

But additionally he had this to say:
Regarding the issue about LPV = -1, I am a bit surprised as this sentinel value should be used only when the source cluster does not have any log pointers, i.e., no target cluster were configured and initialised with this source cluster. In this case it indicates that there is no registered log reader, and that we should not remove any tlogs if buffer is enabled (as we have to wait for the target to register a log reader and log pointer).

And enabling buffering definitely causes LASTPROCESSEDVERSION to return -1. However, with the patch LPV immediately goes back to a reasonable value as soon as buffering is disabled, the tlogs get cleaned up etc. without bootstrapping. So I do wonder if the -1 value is just overloaded in this case to also mean "don't purge tlogs".

We need to unentangle a couple of things. I'll attach a patch in a few minutes that might help.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -----------------------------------------------------------------
>
>                 Key: SOLR-11069
>                 URL: https://issues.apache.org/jira/browse/SOLR-11069
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public)
>          Components: CDCR
>    Affects Versions: 7.0
>            Reporter: Amrit Sarkar
>            Assignee: Erick Erickson
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to poorly initialised and maintained buffer log for either source or target cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* node of each shard of respective collection of respective cluster. Once disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work properly as expected, i.e. provides incorrect seek to the {{non-leader}} nodes to advance at. I am not sure whether this is an intended behavior for sync but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...