[jira] [Updated] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

[jira] [Updated] (SOLR-11069) LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erick Erickson updated SOLR-11069:
    Attachment: SOLR-11069.patch

figuring out the LPV issue is hard because bootstrapping had a problem. At the end of the process, the core is reloaded. However, that means that the code that checks on the state of the replication returns a "notfound", which causes another bootstrap command to be sent.

So this patch moves the relevant objects to (Default)SolrCoreState where they're preserved around core reloads. With this patch (PoC) I can get bootstrapping to occur, enable/disable buffering, bring the target up and down etc. The fact that LPV is -1 when buffering is enabled doesn't seem to be a problem.

So if others can give this a whirl and see if their testing is OK with it then maybe the LPV issue is not an issue.

Mostly I'm throwing this out for others to consider. What do people think about putting the additional objects in SolrCoreState? Putting the objects there was quick, I'm interested in seeing if my results work for others. If so we can decide whether this is the right way to go.

Haven't run precommit, haven't run the full test suite. Did run CdcrBootstrapTest. Also, the CDCR docs need to be updated.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -----------------------------------------------------------------
>                 Key: SOLR-11069
>                 URL: https://issues.apache.org/jira/browse/SOLR-11069
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public)
>          Components: CDCR
>    Affects Versions: 7.0
>            Reporter: Amrit Sarkar
>            Assignee: Erick Erickson
>         Attachments: SOLR-11069.patch
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to poorly initialised and maintained buffer log for either source or target cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* node of each shard of respective collection of respective cluster. Once disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work properly as expected, i.e. provides incorrect seek to the {{non-leader}} nodes to advance at. I am not sure whether this is an intended behavior for sync but it surely doesn't feel right.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]