[jira] [Commented] (SOLR-11718) Deprecate CDCR Buffer APIs

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (SOLR-11718) Deprecate CDCR Buffer APIs

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-11718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323498#comment-16323498 ]

Amrit Sarkar commented on SOLR-11718:

Modified patch with Varun's recommendation: {{SOLR-11718-v3.patch}}. Improved documentation and tests.

There is one test method in {{CdcrReplicationHandlerTest}}::{{testReplicationWithBufferedUpdates}} which is failing at the moment as:

  [beaster] [00:04:50.322] FAILURE  353s | CdcrReplicationHandlerTest.testReplicationWithBufferedUpdates <<<
  [beaster]    > Throwable #1: java.lang.AssertionError: There are still nodes recoverying - waited for 330 seconds
  [beaster]    > at __randomizedtesting.SeedInfo.seed([25F2AEF0CD93CBA3:F6FBFEEE88005734]:0)
  [beaster]    > at org.junit.Assert.fail(Assert.java:93)
  [beaster]    > at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:185)
  [beaster]    > at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:140)
  [beaster]    > at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:135)
  [beaster]    > at org.apache.solr.cloud.cdcr.BaseCdcrDistributedZkTest.waitForRecoveriesToFinish(BaseCdcrDistributedZkTest.java:522)
  [beaster]    > at org.apache.solr.cloud.cdcr.BaseCdcrDistributedZkTest.restartServer(BaseCdcrDistributedZkTest.java:563)
  [beaster]    > at org.apache.solr.cloud.cdcr.CdcrReplicationHandlerTest.testReplicationWithBufferedUpdates(CdcrReplicationHandlerTest.java:228)

We test in this method that when leader is still receiving updates, follower if restarted will buffer the updates and then replay while recovering. In this scenario with buffering being disabled, the follower node is always on recovery and never becomes active as indexing never stops and follower is always behind X no of documents from leader. This is a typical situation where we wait for indexing to complete and then restart follower to fetch index from leader and become active.

I am still writing smart test for this according to current design, but seems like this scenario is no longer valid. Looking forward to thoughts and recommendation.

> Deprecate CDCR Buffer APIs
> --------------------------
>                 Key: SOLR-11718
>                 URL: https://issues.apache.org/jira/browse/SOLR-11718
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public)
>          Components: CDCR
>    Affects Versions: 7.1
>            Reporter: Amrit Sarkar
>             Fix For: master (8.0), 7.3
>         Attachments: SOLR-11718-v3.patch, SOLR-11718.patch, SOLR-11718.patch
> Kindly see the discussion on SOLR-11652.
> Today, if we see the current CDCR documentation page, buffering is "disabled" by default in both source and target. We don't see any purpose served by Cdcr buffering and it is quite an overhead considering it can take a lot heap space (tlogs ptr) and forever retention of tlogs on the disk when enabled. Also today, even if we disable buffer from API on source , considering it was enabled at startup, tlogs are never purged on leader node of shards of source, refer jira: SOLR-11652

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]