[jira] [Comment Edited] (SOLR-10285) Skip LEADER messages when there are leader only shards

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Comment Edited] (SOLR-10285) Skip LEADER messages when there are leader only shards

Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187638#comment-16187638 ]

Cao Manh Dat edited comment on SOLR-10285 at 10/3/17 4:28 AM:
--------------------------------------------------------------

Hi [~jhump], your patch looks good to me. About your TODO notes, I did some search and found that
- ElectionContext is the only place use OverseerAction.Leader ( one for unset leader and one for set leader ).
- STATE_PROP used in the second case is replica's state, which even not used in {{SliceMutator.setShardLeader}}

So your concern about "mark the shard as inactive" is not correct, right?

The only problem that can occur between upgrade is
1. A replica ( repA ) is currently leader
2. The overseer is very busy
3. repA does unset leader operation ( which is delayed because overseer is very busy )
4. repA get stopped in middle of the election process ( so set leader operation never get executed )
5. repA start with the new code, then it saw it is the leader ( the unset operation in step 2 had not been executed ) so it skipped set leader operation.

I think that above case is very very very rare and even it happens ( it can be fixed with FORCE_LEADER API ), Sysadmins must handle overwhelming in the number of operations in Overseer first.




was (Author: caomanhdat):
Hi [~jhump], your patch looks good to me. About your TODO notes, I did some search and found that
- ElectionContext is the only place use OverseerAction.Leader ( one for unset leader and one for set leader ).
- STATE_PROP used in the second case is replica's state, which even not used in {{SliceMutator.setShardLeader}}

So your concern about "mark the shard as inactive" is not correct, right?

The only problem that can occur between upgrade is
1. A replica ( repA ) is currently leader
2. The overseer is very busy
3. repA does unset leader operation ( which is delayed because overseer is very busy )
4. repA get stopped in middle of the election process ( so set leader operation never get executed )
5. repA start with the new code, then it saw it is the leader ( the unset operation in step 2 had not been executed ) so it skipped set leader operation.

I think that above case is very very very rare and even it happens, Sysadmins must handle overwhelming in the number of operations in Overseer first.



> Skip LEADER messages when there are leader only shards
> ------------------------------------------------------
>
>                 Key: SOLR-10285
>                 URL: https://issues.apache.org/jira/browse/SOLR-10285
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public)
>            Reporter: Varun Thacker
>            Assignee: Cao Manh Dat
>         Attachments: SOLR-10285.patch, SOLR-10285.patch, SOLR-10285.patch
>
>
> For shards which have 1 replica ( leader ) we know it doesn't need to recover from anyone. We should short-circuit the recovery process in this case.
> The motivation for this being that we will generate less state events and be able to mark these replicas as active again without it needing to go into 'recovering' state.
> We already short circuit when you set {{-Dsolrcloud.skip.autorecovery=true}} but that sys prop was meant for tests only. Extending this to make sure the code short-circuits when the core knows its the only replica in the shard is the motivation of the Jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]