[jira] [Commented] (SOLR-13163) 'searchRate' trigger: belowNodeOp=DELETENODE can result in loss of leader

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-13163) 'searchRate' trigger: belowNodeOp=DELETENODE can result in loss of leader

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749034#comment-16749034 ]

ASF subversion and git services commented on SOLR-13163:
--------------------------------------------------------

Commit 6882f43b96c6dc0fec7a1677d6687fa83f5a1669 in lucene-solr's branch refs/heads/branch_7x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6882f43 ]

SOLR-13140: harden SearchRateTriggerIntegrationTest by using more absolute rate thresholds and latches to track when all events have been processed so we don't need to 'guess' about sleep calls

This commit also disables testDeleteNode pending an AwaitsFix on SOLR-13163

(cherry picked from commit 15e5ca999ff7e912653db897781b21642d5307f0)


> 'searchRate' trigger: belowNodeOp=DELETENODE can result in loss of leader
> -------------------------------------------------------------------------
>
>                 Key: SOLR-13163
>                 URL: https://issues.apache.org/jira/browse/SOLR-13163
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public)
>            Reporter: Hoss Man
>            Priority: Major
>
> While working on SOLR-13140 I discovered that configuring a very high belowNodeRate in {{SearchRateTriggerIntegrationTest.testDeleteNode}} can cause all nodes -- even the node hosting the shard leader -- to be the target of DELETENODE ops.
> this indicates at least one serious bug in the code (we should never allow the leader to be deleted), but also raises other questions about situations not adequately tested:
> * even if the code isn't particularly protecting the leader, why isn't minReplicas protecting at least one replica?
> * what would happen if multiple replicas co-existed on the same node? would if the leader was one of the replicas that existed on the same node as another replica?
> * what would happen if there were additional collections in the cluster that had replicas on these nodes that had low search rate for this target collection?  would they protect the nodes from being the target of DELETENODE ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]