Shard splitting and replica placement strategy

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Shard splitting and replica placement strategy

Shai Erera
Hi

I wanted to try out the (relatively) new replica placement strategy and how it plays with shard splitting. So I set up a 4-node cluster, created a collection with 1 shard and 2 replicas (each created on a different) node.

When I issue a SPLITSHARD command (without any rules set on the collection), the split finishes successfully and the state of the cluster is:

n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1
n2: s1_r2 (INACTIVE), s1_0_r2
n3: s1_1_r2
n4: empty

So far as expected, since the shard splitting occurred on n1, the two sub shards were created there, and then Solr filled the missing replicas on nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not delete it (in the test).

Then I tried the same, curious if I set the right rule, one of the sub-shards' replicas will move to the 4th node, so I end up w/ a "balanced" cluster. So I created the collection with the rule: "shard:**,replica:<2,node:*", which per the ref guide says that I should end with no more than one replica per shard on every node. Per my understanding, I should end up with either 2 nodes each holding one replica of each shard, 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one replica.

However, while observing the cluster status I noticed that the two created sub-shards are marked as ACTIVE and leader, while the two others are marked in DOWN. Turning on INFO logging I found this:

Caused by: java.lang.NullPointerException at org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168) at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252) at org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203) at org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174) at org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135) at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204) at org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212)

I also tried with the rule "replica:<2,node:*" which yielded the same NPE. I run on 5.4.1 and I couldn't find if this is something that was fixed in 5.5.0/master already. So the question is -- is this a bug or did I misconfigure the rule?

And as a side question, is there any rule which I can configure so that the split shards are distributed evenly in the cluster? Or currently SPLITSHARD will always result in the created shards existing on the origin node, and it's my responsibility to move them elsewhere?

Shai
Reply | Threaded
Open this post in threaded view
|

Re: Shard splitting and replica placement strategy

Noble Paul നോബിള്‍  नोब्ळ्
Whatever it is , there should be no NPE. could be a bug

On Wed, Feb 24, 2016 at 6:23 PM, Shai Erera <[hidden email]> wrote:

> Hi
>
> I wanted to try out the (relatively) new replica placement strategy and how
> it plays with shard splitting. So I set up a 4-node cluster, created a
> collection with 1 shard and 2 replicas (each created on a different) node.
>
> When I issue a SPLITSHARD command (without any rules set on the collection),
> the split finishes successfully and the state of the cluster is:
>
> n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1
> n2: s1_r2 (INACTIVE), s1_0_r2
> n3: s1_1_r2
> n4: empty
>
> So far as expected, since the shard splitting occurred on n1, the two sub
> shards were created there, and then Solr filled the missing replicas on
> nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not
> delete it (in the test).
>
> Then I tried the same, curious if I set the right rule, one of the
> sub-shards' replicas will move to the 4th node, so I end up w/ a "balanced"
> cluster. So I created the collection with the rule:
> "shard:**,replica:<2,node:*", which per the ref guide says that I should end
> with no more than one replica per shard on every node. Per my understanding,
> I should end up with either 2 nodes each holding one replica of each shard,
> 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one
> replica.
>
> However, while observing the cluster status I noticed that the two created
> sub-shards are marked as ACTIVE and leader, while the two others are marked
> in DOWN. Turning on INFO logging I found this:
>
> Caused by: java.lang.NullPointerException at
> org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168)
> at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at
> org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135)
> at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at
> org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at
> org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204)
> at
> org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212)
>
> I also tried with the rule "replica:<2,node:*" which yielded the same NPE. I
> run on 5.4.1 and I couldn't find if this is something that was fixed in
> 5.5.0/master already. So the question is -- is this a bug or did I
> misconfigure the rule?
>
> And as a side question, is there any rule which I can configure so that the
> split shards are distributed evenly in the cluster? Or currently SPLITSHARD
> will always result in the created shards existing on the origin node, and
> it's my responsibility to move them elsewhere?
>
> Shai



--
-----------------------------------------------------
Noble Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Shard splitting and replica placement strategy

Shai Erera

Thanks Noble, I'll try to reproduce in a test then. Does the rule I've set sound right to you though?


On Wed, Feb 24, 2016, 15:19 Noble Paul <[hidden email]> wrote:
Whatever it is , there should be no NPE. could be a bug

On Wed, Feb 24, 2016 at 6:23 PM, Shai Erera <[hidden email]> wrote:
> Hi
>
> I wanted to try out the (relatively) new replica placement strategy and how
> it plays with shard splitting. So I set up a 4-node cluster, created a
> collection with 1 shard and 2 replicas (each created on a different) node.
>
> When I issue a SPLITSHARD command (without any rules set on the collection),
> the split finishes successfully and the state of the cluster is:
>
> n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1
> n2: s1_r2 (INACTIVE), s1_0_r2
> n3: s1_1_r2
> n4: empty
>
> So far as expected, since the shard splitting occurred on n1, the two sub
> shards were created there, and then Solr filled the missing replicas on
> nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not
> delete it (in the test).
>
> Then I tried the same, curious if I set the right rule, one of the
> sub-shards' replicas will move to the 4th node, so I end up w/ a "balanced"
> cluster. So I created the collection with the rule:
> "shard:**,replica:<2,node:*", which per the ref guide says that I should end
> with no more than one replica per shard on every node. Per my understanding,
> I should end up with either 2 nodes each holding one replica of each shard,
> 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one
> replica.
>
> However, while observing the cluster status I noticed that the two created
> sub-shards are marked as ACTIVE and leader, while the two others are marked
> in DOWN. Turning on INFO logging I found this:
>
> Caused by: java.lang.NullPointerException at
> org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168)
> at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at
> org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135)
> at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at
> org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at
> org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204)
> at
> org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212)
>
> I also tried with the rule "replica:<2,node:*" which yielded the same NPE. I
> run on 5.4.1 and I couldn't find if this is something that was fixed in
> 5.5.0/master already. So the question is -- is this a bug or did I
> misconfigure the rule?
>
> And as a side question, is there any rule which I can configure so that the
> split shards are distributed evenly in the cluster? Or currently SPLITSHARD
> will always result in the created shards existing on the origin node, and
> it's my responsibility to move them elsewhere?
>
> Shai



--
-----------------------------------------------------
Noble Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Shard splitting and replica placement strategy

Shai Erera
Opened https://issues.apache.org/jira/browse/SOLR-8728 with a test which reproduces the exception.

On Wed, Feb 24, 2016 at 3:49 PM Shai Erera <[hidden email]> wrote:

Thanks Noble, I'll try to reproduce in a test then. Does the rule I've set sound right to you though?


On Wed, Feb 24, 2016, 15:19 Noble Paul <[hidden email]> wrote:
Whatever it is , there should be no NPE. could be a bug

On Wed, Feb 24, 2016 at 6:23 PM, Shai Erera <[hidden email]> wrote:
> Hi
>
> I wanted to try out the (relatively) new replica placement strategy and how
> it plays with shard splitting. So I set up a 4-node cluster, created a
> collection with 1 shard and 2 replicas (each created on a different) node.
>
> When I issue a SPLITSHARD command (without any rules set on the collection),
> the split finishes successfully and the state of the cluster is:
>
> n1: s1_r1 (INACTIVE), s1_0_r1, s1_1_r1
> n2: s1_r2 (INACTIVE), s1_0_r2
> n3: s1_1_r2
> n4: empty
>
> So far as expected, since the shard splitting occurred on n1, the two sub
> shards were created there, and then Solr filled the missing replicas on
> nodes 2 and 3. Also the source shard s1 was set to INACTIVE and I did not
> delete it (in the test).
>
> Then I tried the same, curious if I set the right rule, one of the
> sub-shards' replicas will move to the 4th node, so I end up w/ a "balanced"
> cluster. So I created the collection with the rule:
> "shard:**,replica:<2,node:*", which per the ref guide says that I should end
> with no more than one replica per shard on every node. Per my understanding,
> I should end up with either 2 nodes each holding one replica of each shard,
> 3 nodes holding a mixture of replicas or 4 nodes each holds exactly one
> replica.
>
> However, while observing the cluster status I noticed that the two created
> sub-shards are marked as ACTIVE and leader, while the two others are marked
> in DOWN. Turning on INFO logging I found this:
>
> Caused by: java.lang.NullPointerException at
> org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:168)
> at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:130) at
> org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:252)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:203)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:174)
> at
> org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:135)
> at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:211) at
> org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:179) at
> org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2204)
> at
> org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1212)
>
> I also tried with the rule "replica:<2,node:*" which yielded the same NPE. I
> run on 5.4.1 and I couldn't find if this is something that was fixed in
> 5.5.0/master already. So the question is -- is this a bug or did I
> misconfigure the rule?
>
> And as a side question, is there any rule which I can configure so that the
> split shards are distributed evenly in the cluster? Or currently SPLITSHARD
> will always result in the created shards existing on the origin node, and
> it's my responsibility to move them elsewhere?
>
> Shai



--
-----------------------------------------------------
Noble Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]