Autoscaling and inactive shards

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Autoscaling and inactive shards

Jan Høydahl / Cominvent
Hi

I'm trying to have Autoscaling move a shard to another node after manually splitting.
We have two nodes, one has a shard1 and the other node is empty.

After SPLITSHARD you have

* shard1 (inactive)
* shard1_0
* shard1_1

For autoscaling we have the {"minimize" : "cores"} cluster preference active. Because of that I'd expect that Autoscaling would suggest to move e.g. shard1_1 to the other (empty) node, but it doesn't. Then I create a rule just to test {"cores": "<2", "node": "#ANY"}, but still no suggestions. Not until I delete the inactive shard1, then it suggests to move one of the two remaining shards to the other node.

So my two questions are
1. Is it by design that inactive shards "count" wrt #cores?
   I understand that it consumes disk but it is not active otherwise,
   so one could argue that it should not be counted in core/replica rules?
2. Why is there no suggestion to move a shard due to the "minimize cores" reference itself?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

Reply | Threaded
Open this post in threaded view
|

Re: Autoscaling and inactive shards

Shalin Shekhar Mangar
Hi Jan,

Comments inline:

On Tue, Jun 12, 2018 at 2:19 AM Jan Høydahl <[hidden email]> wrote:

> Hi
>
> I'm trying to have Autoscaling move a shard to another node after manually
> splitting.
> We have two nodes, one has a shard1 and the other node is empty.
>
> After SPLITSHARD you have
>
> * shard1 (inactive)
> * shard1_0
> * shard1_1
>
> For autoscaling we have the {"minimize" : "cores"} cluster preference
> active. Because of that I'd expect that Autoscaling would suggest to move
> e.g. shard1_1 to the other (empty) node, but it doesn't. Then I create a
> rule just to test {"cores": "<2", "node": "#ANY"}, but still no
> suggestions. Not until I delete the inactive shard1, then it suggests to
> move one of the two remaining shards to the other node.
>
> So my two questions are
> 1. Is it by design that inactive shards "count" wrt #cores?
>    I understand that it consumes disk but it is not active otherwise,
>    so one could argue that it should not be counted in core/replica rules?
>

Today, inactive slices also count towards the number of cores -- though
technically correct, it is probably an oversight.


> 2. Why is there no suggestion to move a shard due to the "minimize cores"
> reference itself?
>

The /autoscaling/suggestions end point only suggests if there are policy
violations. Preferences such as minimize:cores are more of a sorting order
so they aren't really being violated. After you add the rule, the framework
still cannot give a suggestion that satisfies your rule. This is because
even if shard1_1 is moved to node2, node1 still has shard1 and shard1_0. So
the system ends up not suggesting anything. You should get a suggestion if
you add a third node to the cluster though.

Also see SOLR-11997 <https://issues.apache.org/jira/browse/SOLR-11997> which
will tell users that a suggestion could not be returned because we cannot
satisfy the policy. There are a slew of other improvements to suggestions
planned that will return suggestions even when there are no policy
violations.


>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Autoscaling and inactive shards

Jan Høydahl / Cominvent
Ok, get the meaning of preferences.

Would there be a way to write a generic rule that would suggest moving shards to obtain balance, without specifying absolute core counts? I.e. if you have three nodes
A: 3 cores
B: 5 cores
C: 3 cores

Then that rule would suggest two moves to end up with 4 cores on all three (unless that would violate disk space or load limits)?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 12. jun. 2018 kl. 08:10 skrev Shalin Shekhar Mangar <[hidden email]>:
>
> Hi Jan,
>
> Comments inline:
>
> On Tue, Jun 12, 2018 at 2:19 AM Jan Høydahl <[hidden email] <mailto:[hidden email]>> wrote:
>
>> Hi
>>
>> I'm trying to have Autoscaling move a shard to another node after manually
>> splitting.
>> We have two nodes, one has a shard1 and the other node is empty.
>>
>> After SPLITSHARD you have
>>
>> * shard1 (inactive)
>> * shard1_0
>> * shard1_1
>>
>> For autoscaling we have the {"minimize" : "cores"} cluster preference
>> active. Because of that I'd expect that Autoscaling would suggest to move
>> e.g. shard1_1 to the other (empty) node, but it doesn't. Then I create a
>> rule just to test {"cores": "<2", "node": "#ANY"}, but still no
>> suggestions. Not until I delete the inactive shard1, then it suggests to
>> move one of the two remaining shards to the other node.
>>
>> So my two questions are
>> 1. Is it by design that inactive shards "count" wrt #cores?
>>   I understand that it consumes disk but it is not active otherwise,
>>   so one could argue that it should not be counted in core/replica rules?
>>
>
> Today, inactive slices also count towards the number of cores -- though
> technically correct, it is probably an oversight.
>
>
>> 2. Why is there no suggestion to move a shard due to the "minimize cores"
>> reference itself?
>>
>
> The /autoscaling/suggestions end point only suggests if there are policy
> violations. Preferences such as minimize:cores are more of a sorting order
> so they aren't really being violated. After you add the rule, the framework
> still cannot give a suggestion that satisfies your rule. This is because
> even if shard1_1 is moved to node2, node1 still has shard1 and shard1_0. So
> the system ends up not suggesting anything. You should get a suggestion if
> you add a third node to the cluster though.
>
> Also see SOLR-11997 <https://issues.apache.org/jira/browse/SOLR-11997 <https://issues.apache.org/jira/browse/SOLR-11997>> which
> will tell users that a suggestion could not be returned because we cannot
> satisfy the policy. There are a slew of other improvements to suggestions
> planned that will return suggestions even when there are no policy
> violations.
>
>
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com <http://www.cominvent.com/>
>>
>>
>
> --
> Regards,
> Shalin Shekhar Mangar.

Reply | Threaded
Open this post in threaded view
|

Re: Autoscaling and inactive shards

Shalin Shekhar Mangar
Yes, I believe Noble is working on this. See
https://issues.apache.org/jira/browse/SOLR-11985

On Wed, Jun 13, 2018 at 1:35 PM Jan Høydahl <[hidden email]> wrote:

> Ok, get the meaning of preferences.
>
> Would there be a way to write a generic rule that would suggest moving
> shards to obtain balance, without specifying absolute core counts? I.e. if
> you have three nodes
> A: 3 cores
> B: 5 cores
> C: 3 cores
>
> Then that rule would suggest two moves to end up with 4 cores on all three
> (unless that would violate disk space or load limits)?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 12. jun. 2018 kl. 08:10 skrev Shalin Shekhar Mangar <
> [hidden email]>:
> >
> > Hi Jan,
> >
> > Comments inline:
> >
> > On Tue, Jun 12, 2018 at 2:19 AM Jan Høydahl <[hidden email]
> <mailto:[hidden email]>> wrote:
> >
> >> Hi
> >>
> >> I'm trying to have Autoscaling move a shard to another node after
> manually
> >> splitting.
> >> We have two nodes, one has a shard1 and the other node is empty.
> >>
> >> After SPLITSHARD you have
> >>
> >> * shard1 (inactive)
> >> * shard1_0
> >> * shard1_1
> >>
> >> For autoscaling we have the {"minimize" : "cores"} cluster preference
> >> active. Because of that I'd expect that Autoscaling would suggest to
> move
> >> e.g. shard1_1 to the other (empty) node, but it doesn't. Then I create a
> >> rule just to test {"cores": "<2", "node": "#ANY"}, but still no
> >> suggestions. Not until I delete the inactive shard1, then it suggests to
> >> move one of the two remaining shards to the other node.
> >>
> >> So my two questions are
> >> 1. Is it by design that inactive shards "count" wrt #cores?
> >>   I understand that it consumes disk but it is not active otherwise,
> >>   so one could argue that it should not be counted in core/replica
> rules?
> >>
> >
> > Today, inactive slices also count towards the number of cores -- though
> > technically correct, it is probably an oversight.
> >
> >
> >> 2. Why is there no suggestion to move a shard due to the "minimize
> cores"
> >> reference itself?
> >>
> >
> > The /autoscaling/suggestions end point only suggests if there are policy
> > violations. Preferences such as minimize:cores are more of a sorting
> order
> > so they aren't really being violated. After you add the rule, the
> framework
> > still cannot give a suggestion that satisfies your rule. This is because
> > even if shard1_1 is moved to node2, node1 still has shard1 and shard1_0.
> So
> > the system ends up not suggesting anything. You should get a suggestion
> if
> > you add a third node to the cluster though.
> >
> > Also see SOLR-11997 <https://issues.apache.org/jira/browse/SOLR-11997 <
> https://issues.apache.org/jira/browse/SOLR-11997>> which
> > will tell users that a suggestion could not be returned because we cannot
> > satisfy the policy. There are a slew of other improvements to suggestions
> > planned that will return suggestions even when there are no policy
> > violations.
> >
> >
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com <http://www.cominvent.com/>
> >>
> >>
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>
>

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Autoscaling and inactive shards

Andrzej Białecki
If I’m not mistaken the weird accounting of “inactive” shard cores is caused also by the fact that individual cores that constitute replicas in the inactive shard are still loaded, so they still affect the number of active cores. If that’s the case then we should probably fix this to prevent loading the cores from inactive (but still present) shards.

> On 14 Jun 2018, at 04:27, Shalin Shekhar Mangar <[hidden email]> wrote:
>
> Yes, I believe Noble is working on this. See
> https://issues.apache.org/jira/browse/SOLR-11985
>
> On Wed, Jun 13, 2018 at 1:35 PM Jan Høydahl <[hidden email]> wrote:
>
>> Ok, get the meaning of preferences.
>>
>> Would there be a way to write a generic rule that would suggest moving
>> shards to obtain balance, without specifying absolute core counts? I.e. if
>> you have three nodes
>> A: 3 cores
>> B: 5 cores
>> C: 3 cores
>>
>> Then that rule would suggest two moves to end up with 4 cores on all three
>> (unless that would violate disk space or load limits)?
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>>> 12. jun. 2018 kl. 08:10 skrev Shalin Shekhar Mangar <
>> [hidden email]>:
>>>
>>> Hi Jan,
>>>
>>> Comments inline:
>>>
>>> On Tue, Jun 12, 2018 at 2:19 AM Jan Høydahl <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>>
>>>> Hi
>>>>
>>>> I'm trying to have Autoscaling move a shard to another node after
>> manually
>>>> splitting.
>>>> We have two nodes, one has a shard1 and the other node is empty.
>>>>
>>>> After SPLITSHARD you have
>>>>
>>>> * shard1 (inactive)
>>>> * shard1_0
>>>> * shard1_1
>>>>
>>>> For autoscaling we have the {"minimize" : "cores"} cluster preference
>>>> active. Because of that I'd expect that Autoscaling would suggest to
>> move
>>>> e.g. shard1_1 to the other (empty) node, but it doesn't. Then I create a
>>>> rule just to test {"cores": "<2", "node": "#ANY"}, but still no
>>>> suggestions. Not until I delete the inactive shard1, then it suggests to
>>>> move one of the two remaining shards to the other node.
>>>>
>>>> So my two questions are
>>>> 1. Is it by design that inactive shards "count" wrt #cores?
>>>>  I understand that it consumes disk but it is not active otherwise,
>>>>  so one could argue that it should not be counted in core/replica
>> rules?
>>>>
>>>
>>> Today, inactive slices also count towards the number of cores -- though
>>> technically correct, it is probably an oversight.
>>>
>>>
>>>> 2. Why is there no suggestion to move a shard due to the "minimize
>> cores"
>>>> reference itself?
>>>>
>>>
>>> The /autoscaling/suggestions end point only suggests if there are policy
>>> violations. Preferences such as minimize:cores are more of a sorting
>> order
>>> so they aren't really being violated. After you add the rule, the
>> framework
>>> still cannot give a suggestion that satisfies your rule. This is because
>>> even if shard1_1 is moved to node2, node1 still has shard1 and shard1_0.
>> So
>>> the system ends up not suggesting anything. You should get a suggestion
>> if
>>> you add a third node to the cluster though.
>>>
>>> Also see SOLR-11997 <https://issues.apache.org/jira/browse/SOLR-11997 <
>> https://issues.apache.org/jira/browse/SOLR-11997>> which
>>> will tell users that a suggestion could not be returned because we cannot
>>> satisfy the policy. There are a slew of other improvements to suggestions
>>> planned that will return suggestions even when there are no policy
>>> violations.
>>>
>>>
>>>>
>>>> --
>>>> Jan Høydahl, search solution architect
>>>> Cominvent AS - www.cominvent.com <http://www.cominvent.com/>
>>>>
>>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>
>>
>
> --
> Regards,
> Shalin Shekhar Mangar.

Reply | Threaded
Open this post in threaded view
|

Re: Autoscaling and inactive shards

Jan Høydahl / Cominvent
Is there still a valid reason to keep the inactive shards around?
If shard splitting is robust, could not the split operation delete the inactive shard once the new shards are successfully loaded, just like what happens during an automated merge of segments?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. jun. 2018 kl. 12:12 skrev Andrzej Białecki <[hidden email]>:
>
> If I’m not mistaken the weird accounting of “inactive” shard cores is caused also by the fact that individual cores that constitute replicas in the inactive shard are still loaded, so they still affect the number of active cores. If that’s the case then we should probably fix this to prevent loading the cores from inactive (but still present) shards.
>
>> On 14 Jun 2018, at 04:27, Shalin Shekhar Mangar <[hidden email]> wrote:
>>
>> Yes, I believe Noble is working on this. See
>> https://issues.apache.org/jira/browse/SOLR-11985
>>
>> On Wed, Jun 13, 2018 at 1:35 PM Jan Høydahl <[hidden email]> wrote:
>>
>>> Ok, get the meaning of preferences.
>>>
>>> Would there be a way to write a generic rule that would suggest moving
>>> shards to obtain balance, without specifying absolute core counts? I.e. if
>>> you have three nodes
>>> A: 3 cores
>>> B: 5 cores
>>> C: 3 cores
>>>
>>> Then that rule would suggest two moves to end up with 4 cores on all three
>>> (unless that would violate disk space or load limits)?
>>>
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>>
>>>> 12. jun. 2018 kl. 08:10 skrev Shalin Shekhar Mangar <
>>> [hidden email]>:
>>>>
>>>> Hi Jan,
>>>>
>>>> Comments inline:
>>>>
>>>> On Tue, Jun 12, 2018 at 2:19 AM Jan Høydahl <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I'm trying to have Autoscaling move a shard to another node after
>>> manually
>>>>> splitting.
>>>>> We have two nodes, one has a shard1 and the other node is empty.
>>>>>
>>>>> After SPLITSHARD you have
>>>>>
>>>>> * shard1 (inactive)
>>>>> * shard1_0
>>>>> * shard1_1
>>>>>
>>>>> For autoscaling we have the {"minimize" : "cores"} cluster preference
>>>>> active. Because of that I'd expect that Autoscaling would suggest to
>>> move
>>>>> e.g. shard1_1 to the other (empty) node, but it doesn't. Then I create a
>>>>> rule just to test {"cores": "<2", "node": "#ANY"}, but still no
>>>>> suggestions. Not until I delete the inactive shard1, then it suggests to
>>>>> move one of the two remaining shards to the other node.
>>>>>
>>>>> So my two questions are
>>>>> 1. Is it by design that inactive shards "count" wrt #cores?
>>>>> I understand that it consumes disk but it is not active otherwise,
>>>>> so one could argue that it should not be counted in core/replica
>>> rules?
>>>>>
>>>>
>>>> Today, inactive slices also count towards the number of cores -- though
>>>> technically correct, it is probably an oversight.
>>>>
>>>>
>>>>> 2. Why is there no suggestion to move a shard due to the "minimize
>>> cores"
>>>>> reference itself?
>>>>>
>>>>
>>>> The /autoscaling/suggestions end point only suggests if there are policy
>>>> violations. Preferences such as minimize:cores are more of a sorting
>>> order
>>>> so they aren't really being violated. After you add the rule, the
>>> framework
>>>> still cannot give a suggestion that satisfies your rule. This is because
>>>> even if shard1_1 is moved to node2, node1 still has shard1 and shard1_0.
>>> So
>>>> the system ends up not suggesting anything. You should get a suggestion
>>> if
>>>> you add a third node to the cluster though.
>>>>
>>>> Also see SOLR-11997 <https://issues.apache.org/jira/browse/SOLR-11997 <
>>> https://issues.apache.org/jira/browse/SOLR-11997>> which
>>>> will tell users that a suggestion could not be returned because we cannot
>>>> satisfy the policy. There are a slew of other improvements to suggestions
>>>> planned that will return suggestions even when there are no policy
>>>> violations.
>>>>
>>>>
>>>>>
>>>>> --
>>>>> Jan Høydahl, search solution architect
>>>>> Cominvent AS - www.cominvent.com <http://www.cominvent.com/>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Shalin Shekhar Mangar.
>>>
>>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>

Reply | Threaded
Open this post in threaded view
|

Re: Autoscaling and inactive shards

Andrzej Białecki


> On 18 Jun 2018, at 14:02, Jan Høydahl <[hidden email]> wrote:
>
> Is there still a valid reason to keep the inactive shards around?
> If shard splitting is robust, could not the split operation delete the inactive shard once the new shards are successfully loaded, just like what happens during an automated merge of segments?
>


Shard splitting is not robust :) There are some interesting partial failure scenarios in SplitShardCmd that still need fixing - most likely a complete rewrite of SplitShardCmd is required to improve error handling, perhaps also to use a more efficient index splitting algorithm.

Until this is done shard splitting leaves the original shard for a while, and then InactiveShardPlanAction removes them after their TTL expired (default is 2 days).

> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>> 18. jun. 2018 kl. 12:12 skrev Andrzej Białecki <[hidden email]>:
>>
>> If I’m not mistaken the weird accounting of “inactive” shard cores is caused also by the fact that individual cores that constitute replicas in the inactive shard are still loaded, so they still affect the number of active cores. If that’s the case then we should probably fix this to prevent loading the cores from inactive (but still present) shards.
>>
>>> On 14 Jun 2018, at 04:27, Shalin Shekhar Mangar <[hidden email]> wrote:
>>>
>>> Yes, I believe Noble is working on this. See
>>> https://issues.apache.org/jira/browse/SOLR-11985
>>>
>>> On Wed, Jun 13, 2018 at 1:35 PM Jan Høydahl <[hidden email]> wrote:
>>>
>>>> Ok, get the meaning of preferences.
>>>>
>>>> Would there be a way to write a generic rule that would suggest moving
>>>> shards to obtain balance, without specifying absolute core counts? I.e. if
>>>> you have three nodes
>>>> A: 3 cores
>>>> B: 5 cores
>>>> C: 3 cores
>>>>
>>>> Then that rule would suggest two moves to end up with 4 cores on all three
>>>> (unless that would violate disk space or load limits)?
>>>>
>>>> --
>>>> Jan Høydahl, search solution architect
>>>> Cominvent AS - www.cominvent.com
>>>>
>>>>> 12. jun. 2018 kl. 08:10 skrev Shalin Shekhar Mangar <
>>>> [hidden email]>:
>>>>>
>>>>> Hi Jan,
>>>>>
>>>>> Comments inline:
>>>>>
>>>>> On Tue, Jun 12, 2018 at 2:19 AM Jan Høydahl <[hidden email]
>>>> <mailto:[hidden email]>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I'm trying to have Autoscaling move a shard to another node after
>>>> manually
>>>>>> splitting.
>>>>>> We have two nodes, one has a shard1 and the other node is empty.
>>>>>>
>>>>>> After SPLITSHARD you have
>>>>>>
>>>>>> * shard1 (inactive)
>>>>>> * shard1_0
>>>>>> * shard1_1
>>>>>>
>>>>>> For autoscaling we have the {"minimize" : "cores"} cluster preference
>>>>>> active. Because of that I'd expect that Autoscaling would suggest to
>>>> move
>>>>>> e.g. shard1_1 to the other (empty) node, but it doesn't. Then I create a
>>>>>> rule just to test {"cores": "<2", "node": "#ANY"}, but still no
>>>>>> suggestions. Not until I delete the inactive shard1, then it suggests to
>>>>>> move one of the two remaining shards to the other node.
>>>>>>
>>>>>> So my two questions are
>>>>>> 1. Is it by design that inactive shards "count" wrt #cores?
>>>>>> I understand that it consumes disk but it is not active otherwise,
>>>>>> so one could argue that it should not be counted in core/replica
>>>> rules?
>>>>>>
>>>>>
>>>>> Today, inactive slices also count towards the number of cores -- though
>>>>> technically correct, it is probably an oversight.
>>>>>
>>>>>
>>>>>> 2. Why is there no suggestion to move a shard due to the "minimize
>>>> cores"
>>>>>> reference itself?
>>>>>>
>>>>>
>>>>> The /autoscaling/suggestions end point only suggests if there are policy
>>>>> violations. Preferences such as minimize:cores are more of a sorting
>>>> order
>>>>> so they aren't really being violated. After you add the rule, the
>>>> framework
>>>>> still cannot give a suggestion that satisfies your rule. This is because
>>>>> even if shard1_1 is moved to node2, node1 still has shard1 and shard1_0.
>>>> So
>>>>> the system ends up not suggesting anything. You should get a suggestion
>>>> if
>>>>> you add a third node to the cluster though.
>>>>>
>>>>> Also see SOLR-11997 <https://issues.apache.org/jira/browse/SOLR-11997 <
>>>> https://issues.apache.org/jira/browse/SOLR-11997>> which
>>>>> will tell users that a suggestion could not be returned because we cannot
>>>>> satisfy the policy. There are a slew of other improvements to suggestions
>>>>> planned that will return suggestions even when there are no policy
>>>>> violations.
>>>>>
>>>>>
>>>>>>
>>>>>> --
>>>>>> Jan Høydahl, search solution architect
>>>>>> Cominvent AS - www.cominvent.com <http://www.cominvent.com/>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Shalin Shekhar Mangar.
>>>>
>>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>
>