Do we need the MODIFYCOLLECTION Api?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Do we need the MODIFYCOLLECTION Api?

Varun Thacker-4
Today the Modify Collection supports the following properties to be modified
  1. maxShardsPerNode
  2. rule
  3. snitch
  4. policy
  5. collection.configName
  6. autoAddReplicas
  7. replicationFactor
1-4 seems something we should get rid of because we have the AutoScaling Policy framework?

5> Can anyone point out the use-case for this?

6> autoAddReplicas can be changed as a clusterprop and modify-collection API ? Hmm. Which one is supposed to win?

7> We need to allow a user to change replicationFactor. But how does this help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this sounds just confusing? Or allow changing all replica types ? 
Reply | Threaded
Open this post in threaded view
|

Re: Do we need the MODIFYCOLLECTION Api?

Jan Høydahl / Cominvent
Do we have a v2 API for CREATE and MODIFYCOLLECTION? E.g.

{ modify-collection: { replicationFactor: 3 } }

Perhaps we should focus on a decent v2 API and deprecate the old confusing one?

wrt. replicationFactor / nrtReplica / pullReplicas / tlogReplicas, my wish is that replicationFactor keeps on living as today, only setting nrtReplicas, and is mutually exclusive to any of the three others. So if you have a collection with tlogReplicas defined, then modifying "replicationFactor" should throw and error. But if you only ever care about NRT replicas then you can keep using replicationFactor as before???

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

15. jun. 2018 kl. 13:22 skrev Varun Thacker <[hidden email]>:

Today the Modify Collection supports the following properties to be modified
  1. maxShardsPerNode
  2. rule
  3. snitch
  4. policy
  5. collection.configName
  6. autoAddReplicas
  7. replicationFactor
1-4 seems something we should get rid of because we have the AutoScaling Policy framework?

5> Can anyone point out the use-case for this?

6> autoAddReplicas can be changed as a clusterprop and modify-collection API ? Hmm. Which one is supposed to win?

7> We need to allow a user to change replicationFactor. But how does this help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this sounds just confusing? Or allow changing all replica types ? 

Reply | Threaded
Open this post in threaded view
|

Re: Do we need the MODIFYCOLLECTION Api?

david.w.smiley@gmail.com
In reply to this post by Varun Thacker-4
+1 to get rid of #1, #2, #3, #7.

Maybe I'm mistaken but I thought "policy" was a part of the auto scaling framework?

Maybe the capability for autoAddReplicas should be considered an aspect of the auto scaling framework instead of a collection setting, and thus we could remove it here?

I think the ability to modify collection.configName seems useful albeit rare to use in practice.  Perhaps you want to try out a bunch of changes and want to easily roll back.  You could create a config with those modifications, try it out, and if you don't like the results then point your config back to the original.  Although In practice it may not always be possible to just switch configs since a reindex may be required.

On Fri, Jun 15, 2018 at 7:22 AM Varun Thacker <[hidden email]> wrote:
Today the Modify Collection supports the following properties to be modified
  1. maxShardsPerNode
  2. rule
  3. snitch
  4. policy
  5. collection.configName
  6. autoAddReplicas
  7. replicationFactor
1-4 seems something we should get rid of because we have the AutoScaling Policy framework?

5> Can anyone point out the use-case for this?

6> autoAddReplicas can be changed as a clusterprop and modify-collection API ? Hmm. Which one is supposed to win?

7> We need to allow a user to change replicationFactor. But how does this help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this sounds just confusing? Or allow changing all replica types ? 
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Do we need the MODIFYCOLLECTION Api?

Varun Thacker-4


On Fri, Jun 15, 2018 at 2:44 PM, David Smiley <[hidden email]> wrote:
+1 to get rid of #1, #2, #3, #7.

Maybe I'm mistaken but I thought "policy" was a part of the auto scaling framework?

Yeah. And http://lucene.apache.org/solr/guide/solrcloud-autoscaling-api.html#create-and-modify-cluster-policies seems like the way to modify it.  So I wonder why should modifycollection support it? 
Maybe Noble , AB or Shalin could confirm?


Maybe the capability for autoAddReplicas should be considered an aspect of the auto scaling framework instead of a collection setting, and thus we could remove it here?

Yeah I'd love for that to happen. It's even tied to triggers etc so seems like it should be enabled/disabled via the autoscaling API  

I think the ability to modify collection.configName seems useful albeit rare to use in practice.  Perhaps you want to try out a bunch of changes and want to easily roll back.  You could create a config with those modifications, try it out, and if you don't like the results then point your config back to the original.  Although In practice it may not always be possible to just switch configs since a reindex may be required.

Right and then basically we are giving a way for users to shoot themselves in the foot :)  


On Fri, Jun 15, 2018 at 7:22 AM Varun Thacker <[hidden email]> wrote:
Today the Modify Collection supports the following properties to be modified
  1. maxShardsPerNode
  2. rule
  3. snitch
  4. policy
  5. collection.configName
  6. autoAddReplicas
  7. replicationFactor
1-4 seems something we should get rid of because we have the AutoScaling Policy framework?

5> Can anyone point out the use-case for this?

6> autoAddReplicas can be changed as a clusterprop and modify-collection API ? Hmm. Which one is supposed to win?

7> We need to allow a user to change replicationFactor. But how does this help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this sounds just confusing? Or allow changing all replica types ? 
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker

Reply | Threaded
Open this post in threaded view
|

Re: Do we need the MODIFYCOLLECTION Api?

Varun Thacker-4
In reply to this post by Jan Høydahl / Cominvent
Hi Jan,

I agree with how your thinking of replicationFactor as basically being a equivalent to nrtReplicas . Let's not change that.

so the is #7 the real only use for this API? 

On Fri, Jun 15, 2018 at 1:46 PM, Jan Høydahl <[hidden email]> wrote:
Do we have a v2 API for CREATE and MODIFYCOLLECTION? E.g.

{ modify-collection: { replicationFactor: 3 } }

Perhaps we should focus on a decent v2 API and deprecate the old confusing one?

wrt. replicationFactor / nrtReplica / pullReplicas / tlogReplicas, my wish is that replicationFactor keeps on living as today, only setting nrtReplicas, and is mutually exclusive to any of the three others. So if you have a collection with tlogReplicas defined, then modifying "replicationFactor" should throw and error. But if you only ever care about NRT replicas then you can keep using replicationFactor as before???

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

15. jun. 2018 kl. 13:22 skrev Varun Thacker <[hidden email]>:

Today the Modify Collection supports the following properties to be modified
  1. maxShardsPerNode
  2. rule
  3. snitch
  4. policy
  5. collection.configName
  6. autoAddReplicas
  7. replicationFactor
1-4 seems something we should get rid of because we have the AutoScaling Policy framework?

5> Can anyone point out the use-case for this?

6> autoAddReplicas can be changed as a clusterprop and modify-collection API ? Hmm. Which one is supposed to win?

7> We need to allow a user to change replicationFactor. But how does this help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this sounds just confusing? Or allow changing all replica types ? 


Reply | Threaded
Open this post in threaded view
|

Re: Do we need the MODIFYCOLLECTION Api?

Erick Erickson
re: collection.configName

bq. Right and then basically we are giving a way for users to shoot
themselves in the foot :)

They can also delete their index files....

Seriously though, what if I have a bunch of collections sharing a
configset then I need to specialize only one by _adding_ fields? I'd
like to copy the configset to a new one and then point my collection
at it. And with the UninvertingMergePolicy adding DV would be one such
specialization.

I've also seen time-series collections (let's say 30 days) where you
_cannot_ reindex. But you want to modify your schema anyway. People
have
1> defined a new field that's a variant of the old field
2> have their indexing program index to _both_ for 30 days
3> change the app to use the new field
4> change the indexing program to stop indexing to the old field

Sure, the metadata for the field is still carried along but that's not
a problem for a few fields.

Point is it's dangerous to go changing your configset for an existing
collection, sure. But I find the API a better option than having to
manually edit your ZK nodes.

FWIW

On Fri, Jun 15, 2018 at 7:18 AM, Varun Thacker <[hidden email]> wrote:

> Hi Jan,
>
> I agree with how your thinking of replicationFactor as basically being a
> equivalent to nrtReplicas . Let's not change that.
>
> so the is #7 the real only use for this API?
>
> On Fri, Jun 15, 2018 at 1:46 PM, Jan Høydahl <[hidden email]> wrote:
>>
>> Do we have a v2 API for CREATE and MODIFYCOLLECTION? E.g.
>>
>> POST http://localhost:8983/api/c
>> { modify-collection: { replicationFactor: 3 } }
>>
>> Perhaps we should focus on a decent v2 API and deprecate the old confusing
>> one?
>>
>> wrt. replicationFactor / nrtReplica / pullReplicas / tlogReplicas, my wish
>> is that replicationFactor keeps on living as today, only setting
>> nrtReplicas, and is mutually exclusive to any of the three others. So if you
>> have a collection with tlogReplicas defined, then modifying
>> "replicationFactor" should throw and error. But if you only ever care about
>> NRT replicas then you can keep using replicationFactor as before???
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 15. jun. 2018 kl. 13:22 skrev Varun Thacker <[hidden email]>:
>>
>> Today the Modify Collection supports the following properties to be
>> modified
>>
>> maxShardsPerNode
>> rule
>> snitch
>> policy
>> collection.configName
>> autoAddReplicas
>> replicationFactor
>>
>> 1-4 seems something we should get rid of because we have the AutoScaling
>> Policy framework?
>>
>> 5> Can anyone point out the use-case for this?
>>
>> 6> autoAddReplicas can be changed as a clusterprop and modify-collection
>> API ? Hmm. Which one is supposed to win?
>>
>> 7> We need to allow a user to change replicationFactor. But how does this
>> help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this
>> sounds just confusing? Or allow changing all replica types ?
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Do we need the MODIFYCOLLECTION Api?

Varun Thacker-4
So let's keep collection.configName and replicationFactor.

If we were to think of this API today , would MODIFYCOLLECTION be where we still put it? 

It almost feels like a collection setting. Maybe Collection Properties ( SOLR-11960 ) is where it should live?


On Fri, Jun 15, 2018 at 4:58 PM, Erick Erickson <[hidden email]> wrote:
re: collection.configName

bq. Right and then basically we are giving a way for users to shoot
themselves in the foot :)

They can also delete their index files....

Seriously though, what if I have a bunch of collections sharing a
configset then I need to specialize only one by _adding_ fields? I'd
like to copy the configset to a new one and then point my collection
at it. And with the UninvertingMergePolicy adding DV would be one such
specialization.

I've also seen time-series collections (let's say 30 days) where you
_cannot_ reindex. But you want to modify your schema anyway. People
have
1> defined a new field that's a variant of the old field
2> have their indexing program index to _both_ for 30 days
3> change the app to use the new field
4> change the indexing program to stop indexing to the old field

Sure, the metadata for the field is still carried along but that's not
a problem for a few fields.

Point is it's dangerous to go changing your configset for an existing
collection, sure. But I find the API a better option than having to
manually edit your ZK nodes.

FWIW

On Fri, Jun 15, 2018 at 7:18 AM, Varun Thacker <[hidden email]> wrote:
> Hi Jan,
>
> I agree with how your thinking of replicationFactor as basically being a
> equivalent to nrtReplicas . Let's not change that.
>
> so the is #7 the real only use for this API?
>
> On Fri, Jun 15, 2018 at 1:46 PM, Jan Høydahl <[hidden email]> wrote:
>>
>> Do we have a v2 API for CREATE and MODIFYCOLLECTION? E.g.
>>
>> POST http://localhost:8983/api/c
>> { modify-collection: { replicationFactor: 3 } }
>>
>> Perhaps we should focus on a decent v2 API and deprecate the old confusing
>> one?
>>
>> wrt. replicationFactor / nrtReplica / pullReplicas / tlogReplicas, my wish
>> is that replicationFactor keeps on living as today, only setting
>> nrtReplicas, and is mutually exclusive to any of the three others. So if you
>> have a collection with tlogReplicas defined, then modifying
>> "replicationFactor" should throw and error. But if you only ever care about
>> NRT replicas then you can keep using replicationFactor as before???
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 15. jun. 2018 kl. 13:22 skrev Varun Thacker <[hidden email]>:
>>
>> Today the Modify Collection supports the following properties to be
>> modified
>>
>> maxShardsPerNode
>> rule
>> snitch
>> policy
>> collection.configName
>> autoAddReplicas
>> replicationFactor
>>
>> 1-4 seems something we should get rid of because we have the AutoScaling
>> Policy framework?
>>
>> 5> Can anyone point out the use-case for this?
>>
>> 6> autoAddReplicas can be changed as a clusterprop and modify-collection
>> API ? Hmm. Which one is supposed to win?
>>
>> 7> We need to allow a user to change replicationFactor. But how does this
>> help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this
>> sounds just confusing? Or allow changing all replica types ?
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Do we need the MODIFYCOLLECTION Api?

Shalin Shekhar Mangar
In reply to this post by Varun Thacker-4


On Fri, Jun 15, 2018 at 7:47 PM Varun Thacker <[hidden email]> wrote:


On Fri, Jun 15, 2018 at 2:44 PM, David Smiley <[hidden email]> wrote:
+1 to get rid of #1, #2, #3, #7.

Maybe I'm mistaken but I thought "policy" was a part of the auto scaling framework?

Yeah. And http://lucene.apache.org/solr/guide/solrcloud-autoscaling-api.html#create-and-modify-cluster-policies seems like the way to modify it.  So I wonder why should modifycollection support it? 
Maybe Noble , AB or Shalin could confirm?

The policy is indeed part of the auto scaling framework but the support in modify collection is to be able to switch policy for a collection. For example, say you have policy1 which you associated with collection xyz at creation time using the "usePolicy" parameter. Now if you want to change the collection to use policy2 instead then modify collection API is the way to go. IMO, we need support for this API even though certain parameters are ready to be deprecated.
 


Maybe the capability for autoAddReplicas should be considered an aspect of the auto scaling framework instead of a collection setting, and thus we could remove it here?

Yeah I'd love for that to happen. It's even tied to triggers etc so seems like it should be enabled/disabled via the autoscaling API  

I think the ability to modify collection.configName seems useful albeit rare to use in practice.  Perhaps you want to try out a bunch of changes and want to easily roll back.  You could create a config with those modifications, try it out, and if you don't like the results then point your config back to the original.  Although In practice it may not always be possible to just switch configs since a reindex may be required.

Right and then basically we are giving a way for users to shoot themselves in the foot :)  


On Fri, Jun 15, 2018 at 7:22 AM Varun Thacker <[hidden email]> wrote:
Today the Modify Collection supports the following properties to be modified
  1. maxShardsPerNode
  2. rule
  3. snitch
  4. policy
  5. collection.configName
  6. autoAddReplicas
  7. replicationFactor
1-4 seems something we should get rid of because we have the AutoScaling Policy framework?

5> Can anyone point out the use-case for this?

6> autoAddReplicas can be changed as a clusterprop and modify-collection API ? Hmm. Which one is supposed to win?

7> We need to allow a user to change replicationFactor. But how does this help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this sounds just confusing? Or allow changing all replica types ? 
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker



--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Do we need the MODIFYCOLLECTION Api?

Varun Thacker-4
Thanks everyone! I've created SOLR-12498 and linked it to this mailing list thread 

On Tue, Jun 19, 2018 at 8:14 AM, Shalin Shekhar Mangar <[hidden email]> wrote:


On Fri, Jun 15, 2018 at 7:47 PM Varun Thacker <[hidden email]> wrote:


On Fri, Jun 15, 2018 at 2:44 PM, David Smiley <[hidden email]> wrote:
+1 to get rid of #1, #2, #3, #7.

Maybe I'm mistaken but I thought "policy" was a part of the auto scaling framework?

Yeah. And http://lucene.apache.org/solr/guide/solrcloud-autoscaling-api.html#create-and-modify-cluster-policies seems like the way to modify it.  So I wonder why should modifycollection support it? 
Maybe Noble , AB or Shalin could confirm?

The policy is indeed part of the auto scaling framework but the support in modify collection is to be able to switch policy for a collection. For example, say you have policy1 which you associated with collection xyz at creation time using the "usePolicy" parameter. Now if you want to change the collection to use policy2 instead then modify collection API is the way to go. IMO, we need support for this API even though certain parameters are ready to be deprecated.
 


Maybe the capability for autoAddReplicas should be considered an aspect of the auto scaling framework instead of a collection setting, and thus we could remove it here?

Yeah I'd love for that to happen. It's even tied to triggers etc so seems like it should be enabled/disabled via the autoscaling API  

I think the ability to modify collection.configName seems useful albeit rare to use in practice.  Perhaps you want to try out a bunch of changes and want to easily roll back.  You could create a config with those modifications, try it out, and if you don't like the results then point your config back to the original.  Although In practice it may not always be possible to just switch configs since a reindex may be required.

Right and then basically we are giving a way for users to shoot themselves in the foot :)  


On Fri, Jun 15, 2018 at 7:22 AM Varun Thacker <[hidden email]> wrote:
Today the Modify Collection supports the following properties to be modified
  1. maxShardsPerNode
  2. rule
  3. snitch
  4. policy
  5. collection.configName
  6. autoAddReplicas
  7. replicationFactor
1-4 seems something we should get rid of because we have the AutoScaling Policy framework?

5> Can anyone point out the use-case for this?

6> autoAddReplicas can be changed as a clusterprop and modify-collection API ? Hmm. Which one is supposed to win?

7> We need to allow a user to change replicationFactor. But how does this help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this sounds just confusing? Or allow changing all replica types ? 
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker



--
Regards,
Shalin Shekhar Mangar.