SolrCloud "master mode" planned?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

SolrCloud "master mode" planned?

Otis Gospodnetić
Hi,

This thread about Solr master-slave vs. SolrCloud deployment poll seems to point out people find SolrCloud (the ZK part of it) deployment complex:


It could be just how information is presented...
... or how ZK is exposed as something external, which it is...

Are there plans to "hide ZK"?  Or maybe have the notion of master-only (not as in master-slave, but as in running ZK only, not hosting data) mode for SolrCloud nodes (a la ES)?  

I peeked at JIRA, but couldn't find anything about that, although I seem to recall some mention of embedding ZK to make things easier for SolrCloud users.  I think I saw that at some Lucene Revolution talk?

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/

Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud "master mode" planned?

Ishan Chattopadhyaya
Hi Otis,
I've been working on, and shall be working on, a few issues on the lines of "hide ZK".

SOLR-6736: Uploading configsets can now be done through Solr nodes instead of uploading them to ZK.
SOLR-10272: Use a _default configset, with the intention of not needing the user to bother about the concept of configsets unless he needs to
SOLR-10446 (SOLR-9057): User can use CloudSolrClient without access to ZK
SOLR-8440: Enabling BasicAuth security through bin/solr script
Ability to edit security.json through the bin/solr script

Having all this in place, and perhaps some more that I may be missing, should hopefully not need the user to know much about ZK.

1. Do you have suggestions on what more needs to be done for "hiding ZK"?
2. Do you have suggestions on how to track this overall theme of "hiding ZK"? Some of these issues I mentioned are associated with other epics, so I don't know if creating a "hiding ZK" epic and having these (and other issues) as sub-tasks is a good idea (maybe it is). Alternatively, how about tracking these (and other issues) using some label?

Regards,
Ishan



On Wed, Apr 26, 2017 at 2:39 AM, Otis Gospodnetić <[hidden email]> wrote:
Hi,

This thread about Solr master-slave vs. SolrCloud deployment poll seems to point out people find SolrCloud (the ZK part of it) deployment complex:


It could be just how information is presented...
... or how ZK is exposed as something external, which it is...

Are there plans to "hide ZK"?  Or maybe have the notion of master-only (not as in master-slave, but as in running ZK only, not hosting data) mode for SolrCloud nodes (a la ES)?  

I peeked at JIRA, but couldn't find anything about that, although I seem to recall some mention of embedding ZK to make things easier for SolrCloud users.  I think I saw that at some Lucene Revolution talk?

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud "master mode" planned?

Jan Høydahl / Cominvent
There have been suggestions to add a “node controller” process which again could start Solr and perhaps ZK on a node.

But adding a new “zk” role which would let that node start (embedded) ZK I cannot recall. It would of course make a deploy simpler if ZK was hidden as a solr role/feature and perhaps assigned to N nodes, moved if needed etc. If I’m not mistaken ZK 3.5 would make such more dynamic setups easier but is currently in beta.

Also, in these days of containers, I kind of like the concept of spinning up N ZK containers that the Solr containers connect to and let Kubernetes or whatever you use take care of placement, versions etc. So perhaps the need for a production-ready solr-managed zk is not as big as it used to be, or maybe even undesirable? For production Windows installs I could still clearly see a need though.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. apr. 2017 kl. 23.30 skrev Ishan Chattopadhyaya <[hidden email]>:

Hi Otis,
I've been working on, and shall be working on, a few issues on the lines of "hide ZK".

SOLR-6736: Uploading configsets can now be done through Solr nodes instead of uploading them to ZK.
SOLR-10272: Use a _default configset, with the intention of not needing the user to bother about the concept of configsets unless he needs to
SOLR-10446 (SOLR-9057): User can use CloudSolrClient without access to ZK
SOLR-8440: Enabling BasicAuth security through bin/solr script
Ability to edit security.json through the bin/solr script

Having all this in place, and perhaps some more that I may be missing, should hopefully not need the user to know much about ZK.

1. Do you have suggestions on what more needs to be done for "hiding ZK"?
2. Do you have suggestions on how to track this overall theme of "hiding ZK"? Some of these issues I mentioned are associated with other epics, so I don't know if creating a "hiding ZK" epic and having these (and other issues) as sub-tasks is a good idea (maybe it is). Alternatively, how about tracking these (and other issues) using some label?

Regards,
Ishan



On Wed, Apr 26, 2017 at 2:39 AM, Otis Gospodnetić <[hidden email]> wrote:
Hi,

This thread about Solr master-slave vs. SolrCloud deployment poll seems to point out people find SolrCloud (the ZK part of it) deployment complex:


It could be just how information is presented...
... or how ZK is exposed as something external, which it is...

Are there plans to "hide ZK"?  Or maybe have the notion of master-only (not as in master-slave, but as in running ZK only, not hosting data) mode for SolrCloud nodes (a la ES)?  

I peeked at JIRA, but couldn't find anything about that, although I seem to recall some mention of embedding ZK to make things easier for SolrCloud users.  I think I saw that at some Lucene Revolution talk?

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud "master mode" planned?

Mike Drob-3
Could the zk role also be guaranteed to run the Overseer (and no collections)? If we already have that separated out, it would make sense to put it with the embedded zk. I think you can already configure and place things manually this way, but it would be a huge win to package it all up nicely for users and set it to turnkey operation.

I think it was a great improvement for deployment when we dropped tomcat, this is the next logical step.

Mike

On Wed, Apr 26, 2017, 4:22 AM Jan Høydahl <[hidden email]> wrote:
There have been suggestions to add a “node controller” process which again could start Solr and perhaps ZK on a node.

But adding a new “zk” role which would let that node start (embedded) ZK I cannot recall. It would of course make a deploy simpler if ZK was hidden as a solr role/feature and perhaps assigned to N nodes, moved if needed etc. If I’m not mistaken ZK 3.5 would make such more dynamic setups easier but is currently in beta.

Also, in these days of containers, I kind of like the concept of spinning up N ZK containers that the Solr containers connect to and let Kubernetes or whatever you use take care of placement, versions etc. So perhaps the need for a production-ready solr-managed zk is not as big as it used to be, or maybe even undesirable? For production Windows installs I could still clearly see a need though.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. apr. 2017 kl. 23.30 skrev Ishan Chattopadhyaya <[hidden email]>:

Hi Otis,
I've been working on, and shall be working on, a few issues on the lines of "hide ZK".

SOLR-6736: Uploading configsets can now be done through Solr nodes instead of uploading them to ZK.
SOLR-10272: Use a _default configset, with the intention of not needing the user to bother about the concept of configsets unless he needs to
SOLR-10446 (SOLR-9057): User can use CloudSolrClient without access to ZK
SOLR-8440: Enabling BasicAuth security through bin/solr script
Ability to edit security.json through the bin/solr script

Having all this in place, and perhaps some more that I may be missing, should hopefully not need the user to know much about ZK.

1. Do you have suggestions on what more needs to be done for "hiding ZK"?
2. Do you have suggestions on how to track this overall theme of "hiding ZK"? Some of these issues I mentioned are associated with other epics, so I don't know if creating a "hiding ZK" epic and having these (and other issues) as sub-tasks is a good idea (maybe it is). Alternatively, how about tracking these (and other issues) using some label?

Regards,
Ishan



On Wed, Apr 26, 2017 at 2:39 AM, Otis Gospodnetić <[hidden email]> wrote:
Hi,

This thread about Solr master-slave vs. SolrCloud deployment poll seems to point out people find SolrCloud (the ZK part of it) deployment complex:


It could be just how information is presented...
... or how ZK is exposed as something external, which it is...

Are there plans to "hide ZK"?  Or maybe have the notion of master-only (not as in master-slave, but as in running ZK only, not hosting data) mode for SolrCloud nodes (a la ES)?  

I peeked at JIRA, but couldn't find anything about that, although I seem to recall some mention of embedding ZK to make things easier for SolrCloud users.  I think I saw that at some Lucene Revolution talk?

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud "master mode" planned?

Malcolm Upayavira Holmes
I have done a *lot* of automating this. Redoing it recently it was quite embarrassing to realise how much complexity there is involved in it - it is crazy hard to get a basic, production ready SolrCloud setup running.

One thing that is hard is getting a ZooKeeper ensemble going - using Exhibitor makes it much easier.

Something that has often occurred to me is, why do we require people to go download a separate ZooKeeper, and work out how to install and configure it, when we have it embedded already? Why can't we just have a 'bin/solr zk start' command which starts an "embedded" zookeeper, but without Solr. To really make it neat, we offer some way (a la Exhibitor) for multiple concurrently started ZK nodes to autodiscover each other, then getting our three ZK nodes up won't be quite so treacherous.

Just a thought.

Upayavira

On Wed, 26 Apr 2017, at 03:58 PM, Mike Drob wrote:
Could the zk role also be guaranteed to run the Overseer (and no collections)? If we already have that separated out, it would make sense to put it with the embedded zk. I think you can already configure and place things manually this way, but it would be a huge win to package it all up nicely for users and set it to turnkey operation.

I think it was a great improvement for deployment when we dropped tomcat, this is the next logical step.

Mike

On Wed, Apr 26, 2017, 4:22 AM Jan Høydahl <[hidden email]> wrote:
There have been suggestions to add a “node controller” process which again could start Solr and perhaps ZK on a node.

But adding a new “zk” role which would let that node start (embedded) ZK I cannot recall. It would of course make a deploy simpler if ZK was hidden as a solr role/feature and perhaps assigned to N nodes, moved if needed etc. If I’m not mistaken ZK 3.5 would make such more dynamic setups easier but is currently in beta.

Also, in these days of containers, I kind of like the concept of spinning up N ZK containers that the Solr containers connect to and let Kubernetes or whatever you use take care of placement, versions etc. So perhaps the need for a production-ready solr-managed zk is not as big as it used to be, or maybe even undesirable? For production Windows installs I could still clearly see a need though.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. apr. 2017 kl. 23.30 skrev Ishan Chattopadhyaya <[hidden email]>:

Hi Otis,
I've been working on, and shall be working on, a few issues on the lines of "hide ZK".

SOLR-6736: Uploading configsets can now be done through Solr nodes instead of uploading them to ZK.
SOLR-10272: Use a _default configset, with the intention of not needing the user to bother about the concept of configsets unless he needs to
SOLR-10446 (SOLR-9057): User can use CloudSolrClient without access to ZK
SOLR-8440: Enabling BasicAuth security through bin/solr script
Ability to edit security.json through the bin/solr script
Having all this in place, and perhaps some more that I may be missing, should hopefully not need the user to know much about ZK.

1. Do you have suggestions on what more needs to be done for "hiding ZK"?
2. Do you have suggestions on how to track this overall theme of "hiding ZK"? Some of these issues I mentioned are associated with other epics, so I don't know if creating a "hiding ZK" epic and having these (and other issues) as sub-tasks is a good idea (maybe it is). Alternatively, how about tracking these (and other issues) using some label?
Regards,
Ishan



On Wed, Apr 26, 2017 at 2:39 AM, Otis Gospodnetić <[hidden email]> wrote:
Hi,

This thread about Solr master-slave vs. SolrCloud deployment poll seems to point out people find SolrCloud (the ZK part of it) deployment complex:


It could be just how information is presented...
... or how ZK is exposed as something external, which it is...

Are there plans to "hide ZK"?  Or maybe have the notion of master-only (not as in master-slave, but as in running ZK only, not hosting data) mode for SolrCloud nodes (a la ES)?  

I peeked at JIRA, but couldn't find anything about that, although I seem to recall some mention of embedding ZK to make things easier for SolrCloud users.  I think I saw that at some Lucene Revolution talk?

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud "master mode" planned?

david.w.smiley@gmail.com

On Apr 26, 2017, at 4:35 PM, Upayavira <[hidden email]> wrote:

I have done a *lot* of automating this. Redoing it recently it was quite embarrassing to realise how much complexity there is involved in it - it is crazy hard to get a basic, production ready SolrCloud setup running.

Would you mind enumerating a list of what sort of issues you ran into deploying ZooKeeper in a production config?  A quick draft list of sorts just to get a sense of what sort of stuff generally you had to contend with.  I recently did it in a Docker/Kontena infrastructure.  I did not find it to be hard; maybe medium :-).  I got the nodes working out of the box with minimal effort but had to make changes to harden it.
* I found the existing official Docker image for Zookeeper lacking in that I couldn't easily specify the "auto purge" settings, which default to no purging which is unacceptable.
* I set "-XX:+CrashOnOutOfMemoryError" so that the process would end when an OOM occurs so that Kontena (Docker orchestrator) would notice its down so it could restart it (a rare event obviously).  Users not using a container environment might not care about this I guess.  This was merely a configuration setting; no Docker image hack needed.  
* I also ensured I used the latest ZK 3.4.6 release.... I recall 3.4.4 (or maybe even 3.4.5?) cached DNS entries without re-looking up if it failed which is particularly problematic in a container environment where it's common for services to get a new IP when they are restarted.  Thankfully I did not learn that issue the hard way; I recall a blog warning of this issue by Shalin or Martijn Koster.  No action from me here other than ensuring I used an appropriate new version.  Originally out of laziness I used Confluent's Docker image but I knew I would have to switch because of this issue.

One thing that is hard is getting a ZooKeeper ensemble going - using Exhibitor makes it much easier.

Something that has often occurred to me is, why do we require people to go download a separate ZooKeeper, and work out how to install and configure it, when we have it embedded already? Why can't we just have a 'bin/solr zk start' command which starts an "embedded" zookeeper, but without Solr. To really make it neat, we offer some way (a la Exhibitor) for multiple concurrently started ZK nodes to autodiscover each other, then getting our three ZK nodes up won't be quite so treacherous.

I've often thought the same -- why not just embed it.  People say it's not a "production config" but this is only because we all keep telling us this is in an echo chamber and we believe ourselves :-P

~ David


On Wed, 26 Apr 2017, at 03:58 PM, Mike Drob wrote:
Could the zk role also be guaranteed to run the Overseer (and no collections)? If we already have that separated out, it would make sense to put it with the embedded zk. I think you can already configure and place things manually this way, but it would be a huge win to package it all up nicely for users and set it to turnkey operation.

I think it was a great improvement for deployment when we dropped tomcat, this is the next logical step.

Mike

On Wed, Apr 26, 2017, 4:22 AM Jan Høydahl <[hidden email]> wrote:
There have been suggestions to add a “node controller” process which again could start Solr and perhaps ZK on a node.

But adding a new “zk” role which would let that node start (embedded) ZK I cannot recall. It would of course make a deploy simpler if ZK was hidden as a solr role/feature and perhaps assigned to N nodes, moved if needed etc. If I’m not mistaken ZK 3.5 would make such more dynamic setups easier but is currently in beta.

Also, in these days of containers, I kind of like the concept of spinning up N ZK containers that the Solr containers connect to and let Kubernetes or whatever you use take care of placement, versions etc. So perhaps the need for a production-ready solr-managed zk is not as big as it used to be, or maybe even undesirable? For production Windows installs I could still clearly see a need though.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. apr. 2017 kl. 23.30 skrev Ishan Chattopadhyaya <[hidden email]>:

Hi Otis,
I've been working on, and shall be working on, a few issues on the lines of "hide ZK".

SOLR-6736: Uploading configsets can now be done through Solr nodes instead of uploading them to ZK.
SOLR-10272: Use a _default configset, with the intention of not needing the user to bother about the concept of configsets unless he needs to
SOLR-10446 (SOLR-9057): User can use CloudSolrClient without access to ZK
SOLR-8440: Enabling BasicAuth security through bin/solr script
Ability to edit security.json through the bin/solr script
Having all this in place, and perhaps some more that I may be missing, should hopefully not need the user to know much about ZK.

1. Do you have suggestions on what more needs to be done for "hiding ZK"?
2. Do you have suggestions on how to track this overall theme of "hiding ZK"? Some of these issues I mentioned are associated with other epics, so I don't know if creating a "hiding ZK" epic and having these (and other issues) as sub-tasks is a good idea (maybe it is). Alternatively, how about tracking these (and other issues) using some label?
Regards,
Ishan



On Wed, Apr 26, 2017 at 2:39 AM, Otis Gospodnetić <[hidden email]> wrote:
Hi,

This thread about Solr master-slave vs. SolrCloud deployment poll seems to point out people find SolrCloud (the ZK part of it) deployment complex:


It could be just how information is presented...
... or how ZK is exposed as something external, which it is...

Are there plans to "hide ZK"?  Or maybe have the notion of master-only (not as in master-slave, but as in running ZK only, not hosting data) mode for SolrCloud nodes (a la ES)?  

I peeked at JIRA, but couldn't find anything about that, although I seem to recall some mention of embedding ZK to make things easier for SolrCloud users.  I think I saw that at some Lucene Revolution talk?

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud "master mode" planned?

Malcolm Upayavira Holmes



On Wed, 26 Apr 2017, at 10:06 PM, David Smiley wrote:

On Apr 26, 2017, at 4:35 PM, Upayavira <[hidden email]> wrote:

I have done a *lot* of automating this. Redoing it recently it was quite embarrassing to realise how much complexity there is involved in it - it is crazy hard to get a basic, production ready SolrCloud setup running.

Would you mind enumerating a list of what sort of issues you ran into deploying ZooKeeper in a production config?  A quick draft list of sorts just to get a sense of what sort of stuff generally you had to contend with.  I recently did it in a Docker/Kontena infrastructure.  I did not find it to be hard; maybe medium :-).  I got the nodes working out of the box with minimal effort but had to make changes to harden it.
* I found the existing official Docker image for Zookeeper lacking in that I couldn't easily specify the "auto purge" settings, which default to no purging which is unacceptable.
* I set "-XX:+CrashOnOutOfMemoryError" so that the process would end when an OOM occurs so that Kontena (Docker orchestrator) would notice its down so it could restart it (a rare event obviously).  Users not using a container environment might not care about this I guess.  This was merely a configuration setting; no Docker image hack needed.  
* I also ensured I used the latest ZK 3.4.6 release.... I recall 3.4.4 (or maybe even 3.4.5?) cached DNS entries without re-looking up if it failed which is particularly problematic in a container environment where it's common for services to get a new IP when they are restarted.  Thankfully I did not learn that issue the hard way; I recall a blog warning of this issue by Shalin or Martijn Koster.  No action from me here other than ensuring I used an appropriate new version.  Originally out of laziness I used Confluent's Docker image but I knew I would have to switch because of this issue.

I used it for a test case for an app I built when learning more about deployments. It is all on github at http://github.com/odoko-devops/uberstack. There's an example there in examples/apache-solr. I gave up on that effort because (a) I was making it more complex than it needed to be and (b) I couldn't compete with the big guns in the devops industry.

What I needed was to have three ZooKeeper nodes start up and autodiscover each other. That, I handled with Exhibitor (from Netflix). They provide a (not-production ready!!) Docker image that, whilst it takes a few minutes, does result in ZK nodes that are working as an ensemble, when none of them know about each other to start. It requires an S3 bucket to provide the co-ordination.

The real lesson was "don't fail, retry". I built a wrapper around Solr so that if ZK wasn't available (and in the correct ensemble) Solr wouldn't start yet. It just kept retrying. Similarly, In https://github.com/odoko-devops/solr-utils there's a tool that, when containerised, will allow you to create a chroot (must be done after ZK but before Solr), to upload configs (must be done after Solr, but before collection creation), create a collection (only once configs present), etc.

It made use of Rancher to provide an overlay network with DNS used for service discovery - the ZK nodes were accessible to Solr via the hostname "zookeeper", so Solr didn't have to do anything fancy in order to find them.

The outcome of this was a reliable one-click install of three ZK nodes, three Solr cloud nodes, and indexed content and a webapp showing the content. Was pretty cool.

Or should I say, the outcome was cool, the process to get there was painful.

Happy to share more of this if it is useful.

Upayavira

One thing that is hard is getting a ZooKeeper ensemble going - using Exhibitor makes it much easier.

Something that has often occurred to me is, why do we require people to go download a separate ZooKeeper, and work out how to install and configure it, when we have it embedded already? Why can't we just have a 'bin/solr zk start' command which starts an "embedded" zookeeper, but without Solr. To really make it neat, we offer some way (a la Exhibitor) for multiple concurrently started ZK nodes to autodiscover each other, then getting our three ZK nodes up won't be quite so treacherous.

I've often thought the same -- why not just embed it.  People say it's not a "production config" but this is only because we all keep telling us this is in an echo chamber and we believe ourselves :-P

~ David


On Wed, 26 Apr 2017, at 03:58 PM, Mike Drob wrote:
Could the zk role also be guaranteed to run the Overseer (and no collections)? If we already have that separated out, it would make sense to put it with the embedded zk. I think you can already configure and place things manually this way, but it would be a huge win to package it all up nicely for users and set it to turnkey operation.

I think it was a great improvement for deployment when we dropped tomcat, this is the next logical step.

Mike

On Wed, Apr 26, 2017, 4:22 AM Jan Høydahl <[hidden email]> wrote:
There have been suggestions to add a “node controller” process which again could start Solr and perhaps ZK on a node.

But adding a new “zk” role which would let that node start (embedded) ZK I cannot recall. It would of course make a deploy simpler if ZK was hidden as a solr role/feature and perhaps assigned to N nodes, moved if needed etc. If I’m not mistaken ZK 3.5 would make such more dynamic setups easier but is currently in beta.

Also, in these days of containers, I kind of like the concept of spinning up N ZK containers that the Solr containers connect to and let Kubernetes or whatever you use take care of placement, versions etc. So perhaps the need for a production-ready solr-managed zk is not as big as it used to be, or maybe even undesirable? For production Windows installs I could still clearly see a need though.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. apr. 2017 kl. 23.30 skrev Ishan Chattopadhyaya <[hidden email]>:

Hi Otis,
I've been working on, and shall be working on, a few issues on the lines of "hide ZK".

SOLR-6736: Uploading configsets can now be done through Solr nodes instead of uploading them to ZK.
SOLR-10272: Use a _default configset, with the intention of not needing the user to bother about the concept of configsets unless he needs to
SOLR-10446 (SOLR-9057): User can use CloudSolrClient without access to ZK
SOLR-8440: Enabling BasicAuth security through bin/solr script
Ability to edit security.json through the bin/solr script
Having all this in place, and perhaps some more that I may be missing, should hopefully not need the user to know much about ZK.

1. Do you have suggestions on what more needs to be done for "hiding ZK"?
2. Do you have suggestions on how to track this overall theme of "hiding ZK"? Some of these issues I mentioned are associated with other epics, so I don't know if creating a "hiding ZK" epic and having these (and other issues) as sub-tasks is a good idea (maybe it is). Alternatively, how about tracking these (and other issues) using some label?
Regards,
Ishan



On Wed, Apr 26, 2017 at 2:39 AM, Otis Gospodnetić <[hidden email]> wrote:
Hi,

This thread about Solr master-slave vs. SolrCloud deployment poll seems to point out people find SolrCloud (the ZK part of it) deployment complex:


It could be just how information is presented...
... or how ZK is exposed as something external, which it is...

Are there plans to "hide ZK"?  Or maybe have the notion of master-only (not as in master-slave, but as in running ZK only, not hosting data) mode for SolrCloud nodes (a la ES)?  

I peeked at JIRA, but couldn't find anything about that, although I seem to recall some mention of embedding ZK to make things easier for SolrCloud users.  I think I saw that at some Lucene Revolution talk?

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud "master mode" planned?

Otis Gospodnetić
In reply to this post by Malcolm Upayavira Holmes
Right, that "bin/solr zk start" is sort of how one could present that to users.  I took the liberty of creating https://issues.apache.org/jira/browse/SOLR-10573 after not being able to find any such issues (yet hearing about such ideas at Lucene Revolution).

Ishan & Co, you may want to link other related issues or use e.g. "hideZK" label and treat SOLR-10573 just as an umbrella?

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Wed, Apr 26, 2017 at 4:35 PM, Upayavira <[hidden email]> wrote:
I have done a *lot* of automating this. Redoing it recently it was quite embarrassing to realise how much complexity there is involved in it - it is crazy hard to get a basic, production ready SolrCloud setup running.

One thing that is hard is getting a ZooKeeper ensemble going - using Exhibitor makes it much easier.

Something that has often occurred to me is, why do we require people to go download a separate ZooKeeper, and work out how to install and configure it, when we have it embedded already? Why can't we just have a 'bin/solr zk start' command which starts an "embedded" zookeeper, but without Solr. To really make it neat, we offer some way (a la Exhibitor) for multiple concurrently started ZK nodes to autodiscover each other, then getting our three ZK nodes up won't be quite so treacherous.

Just a thought.

Upayavira

On Wed, 26 Apr 2017, at 03:58 PM, Mike Drob wrote:
Could the zk role also be guaranteed to run the Overseer (and no collections)? If we already have that separated out, it would make sense to put it with the embedded zk. I think you can already configure and place things manually this way, but it would be a huge win to package it all up nicely for users and set it to turnkey operation.

I think it was a great improvement for deployment when we dropped tomcat, this is the next logical step.

Mike

On Wed, Apr 26, 2017, 4:22 AM Jan Høydahl <[hidden email]> wrote:
There have been suggestions to add a “node controller” process which again could start Solr and perhaps ZK on a node.

But adding a new “zk” role which would let that node start (embedded) ZK I cannot recall. It would of course make a deploy simpler if ZK was hidden as a solr role/feature and perhaps assigned to N nodes, moved if needed etc. If I’m not mistaken ZK 3.5 would make such more dynamic setups easier but is currently in beta.

Also, in these days of containers, I kind of like the concept of spinning up N ZK containers that the Solr containers connect to and let Kubernetes or whatever you use take care of placement, versions etc. So perhaps the need for a production-ready solr-managed zk is not as big as it used to be, or maybe even undesirable? For production Windows installs I could still clearly see a need though.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. apr. 2017 kl. 23.30 skrev Ishan Chattopadhyaya <[hidden email]>:

Hi Otis,
I've been working on, and shall be working on, a few issues on the lines of "hide ZK".

SOLR-6736: Uploading configsets can now be done through Solr nodes instead of uploading them to ZK.
SOLR-10272: Use a _default configset, with the intention of not needing the user to bother about the concept of configsets unless he needs to
SOLR-10446 (SOLR-9057): User can use CloudSolrClient without access to ZK
SOLR-8440: Enabling BasicAuth security through bin/solr script
Ability to edit security.json through the bin/solr script
Having all this in place, and perhaps some more that I may be missing, should hopefully not need the user to know much about ZK.

1. Do you have suggestions on what more needs to be done for "hiding ZK"?
2. Do you have suggestions on how to track this overall theme of "hiding ZK"? Some of these issues I mentioned are associated with other epics, so I don't know if creating a "hiding ZK" epic and having these (and other issues) as sub-tasks is a good idea (maybe it is). Alternatively, how about tracking these (and other issues) using some label?
Regards,
Ishan



On Wed, Apr 26, 2017 at 2:39 AM, Otis Gospodnetić <[hidden email]> wrote:
Hi,

This thread about Solr master-slave vs. SolrCloud deployment poll seems to point out people find SolrCloud (the ZK part of it) deployment complex:


It could be just how information is presented...
... or how ZK is exposed as something external, which it is...

Are there plans to "hide ZK"?  Or maybe have the notion of master-only (not as in master-slave, but as in running ZK only, not hosting data) mode for SolrCloud nodes (a la ES)?  

I peeked at JIRA, but couldn't find anything about that, although I seem to recall some mention of embedding ZK to make things easier for SolrCloud users.  I think I saw that at some Lucene Revolution talk?

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud "master mode" planned?

Walter Underwood
Not fired up about approaches that have “some nodes more equal than others”, whether it is zk-only or replica-only. That is the opposite of autoscaling.

Getting our Solr Cloud cluster running in test and in prod was surprisingly difficult and unfortunately mysterious. I started with Solr 1.1, so I’m not exactly a noob.

This cluster needs to handle very long text queries on a large collection (17 million docs and growing). After too much work, I’m really happy with the performance. This is 4 shards, 4 replicas, with AWS c4.8xlarge nodes. Yes, that is nearly $200,000 per year, just for the instances.

Here is what I wanted, and what I ended up with after way too much work.

* Collection configs and startup parameters under version control.
* Separation between installing software and installing configs.
* Automated config deploy from Jenkins.
* Data and logs on /solr EBS volume.
* HTTP on port 6090 to match our port-to-app mapping (just gave up on this one, geez).
* Five node Zookeeper ensemble to avoid single point of failure during zk version or instance upgrade (wasted two weeks, gave up).
* Graceful scale out and scale in (not even nearly done).
* Metrics reported to Graphite, the performance bug in 6.4 cost a few weeks.

Separating executables, config, and data should not be this hard, right? I thought we solved that in the 1990’s.

I’ve never had a problem with getting Zookeeper running, but getting the five node ensemble to work was impossible. I wrote my first concurrent code 35 years ago, so I should be able to do this. Just could not get 3.4.6 to actually work on five nodes, no matter how many weird things we tried, like switching AWS instance types. Used an existing 3 node ensemble that we had wanted to decommission.

The magic solr script commands do not document what happens when you change the port or server directory. Surprisingly, many of them only work after you have a local running Solr instance. Also not documented. So port must be passed in to many of the commands. I guess the server directory needs to be passed in, too, but I never figured that out.

The required contents of the Zookeeper “filesystem” are undocumented, as far as I can tell. In fact, I found it really hard to figure out exactly what directory to give to either the solr script or the zkCli script. Earlier versions of solr had $SOLR_HOME/$collection/conf/…, but where is the directory arg in that hierarchy? Especially because … solr.xml

Still don’t have a method that I trust to create a new cluster, but updates to our existing clusters are pretty solid. It just seems bogus to have a whole filesystem-based deployment, use that to bootstrap, then never use it again. I have zero trust that it will work the next time. 

I wrote a Python program that takes the URL of any Solr node (I used the load balancer), the collection(s) to update, and the base directory for configs (use the collection name as a subdirectory). It does this:

1. Gets info from Solr (requests package, yay!) and parses out the zk host, including the chroot. Some ugly parsing there, that should be straight JSON.
2. Connects to zk (kazoo package, yay!). Uploads solr.xml from the base of the configs directory.
3. Optionally, removes all the files from the zk config directory we are about to upload to.
4. Uploads all the files, recursively, from $configs/$collection on the filesystem to zk.
5. Optionally, links the config to the same name as a collection.
6. Sends RELOAD for that collection to the Solr node as async command.
7. Waits.
8. Waits. Finally completes.
9. Parses the response for succeeding and failed nodes. If there are failed nodes, exit with a failure result code. NOTE: this could leave the cluster in an unfortunate state.
10. Repeat for the next collection on the command line.

A typical invocation looks like this:

   python solrtool/solrtool.py -h solr-cloud.test.cheggnet.com --clear -c questions -d configs

Installing a new version of Solr on a 16 node cluster is taking about an hour and a half. We follow the hints in the docs (upgrade the overseer last), and have turned that into a process. That includes some exciting Unix pipelines with “jq” to distinguish between live nodes and nodes in the cluster which are effed. We had one node which never could recover its replicas. We finally shot it and made a fresh one. About a third of the steps are automated. Unfortunately, “solr healthcheck -c colname” cannot be run remotely, so we need a sudo to each system and a manual review of the results.

If you want to see the process, ping me. Nothing secret, but too big for this email.

Getting to this point was way too hard. And this is not even close to good enough. It needs to upload the new config to a timestamped temp directory, then move/copy it after it is successful. If the reload fails, it should to roll back to the previous config and try another reload.

Our configs are under source control. We will never use managed configs.

Other things we needed to do to get here.

* solr.in.sh is under source control.
* etc/jetty.xml is under source control so we can put the logs in /solr.
* Each solrconfig.xml has /solr/… hardcoded as the data directory.
* solr.in.sh has some dancing around to pass in the right properties for the metrics destination and New Relic. This avoids maintaining separate configs for dev, test, and prod. Solr could be better about properties vs host settings vs JVM settings. Putting all of them in solr.in.sh is sloppy.

Here is what we append to solr.in.sh. Line wrapping might make this look odd for you.

# Generate a Graphite prefix from the hostname and environment portion of the domain
environment=`hostname | cut -d . -f 2`
base_hostname=`hostname -s`
graphite_prefix="${environment}.${base_hostname}"
# Default values are the settings for dev3.
graphite_host="metrics.test3.cheggnet.com"
ZK_HOST="zookeeper-eb342a2d.dev3.cloud.cheggnet.com:2181/solr-cloud"
if [[ "$environment" = 'test3' ]]
then
    # dev3 and test3 share the same Graphite metrics host
    ZK_HOST='zookeeper1.test3.cloud.cheggnet.com:2181,zookeeper2.test3.cloud.cheggnet.com:2181,zookeeper3.test3.cloud.cheggnet.com:2181/solr-cloud'
fi
if [[ "$environment" = 'prod2' ]]
then
    graphite_host='metrics-eng.prod.cheggnet.com'
    ZK_HOST='thor-zk01.prod2.cloud.cheggnet.com:2181,thor-zk02.prod2.cloud.cheggnet.com:2181,thor-zk03.prod2.cloud.cheggnet.com:2181/solr-cloud'
fi
SOLR_OPTS="$SOLR_OPTS -Dgraphite_prefix=${graphite_prefix}"
SOLR_OPTS="$SOLR_OPTS -Dgraphite_host=${graphite_host}"
SOLR_OPTS="$SOLR_OPTS -javaagent:/apps/solr6/newrelic/newrelic.jar"
SOLR_OPTS="$SOLR_OPTS -Dnewrelic.environment=${environment}"
ENABLE_REMOTE_JMX_OPTS="true"
SOLR_LOGS_DIR=/solr/logs

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)


On Apr 26, 2017, at 3:25 PM, Otis Gospodnetić <[hidden email]> wrote:

Right, that "bin/solr zk start" is sort of how one could present that to users.  I took the liberty of creating https://issues.apache.org/jira/browse/SOLR-10573 after not being able to find any such issues (yet hearing about such ideas at Lucene Revolution).

Ishan & Co, you may want to link other related issues or use e.g. "hideZK" label and treat SOLR-10573 just as an umbrella?

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Wed, Apr 26, 2017 at 4:35 PM, Upayavira <[hidden email]> wrote:
I have done a *lot* of automating this. Redoing it recently it was quite embarrassing to realise how much complexity there is involved in it - it is crazy hard to get a basic, production ready SolrCloud setup running.

One thing that is hard is getting a ZooKeeper ensemble going - using Exhibitor makes it much easier.

Something that has often occurred to me is, why do we require people to go download a separate ZooKeeper, and work out how to install and configure it, when we have it embedded already? Why can't we just have a 'bin/solr zk start' command which starts an "embedded" zookeeper, but without Solr. To really make it neat, we offer some way (a la Exhibitor) for multiple concurrently started ZK nodes to autodiscover each other, then getting our three ZK nodes up won't be quite so treacherous.

Just a thought.

Upayavira

On Wed, 26 Apr 2017, at 03:58 PM, Mike Drob wrote:
Could the zk role also be guaranteed to run the Overseer (and no collections)? If we already have that separated out, it would make sense to put it with the embedded zk. I think you can already configure and place things manually this way, but it would be a huge win to package it all up nicely for users and set it to turnkey operation.

I think it was a great improvement for deployment when we dropped tomcat, this is the next logical step.

Mike

On Wed, Apr 26, 2017, 4:22 AM Jan Høydahl <[hidden email]> wrote:
There have been suggestions to add a “node controller” process which again could start Solr and perhaps ZK on a node.

But adding a new “zk” role which would let that node start (embedded) ZK I cannot recall. It would of course make a deploy simpler if ZK was hidden as a solr role/feature and perhaps assigned to N nodes, moved if needed etc. If I’m not mistaken ZK 3.5 would make such more dynamic setups easier but is currently in beta.

Also, in these days of containers, I kind of like the concept of spinning up N ZK containers that the Solr containers connect to and let Kubernetes or whatever you use take care of placement, versions etc. So perhaps the need for a production-ready solr-managed zk is not as big as it used to be, or maybe even undesirable? For production Windows installs I could still clearly see a need though.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. apr. 2017 kl. 23.30 skrev Ishan Chattopadhyaya <[hidden email]>:

Hi Otis,
I've been working on, and shall be working on, a few issues on the lines of "hide ZK".

SOLR-6736: Uploading configsets can now be done through Solr nodes instead of uploading them to ZK.
SOLR-10272: Use a _default configset, with the intention of not needing the user to bother about the concept of configsets unless he needs to
SOLR-10446 (SOLR-9057): User can use CloudSolrClient without access to ZK
SOLR-8440: Enabling BasicAuth security through bin/solr script
Ability to edit security.json through the bin/solr script
Having all this in place, and perhaps some more that I may be missing, should hopefully not need the user to know much about ZK.

1. Do you have suggestions on what more needs to be done for "hiding ZK"?
2. Do you have suggestions on how to track this overall theme of "hiding ZK"? Some of these issues I mentioned are associated with other epics, so I don't know if creating a "hiding ZK" epic and having these (and other issues) as sub-tasks is a good idea (maybe it is). Alternatively, how about tracking these (and other issues) using some label?
Regards,
Ishan



On Wed, Apr 26, 2017 at 2:39 AM, Otis Gospodnetić <[hidden email]> wrote:
Hi,

This thread about Solr master-slave vs. SolrCloud deployment poll seems to point out people find SolrCloud (the ZK part of it) deployment complex:


It could be just how information is presented...
... or how ZK is exposed as something external, which it is...

Are there plans to "hide ZK"?  Or maybe have the notion of master-only (not as in master-slave, but as in running ZK only, not hosting data) mode for SolrCloud nodes (a la ES)?  

I peeked at JIRA, but couldn't find anything about that, although I seem to recall some mention of embedding ZK to make things easier for SolrCloud users.  I think I saw that at some Lucene Revolution talk?

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/




Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud "master mode" planned?

Greg Pendlebury
Would it be possible to have an optional node that can dynamically assume a leadership position?

There has been some small amount of discussion here about whether we could have a 'search node' join a cluster and give it an empty/null (or zero length) hash range on the clusterstate file and it would host no lucene segments, which helps it avoid any GC issues related to commits, or NIC saturation related to replication. The node could  possibly be tuned and configured purely for search and a small number of these nodes could be put behind a traditional load balancer and remove the need for search clients to understand zookeeper. That last part was attractive to us simply for search clients like JMeter, but it is nice for other reasons (like firewalling the ZK nodes and Solr).

Our theory (which never evolved beyond idle chat) was that this would almost be possible as it currently stands, but those sorts of nodes might be an attractive place to host any 'leadership' or features which optionally buy in to 'some nodes more equal than others'.

Ta,
Greg


On 27 April 2017 at 11:27, Walter Underwood <[hidden email]> wrote:
Not fired up about approaches that have “some nodes more equal than others”, whether it is zk-only or replica-only. That is the opposite of autoscaling.

Getting our Solr Cloud cluster running in test and in prod was surprisingly difficult and unfortunately mysterious. I started with Solr 1.1, so I’m not exactly a noob.

This cluster needs to handle very long text queries on a large collection (17 million docs and growing). After too much work, I’m really happy with the performance. This is 4 shards, 4 replicas, with AWS c4.8xlarge nodes. Yes, that is nearly $200,000 per year, just for the instances.

Here is what I wanted, and what I ended up with after way too much work.

* Collection configs and startup parameters under version control.
* Separation between installing software and installing configs.
* Automated config deploy from Jenkins.
* Data and logs on /solr EBS volume.
* HTTP on port 6090 to match our port-to-app mapping (just gave up on this one, geez).
* Five node Zookeeper ensemble to avoid single point of failure during zk version or instance upgrade (wasted two weeks, gave up).
* Graceful scale out and scale in (not even nearly done).
* Metrics reported to Graphite, the performance bug in 6.4 cost a few weeks.

Separating executables, config, and data should not be this hard, right? I thought we solved that in the 1990’s.

I’ve never had a problem with getting Zookeeper running, but getting the five node ensemble to work was impossible. I wrote my first concurrent code 35 years ago, so I should be able to do this. Just could not get 3.4.6 to actually work on five nodes, no matter how many weird things we tried, like switching AWS instance types. Used an existing 3 node ensemble that we had wanted to decommission.

The magic solr script commands do not document what happens when you change the port or server directory. Surprisingly, many of them only work after you have a local running Solr instance. Also not documented. So port must be passed in to many of the commands. I guess the server directory needs to be passed in, too, but I never figured that out.

The required contents of the Zookeeper “filesystem” are undocumented, as far as I can tell. In fact, I found it really hard to figure out exactly what directory to give to either the solr script or the zkCli script. Earlier versions of solr had $SOLR_HOME/$collection/conf/…, but where is the directory arg in that hierarchy? Especially because … solr.xml

Still don’t have a method that I trust to create a new cluster, but updates to our existing clusters are pretty solid. It just seems bogus to have a whole filesystem-based deployment, use that to bootstrap, then never use it again. I have zero trust that it will work the next time. 

I wrote a Python program that takes the URL of any Solr node (I used the load balancer), the collection(s) to update, and the base directory for configs (use the collection name as a subdirectory). It does this:

1. Gets info from Solr (requests package, yay!) and parses out the zk host, including the chroot. Some ugly parsing there, that should be straight JSON.
2. Connects to zk (kazoo package, yay!). Uploads solr.xml from the base of the configs directory.
3. Optionally, removes all the files from the zk config directory we are about to upload to.
4. Uploads all the files, recursively, from $configs/$collection on the filesystem to zk.
5. Optionally, links the config to the same name as a collection.
6. Sends RELOAD for that collection to the Solr node as async command.
7. Waits.
8. Waits. Finally completes.
9. Parses the response for succeeding and failed nodes. If there are failed nodes, exit with a failure result code. NOTE: this could leave the cluster in an unfortunate state.
10. Repeat for the next collection on the command line.

A typical invocation looks like this:

   python solrtool/solrtool.py -h solr-cloud.test.cheggnet.com --clear -c questions -d configs

Installing a new version of Solr on a 16 node cluster is taking about an hour and a half. We follow the hints in the docs (upgrade the overseer last), and have turned that into a process. That includes some exciting Unix pipelines with “jq” to distinguish between live nodes and nodes in the cluster which are effed. We had one node which never could recover its replicas. We finally shot it and made a fresh one. About a third of the steps are automated. Unfortunately, “solr healthcheck -c colname” cannot be run remotely, so we need a sudo to each system and a manual review of the results.

If you want to see the process, ping me. Nothing secret, but too big for this email.

Getting to this point was way too hard. And this is not even close to good enough. It needs to upload the new config to a timestamped temp directory, then move/copy it after it is successful. If the reload fails, it should to roll back to the previous config and try another reload.

Our configs are under source control. We will never use managed configs.

Other things we needed to do to get here.

* solr.in.sh is under source control.
* etc/jetty.xml is under source control so we can put the logs in /solr.
* Each solrconfig.xml has /solr/… hardcoded as the data directory.
* solr.in.sh has some dancing around to pass in the right properties for the metrics destination and New Relic. This avoids maintaining separate configs for dev, test, and prod. Solr could be better about properties vs host settings vs JVM settings. Putting all of them in solr.in.sh is sloppy.

Here is what we append to solr.in.sh. Line wrapping might make this look odd for you.

# Generate a Graphite prefix from the hostname and environment portion of the domain
environment=`hostname | cut -d . -f 2`
base_hostname=`hostname -s`
graphite_prefix="${environment}.${base_hostname}"
# Default values are the settings for dev3.
graphite_host="metrics.test3.cheggnet.com"
ZK_HOST="zookeeper-eb342a2d.dev3.cloud.cheggnet.com:2181/solr-cloud"
if [[ "$environment" = 'test3' ]]
then
    # dev3 and test3 share the same Graphite metrics host
    ZK_HOST='zookeeper1.test3.cloud.cheggnet.com:2181,zookeeper2.test3.cloud.cheggnet.com:2181,zookeeper3.test3.cloud.cheggnet.com:2181/solr-cloud'
fi
if [[ "$environment" = 'prod2' ]]
then
    graphite_host='metrics-eng.prod.cheggnet.com'
    ZK_HOST='thor-zk01.prod2.cloud.cheggnet.com:2181,thor-zk02.prod2.cloud.cheggnet.com:2181,thor-zk03.prod2.cloud.cheggnet.com:2181/solr-cloud'
fi
SOLR_OPTS="$SOLR_OPTS -Dgraphite_prefix=${graphite_prefix}"
SOLR_OPTS="$SOLR_OPTS -Dgraphite_host=${graphite_host}"
SOLR_OPTS="$SOLR_OPTS -javaagent:/apps/solr6/newrelic/newrelic.jar"
SOLR_OPTS="$SOLR_OPTS -Dnewrelic.environment=${environment}"
ENABLE_REMOTE_JMX_OPTS="true"
SOLR_LOGS_DIR=/solr/logs

wunder

On Apr 26, 2017, at 3:25 PM, Otis Gospodnetić <[hidden email]> wrote:

Right, that "bin/solr zk start" is sort of how one could present that to users.  I took the liberty of creating https://issues.apache.org/jira/browse/SOLR-10573 after not being able to find any such issues (yet hearing about such ideas at Lucene Revolution).

Ishan & Co, you may want to link other related issues or use e.g. "hideZK" label and treat SOLR-10573 just as an umbrella?

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Wed, Apr 26, 2017 at 4:35 PM, Upayavira <[hidden email]> wrote:
I have done a *lot* of automating this. Redoing it recently it was quite embarrassing to realise how much complexity there is involved in it - it is crazy hard to get a basic, production ready SolrCloud setup running.

One thing that is hard is getting a ZooKeeper ensemble going - using Exhibitor makes it much easier.

Something that has often occurred to me is, why do we require people to go download a separate ZooKeeper, and work out how to install and configure it, when we have it embedded already? Why can't we just have a 'bin/solr zk start' command which starts an "embedded" zookeeper, but without Solr. To really make it neat, we offer some way (a la Exhibitor) for multiple concurrently started ZK nodes to autodiscover each other, then getting our three ZK nodes up won't be quite so treacherous.

Just a thought.

Upayavira

On Wed, 26 Apr 2017, at 03:58 PM, Mike Drob wrote:
Could the zk role also be guaranteed to run the Overseer (and no collections)? If we already have that separated out, it would make sense to put it with the embedded zk. I think you can already configure and place things manually this way, but it would be a huge win to package it all up nicely for users and set it to turnkey operation.

I think it was a great improvement for deployment when we dropped tomcat, this is the next logical step.

Mike

On Wed, Apr 26, 2017, 4:22 AM Jan Høydahl <[hidden email]> wrote:
There have been suggestions to add a “node controller” process which again could start Solr and perhaps ZK on a node.

But adding a new “zk” role which would let that node start (embedded) ZK I cannot recall. It would of course make a deploy simpler if ZK was hidden as a solr role/feature and perhaps assigned to N nodes, moved if needed etc. If I’m not mistaken ZK 3.5 would make such more dynamic setups easier but is currently in beta.

Also, in these days of containers, I kind of like the concept of spinning up N ZK containers that the Solr containers connect to and let Kubernetes or whatever you use take care of placement, versions etc. So perhaps the need for a production-ready solr-managed zk is not as big as it used to be, or maybe even undesirable? For production Windows installs I could still clearly see a need though.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. apr. 2017 kl. 23.30 skrev Ishan Chattopadhyaya <[hidden email]>:

Hi Otis,
I've been working on, and shall be working on, a few issues on the lines of "hide ZK".

SOLR-6736: Uploading configsets can now be done through Solr nodes instead of uploading them to ZK.
SOLR-10272: Use a _default configset, with the intention of not needing the user to bother about the concept of configsets unless he needs to
SOLR-10446 (SOLR-9057): User can use CloudSolrClient without access to ZK
SOLR-8440: Enabling BasicAuth security through bin/solr script
Ability to edit security.json through the bin/solr script
Having all this in place, and perhaps some more that I may be missing, should hopefully not need the user to know much about ZK.

1. Do you have suggestions on what more needs to be done for "hiding ZK"?
2. Do you have suggestions on how to track this overall theme of "hiding ZK"? Some of these issues I mentioned are associated with other epics, so I don't know if creating a "hiding ZK" epic and having these (and other issues) as sub-tasks is a good idea (maybe it is). Alternatively, how about tracking these (and other issues) using some label?
Regards,
Ishan



On Wed, Apr 26, 2017 at 2:39 AM, Otis Gospodnetić <[hidden email]> wrote:
Hi,

This thread about Solr master-slave vs. SolrCloud deployment poll seems to point out people find SolrCloud (the ZK part of it) deployment complex:


It could be just how information is presented...
... or how ZK is exposed as something external, which it is...

Are there plans to "hide ZK"?  Or maybe have the notion of master-only (not as in master-slave, but as in running ZK only, not hosting data) mode for SolrCloud nodes (a la ES)?  

I peeked at JIRA, but couldn't find anything about that, although I seem to recall some mention of embedding ZK to make things easier for SolrCloud users.  I think I saw that at some Lucene Revolution talk?

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/