Rolling Deploys and SolrCloud

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Rolling Deploys and SolrCloud

Mike Schultz
Does anybody have any experience with "rolling deployments" and SolrCloud?

We have a production environment where we deploy new software and config simultaneously to individual servers in a rolling manner.  At any point during the deployment, there may be N boxes with old software/config and M boxes with new software/config.  Eventually N=0 (no more old software/config) and M=100% (all boxes have new software/config).  This is very convenient because one knows that the config that the new software requires is present, but it is not present for old software on other boxes.  We can maintain 100% uptime for the service using this technique.

If I'm understanding the role that Zk plays in SolrCloud, this no longer works.  If config lives in Zk, then it's all or nothing, all old or all new config.  If this is true, it presents a bunch of new challenges for deploying software.

So to ask a concrete question, is it possible to not use zk for config distribution, i.e. keep the config local to each shard?

Mike Schultz
Reply | Threaded
Open this post in threaded view
|

Re: Rolling Deploys and SolrCloud

Mark Miller-3
Doing this with SolrCloud is not much different than doing it with old style Solr.

ZooKeeper supports rolling restarts, and AFAIK, so does Solr generally.

While the configs live in zk, they work the same way as if they were local.

A SolrCore won't try and read them until you reload it. I think that's enough usually?

Can SolrCloud read config files from the local filesystem? Not currently.

Mark

On Dec 11, 2012, at 7:56 PM, Mike Schultz <[hidden email]> wrote:

> Does anybody have any experience with "rolling deployments" and SolrCloud?
>
> We have a production environment where we deploy new software and config
> simultaneously to individual servers in a rolling manner.  At any point
> during the deployment, there may be N boxes with old software/config and M
> boxes with new software/config.  Eventually N=0 (no more old
> software/config) and M=100% (all boxes have new software/config).  This is
> very convenient because one knows that the config that the new software
> requires is present, but it is not present for old software on other boxes.
> We can maintain 100% uptime for the service using this technique.
>
> If I'm understanding the role that Zk plays in SolrCloud, this no longer
> works.  If config lives in Zk, then it's all or nothing, all old or all new
> config.  If this is true, it presents a bunch of new challenges for
> deploying software.
>
> So to ask a concrete question, is it possible to not use zk for config
> distribution, i.e. keep the config local to each shard?
>
> Mike Schultz
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Rolling-Deploys-and-SolrCloud-tp4026212.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Rolling Deploys and SolrCloud

Mike Schultz
Ok, that makes sense and it's probably workable, but, it's still more awkward than having code and configuration deployed together to individual machines.  

For example, for a deploy of new software/config we need to 1) first upload config to zK.  then 2) deploy new software to the nodes.

What about the span of time between 1) and 2)?  If a box bounces during this time it will come up with the wrong config.  Or what if 2) goes awry and some boxes succeed and some fail?  It could be very complicated to recover from that.

Another use case is, I may want to push a new software/config to a single box for a smoke test before rolling to all nodes in production (some might call this testing in production but it's just real world safety).

I guess at the end of the day, what I don't understand is, given that I need to roll new software bits to individual nodes for the deployment of new software, what good does keeping config in zk do for me?  Why not just keep the config with the software and roll it at the same time?  
Reply | Threaded
Open this post in threaded view
|

Re: Rolling Deploys and SolrCloud

Mark Miller-3

On Dec 12, 2012, at 12:52 PM, Mike Schultz <[hidden email]> wrote:

> Ok, that makes sense and it's probably workable, but, it's still more awkward
> than having code and configuration deployed together to individual machines.  
>
> For example, for a deploy of new software/config we need to 1) first upload
> config to zK.  then 2) deploy new software to the nodes.
>
> What about the span of time between 1) and 2)?  If a box bounces during this
> time it will come up with the wrong config.  Or what if 2) goes awry and
> some boxes succeed and some fail?  It could be very complicated to recover
> from that.

Yeah, that's the only sticky spot I was thinking about. But i figured that in general, if a box goes down, you would upgrade it before bringing it back up.

>
> Another use case is, I may want to push a new software/config to a single
> box for a smoke test before rolling to all nodes in production (some might
> call this testing in production but it's just real world safety).

You might file a feature request - there are various things you could consider - perhaps allowing a core that is part of a collection to override the collection config and point to a different one in zookeeper.

Working with local configs today is a little scary in that it might tie our hands a bit in terms of what we need to support in the future and other features we want to add.

>
> I guess at the end of the day, what I don't understand is, given that I need
> to roll new software bits to individual nodes for the deployment of new
> software, what good does keeping config in zk do for me?  Why not just keep
> the config with the software and roll it at the same time?  

It ensures that all the nodes are using the same config and one is not somehow off. If you have ever juggled configs that should be the same across 100 nodes, you have probably screwed things up here and there. And updating a setting means updating 100 files, etc. Be sure you didn't miss one :)

- Mark