Cloud Deployment Strategy... In the Cloud

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Cloud Deployment Strategy... In the Cloud

Steve Davids
Hi,

I am trying to come up with a repeatable process for deploying a Solr Cloud
cluster from scratch along with the appropriate security groups, auto
scaling groups, and custom Solr plugin code. I saw that LucidWorks created
a Solr Scale Toolkit but that seems to be more of a one-shot deal than
really setting up your environment for the long-haul. Here is were we are
at right now:

   1. ZooKeeper ensemble is easily brought up via a Cloud Formation Script
   2. We have an RPM built to lay down the Solr distribution + Custom
   plugins + Configuration
   3. Solr machines come up and connect to ZK

Now, we are using Puppet which could easily create the core.properties file
for the corresponding core and have ZK get bootstrapped but that seems to
be a no-no these days... So, can anyone think of a way to get ZK
bootstrapped automatically with pre-configured Collection configurations?
Also, is there a recommendation on how to deal with machines that are
coming/going? As I see it machines will be getting spun up and terminated
from time to time and we need to have a process of dealing with that, the
first idea was to just use a common node name so if a machine was
terminated a new one can come up and replace that particular node but on
second thought it would seem to require an auto scaling group *per* node
(so it knows what node name it is). For a large cluster this seems crazy
from a maintenance perspective, especially if you want to be elastic with
regard to the number of live replicas for peak times. So, then the next
idea was to have some outside observer listen to when new ec2 instances are
created or terminated (via CloudWatch SQS) and make the appropriate API
calls to either add the replica or delete it, this seems doable but perhaps
not the simplest solution that could work.

I was hoping others have already gone through this and have valuable advice
to give, we are trying to setup Solr Cloud the "right way" so we don't get
nickel-and-dimed to death from an O&M perspective.

Thanks,

-Steve
Reply | Threaded
Open this post in threaded view
|

Re: Cloud Deployment Strategy... In the Cloud

Gili Nachum-2
Our auto setup sequence is:
1.deploy 3 zk nodes
2. Deploy solr nodes and start them connecting to zk.
3. Upload collection config to zk.
4. Call create collection rest api.
5. Done. SolrCloud ready to work.

Don't yet have automation for replacing or adding a node.
On Sep 22, 2015 18:27, "Steve Davids" <[hidden email]> wrote:

> Hi,
>
> I am trying to come up with a repeatable process for deploying a Solr Cloud
> cluster from scratch along with the appropriate security groups, auto
> scaling groups, and custom Solr plugin code. I saw that LucidWorks created
> a Solr Scale Toolkit but that seems to be more of a one-shot deal than
> really setting up your environment for the long-haul. Here is were we are
> at right now:
>
>    1. ZooKeeper ensemble is easily brought up via a Cloud Formation Script
>    2. We have an RPM built to lay down the Solr distribution + Custom
>    plugins + Configuration
>    3. Solr machines come up and connect to ZK
>
> Now, we are using Puppet which could easily create the core.properties file
> for the corresponding core and have ZK get bootstrapped but that seems to
> be a no-no these days... So, can anyone think of a way to get ZK
> bootstrapped automatically with pre-configured Collection configurations?
> Also, is there a recommendation on how to deal with machines that are
> coming/going? As I see it machines will be getting spun up and terminated
> from time to time and we need to have a process of dealing with that, the
> first idea was to just use a common node name so if a machine was
> terminated a new one can come up and replace that particular node but on
> second thought it would seem to require an auto scaling group *per* node
> (so it knows what node name it is). For a large cluster this seems crazy
> from a maintenance perspective, especially if you want to be elastic with
> regard to the number of live replicas for peak times. So, then the next
> idea was to have some outside observer listen to when new ec2 instances are
> created or terminated (via CloudWatch SQS) and make the appropriate API
> calls to either add the replica or delete it, this seems doable but perhaps
> not the simplest solution that could work.
>
> I was hoping others have already gone through this and have valuable advice
> to give, we are trying to setup Solr Cloud the "right way" so we don't get
> nickel-and-dimed to death from an O&M perspective.
>
> Thanks,
>
> -Steve
>
Reply | Threaded
Open this post in threaded view
|

Re: Cloud Deployment Strategy... In the Cloud

Steve Davids
What tools do you use for the "auto setup"? How do you get your config
automatically uploaded to zk?

On Tue, Sep 22, 2015 at 2:35 PM, Gili Nachum <[hidden email]> wrote:

> Our auto setup sequence is:
> 1.deploy 3 zk nodes
> 2. Deploy solr nodes and start them connecting to zk.
> 3. Upload collection config to zk.
> 4. Call create collection rest api.
> 5. Done. SolrCloud ready to work.
>
> Don't yet have automation for replacing or adding a node.
> On Sep 22, 2015 18:27, "Steve Davids" <[hidden email]> wrote:
>
> > Hi,
> >
> > I am trying to come up with a repeatable process for deploying a Solr
> Cloud
> > cluster from scratch along with the appropriate security groups, auto
> > scaling groups, and custom Solr plugin code. I saw that LucidWorks
> created
> > a Solr Scale Toolkit but that seems to be more of a one-shot deal than
> > really setting up your environment for the long-haul. Here is were we are
> > at right now:
> >
> >    1. ZooKeeper ensemble is easily brought up via a Cloud Formation
> Script
> >    2. We have an RPM built to lay down the Solr distribution + Custom
> >    plugins + Configuration
> >    3. Solr machines come up and connect to ZK
> >
> > Now, we are using Puppet which could easily create the core.properties
> file
> > for the corresponding core and have ZK get bootstrapped but that seems to
> > be a no-no these days... So, can anyone think of a way to get ZK
> > bootstrapped automatically with pre-configured Collection configurations?
> > Also, is there a recommendation on how to deal with machines that are
> > coming/going? As I see it machines will be getting spun up and terminated
> > from time to time and we need to have a process of dealing with that, the
> > first idea was to just use a common node name so if a machine was
> > terminated a new one can come up and replace that particular node but on
> > second thought it would seem to require an auto scaling group *per* node
> > (so it knows what node name it is). For a large cluster this seems crazy
> > from a maintenance perspective, especially if you want to be elastic with
> > regard to the number of live replicas for peak times. So, then the next
> > idea was to have some outside observer listen to when new ec2 instances
> are
> > created or terminated (via CloudWatch SQS) and make the appropriate API
> > calls to either add the replica or delete it, this seems doable but
> perhaps
> > not the simplest solution that could work.
> >
> > I was hoping others have already gone through this and have valuable
> advice
> > to give, we are trying to setup Solr Cloud the "right way" so we don't
> get
> > nickel-and-dimed to death from an O&M perspective.
> >
> > Thanks,
> >
> > -Steve
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Cloud Deployment Strategy... In the Cloud

Erick Erickson
bq: What tools do you use for the "auto setup"? How do you get your config
automatically uploaded to zk?

Both uploading the config to ZK and creating collections are one-time
operations, usually done manually. Currently uploading the config set is
accomplished with zkCli (yes, it's a little clumsy). There's a JIRA to put
this into solr/bin as a command though. They'd be easy enough to script in
any given situation though with a shell script or wizard....

Best,
Erick

On Wed, Sep 23, 2015 at 7:33 PM, Steve Davids <[hidden email]> wrote:

> What tools do you use for the "auto setup"? How do you get your config
> automatically uploaded to zk?
>
> On Tue, Sep 22, 2015 at 2:35 PM, Gili Nachum <[hidden email]> wrote:
>
> > Our auto setup sequence is:
> > 1.deploy 3 zk nodes
> > 2. Deploy solr nodes and start them connecting to zk.
> > 3. Upload collection config to zk.
> > 4. Call create collection rest api.
> > 5. Done. SolrCloud ready to work.
> >
> > Don't yet have automation for replacing or adding a node.
> > On Sep 22, 2015 18:27, "Steve Davids" <[hidden email]> wrote:
> >
> > > Hi,
> > >
> > > I am trying to come up with a repeatable process for deploying a Solr
> > Cloud
> > > cluster from scratch along with the appropriate security groups, auto
> > > scaling groups, and custom Solr plugin code. I saw that LucidWorks
> > created
> > > a Solr Scale Toolkit but that seems to be more of a one-shot deal than
> > > really setting up your environment for the long-haul. Here is were we
> are
> > > at right now:
> > >
> > >    1. ZooKeeper ensemble is easily brought up via a Cloud Formation
> > Script
> > >    2. We have an RPM built to lay down the Solr distribution + Custom
> > >    plugins + Configuration
> > >    3. Solr machines come up and connect to ZK
> > >
> > > Now, we are using Puppet which could easily create the core.properties
> > file
> > > for the corresponding core and have ZK get bootstrapped but that seems
> to
> > > be a no-no these days... So, can anyone think of a way to get ZK
> > > bootstrapped automatically with pre-configured Collection
> configurations?
> > > Also, is there a recommendation on how to deal with machines that are
> > > coming/going? As I see it machines will be getting spun up and
> terminated
> > > from time to time and we need to have a process of dealing with that,
> the
> > > first idea was to just use a common node name so if a machine was
> > > terminated a new one can come up and replace that particular node but
> on
> > > second thought it would seem to require an auto scaling group *per*
> node
> > > (so it knows what node name it is). For a large cluster this seems
> crazy
> > > from a maintenance perspective, especially if you want to be elastic
> with
> > > regard to the number of live replicas for peak times. So, then the next
> > > idea was to have some outside observer listen to when new ec2 instances
> > are
> > > created or terminated (via CloudWatch SQS) and make the appropriate API
> > > calls to either add the replica or delete it, this seems doable but
> > perhaps
> > > not the simplest solution that could work.
> > >
> > > I was hoping others have already gone through this and have valuable
> > advice
> > > to give, we are trying to setup Solr Cloud the "right way" so we don't
> > get
> > > nickel-and-dimed to death from an O&M perspective.
> > >
> > > Thanks,
> > >
> > > -Steve
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Cloud Deployment Strategy... In the Cloud

Dan Davis-2
ant is very good at this sort of thing, and easier for Java devs to learn
than Make.  Python has a module called fabric that is also very fine, but
for my dev. ops. it is another thing to learn.
I tend to divide things into three categories:

 - Things that have to do with system setup, and need to be run as root.
For this I write a bash script (I should learn puppet, but...)
 - Things that have to do with one time installation as a solr admin user
with /bin/bash, including upconfig.   For this I use an ant build.
 - Normal operational procedures.   For this, I typically use Solr admin or
scripts, but I wish I had time to create a good webapp (or money to
purchase Fusion).


On Thu, Sep 24, 2015 at 12:39 AM, Erick Erickson <[hidden email]>
wrote:

> bq: What tools do you use for the "auto setup"? How do you get your config
> automatically uploaded to zk?
>
> Both uploading the config to ZK and creating collections are one-time
> operations, usually done manually. Currently uploading the config set is
> accomplished with zkCli (yes, it's a little clumsy). There's a JIRA to put
> this into solr/bin as a command though. They'd be easy enough to script in
> any given situation though with a shell script or wizard....
>
> Best,
> Erick
>
> On Wed, Sep 23, 2015 at 7:33 PM, Steve Davids <[hidden email]> wrote:
>
> > What tools do you use for the "auto setup"? How do you get your config
> > automatically uploaded to zk?
> >
> > On Tue, Sep 22, 2015 at 2:35 PM, Gili Nachum <[hidden email]>
> wrote:
> >
> > > Our auto setup sequence is:
> > > 1.deploy 3 zk nodes
> > > 2. Deploy solr nodes and start them connecting to zk.
> > > 3. Upload collection config to zk.
> > > 4. Call create collection rest api.
> > > 5. Done. SolrCloud ready to work.
> > >
> > > Don't yet have automation for replacing or adding a node.
> > > On Sep 22, 2015 18:27, "Steve Davids" <[hidden email]> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to come up with a repeatable process for deploying a Solr
> > > Cloud
> > > > cluster from scratch along with the appropriate security groups, auto
> > > > scaling groups, and custom Solr plugin code. I saw that LucidWorks
> > > created
> > > > a Solr Scale Toolkit but that seems to be more of a one-shot deal
> than
> > > > really setting up your environment for the long-haul. Here is were we
> > are
> > > > at right now:
> > > >
> > > >    1. ZooKeeper ensemble is easily brought up via a Cloud Formation
> > > Script
> > > >    2. We have an RPM built to lay down the Solr distribution + Custom
> > > >    plugins + Configuration
> > > >    3. Solr machines come up and connect to ZK
> > > >
> > > > Now, we are using Puppet which could easily create the
> core.properties
> > > file
> > > > for the corresponding core and have ZK get bootstrapped but that
> seems
> > to
> > > > be a no-no these days... So, can anyone think of a way to get ZK
> > > > bootstrapped automatically with pre-configured Collection
> > configurations?
> > > > Also, is there a recommendation on how to deal with machines that are
> > > > coming/going? As I see it machines will be getting spun up and
> > terminated
> > > > from time to time and we need to have a process of dealing with that,
> > the
> > > > first idea was to just use a common node name so if a machine was
> > > > terminated a new one can come up and replace that particular node but
> > on
> > > > second thought it would seem to require an auto scaling group *per*
> > node
> > > > (so it knows what node name it is). For a large cluster this seems
> > crazy
> > > > from a maintenance perspective, especially if you want to be elastic
> > with
> > > > regard to the number of live replicas for peak times. So, then the
> next
> > > > idea was to have some outside observer listen to when new ec2
> instances
> > > are
> > > > created or terminated (via CloudWatch SQS) and make the appropriate
> API
> > > > calls to either add the replica or delete it, this seems doable but
> > > perhaps
> > > > not the simplest solution that could work.
> > > >
> > > > I was hoping others have already gone through this and have valuable
> > > advice
> > > > to give, we are trying to setup Solr Cloud the "right way" so we
> don't
> > > get
> > > > nickel-and-dimed to death from an O&M perspective.
> > > >
> > > > Thanks,
> > > >
> > > > -Steve
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Cloud Deployment Strategy... In the Cloud

Steve Davids
Our project built a custom "admin" webapp that we use for various O&M
activities so I went ahead and added the ability to upload a Zip
distribution which then uses SolrJ to forward the extracted contents to ZK,
this package is built and uploaded via a Gradle build task which makes life
easy on us by allowing us to jam stuff into ZK which is sitting in a
private network (local VPC) without necessarily needing to be on a ZK
machine. We then moved on to creating collection (trivial), and
adding/removing replicas. As for adding replicas I am rather confused as to
why I would need specify a specific shard for replica placement, before
when I threw down a core.properties file the machine would automatically
come up and figure out which shard it should join based on reasonable
assumptions - why wouldn't the same logic apply here? I then saw that
a Rule-based
Replica Placement
<https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement>
feature was added which I thought would be reasonable but after looking at
the tests <https://issues.apache.org/jira/browse/SOLR-7577> it appears to
still require a shard parameter for adding a replica which seems to defeat
the entire purpose. So after getting bummed out about that, I took a look
at the delete replica request since we are having machines come/go we need
to start dropping them and found that the delete replica requires a
collection, shard, and replica name and if I have the name of the machine
it appears the only way to figure out what to remove is by walking the
clusterstate tree for all collections and determine which replicas are a
candidate for removal which seems unnecessarily complicated.

Hopefully I don't come off as complaining, but rather looking at it from a
client perspective, the Collections API doesn't seem simple to use and
really the only reason I am messing around with it now is because there is
repeated threats to make "zk as truth" the default in the 5.x branch at
some point in the future. I would personally advocate that something like
the autoManageReplicas <https://issues.apache.org/jira/browse/SOLR-5748> be
introduced to make life much simpler on clients as this appears to be the
thing I am trying to implement externally.

If anyone has happened to to build a system to orchestrate Solr for cloud
infrastructure and have some pointers it would be greatly appreciated.

Thanks,

-Steve

On Thu, Sep 24, 2015 at 10:15 AM, Dan Davis <[hidden email]> wrote:

> ant is very good at this sort of thing, and easier for Java devs to learn
> than Make.  Python has a module called fabric that is also very fine, but
> for my dev. ops. it is another thing to learn.
> I tend to divide things into three categories:
>
>  - Things that have to do with system setup, and need to be run as root.
> For this I write a bash script (I should learn puppet, but...)
>  - Things that have to do with one time installation as a solr admin user
> with /bin/bash, including upconfig.   For this I use an ant build.
>  - Normal operational procedures.   For this, I typically use Solr admin or
> scripts, but I wish I had time to create a good webapp (or money to
> purchase Fusion).
>
>
> On Thu, Sep 24, 2015 at 12:39 AM, Erick Erickson <[hidden email]>
> wrote:
>
> > bq: What tools do you use for the "auto setup"? How do you get your
> config
> > automatically uploaded to zk?
> >
> > Both uploading the config to ZK and creating collections are one-time
> > operations, usually done manually. Currently uploading the config set is
> > accomplished with zkCli (yes, it's a little clumsy). There's a JIRA to
> put
> > this into solr/bin as a command though. They'd be easy enough to script
> in
> > any given situation though with a shell script or wizard....
> >
> > Best,
> > Erick
> >
> > On Wed, Sep 23, 2015 at 7:33 PM, Steve Davids <[hidden email]> wrote:
> >
> > > What tools do you use for the "auto setup"? How do you get your config
> > > automatically uploaded to zk?
> > >
> > > On Tue, Sep 22, 2015 at 2:35 PM, Gili Nachum <[hidden email]>
> > wrote:
> > >
> > > > Our auto setup sequence is:
> > > > 1.deploy 3 zk nodes
> > > > 2. Deploy solr nodes and start them connecting to zk.
> > > > 3. Upload collection config to zk.
> > > > 4. Call create collection rest api.
> > > > 5. Done. SolrCloud ready to work.
> > > >
> > > > Don't yet have automation for replacing or adding a node.
> > > > On Sep 22, 2015 18:27, "Steve Davids" <[hidden email]> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am trying to come up with a repeatable process for deploying a
> Solr
> > > > Cloud
> > > > > cluster from scratch along with the appropriate security groups,
> auto
> > > > > scaling groups, and custom Solr plugin code. I saw that LucidWorks
> > > > created
> > > > > a Solr Scale Toolkit but that seems to be more of a one-shot deal
> > than
> > > > > really setting up your environment for the long-haul. Here is were
> we
> > > are
> > > > > at right now:
> > > > >
> > > > >    1. ZooKeeper ensemble is easily brought up via a Cloud Formation
> > > > Script
> > > > >    2. We have an RPM built to lay down the Solr distribution +
> Custom
> > > > >    plugins + Configuration
> > > > >    3. Solr machines come up and connect to ZK
> > > > >
> > > > > Now, we are using Puppet which could easily create the
> > core.properties
> > > > file
> > > > > for the corresponding core and have ZK get bootstrapped but that
> > seems
> > > to
> > > > > be a no-no these days... So, can anyone think of a way to get ZK
> > > > > bootstrapped automatically with pre-configured Collection
> > > configurations?
> > > > > Also, is there a recommendation on how to deal with machines that
> are
> > > > > coming/going? As I see it machines will be getting spun up and
> > > terminated
> > > > > from time to time and we need to have a process of dealing with
> that,
> > > the
> > > > > first idea was to just use a common node name so if a machine was
> > > > > terminated a new one can come up and replace that particular node
> but
> > > on
> > > > > second thought it would seem to require an auto scaling group *per*
> > > node
> > > > > (so it knows what node name it is). For a large cluster this seems
> > > crazy
> > > > > from a maintenance perspective, especially if you want to be
> elastic
> > > with
> > > > > regard to the number of live replicas for peak times. So, then the
> > next
> > > > > idea was to have some outside observer listen to when new ec2
> > instances
> > > > are
> > > > > created or terminated (via CloudWatch SQS) and make the appropriate
> > API
> > > > > calls to either add the replica or delete it, this seems doable but
> > > > perhaps
> > > > > not the simplest solution that could work.
> > > > >
> > > > > I was hoping others have already gone through this and have
> valuable
> > > > advice
> > > > > to give, we are trying to setup Solr Cloud the "right way" so we
> > don't
> > > > get
> > > > > nickel-and-dimed to death from an O&M perspective.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > -Steve
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Cloud Deployment Strategy... In the Cloud

Mark Miller-3
On Wed, Sep 30, 2015 at 10:36 AM Steve Davids <[hidden email]> wrote:

> Our project built a custom "admin" webapp that we use for various O&M
> activities so I went ahead and added the ability to upload a Zip
> distribution which then uses SolrJ to forward the extracted contents to ZK,
> this package is built and uploaded via a Gradle build task which makes life
> easy on us by allowing us to jam stuff into ZK which is sitting in a
> private network (local VPC) without necessarily needing to be on a ZK
> machine. We then moved on to creating collection (trivial), and
> adding/removing replicas. As for adding replicas I am rather confused as to
> why I would need specify a specific shard for replica placement, before
> when I threw down a core.properties file the machine would automatically
> come up and figure out which shard it should join based on reasonable
> assumptions - why wouldn't the same logic apply here?


I'd file a JIRA issue for the functionality.


> I then saw that
> a Rule-based
> Replica Placement
> <
> https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement
> >
> feature was added which I thought would be reasonable but after looking at
> the tests <https://issues.apache.org/jira/browse/SOLR-7577> it appears to
> still require a shard parameter for adding a replica which seems to defeat
> the entire purpose.


I was not involved in the addReplica command, but the predefined stuff
worked that way just to make bootstrapping up a cluster really simple. I
don't see why addReplica couldn't follow the same logic if no shard was
specified.


> So after getting bummed out about that, I took a look
> at the delete replica request since we are having machines come/go we need
> to start dropping them and found that the delete replica requires a
> collection, shard, and replica name and if I have the name of the machine
> it appears the only way to figure out what to remove is by walking the
> clusterstate tree for all collections and determine which replicas are a
> candidate for removal which seems unnecessarily complicated.
>

You should not need the shard for this call. The collection and replica
core node name will be unique. Another JIRA issue?


>
> Hopefully I don't come off as complaining, but rather looking at it from a
> client perspective, the Collections API doesn't seem simple to use and
> really the only reason I am messing around with it now is because there is
> repeated threats to make "zk as truth" the default in the 5.x branch at
> some point in the future. I would personally advocate that something like
> the autoManageReplicas <https://issues.apache.org/jira/browse/SOLR-5748>
> be
> introduced to make life much simpler on clients as this appears to be the
> thing I am trying to implement externally.
>
> If anyone has happened to to build a system to orchestrate Solr for cloud
> infrastructure and have some pointers it would be greatly appreciated.
>
> Thanks,
>
> -Steve
>
>
> --
- Mark
about.me/markrmiller