synchronizing slave indexes in distributing collections

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

synchronizing slave indexes in distributing collections

daremind
Hi, there,

We want to use Solr's Collection Distribution. Here's the question regarding
recovery of failures of the scripts.  To my understanding:

* if the snapuller fails on a slave, we can possibly implement something
like the master would examine the status messages from all slaves and notify
all slaves to execute snapinstaller if all statuses are success.

* however, if then snapinstaller fails on a slave, there is really no simple
operation to rollback so that all slaves can still keep the same old index.
Besides, there is usually some hardware, network or simply Solr problems
causing the snapinstaller to fail. The problem may prevent any rollback
operation to execute, even if there is such an operation.

It seems possible to implement a 2-phase commit like protocol to provide
automatic recovery to keep all slave indexes consistent at all time.
However, one being that I don't see there's an rollback operation for
snapinstaller; two this would definitely complicates the system.

So looks like all we can do is it monitoring the logs and alarm people to
fix the issue and rerun the scripts, etc. whenever failures occur. Is that
the correct understanding?


Thanks,

-Hui
Reply | Threaded
Open this post in threaded view
|

Re: synchronizing slave indexes in distributing collections

hossman

: So looks like all we can do is it monitoring the logs and alarm people to
: fix the issue and rerun the scripts, etc. whenever failures occur. Is that
: the correct understanding?

I have *never* seen snappuller or snapinstaller fail (except during an
initial rollout of Solr when i forgot to setup the neccessary ssh keys).

I suppose we could at an option to snapinstaller to support explicitly
installing a snapshot by name ... then if you detect that salve Z didn't
load the latest snapshot, you could always tell the other slaves to
snapinstall whatever older version slave Z is still using -- but frankly
that seems a little silly -- not to mention that if you couldn't load the
snapshot into Z, odds are Z isn't responding to queries either.

a better course of action might just be to have an automated system which
monitors the distribution status info on the master, and takes any slaves
that don't update it properly out of your load balances rotation (and
notifies people to look into it)



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: synchronizing slave indexes in distributing collections

Bill Au
If snapinstaller fails to install the lastest snapshot, then chances are
that it would be able to install any earlier snapshots as well.  All it does
is some very simple filesystem operations and then invoke the Solr server to
do a commit.  I agree with Chris that the best thing to do is to take it out
of rotation and fix the underlying problem.

Bill

On 8/17/07, Chris Hostetter <[hidden email]> wrote:

>
>
> : So looks like all we can do is it monitoring the logs and alarm people
> to
> : fix the issue and rerun the scripts, etc. whenever failures occur. Is
> that
> : the correct understanding?
>
> I have *never* seen snappuller or snapinstaller fail (except during an
> initial rollout of Solr when i forgot to setup the neccessary ssh keys).
>
> I suppose we could at an option to snapinstaller to support explicitly
> installing a snapshot by name ... then if you detect that salve Z didn't
> load the latest snapshot, you could always tell the other slaves to
> snapinstall whatever older version slave Z is still using -- but frankly
> that seems a little silly -- not to mention that if you couldn't load the
> snapshot into Z, odds are Z isn't responding to queries either.
>
> a better course of action might just be to have an automated system which
> monitors the distribution status info on the master, and takes any slaves
> that don't update it properly out of your load balances rotation (and
> notifies people to look into it)
>
>
>
> -Hoss
>
>
Reply | Threaded
Open this post in threaded view
|

Re: synchronizing slave indexes in distributing collections

daremind
Thanks, guys.

Glad to know the scripts work very well in your experience. (well, indeed
they are quite simple.) So that's how I imagine we should do it except that
you guys added a very good point -- that the monitoring system can invoke a
script to take the slave out of the load balancer.  I'd like to implement
this idea.


Cheers,

-Hui

On 8/17/07, Bill Au <[hidden email]> wrote:

>
> If snapinstaller fails to install the lastest snapshot, then chances are
> that it would be able to install any earlier snapshots as well.  All it
> does
> is some very simple filesystem operations and then invoke the Solr server
> to
> do a commit.  I agree with Chris that the best thing to do is to take it
> out
> of rotation and fix the underlying problem.
>
> Bill
>
> On 8/17/07, Chris Hostetter <[hidden email]> wrote:
> >
> >
> > : So looks like all we can do is it monitoring the logs and alarm people
> > to
> > : fix the issue and rerun the scripts, etc. whenever failures occur. Is
> > that
> > : the correct understanding?
> >
> > I have *never* seen snappuller or snapinstaller fail (except during an
> > initial rollout of Solr when i forgot to setup the neccessary ssh keys).
> >
> > I suppose we could at an option to snapinstaller to support explicitly
> > installing a snapshot by name ... then if you detect that salve Z didn't
> > load the latest snapshot, you could always tell the other slaves to
> > snapinstall whatever older version slave Z is still using -- but frankly
> > that seems a little silly -- not to mention that if you couldn't load
> the
> > snapshot into Z, odds are Z isn't responding to queries either.
> >
> > a better course of action might just be to have an automated system
> which
> > monitors the distribution status info on the master, and takes any
> slaves
> > that don't update it properly out of your load balances rotation (and
> > notifies people to look into it)
> >
> >
> >
> > -Hoss
> >
> >
>



--
Regards,

-Hui
Reply | Threaded
Open this post in threaded view
|

Re: synchronizing slave indexes in distributing collections

sunnyShiny06
Hi,

I would like to know where are you about your script which take the slave out of the load balancer ??
I've no choice to do that during update on the slave server.

Thanks,

Yu-Hui Jin wrote
Thanks, guys.

Glad to know the scripts work very well in your experience. (well, indeed
they are quite simple.) So that's how I imagine we should do it except that
you guys added a very good point -- that the monitoring system can invoke a
script to take the slave out of the load balancer.  I'd like to implement
this idea.


Cheers,

-Hui

On 8/17/07, Bill Au <bill.w.au@gmail.com> wrote:
>
> If snapinstaller fails to install the lastest snapshot, then chances are
> that it would be able to install any earlier snapshots as well.  All it
> does
> is some very simple filesystem operations and then invoke the Solr server
> to
> do a commit.  I agree with Chris that the best thing to do is to take it
> out
> of rotation and fix the underlying problem.
>
> Bill
>
> On 8/17/07, Chris Hostetter <hossman_lucene@fucit.org> wrote:
> >
> >
> > : So looks like all we can do is it monitoring the logs and alarm people
> > to
> > : fix the issue and rerun the scripts, etc. whenever failures occur. Is
> > that
> > : the correct understanding?
> >
> > I have *never* seen snappuller or snapinstaller fail (except during an
> > initial rollout of Solr when i forgot to setup the neccessary ssh keys).
> >
> > I suppose we could at an option to snapinstaller to support explicitly
> > installing a snapshot by name ... then if you detect that salve Z didn't
> > load the latest snapshot, you could always tell the other slaves to
> > snapinstall whatever older version slave Z is still using -- but frankly
> > that seems a little silly -- not to mention that if you couldn't load
> the
> > snapshot into Z, odds are Z isn't responding to queries either.
> >
> > a better course of action might just be to have an automated system
> which
> > monitors the distribution status info on the master, and takes any
> slaves
> > that don't update it properly out of your load balances rotation (and
> > notifies people to look into it)
> >
> >
> >
> > -Hoss
> >
> >
>



--
Regards,

-Hui