Master/Slave setup

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Master/Slave setup

Alex Benjamen
I'm trying to figure out how best to handle the replication for our system. (We're
not using the rsync mechanism because we don't want to have frequent updates
on slaves)
 
Current process:
 
1. Master builds new incremental index once an hour. Commit/Optimize, copy over
    index to an nfs exported directory
2. Slave compares index version on in mounted dir to it's own(once in 2hrs), if it finds a newer
    index,  it will: stop solr, copy over new index, restart solr
 
Things are working fine, but, the problem is there is no autowarming. If we use the master/slave
setup, then rsync will constantly update the index and the caching will not work as well. There
is no reason for us to keep updating the slaves.

Question: is it possible to simply copy over the new index without restarting solr? And solr server
will detect that the index has in fact changed, and autowarm based on prev. queries...

Should snappuller be used? How does snappuller know not to fetch while the master is indexing
the feeds... or doing optimize, etc
 
Thanks in advance for sugesstions
-Alex
Reply | Threaded
Open this post in threaded view
|

Re: Master/Slave setup

Otis Gospodnetic-2
Alex,

I think you should rethink the approach you described and reconsider using the provided replication scripts.

- How often the searchers see the new index depends on how often the snappuller + snapinstaller are run on slaves.
- If you want the searchers to get a new and optimized index every 2 hours, there is no need to optimize the index on slaves every 1 hour.
- If you use snappuller + snapinstaller you will not need to restart Solr (good!) and autowarming will be done (good again!)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----

> From: Alex Benjamen <[hidden email]>
> To: [hidden email]
> Sent: Thursday, February 28, 2008 10:12:32 PM
> Subject: Master/Slave setup
>
> I'm trying to figure out how best to handle the replication for our system.
> (We're
> not using the rsync mechanism because we don't want to have frequent updates
> on slaves)
>  
> Current process:
>  
> 1. Master builds new incremental index once an hour. Commit/Optimize, copy over
>     index to an nfs exported directory
> 2. Slave compares index version on in mounted dir to it's own(once in 2hrs), if
> it finds a newer
>     index,  it will: stop solr, copy over new index, restart solr
>  
> Things are working fine, but, the problem is there is no autowarming. If we use
> the master/slave
> setup, then rsync will constantly update the index and the caching will not work
> as well. There
> is no reason for us to keep updating the slaves.
>
> Question: is it possible to simply copy over the new index without restarting
> solr? And solr server
> will detect that the index has in fact changed, and autowarm based on prev.
> queries...
>
> Should snappuller be used? How does snappuller know not to fetch while the
> master is indexing
> the feeds... or doing optimize, etc
>  
> Thanks in advance for sugesstions
> -Alex
>


Reply | Threaded
Open this post in threaded view
|

Re: Master/Slave setup

Walter Underwood, Netflix
In reply to this post by Alex Benjamen
You have no cache at all when you stop and restart Solr. I recommend
using the provided scripts for index distribution. Run snappuller
and snapinstaller every two hours.

The scripts already do the right thing. A snapshot is created after
a commit on the indexer. Snappuller only copies over an index
when it has changed. Snapinstaller updates Solr without restarting it.
Solr autowarms a new cache from the existing one.

wunder

On 2/28/08 7:12 PM, "Alex Benjamen" <[hidden email]> wrote:

> I'm trying to figure out how best to handle the replication for our system.
> (We're
> not using the rsync mechanism because we don't want to have frequent updates
> on slaves)
>  
> Current process:
>  
> 1. Master builds new incremental index once an hour. Commit/Optimize, copy
> over
>     index to an nfs exported directory
> 2. Slave compares index version on in mounted dir to it's own(once in 2hrs),
> if it finds a newer
>     index,  it will: stop solr, copy over new index, restart solr
>  
> Things are working fine, but, the problem is there is no autowarming. If we
> use the master/slave
> setup, then rsync will constantly update the index and the caching will not
> work as well. There
> is no reason for us to keep updating the slaves.
>
> Question: is it possible to simply copy over the new index without restarting
> solr? And solr server
> will detect that the index has in fact changed, and autowarm based on prev.
> queries...
>
> Should snappuller be used? How does snappuller know not to fetch while the
> master is indexing
> the feeds... or doing optimize, etc
>  
> Thanks in advance for sugesstions
> -Alex

Reply | Threaded
Open this post in threaded view
|

RE: Master/Slave setup

Alex Benjamen
In reply to this post by Otis Gospodnetic-2
OK, I'll give it a shot... Couple of issues I see with the snappuller:
 
1. When the master performs a commit, and then optimize, there is nothing to prevent
   snappuller to pul a non-optimized index?
 
2 Do uncommitted updates constitute a different index version... suppose I post 10 XML
  files on update, and do not commit.   The snappuller happens to run at that time - will
  it pull the uncommitted index, or is it smart enough to detect that the newer index is
  not committed/optimized

I suppose, I could write to some file after optimize is done (index version) on the master,
and modify snappuller to look at that file... but it would be good if that happens "out of the box"
 
Thanks,
Alex
 
 

________________________________

From: Otis Gospodnetic [mailto:[hidden email]]
Sent: Thu 2/28/2008 8:16 PM
To: [hidden email]
Subject: Re: Master/Slave setup



Alex,

I think you should rethink the approach you described and reconsider using the provided replication scripts.

- How often the searchers see the new index depends on how often the snappuller + snapinstaller are run on slaves.
- If you want the searchers to get a new and optimized index every 2 hours, there is no need to optimize the index on slaves every 1 hour.
- If you use snappuller + snapinstaller you will not need to restart Solr (good!) and autowarming will be done (good again!)

Otis
--


Reply | Threaded
Open this post in threaded view
|

Re: Master/Slave setup

Walter Underwood, Netflix
In solrconfig.xml, configure a listener for "postOptimize" but not for
"postCommit". That listener runs snapshooter. You will only create
snapshots after an optimize. That's what I do.

wunder

On 2/29/08 11:38 AM, "Alex Benjamen" <[hidden email]> wrote:

> OK, I'll give it a shot... Couple of issues I see with the snappuller:
>  
> 1. When the master performs a commit, and then optimize, there is nothing to
> prevent
>    snappuller to pul a non-optimized index?
>  
> 2 Do uncommitted updates constitute a different index version... suppose I
> post 10 XML
>   files on update, and do not commit.   The snappuller happens to run at that
> time - will
>   it pull the uncommitted index, or is it smart enough to detect that the
> newer index is
>   not committed/optimized
>
> I suppose, I could write to some file after optimize is done (index version)
> on the master,
> and modify snappuller to look at that file... but it would be good if that
> happens "out of the box"
>  
> Thanks,
> Alex
>  
>  
>
> ________________________________
>
> From: Otis Gospodnetic [mailto:[hidden email]]
> Sent: Thu 2/28/2008 8:16 PM
> To: [hidden email]
> Subject: Re: Master/Slave setup
>
>
>
> Alex,
>
> I think you should rethink the approach you described and reconsider using the
> provided replication scripts.
>
> - How often the searchers see the new index depends on how often the
> snappuller + snapinstaller are run on slaves.
> - If you want the searchers to get a new and optimized index every 2 hours,
> there is no need to optimize the index on slaves every 1 hour.
> - If you use snappuller + snapinstaller you will not need to restart Solr
> (good!) and autowarming will be done (good again!)
>
> Otis
> --
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Master/Slave setup

Otis Gospodnetic-2
In reply to this post by Alex Benjamen
But note one thing here.  Pulling a merely modified index (its snapshot) and not the fully optimized index means you'll only pull the delta, while if you fully optimize the index and then the snapshooter runs and then snappuller runs, the *whole* index will be pulled over the network from master to slave.  This is good in some situations, and bad in others.  In your case, it's probably a good thing - you are likely okay with the large network transfer and would not be okay with an unoptimized index being hit at high query rate.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----

> From: Walter Underwood <[hidden email]>
> To: [hidden email]
> Sent: Friday, February 29, 2008 2:36:12 PM
> Subject: Re: Master/Slave setup
>
> In solrconfig.xml, configure a listener for "postOptimize" but not for
> "postCommit". That listener runs snapshooter. You will only create
> snapshots after an optimize. That's what I do.
>
> wunder
>
> On 2/29/08 11:38 AM, "Alex Benjamen"  wrote:
>
> > OK, I'll give it a shot... Couple of issues I see with the snappuller:
> >  
> > 1. When the master performs a commit, and then optimize, there is nothing to
> > prevent
> >    snappuller to pul a non-optimized index?
> >  
> > 2 Do uncommitted updates constitute a different index version... suppose I
> > post 10 XML
> >   files on update, and do not commit.   The snappuller happens to run at that
> > time - will
> >   it pull the uncommitted index, or is it smart enough to detect that the
> > newer index is
> >   not committed/optimized
> >
> > I suppose, I could write to some file after optimize is done (index version)
> > on the master,
> > and modify snappuller to look at that file... but it would be good if that
> > happens "out of the box"
> >  
> > Thanks,
> > Alex
> >  
> >  
> >
> > ________________________________
> >
> > From: Otis Gospodnetic [mailto:[hidden email]]
> > Sent: Thu 2/28/2008 8:16 PM
> > To: [hidden email]
> > Subject: Re: Master/Slave setup
> >
> >
> >
> > Alex,
> >
> > I think you should rethink the approach you described and reconsider using the
> > provided replication scripts.
> >
> > - How often the searchers see the new index depends on how often the
> > snappuller + snapinstaller are run on slaves.
> > - If you want the searchers to get a new and optimized index every 2 hours,
> > there is no need to optimize the index on slaves every 1 hour.
> > - If you use snappuller + snapinstaller you will not need to restart Solr
> > (good!) and autowarming will be done (good again!)
> >
> > Otis
> > --
> >
> >
>
>