Incremental replication...

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Incremental replication...

escher2k
I was wondering if the scripts provided in Solr do incremental replication. Looking at the script for snapshooter, it seems like the whole index directory is copied over. Is that correct ? If so, isn't performance a problem over the long run ? Thanks for the clarification in advance (I hope I am wrong !!).
Reply | Threaded
Open this post in threaded view
|

RE: Incremental replication...

Graham Stead-2
We have used replication for a few weeks now and it generally works well.

I believe you'll find that commit operations cause only new segments to be
transferred, whereas optimize operations cause the entire index to be
transferred. Therefore, the amount of data transferred really depends on how
frequently you index new data and how often you call <commit/> and
<optimize/>.

Hope this helps,
-Graham


Reply | Threaded
Open this post in threaded view
|

RE: Incremental replication...

escher2k
Graham Stead-2 wrote
We have used replication for a few weeks now and it generally works well.

I believe you'll find that commit operations cause only new segments to be
transferred, whereas optimize operations cause the entire index to be
transferred. Therefore, the amount of data transferred really depends on how
frequently you index new data and how often you call <commit/> and
<optimize/>.

Hope this helps,
-Graham

Thanks Graham. Atleast from looking at the snapshooter script, it doesn't seem to be doing anything specific.  The following is a fragment from the script -

snap_name=snapshot.`date +"%Y%m%d%H%M%S"`
name=${data_dir}/${snap_name}
temp=${data_dir}/temp-${snap_name}

if [[ -d ${name} ]]
then
    logMessage snapshot directory ${name} already exists
    logExit aborted 1
fi

if [[ -d ${temp} ]]
then
    logMessage snapshoting of ${name} in progress
    logExit aborted 1
fi

# clean up after INT/TERM
trap 'echo cleaning up, please wait ...;/bin/rm -rf ${name} ${temp};logExit aborted 13' INT TERM

logMessage taking snapshot ${name}

# take a snapshot using hard links into temporary location
# then move it into place atomically
cp -lr ${data_dir}/index ${temp}
mv ${temp} ${name}
Reply | Threaded
Open this post in threaded view
|

Re: Incremental replication...

Bertrand Delacretaz
On 2/13/07, escher2k <[hidden email]> wrote:

> ...Atleast from looking at the snapshooter script, it doesn't
> seem to be doing anything specific...

The snapshooter script only makes an "instant snapshot" of the index
directory using cp -lr. This does not involve any copying of index
data.

The actual replication is done using rsync in the other scripts, by
copying the index snapshot elsewhere.

Rsync only copies what has changed since the last copy, and not many
files change in a Lucene index when adding documents, so it's correct
that replication uses little bandwidth when adding documents.

Index optimization, OTOH, causes much larger changes in the index
directory, so after an optimization rsync will usually have much more
data to transfer.

-Bertrand
Reply | Threaded
Open this post in threaded view
|

Re: Incremental replication...

Bill Au
FYI, additional information on replication is available in the Solr TWiki:

http://wiki.apache.org/solr/CollectionDistribution

Bill

On 2/13/07, Bertrand Delacretaz <[hidden email]> wrote:

>
> On 2/13/07, escher2k <[hidden email]> wrote:
>
> > ...Atleast from looking at the snapshooter script, it doesn't
> > seem to be doing anything specific...
>
> The snapshooter script only makes an "instant snapshot" of the index
> directory using cp -lr. This does not involve any copying of index
> data.
>
> The actual replication is done using rsync in the other scripts, by
> copying the index snapshot elsewhere.
>
> Rsync only copies what has changed since the last copy, and not many
> files change in a Lucene index when adding documents, so it's correct
> that replication uses little bandwidth when adding documents.
>
> Index optimization, OTOH, causes much larger changes in the index
> directory, so after an optimization rsync will usually have much more
> data to transfer.
>
> -Bertrand
>
Reply | Threaded
Open this post in threaded view
|

Re: Incremental replication...

Kevin Lewandowski
In reply to this post by escher2k
snapshooter copies all files but most files in the snapshot
directories are hard links pointing to segments in the main index
directory. So only new segments end up getting copied.

We've been running replication on discogs.com for several months and
it works great.

On 2/13/07, escher2k <[hidden email]> wrote:

>
> I was wondering if the scripts provided in Solr do incremental replication.
> Looking at the script for snapshooter, it seems like the whole index
> directory is copied over. Is that correct ? If so, isn't performance a
> problem over the long run ? Thanks for the clarification in advance (I hope
> I am wrong !!).
> --
> View this message in context: http://www.nabble.com/Incremental-replication...-tf3222946.html#a8951862
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>