replication question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

replication question

Michael Stoppelman
I've got a question from Doug's original email about replication (
http://www.mail-archive.com/lucene-user@.../msg12709.html):

"1. On the index master, periodically checkpoint the index. Every minute or
so the IndexWriter is closed and a 'cp -lr index index.DATE' command is
executed from Java, where DATE is the current date and time. This
efficiently makes a copy of the index when its in a consistent state by
constructing a tree of hard links. If Lucene re-writes any files (e.g., the
segments file) a new inode is created and the copy is unchanged."

Is closing the IndexWriter really a requirement on taking a snapshot? Or can
one take a snapshot on an index being written, I've done this in my
development environment and it seems to work fine w/o closing the
IndexWriter. Also the solr replication shell scripts don't seem to worry
about this either.

M
Reply | Threaded
Open this post in threaded view
|

Re: replication question

Yonik Seeley
On Tue, Dec 16, 2008 at 1:04 AM, Michael Stoppelman <[hidden email]> wrote:

> I've got a question from Doug's original email about replication (
> http://www.mail-archive.com/lucene-user@.../msg12709.html):
>
> "1. On the index master, periodically checkpoint the index. Every minute or
> so the IndexWriter is closed and a 'cp -lr index index.DATE' command is
> executed from Java, where DATE is the current date and time. This
> efficiently makes a copy of the index when its in a consistent state by
> constructing a tree of hard links. If Lucene re-writes any files (e.g., the
> segments file) a new inode is created and the copy is unchanged."
>
> Is closing the IndexWriter really a requirement on taking a snapshot? Or can
> one take a snapshot on an index being written, I've done this in my
> development environment and it seems to work fine w/o closing the
> IndexWriter.

There are subtle race conditions if you try to do this with a changing index.
At any instance in time, the index should be consistent, *but* you
can't actually make a snapshot instantaneously.

So this is doable, but it would require some complex retry logic like
IndexReader has when opening an index.

> Also the solr replication shell scripts don't seem to worry
> about this either.

Solr takes snapshots when it knows it's not updating the index (new
index changes are internally blocked when calling snapshooter).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: replication question

Michael Stoppelman
Hi Yonik,

Thanks for the response.

reply inline.

On Tue, Dec 16, 2008 at 6:44 AM, Yonik Seeley <[hidden email]> wrote:

> On Tue, Dec 16, 2008 at 1:04 AM, Michael Stoppelman <[hidden email]>
> wrote:
> > I've got a question from Doug's original email about replication (
> > http://www.mail-archive.com/lucene-user@.../msg12709.html
> ):
> >
> > "1. On the index master, periodically checkpoint the index. Every minute
> or
> > so the IndexWriter is closed and a 'cp -lr index index.DATE' command is
> > executed from Java, where DATE is the current date and time. This
> > efficiently makes a copy of the index when its in a consistent state by
> > constructing a tree of hard links. If Lucene re-writes any files (e.g.,
> the
> > segments file) a new inode is created and the copy is unchanged."
> >
> > Is closing the IndexWriter really a requirement on taking a snapshot? Or
> can
> > one take a snapshot on an index being written, I've done this in my
> > development environment and it seems to work fine w/o closing the
> > IndexWriter.
>
> There are subtle race conditions if you try to do this with a changing
> index.
> At any instance in time, the index should be consistent, *but* you
> can't actually make a snapshot instantaneously.


Is the race condition in writing out the segments.gen or segments_N files?
From my understanding index segments once closed by the IndexWriter they
aren't modified again (they might be deleted though if they're merged away).


>
> So this is doable, but it would require some complex retry logic like
> IndexReader has when opening an index.
>
> > Also the solr replication shell scripts don't seem to worry
> > about this either.
>
> Solr takes snapshots when it knows it's not updating the index (new
> index changes are internally blocked when calling snapshooter).
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: replication question

Michael McCandless-2

It's better to use SnapshotDeletionPolicy to grab a consistent image  
of the index.  You don't need to close the IndexWriter, nor stop  
making changes through IndexWriter, and it lets you capture a given  
segments_N (and all index files it needs) and then take your time  
making a copy/backup/etc of all files in the snapshot.

There's a "green paper", excerpted from the upcoming Lucene in Action  
revision, that covers how to use SnapshotDeletionPolicy for backing up  
an index:

   http://manning.com/free/green_HotBackupsLucene.html

(Disclaimers: 1) I wrote the article, 2) The link is frustrating  
because you have to submit your email address, then get email w/ a  
link that gives you a zip file, which you then unzip and open the  
index.html... I've been meaning to post the article directly to the  
Wiki so now seems like a good time!).

Mike

Michael Stoppelman wrote:

> Hi Yonik,
>
> Thanks for the response.
>
> reply inline.
>
> On Tue, Dec 16, 2008 at 6:44 AM, Yonik Seeley <[hidden email]>  
> wrote:
>
>> On Tue, Dec 16, 2008 at 1:04 AM, Michael Stoppelman <[hidden email]
>> >
>> wrote:
>>> I've got a question from Doug's original email about replication (
>>> http://www.mail-archive.com/lucene-user@.../msg12709.html
>> ):
>>>
>>> "1. On the index master, periodically checkpoint the index. Every  
>>> minute
>> or
>>> so the IndexWriter is closed and a 'cp -lr index index.DATE'  
>>> command is
>>> executed from Java, where DATE is the current date and time. This
>>> efficiently makes a copy of the index when its in a consistent  
>>> state by
>>> constructing a tree of hard links. If Lucene re-writes any files  
>>> (e.g.,
>> the
>>> segments file) a new inode is created and the copy is unchanged."
>>>
>>> Is closing the IndexWriter really a requirement on taking a  
>>> snapshot? Or
>> can
>>> one take a snapshot on an index being written, I've done this in my
>>> development environment and it seems to work fine w/o closing the
>>> IndexWriter.
>>
>> There are subtle race conditions if you try to do this with a  
>> changing
>> index.
>> At any instance in time, the index should be consistent, *but* you
>> can't actually make a snapshot instantaneously.
>
>
> Is the race condition in writing out the segments.gen or segments_N  
> files?
> From my understanding index segments once closed by the IndexWriter  
> they
> aren't modified again (they might be deleted though if they're  
> merged away).
>
>
>>
>> So this is doable, but it would require some complex retry logic like
>> IndexReader has when opening an index.
>>
>>> Also the solr replication shell scripts don't seem to worry
>>> about this either.
>>
>> Solr takes snapshots when it knows it's not updating the index (new
>> index changes are internally blocked when calling snapshooter).
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]