SnapshotDeletionPolicy usage

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

SnapshotDeletionPolicy usage

jmuguruza
Hi guys,

I want to make use of the possibylity of hot backups in 2.3. If i
understand correctly, the only thing i need to do is to open the
writers with SnapshotDeletionPolicy, is that correct?

SnapshotDeletionPolicy dp = new SnapshotDeletionPolicy(new
KeepOnlyLastCommitDeletionPolicy());
final IndexWriter writer = new IndexWriter(dir, true, new
StandardAnalyzer(), dp);

And what would be the trade off of using this policy versus the
default (performance wise)? I have frenquently updating indexes (up to
tens every second) that i close periodically, and much less freqent
readers.

thanks
javi

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: SnapshotDeletionPolicy usage

Michael McCandless-2

jm wrote:

> Hi guys,
>
> I want to make use of the possibylity of hot backups in 2.3. If i
> understand correctly, the only thing i need to do is to open the
> writers with SnapshotDeletionPolicy, is that correct?

Right.

> SnapshotDeletionPolicy dp = new SnapshotDeletionPolicy(new
> KeepOnlyLastCommitDeletionPolicy());
> final IndexWriter writer = new IndexWriter(dir, true, new
> StandardAnalyzer(), dp);

You can also wrap any other deletion policy (it doesn't have to be  
KeepOnlyLastCommit).

When you want to do a backup, make sure to do try/finally, ie:

   IndexCommitPoint cp = dp.snapshot();
   try {
     Collection files = cp.getFileNames();
     <do copying here>
   } finally {
     dp.release();
   }

> And what would be the trade off of using this policy versus the
> default (performance wise)? I have frenquently updating indexes (up to
> tens every second) that i close periodically, and much less freqent
> readers.

There should be no performance loss as far as indexing throughput goes.

Though obviously while a backup is running, if you are taxing your IO  
system, then flushing/merging by the writer may take longer to run...  
however, it's safe (and maybe a good idea) to throttle the IO of your  
backup so you don't adversely affect ongoing indexing & searching.  
It just means you hold the commit point open for longer...

There is a transient cost in disk space: while a backup is running,  
the writer will not delete any segment files referenced by the commit  
point you have open.  So, if the writer goes and merges away some of  
these segments while your backup is running, this will consume some  
extra disk space.  Once you release the snapshot, the disk space will  
be reclaimed the next time the writer flushes, merges or is closed.

Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: SnapshotDeletionPolicy usage

jmuguruza
Thanks for the reply Mike.

> You can also wrap any other deletion policy (it doesn't have to be
> KeepOnlyLastCommit).
>
> When you want to do a backup, make sure to do try/finally, ie:
>
>    IndexCommitPoint cp = dp.snapshot();
>    try {
>      Collection files = cp.getFileNames();
>      <do copying here>
>    } finally {
>      dp.release();
>    }

Ohhh, I was hoping that by using that policy any _external_ process (a
commercial backup tool) would be able to backup the index files in a
consistent way...I am afraid I was asking too much.

If I manage to coordinate with the external process so it signals it
wants to do a backup, I take the snapshot in the indexing process, and
it then copies the files, and then I release the snapshot that will
work no?

javi

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: SnapshotDeletionPolicy usage

Michael McCandless-2

jm wrote:

> Thanks for the reply Mike.
>
>> You can also wrap any other deletion policy (it doesn't have to be
>> KeepOnlyLastCommit).
>>
>> When you want to do a backup, make sure to do try/finally, ie:
>>
>>    IndexCommitPoint cp = dp.snapshot();
>>    try {
>>      Collection files = cp.getFileNames();
>>      <do copying here>
>>    } finally {
>>      dp.release();
>>    }
>
> Ohhh, I was hoping that by using that policy any _external_ process (a
> commercial backup tool) would be able to backup the index files in a
> consistent way...I am afraid I was asking too much.

Well, an external process doesn't have enough information.

EG if you have a reader on the index, it could be keeping files  
around (because it has them open) that are not actually referenced by  
the most recent commit to the index.  You don't need to back up those  
files.  So you really need to use the current writer to have it tell  
you which files are referenced.

And, if you keep adding docs via the writer, and merges complete, the  
index files will be changing while you're doing the backup (but, not  
the files referenced by your snapshot).

> If I manage to coordinate with the external process so it signals it
> wants to do a backup, I take the snapshot in the indexing process, and
> it then copies the files, and then I release the snapshot that will
> work no?

Yes, that will work.  Once you have the list of files, you can send  
it to any external  tool (rsync, tar, robocopy, cp, etc.) to have  
them actually do the copying.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]