Atomicity and AutoCommit

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Atomicity and AutoCommit

Simon Wistow
I currently have a set up that indexes into RAM and then periodically
merges that into a disk based index.

Searches are done from the disk based index and deletes are handled by
keeping a list of deleted documents, filtering out search results and
applying the deletes to the index at merge time.

All this was done to make sure that we didn't corrupt the index (which
we'd seen happen a few times when the indexing machine failed for
whatever reason). With this scheme if the machine fails then all that's
lost is the RAM index and the list of deletes. We then just simply play
back all actions since the last merge and we're back to where we
started.

However it occurred to me that this might all be redundant now with
Lucene 2.3 (it's possible it might have always been redundant come to
think of it) - should I just open a Disk based Index with
autocommit=false and then periodically commit the changes by close()ing
and then re-open()ing the Disk index ? Is that atomic? i.e is there a
situation using this whereby the index could become corrupted?

Thanks,

Simon



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Atomicity and AutoCommit

Michael McCandless-2

When you previously saw corruption was it due to an OS or machine
crash (or power cord got pulled)?  If so, you were likely hitting
LUCENE-1044, which is fixed on the trunk version of Lucene (to be 2.4
at some point) but is not fixed in 2.3.

If that is what you were hitting, then unfortunately neither buffering
updates into RAM nor using autoCommit=false in 2.3 will fully protect
you from this issue.  Though, both of these approaches should reduce
your chance of hitting LUCENE-1044 since they both reduce frequency of
commits to the index.

Mike

Simon Wistow wrote:

> I currently have a set up that indexes into RAM and then periodically
> merges that into a disk based index.
>
> Searches are done from the disk based index and deletes are handled by
> keeping a list of deleted documents, filtering out search results and
> applying the deletes to the index at merge time.
>
> All this was done to make sure that we didn't corrupt the index (which
> we'd seen happen a few times when the indexing machine failed for
> whatever reason). With this scheme if the machine fails then all  
> that's
> lost is the RAM index and the list of deletes. We then just simply  
> play
> back all actions since the last merge and we're back to where we
> started.
>
> However it occurred to me that this might all be redundant now with
> Lucene 2.3 (it's possible it might have always been redundant come to
> think of it) - should I just open a Disk based Index with
> autocommit=false and then periodically commit the changes by close()
> ing
> and then re-open()ing the Disk index ? Is that atomic? i.e is there a
> situation using this whereby the index could become corrupted?
>
> Thanks,
>
> Simon
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Atomicity and AutoCommit

Simon Wistow
On Wed, Feb 27, 2008 at 09:38:55AM -0500, Michael McCandless said:
>
> When you previously saw corruption was it due to an OS or machine
> crash (or power cord got pulled)?  If so, you were likely hitting
> LUCENE-1044, which is fixed on the trunk version of Lucene (to be 2.4
> at some point) but is not fixed in 2.3.

Yes - it's power outages and other unnatural events (sysadmins
accidentally kill -9ing the process) that caused it.

What's the chances of me backporting the fix to 2.3 or should I just
wait for 2.4?

Come 2.4 is my buffering to RAM redundant?

Thanks,

Simon



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Atomicity and AutoCommit

Mark Miller-3
You need to make sure your storage does not lie in response to an fsync
command. If it does (most commercial stuff does), you cannot guaranty no
corruption. Search google for "your harddrive lies to you" or something.

It shouldnt be that hard to take the patch from the issue and apply it
to a checked out version of 2.3 right? I don't think it relies on other
2.4 stuff as there isnt much of it yet.

Simon Wistow wrote:

> On Wed, Feb 27, 2008 at 09:38:55AM -0500, Michael McCandless said:
>  
>> When you previously saw corruption was it due to an OS or machine
>> crash (or power cord got pulled)?  If so, you were likely hitting
>> LUCENE-1044, which is fixed on the trunk version of Lucene (to be 2.4
>> at some point) but is not fixed in 2.3.
>>    
>
> Yes - it's power outages and other unnatural events (sysadmins
> accidentally kill -9ing the process) that caused it.
>
> What's the chances of me backporting the fix to 2.3 or should I just
> wait for 2.4?
>
> Come 2.4 is my buffering to RAM redundant?
>
> Thanks,
>
> Simon
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Atomicity and AutoCommit

Michael McCandless-2
In reply to this post by Simon Wistow

Simon Wistow wrote:

> On Wed, Feb 27, 2008 at 09:38:55AM -0500, Michael McCandless said:
>>
>> When you previously saw corruption was it due to an OS or machine
>> crash (or power cord got pulled)?  If so, you were likely hitting
>> LUCENE-1044, which is fixed on the trunk version of Lucene (to be 2.4
>> at some point) but is not fixed in 2.3.
>
> Yes - it's power outages and other unnatural events (sysadmins
> accidentally kill -9ing the process) that caused it.

OK power outage can definitely cause corruption.  This has been a long
standing, but only recently uncovered, and now fixed in 2.4, issue
(LUCENE-1044).  But I believe kill -9 should not cause corruption.

BTW hot backups, as of 2.3, are now very easy.  Just use
SnapshotDeletionPolicy when creating your writer.  Making frequent
backups is a good safeguard too...

> What's the chances of me backporting the fix to 2.3 or should I just
> wait for 2.4?

It unfortunately was a fairly large change; I'm not sure how cleanly
the patch will apply to 2.3.  Maybe try trunk (but beware: the index
format changed with LUCENE-1044 to add an integrity checksum to
the end of the segments_N file)...

> Come 2.4 is my buffering to RAM redundant?

Well, as Mark said, if your IO system does not lie on fsync, then  
buffering
to RAM is redundant.  If it does lie, you still have open risk of  
corruption and
so buffering to RAM probably reduces (but doesn't eliminate) the risk.

Also, as of 2.3, manually buffering to RAMDirectory should no longer
give a big performance win over just giving that RAM to the
IndexWriter as its buffer instead.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]