[jira] [Updated] (LUCENE-5263) Deletes may be silently lost if disk fills up and then frees up

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Updated] (LUCENE-5263) Deletes may be silently lost if disk fills up and then frees up

Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-5263:

    Attachment: LUCENE-5263.patch

Thanks Shai, great feedback; I folded it all in.

bq. This is not only about disk-full, but about any transient IOE (e.g. ran out of file handles temporarily) that occur. So maybe change the issue's description (and also the CHANGES entry text)?

I'll fix the issue description, and I added a CHANGES.txt entry.

bq. In TestIndexWriterReader I see this:

Yes, this is intentional (it was helpful debugging to see exactly
where I broke r2.isCurrent()).

bq. You think maybe we should move FakeIOE to MDW so that other tests can use it?

I moved it, and shared from TestIndexWriterReader.java and

bq. In ReaderPool.release: why don't you call writer.checkpoint()?

Right, I needed to increment changeCount (so that the writer writes a
new segments file), plus checkpoint with IFD (so it knows about the
new .del file) but NOT increment SIS.version (because there was in
fact no "actual" change to the index contents, just moving state from
RAM to disk).  I put a comment explaining this, and also factored out
a new method.

bq. Separately, I think it will be a useful utility to have a method like Utils.throwEx(Throwable t)

Good idea!  I factored that out.

bq.  I think you can move the newIWC call to inside the 'if (w == null)'.

OK I fixed that.

bq. I implemented the TODO about sub-classing CMS. Perhaps you'll want to include that in your patch as well:

I folded this in too ... thanks!

Also, separately, I managed to re-attach the fang I previously removed
from testThreadInterruptDeadlock/testTwoThreadsInterruptDeadlock, by
fixing IW.ReaderPool.dropAll to throw any exceptions it hits when save
is true, rather than suppressing them and dropping all readers from
the pool.  I think there was another bug lurking there, because the
pool would lose changes previously if an exc was hit writing deletes
on close, and the app then called close again, and it succeeded.

I also fixed a few tests to clean up their static fields ... they were
sometimes angry during distributed beasting!

> Deletes may be silently lost if disk fills up and then frees up
> ---------------------------------------------------------------
>                 Key: LUCENE-5263
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5263
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 4.6
>         Attachments: LUCENE-5263.patch, LUCENE-5263.patch
> This case is tricky to handle, yet I think realistic: disk fills up
> temporarily, causes an exception in writeLiveDocs, and then the app
> keeps using the IW instance.
> Meanwhile disk later frees up again, IW is closed "successfully".  In
> certain cases, we can silently lose deletes in this case.
> I had already committed
> TestIndexWriterDeletes.testNoLostDeletesOnDiskFull, and Jenkins seems
> happy with it so far, but when I added fangs to the test (cutover to
> RandomIndexWriter from IndexWriter, allow IOE during getReader, add
> randomness to when exc is thrown, etc.), it uncovered some real/nasty
> bugs:
>   * ReaderPool.dropAll was suppressing any exception it hit, because
>     {code}if (priorE != null){code} should instead be {code}if (priorE == null){code}
>   * After a merge, we have to write deletes before committing the
>     segment, because an exception when writing deletes means we need
>     to abort the merge
>   * Several places that were directly calling deleter.checkpoint must
>     also increment the changeCount else on close IW thinks there are
>     no changes and doesn't write a new segments file.
>   * closeInternal was dropping pooled readers after writing the
>     segments file, which would lose deletes still buffered due to a
>     previous exc.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]