[jira] Created: (LUCENE-992) IndexWriter.updateDocument is no longer atomic

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-992) IndexWriter.updateDocument is no longer atomic

JIRA jira@apache.org
IndexWriter.updateDocument is no longer atomic
----------------------------------------------

                 Key: LUCENE-992
                 URL: https://issues.apache.org/jira/browse/LUCENE-992
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.2
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Minor
             Fix For: 2.3


Spinoff from LUCENE-847.

Ning caught that as of LUCENE-843, we lost the atomicity of the delete
+ add in IndexWriter.updateDocument.

Ning suggested a simple fix: move the buffered deletes into
DocumentsWriter and let it do the delete + add atomically.  This has a
nice side effect of also consolidating the "time to flush" logic in
DocumentsWriter.



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-992) IndexWriter.updateDocument is no longer atomic

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-992:
--------------------------------------

    Lucene Fields: [New, Patch Available]  (was: [New])

> IndexWriter.updateDocument is no longer atomic
> ----------------------------------------------
>
>                 Key: LUCENE-992
>                 URL: https://issues.apache.org/jira/browse/LUCENE-992
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-992.patch
>
>
> Spinoff from LUCENE-847.
> Ning caught that as of LUCENE-843, we lost the atomicity of the delete
> + add in IndexWriter.updateDocument.
> Ning suggested a simple fix: move the buffered deletes into
> DocumentsWriter and let it do the delete + add atomically.  This has a
> nice side effect of also consolidating the "time to flush" logic in
> DocumentsWriter.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-992) IndexWriter.updateDocument is no longer atomic

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-992:
--------------------------------------

    Attachment: LUCENE-992.patch

Attached patch.

I added a unit test that runs 2 indexing threads (calling
updateDocument) and 2 reader threads and asserts in the reader threads
that the number of documents never changes.

I also slightly changed the exception semantics in IndexWriter:
previously if a disk full (or other) exception was hit when flushing
the buffered docs, the buffered deletes were retained but the
partially flushed buffered docs (if any) were discarded.  I think this
was actually a bug because the buffered deletes must also be discarded
since they refer to document numbers that are no longer valid.  So I
changed it to also clear buffered deletes on exception, and had to
change one unit test (TestIndexWriterDelete) to match this.


> IndexWriter.updateDocument is no longer atomic
> ----------------------------------------------
>
>                 Key: LUCENE-992
>                 URL: https://issues.apache.org/jira/browse/LUCENE-992
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-992.patch
>
>
> Spinoff from LUCENE-847.
> Ning caught that as of LUCENE-843, we lost the atomicity of the delete
> + add in IndexWriter.updateDocument.
> Ning suggested a simple fix: move the buffered deletes into
> DocumentsWriter and let it do the delete + add atomically.  This has a
> nice side effect of also consolidating the "time to flush" logic in
> DocumentsWriter.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-992) IndexWriter.updateDocument is no longer atomic

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525271 ]

Ning Li commented on LUCENE-992:
--------------------------------

The patch looks good! A few comments and/or observations:

  - addDocument(Document doc, Analyzer analyzer, Term delTerm): is it better to name it updateDocument?

  - I didn't check all the variable accesses in DocumentsWriter, but it seems abort() should lock for some of the variables it accesses. Or make abort() a synchronized method.

  - Observation: Large documents will block small documents from being flushed if addDocument of large documents is called before that of small ones. This is not the case before LUCENE-843.

> I also slightly changed the exception semantics in IndexWriter:
> previously if a disk full (or other) exception was hit when flushing
> the buffered docs, the buffered deletes were retained but the
> partially flushed buffered docs (if any) were discarded.

  - Observation: Before LUCENE-843, both buffered docs and buffered deletes were retained when such an exception occurs. Now both buffered docs and buffered deletes would be discared if an exception is hit.


> IndexWriter.updateDocument is no longer atomic
> ----------------------------------------------
>
>                 Key: LUCENE-992
>                 URL: https://issues.apache.org/jira/browse/LUCENE-992
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-992.patch
>
>
> Spinoff from LUCENE-847.
> Ning caught that as of LUCENE-843, we lost the atomicity of the delete
> + add in IndexWriter.updateDocument.
> Ning suggested a simple fix: move the buffered deletes into
> DocumentsWriter and let it do the delete + add atomically.  This has a
> nice side effect of also consolidating the "time to flush" logic in
> DocumentsWriter.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-992) IndexWriter.updateDocument is no longer atomic

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525377 ]

Michael McCandless commented on LUCENE-992:
-------------------------------------------


>  - addDocument(Document doc, Analyzer analyzer, Term delTerm): is it
>    better to name it updateDocument?

Good, I'll make that change.

> - I didn't check all the variable accesses in DocumentsWriter, but
>   it seems abort() should lock for some of the variables it
>   accesses. Or make abort() a synchronized method.

OK I will make abort synchronized.

> - Observation: Large documents will block small documents from being
>   flushed if addDocument of large documents is called before that of
>   small ones. This is not the case before LUCENE-843.

Right, when multiple documents are in flight at once (because multiple
threads are adding documents), the documents must be flushed in order
of docID.  Each one grabs a unique (sequential) docID at the start
(synchronized), does the indexing un-synchronized, then flushes
(synchronized) but only if it is that documents "turn" to flush (ie it
is the next docID to be written).  So if a large doc grabs docID
first, then a small doc comes through, it's possible for small docs to
finish indexing before large doc does in which case small docs are
buffered, waiting for large doc to flush first.

> I also slightly changed the exception semantics in IndexWriter:
> previously if a disk full (or other) exception was hit when flushing
> the buffered docs, the buffered deletes were retained but the
> partially flushed buffered docs (if any) were discarded.

> - Observation: Before LUCENE-843, both buffered docs and buffered
>   deletes were retained when such an exception occurs. Now both
>   buffered docs and buffered deletes would be discared if an exception
>   is hit.

Right, altough if the exception is hit after the commit point (eg,
while building the compound file) then the buffered docs & deletes
are added to the index.

I plan to commit this in a day or two.


> IndexWriter.updateDocument is no longer atomic
> ----------------------------------------------
>
>                 Key: LUCENE-992
>                 URL: https://issues.apache.org/jira/browse/LUCENE-992
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-992.patch
>
>
> Spinoff from LUCENE-847.
> Ning caught that as of LUCENE-843, we lost the atomicity of the delete
> + add in IndexWriter.updateDocument.
> Ning suggested a simple fix: move the buffered deletes into
> DocumentsWriter and let it do the delete + add atomically.  This has a
> nice side effect of also consolidating the "time to flush" logic in
> DocumentsWriter.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-992) IndexWriter.updateDocument is no longer atomic

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-992.
---------------------------------------

       Resolution: Fixed
    Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

> IndexWriter.updateDocument is no longer atomic
> ----------------------------------------------
>
>                 Key: LUCENE-992
>                 URL: https://issues.apache.org/jira/browse/LUCENE-992
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-992.patch
>
>
> Spinoff from LUCENE-847.
> Ning caught that as of LUCENE-843, we lost the atomicity of the delete
> + add in IndexWriter.updateDocument.
> Ning suggested a simple fix: move the buffered deletes into
> DocumentsWriter and let it do the delete + add atomically.  This has a
> nice side effect of also consolidating the "time to flush" logic in
> DocumentsWriter.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]