[jira] Commented: (LUCENE-140) docs out of order

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-140) docs out of order

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463247 ]

Michael McCandless commented on LUCENE-140:
-------------------------------------------


Doron,

> (1) the sequence of ops brought by Jason is wrong:
>   ...
>
> Problem here is that the docIDs found in (b) may be altered in step
> (d) and so step (f) would delete the wrong docs. In particular, it
> might attempt to delete ids that are out of the range. This might
> expose exactly the BitVector problem, and would explain the whole
> thing, but I too cannot see how it explains the delete-by-term case.

Right, the case I fixed only happens when the Lucene
deleteDocument(int docNum) is [slightly] mis-used.  Ie if you are
"playing by the rules" you would never have hit this bug.  And this
particular use case is indeed incorrect: doc numbers are only valid to
the one reader that you got them from.

> I think however that the test Mike added does not expose the docs
> out of order bug - I tried this test without the fix and it only
> fail on the "gotException assert" - if you comment this assert the
> test pass.

Huh, I see my test case (in IndexReader) indeed hitting the original
"docs out of order" exception.  If I take the current trunk and
comment out the (one line) bounds check in BitVector.set and run that
test, it hits the "docs out of order" exception.

Are you sure you updated the change (to tighten the check to a <= from
a <) to index/SegmentMerger.java?  Because, I did indeed find that the
test failed to fail when I first wrote it (but should have).  So in
digging why it didn't fail as expected, I found that the check for
"docs out of order" missed the boundary case of the same doc number
twice in a row.  Once I fixed that, the test failed as expected.

> (3) maxDoc() computation in SegmentReader is based (on some paths)
> in RandomAccessFile.length(). IIRC I saw cases (in previous project)
> where File.length() or RAF.length() (not sure which of the two) did
> not always reflect real length, if the system was very busy IO wise,
> unless FD.sync() was called (with performance hit).

Yes I saw this too.  From the follow-on discussion it sounds like we
haven't found a specific known JVM bug here.  Still, it does make me
nervous that we rely on file length to derive maxDoc.

In general I think we should rely on as little as possible from the
file system (there are so many cross platform issues/differences) and
instead explicitly store things like maxDoc into the index.  I will
open a separate Jira issue to track this.  Also I will record this
path in the instrumentation patch for 1.9.1 just to see if we are
actually hitting something here (I think unlikely but possible).


> docs out of order
> -----------------
>
>                 Key: LUCENE-140
>                 URL: https://issues.apache.org/jira/browse/LUCENE-140
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: unspecified
>         Environment: Operating System: Linux
> Platform: PC
>            Reporter: legez
>         Assigned To: Michael McCandless
>         Attachments: bug23650.txt, corrupted.part1.rar, corrupted.part2.rar
>
>
> Hello,
>   I can not find out, why (and what) it is happening all the time. I got an
> exception:
> java.lang.IllegalStateException: docs out of order
>         at
> org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:219)
>         at
> org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:191)
>         at
> org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:172)
>         at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:135)
>         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
>         at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:341)
>         at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:250)
>         at Optimize.main(Optimize.java:29)
> It happens either in 1.2 and 1.3rc1 (anyway what happened to it? I can not find
> it neither in download nor in version list in this form). Everything seems OK. I
> can search through index, but I can not optimize it. Even worse after this
> exception every time I add new documents and close IndexWriter new segments is
> created! I think it has all documents added before, because of its size.
> My index is quite big: 500.000 docs, about 5gb of index directory.
> It is _repeatable_. I drop index, reindex everything. Afterwards I add a few
> docs, try to optimize and receive above exception.
> My documents' structure is:
>   static Document indexIt(String id_strony, Reader reader, String data_wydania,
> String id_wydania, String id_gazety, String data_wstawienia)
> {
>     Document doc = new Document();
>     doc.add(Field.Keyword("id", id_strony ));
>     doc.add(Field.Keyword("data_wydania", data_wydania));
>     doc.add(Field.Keyword("id_wydania", id_wydania));
>     doc.add(Field.Text("id_gazety", id_gazety));
>     doc.add(Field.Keyword("data_wstawienia", data_wstawienia));
>     doc.add(Field.Text("tresc", reader));
>     return doc;
> }
> Sincerely,
> legez

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-140) docs out of order

Doron Cohen
"Michael McCandless (JIRA)" <[hidden email]> wrote on 09/01/2007 03:32:27:

> > I think however that the test Mike added does not expose the docs
> > out of order bug - I tried this test without the fix and it only
> > fail on the "gotException assert" - if you comment this assert the
> > test pass.
>
> Huh, I see my test case (in IndexReader) indeed hitting the original
> "docs out of order" exception.  If I take the current trunk and
> comment out the (one line) bounds check in BitVector.set and run that
> test, it hits the "docs out of order" exception.
>
> Are you sure you updated the change (to tighten the check to a <= from
> a <) to index/SegmentMerger.java?  Because, I did indeed find that the
> test failed to fail when I first wrote it (but should have).  So in
> digging why it didn't fail as expected, I found that the check for
> "docs out of order" missed the boundary case of the same doc number
> twice in a row.  Once I fixed that, the test failed as expected.
>

That's a moving target...:-)
You're right, I ran without the SegmentMerger-tightened-check, imitating
current 1.9.1 "field experience".


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]