[jira] Commented: (LUCENE-140) docs out of order

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] Commented: (LUCENE-140) docs out of order

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463243 ]

Michael McCandless commented on LUCENE-140:

Jed, thanks for testing the fix!

> Alas, this doesn't appear to be the problem. We are still getting
> it, but we do at least have a little more info. We added the doc and
> lastDoc to the IllegalArgEx and we are getting very strange numbers:
> java.lang.IllegalStateException: docs out of order (-1764 < 0)
> ...
> where doc = -1764 and lastDoc is zero

OK, so we've definitely got something else at play here (bummer!). That
(negative doc number) is very strange.  I will keep looking a this.  I
will prepare a patch on 1.9.1 that adds some more instrumentation so
we can get more details when you hit this ...

> We do only use the deleteDocuments(Term) method, so we are not sure
> whether this will truly fix our problem, but we note that that
> method calls deleteDocument(int) based on the TermDocs returned for
> the Term - and maybe they can be incorrect???

Right, but I had thought the docNum's coming in from this path would
be correct.  It sounds like we have another issue at play here that
can somehow get even these doc numbers messed up.

> Out of interest, apart from changing from 1.4.3 to 1.9.1, in the
> JIRA 3.7 release we changed our default merge factor to 4 from
> 10. We hadn't seen this problem before, and suddenly we have had a
> reasonable number of occurrences.

Interesting.  Maybe try changing back to 4 and see if it suppresses
the bug?  Might give us [desperately needed] more data to cling to
here!  On the 1.4.3 -> 1.9.1 change, some of the cases above were even
pre-1.4.x (though they could have been from yet another root cause or
maybe filesystem) so it's hard to draw hard conclusions on this

> docs out of order
> -----------------
>                 Key: LUCENE-140
>                 URL: https://issues.apache.org/jira/browse/LUCENE-140
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: unspecified
>         Environment: Operating System: Linux
> Platform: PC
>            Reporter: legez
>         Assigned To: Michael McCandless
>         Attachments: bug23650.txt, corrupted.part1.rar, corrupted.part2.rar
> Hello,
>   I can not find out, why (and what) it is happening all the time. I got an
> exception:
> java.lang.IllegalStateException: docs out of order
>         at
> org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:219)
>         at
> org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:191)
>         at
> org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:172)
>         at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:135)
>         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
>         at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:341)
>         at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:250)
>         at Optimize.main(Optimize.java:29)
> It happens either in 1.2 and 1.3rc1 (anyway what happened to it? I can not find
> it neither in download nor in version list in this form). Everything seems OK. I
> can search through index, but I can not optimize it. Even worse after this
> exception every time I add new documents and close IndexWriter new segments is
> created! I think it has all documents added before, because of its size.
> My index is quite big: 500.000 docs, about 5gb of index directory.
> It is _repeatable_. I drop index, reindex everything. Afterwards I add a few
> docs, try to optimize and receive above exception.
> My documents' structure is:
>   static Document indexIt(String id_strony, Reader reader, String data_wydania,
> String id_wydania, String id_gazety, String data_wstawienia)
> {
>     Document doc = new Document();
>     doc.add(Field.Keyword("id", id_strony ));
>     doc.add(Field.Keyword("data_wydania", data_wydania));
>     doc.add(Field.Keyword("id_wydania", id_wydania));
>     doc.add(Field.Text("id_gazety", id_gazety));
>     doc.add(Field.Keyword("data_wstawienia", data_wstawienia));
>     doc.add(Field.Text("tresc", reader));
>     return doc;
> }
> Sincerely,
> legez

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]