[jira] Commented: (LUCENE-140) docs out of order

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-140) docs out of order

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463483 ]

Doron Cohen commented on LUCENE-140:
------------------------------------

Jed, is it possible that when re-creating the index, while IndexWriter is constructed with create=true, FSDirectory is opened with create=false?
I suspect so, because otherwise, old  .del files would have been deleted.
If indeed so, newly created segments, which have same names as segments in previous (bad) runs, when opened, would read the (bad) old .del file.
This would possibly expose the bug fixed by Michael.
I may be over speculating here, but if this is the case, it can also explain why changing the merge factor from 4 to 10 exposed the problem.

In fact, let me speculate even further - if indeed when creating the index from scratch, the FSDirectory is (mistakenly) opened with create=false, as long as you always repeated the same sequencing of adding and deleting docs, you were likely to almost not suffer from this mistake, because segments created with same names as (old) .del files simply see docs as deleted before the docs are actually deleted by the program. The search behaves wrongly, not finding these docs before they are actually deleted, but no exception is thrown when adding docs. However once the merge factor was changed from 4 to 10, the matching between old .del files and new segments (with same names) was broken, and the out-of-order exception appeared.

...and if this is not the case, we would need to look for something else...

> docs out of order
> -----------------
>
>                 Key: LUCENE-140
>                 URL: https://issues.apache.org/jira/browse/LUCENE-140
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: unspecified
>         Environment: Operating System: Linux
> Platform: PC
>            Reporter: legez
>         Assigned To: Michael McCandless
>         Attachments: bug23650.txt, corrupted.part1.rar, corrupted.part2.rar, indexing-failure.log, LUCENE-140-2007-01-09-instrumentation.patch
>
>
> Hello,
>   I can not find out, why (and what) it is happening all the time. I got an
> exception:
> java.lang.IllegalStateException: docs out of order
>         at
> org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:219)
>         at
> org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:191)
>         at
> org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:172)
>         at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:135)
>         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
>         at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:341)
>         at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:250)
>         at Optimize.main(Optimize.java:29)
> It happens either in 1.2 and 1.3rc1 (anyway what happened to it? I can not find
> it neither in download nor in version list in this form). Everything seems OK. I
> can search through index, but I can not optimize it. Even worse after this
> exception every time I add new documents and close IndexWriter new segments is
> created! I think it has all documents added before, because of its size.
> My index is quite big: 500.000 docs, about 5gb of index directory.
> It is _repeatable_. I drop index, reindex everything. Afterwards I add a few
> docs, try to optimize and receive above exception.
> My documents' structure is:
>   static Document indexIt(String id_strony, Reader reader, String data_wydania,
> String id_wydania, String id_gazety, String data_wstawienia)
> {
>     Document doc = new Document();
>     doc.add(Field.Keyword("id", id_strony ));
>     doc.add(Field.Keyword("data_wydania", data_wydania));
>     doc.add(Field.Keyword("id_wydania", id_wydania));
>     doc.add(Field.Text("id_gazety", id_gazety));
>     doc.add(Field.Keyword("data_wstawienia", data_wstawienia));
>     doc.add(Field.Text("tresc", reader));
>     return doc;
> }
> Sincerely,
> legez

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]