[jira] Commented: (LUCENE-140) docs out of order

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-140) docs out of order

JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/LUCENE-140?page=comments#action_12376780 ]

Jason Lambert commented on LUCENE-140:
--------------------------------------

I was having this problem intermittently while indexing over multiple threads and I have found that the following steps can cause this error (with Lucene 1.3 and 1.4.x):

- Open an IndexReader (#1) over an existing index (this reader is used for searching while updating the index)
- Using this reader (#1) do a search for the document(s) that you would like to update; obtain their document ID numbers
- Create an IndexWriter and add several new documents to the index (for me, this writing is done in other threads) (*)
- Close the IndexWriter (*)
- Open another IndexReader (#2) over the index
- Delete the previously found documents by their document ID numbers using reader #2
- Close the #2 reader
- Create another IndexWriter (#2) and re-add the updated documents
- Close the IndexWriter #2
- Close the original IndexReader (#1) and open a new reader for general searching
 
If I ensure that the steps marked with an asterisk (*) do not happen (i.e. using thread synchronization), I never get this error.  Otherwise, it will happen intermittently while closing the second IndexWriter (#2)  (sometimes I get an ArrayIndexOutOfBoundsException during the deletion).  These 'extra' writes cause the initial 'segments' file to be updated after which it is re-read while opening the second IndexReader (#2).

This can end up deleting documents using possibly non-existent IDs, most likely causing the index corruption that this error signals.

> docs out of order
> -----------------
>
>          Key: LUCENE-140
>          URL: http://issues.apache.org/jira/browse/LUCENE-140
>      Project: Lucene - Java
>         Type: Bug

>   Components: Index
>     Versions: unspecified
>  Environment: Operating System: Linux
> Platform: PC
>     Reporter: legez
>     Assignee: Lucene Developers
>  Attachments: bug23650.txt, corrupted.part1.rar, corrupted.part2.rar
>
> Hello,
>   I can not find out, why (and what) it is happening all the time. I got an
> exception:
> java.lang.IllegalStateException: docs out of order
>         at
> org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:219)
>         at
> org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:191)
>         at
> org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:172)
>         at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:135)
>         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
>         at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:341)
>         at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:250)
>         at Optimize.main(Optimize.java:29)
> It happens either in 1.2 and 1.3rc1 (anyway what happened to it? I can not find
> it neither in download nor in version list in this form). Everything seems OK. I
> can search through index, but I can not optimize it. Even worse after this
> exception every time I add new documents and close IndexWriter new segments is
> created! I think it has all documents added before, because of its size.
> My index is quite big: 500.000 docs, about 5gb of index directory.
> It is _repeatable_. I drop index, reindex everything. Afterwards I add a few
> docs, try to optimize and receive above exception.
> My documents' structure is:
>   static Document indexIt(String id_strony, Reader reader, String data_wydania,
> String id_wydania, String id_gazety, String data_wstawienia)
> {
>     Document doc = new Document();
>     doc.add(Field.Keyword("id", id_strony ));
>     doc.add(Field.Keyword("data_wydania", data_wydania));
>     doc.add(Field.Keyword("id_wydania", id_wydania));
>     doc.add(Field.Text("id_gazety", id_gazety));
>     doc.add(Field.Keyword("data_wstawienia", data_wstawienia));
>     doc.add(Field.Text("tresc", reader));
>     return doc;
> }
> Sincerely,
> legez

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]