DO NOT REPLY [Bug 23650] - docs out of order

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

DO NOT REPLY [Bug 23650] - docs out of order

Bugzilla from bugzilla@apache.org
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG?
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=23650>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND?
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=23650





------- Additional Comments From [hidden email]  2005-06-14 16:42 -------
More Data Integrity Issue: Docs out of Order

Hi,
Seeing similar issue to the one reported in:
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23650
On examining the segments, following inconsistencies were found
(a) The merging segments had doc number that is greater than maxDoc.
Don't know how it go in this state, but this occurs using standard lucene
code.
(b) Strangely, some documents had terms with zero frequency.  And when it
occurred,
the zero frequency term has several posting as (docid 0)
Example.. (docid freq)  -- MaxDoc = 7749 - NO DELETION.
Merging msgBody; text=it; sz=2  --- The field name is msgBody and term is "it"
                                    and two segments have the term.
(0 0)(0 0)(0 0)..........(0 0)(4 6)(5 3)(6 1)(9 1)(10 2)(12 1)......
...(6791 2)(6794 3)(6796 2)(6798 16)(6801 1)(6805 1)(6806 5)
(6808 1)(6810 1)(6815 2)(6816 3)(6817 1)(6818 1)(6821 4)(6822 1)
(6824 3)(6826 4)(6828 1)(6829 3)(12549 2)doc exceeds count
749(13570 1)doc exceeds count 7749(14896 1)doc exceeds count 7749
(15028 1)doc exceeds count 7749(15357 1)doc exceeds count 7749
(15427 1)doc exceeds count 7749(15534 1)doc exceeds count 7749
(15535 1)doc exceeds count 7749(15653 1)doc exceeds count 7749
(16530 1)doc exceeds count 7749(17108 1).......
(c) Also the zero frequency was not limited to the 0 document, there was
another instance.

One work around that seemed to resolve the issue was to:
(a) keep the maxDoc as a member variable in SegmentMergeInfo
and ignore/throw exception if an inconsistent state is detected.

****ADD To SegmentMerger just before "docs out of order" check.
  if (postings.freq() == 0) {
            continue;
   }
   if (doc >= smi.maxDoc) {
      //sbLog.append("doc exceeds count \r\n " + smi.maxDoc);
      continue;
   }
****

Atleast putting a check would not corrupt the segments and would
get us closer to the real problem as to why freq=0 and doc number exceeds
maxDoc. Note, the code has had the fix to the other Segment corruption issue
that I previously reported (namely, Using a zero length file).


--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]