I'd like to add that if I enable assertions I get a stack trace like this:
java.lang.AssertionError
at
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:903)
at
org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
at
org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:46)
at
org.apache.lucene.search.SloppyPhraseMatcher.initSimple(SloppyPhraseMatcher.java:368)
at
org.apache.lucene.search.SloppyPhraseMatcher.initPhrasePositions(SloppyPhraseMatcher.java:356)
at
org.apache.lucene.search.SloppyPhraseMatcher.reset(SloppyPhraseMatcher.java:153)
at
org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:49)
at
org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
at
org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
at
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
at
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
at
org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
at
org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
at
org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:67)
at
org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:194)
at
org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
at
org.apache.lucene.queries.function.ValueSource$WrappedDoubleValuesSource$1.advanceExact(ValueSource.java:215)
at
com.wolfram.textsearch.MultiplicationDoubleValuesSource$1.advanceExact(MultiplicationDoubleValuesSource.java:60)
... 27 more
Meaning that posPendingCount in Lucene50PostingsReader is 0 when
nextPosition() is called.
At the point the assertion fails these are the other values in this object:
> encoded = {byte[512]@2705} [0, 112, 7, 20, -48, -8, 16, 96, -99, 25,
> +502 more]
> docDeltaBuffer = {int[147]@2706} [1164, 2, 506, 183, 3, 190, 1, 1, 1,
> 57, +137 more]
> freqBuffer = {int[147]@2707} [1, 1, 1, 1, 1, 2, 1, 2, 1, 3, +137 more]
> posDeltaBuffer = {int[147]@2708} [7, 7, 333, 248, 262, 157, 414, 104,
> 157, 409, +137 more]
> payloadLengthBuffer = null
> offsetStartDeltaBuffer = null
> offsetLengthBuffer = null
> payloadBytes = null
> payloadByteUpto = 0
> payloadLength = 0
> lastStartOffset = 0
> startOffset = -1
> endOffset = -1
> docBufferUpto = 3
> posBufferUpto = 3
> skipper = null
> skipped = false
> startDocIn = {ByteBufferIndexInput$SingleBufferImpl@2709}
> "MMapIndexInput(path="/usr/local/Wolfram/Mathematica/12.3/Documentation/English/SearchIndex/3/_0.cfs")
> [slice=_0_Lucene50_0.doc]"
> docIn = {ByteBufferIndexInput$SingleBufferImpl@2710}
> "MMapIndexInput(path="/usr/local/Wolfram/Mathematica/12.3/Documentation/English/SearchIndex/3/_0.cfs")
> [slice=_0_Lucene50_0.doc]"
> posIn = {ByteBufferIndexInput$SingleBufferImpl@2704}
> "MMapIndexInput(path="/usr/local/Wolfram/Mathematica/12.3/Documentation/English/SearchIndex/3/_0.cfs")
> [slice=_0_Lucene50_0.pos]"
> payIn = null
> payload = null
> indexHasOffsets = false
> indexHasPayloads = false
> docFreq = 69
> totalTermFreq = 146
> docUpto = 3
> doc = 1672
> accum = 1672
> freq = 1
> position = 333
> posPendingCount = 0
> posPendingFP = -1
> payPendingFP = 0
> docTermStartFP = 683174
> posTermStartFP = 236756
> payTermStartFP = 0
> lastPosBlockFP = 236949
> skipOffset = -1
> nextSkipDoc = 2147483647
> needsOffsets = false
> needsPayloads = false
> singletonDocID = -1
Maybe this information is useful to see what's going on, or at least add
some code somewhere to help clarify this issue.
Thanks!
El 24/11/20 a las 11:36, Nicolás Lichtmaier escribió:
> This is reproducible only within our product, I haven't yet been able
> to isolate this and reproduce it standalone. It's Java 11.
>
> Yes, I've run CheckIndex with the "-slow" option and with assertions
> enabled.
>
> El 24/11/20 a las 11:32, Adrien Grand escribió:
>> This is related to phrase matching indeed. Positions are stored in
>> blocks of 128 values, where every block is encoded with a different
>> number of bits per value. And the error you are seeing suggests that
>> one block reports 69 bits per value.
>>
>> The fact that CheckIndex didn't complain is surprising. Did you only
>> verify checksums (the -fast option) or did you run the full CheckIndex?
>>
>> Is your problem reproducible? If yes, does it still reproduce if you
>> move to a recent JVM?
>>
>> On Tue, Nov 24, 2020 at 3:22 PM Nicolás Lichtmaier
>> <
[hidden email] <mailto:
[hidden email]>> wrote:
>>
>> Lucene 8.7's CheckIndex says there are no errors in the index.
>>
>> On closer inspection this seems related to phrase matching...
>>
>> El 24/11/20 a las 05:18, Adrien Grand escribió:
>> > Can you run CheckIndex on your index to make sure it is not
>> corrupt?
>> >
>> > On Tue, Nov 24, 2020 at 1:01 AM Nicolás Lichtmaier
>> > <
[hidden email]> wrote:
>> >
>> >> I'm seeing errors like this one (using backwards codecs):
>> >>
>> >> java.lang.ArrayIndexOutOfBoundsException: Index 69 out of
>> bounds for
>> >> length 33
>> >> at
>> >>
>> org.apache.lucene.codecs.lucene50.ForUtil.readBlock(ForUtil.java:196)
>> >> at
>> >>
>> >>
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.refillPositions(Lucene50PostingsReader.java:721)
>> >> at
>> >>
>> >>
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:924)
>> >> at
>> >>
>> >>
>> org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
>> >> at
>> >>
>> >>
>> org.apache.lucene.search.SloppyPhraseMatcher.advancePP(SloppyPhraseMatcher.java:262)
>> >> at
>> >>
>> >>
>> org.apache.lucene.search.SloppyPhraseMatcher.nextMatch(SloppyPhraseMatcher.java:173)
>> >> at
>> >>
>> org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:58)
>> >> at
>> >>
>> >>
>> org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
>> >> at
>> >>
>> >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
>> >> at
>> >>
>> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>> >> at
>> >>
>> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>> >> at
>> >>
>> >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
>> >> at
>> >>
>> >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
>> >> at
>> >>
>> >>
>> org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:67)
>> >> at
>> >>
>> >>
>> org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:194)
>> >> at
>> >>
>> >>
>> org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
>> >> at
>> >>
>> >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
>> >> at
>> >>
>> >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
>> >> at
>> >>
>> >>
>> org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
>> >> at
>> >>
>> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
>> >> at
>> >>
>> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
>> >> at
>> org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>> >> at
>> >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661)
>> >> at
>> >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
>> >> at
>> >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
>> >> at
>> >>
>> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
>> >> at
>> >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
>> >>
>> >> They seem to be connected with double values stored as
>> "docvalues" and
>> >> user in formulas to affect the scores.
>> >>
>> >> Is there any known incompatibility? Is this something that
>> should work?
>> >> Must I rebuild the indices with 8.7? (that would be bad for our
>> usecase
>> >> here)
>> >>
>> >> Thanks!
>> >>
>> >>
>> >>
>> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail:
[hidden email]
>> <mailto:
[hidden email]>
>> >> For additional commands, e-mail:
>>
[hidden email]
>> <mailto:
[hidden email]>
>> >>
>> >>
>>
>>
>>
>> --
>> Adrien
>
---------------------------------------------------------------------
To unsubscribe, e-mail:
[hidden email]
For additional commands, e-mail:
[hidden email]