Lucene 8.7 error searching an index created with 8.3

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene 8.7 error searching an index created with 8.3

Nicolás Lichtmaier-2
I'm seeing errors like this one (using backwards codecs):

java.lang.ArrayIndexOutOfBoundsException: Index 69 out of bounds for
length 33
     at
org.apache.lucene.codecs.lucene50.ForUtil.readBlock(ForUtil.java:196)
     at
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.refillPositions(Lucene50PostingsReader.java:721)
     at
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:924)
     at
org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
     at
org.apache.lucene.search.SloppyPhraseMatcher.advancePP(SloppyPhraseMatcher.java:262)
     at
org.apache.lucene.search.SloppyPhraseMatcher.nextMatch(SloppyPhraseMatcher.java:173)
     at
org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:58)
     at
org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
     at
org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
     at
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
     at
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
     at
org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
     at
org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
     at
org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:67)
     at
org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:194)
     at
org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
     at
org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
     at
org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
     at
org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
     at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
     at
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
     at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
     at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661)
     at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
     at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
     at
org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
     at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)

They seem to be connected with double values stored as "docvalues" and
user in formulas to affect the scores.

Is there any known incompatibility? Is this something that should work?
Must I rebuild the indices with 8.7? (that would be bad for our usecase
here)

Thanks!


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene 8.7 error searching an index created with 8.3

Adrien Grand
Can you run CheckIndex on your index to make sure it is not corrupt?

On Tue, Nov 24, 2020 at 1:01 AM Nicolás Lichtmaier
<[hidden email]> wrote:

> I'm seeing errors like this one (using backwards codecs):
>
> java.lang.ArrayIndexOutOfBoundsException: Index 69 out of bounds for
> length 33
>      at
> org.apache.lucene.codecs.lucene50.ForUtil.readBlock(ForUtil.java:196)
>      at
>
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.refillPositions(Lucene50PostingsReader.java:721)
>      at
>
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:924)
>      at
>
> org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
>      at
>
> org.apache.lucene.search.SloppyPhraseMatcher.advancePP(SloppyPhraseMatcher.java:262)
>      at
>
> org.apache.lucene.search.SloppyPhraseMatcher.nextMatch(SloppyPhraseMatcher.java:173)
>      at
> org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:58)
>      at
>
> org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
>      at
>
> org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
>      at
> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>      at
> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>      at
>
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
>      at
>
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
>      at
>
> org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:67)
>      at
>
> org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:194)
>      at
>
> org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
>      at
>
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
>      at
>
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
>      at
>
> org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
>      at
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
>      at
> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
>      at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>      at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661)
>      at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
>      at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
>      at
> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
>      at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
>
> They seem to be connected with double values stored as "docvalues" and
> user in formulas to affect the scores.
>
> Is there any known incompatibility? Is this something that should work?
> Must I rebuild the indices with 8.7? (that would be bad for our usecase
> here)
>
> Thanks!
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
Adrien
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 8.7 error searching an index created with 8.3

Nicolás Lichtmaier-2
Lucene 8.7's CheckIndex says there are no errors in the index.

On closer inspection this seems related to phrase matching...

El 24/11/20 a las 05:18, Adrien Grand escribió:

> Can you run CheckIndex on your index to make sure it is not corrupt?
>
> On Tue, Nov 24, 2020 at 1:01 AM Nicolás Lichtmaier
> <[hidden email]> wrote:
>
>> I'm seeing errors like this one (using backwards codecs):
>>
>> java.lang.ArrayIndexOutOfBoundsException: Index 69 out of bounds for
>> length 33
>>       at
>> org.apache.lucene.codecs.lucene50.ForUtil.readBlock(ForUtil.java:196)
>>       at
>>
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.refillPositions(Lucene50PostingsReader.java:721)
>>       at
>>
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:924)
>>       at
>>
>> org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
>>       at
>>
>> org.apache.lucene.search.SloppyPhraseMatcher.advancePP(SloppyPhraseMatcher.java:262)
>>       at
>>
>> org.apache.lucene.search.SloppyPhraseMatcher.nextMatch(SloppyPhraseMatcher.java:173)
>>       at
>> org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:58)
>>       at
>>
>> org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
>>       at
>>
>> org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
>>       at
>> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>>       at
>> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>>       at
>>
>> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
>>       at
>>
>> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
>>       at
>>
>> org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:67)
>>       at
>>
>> org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:194)
>>       at
>>
>> org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
>>       at
>>
>> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
>>       at
>>
>> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
>>       at
>>
>> org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
>>       at
>> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
>>       at
>> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
>>       at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>>       at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661)
>>       at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
>>       at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
>>       at
>> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
>>       at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
>>
>> They seem to be connected with double values stored as "docvalues" and
>> user in formulas to affect the scores.
>>
>> Is there any known incompatibility? Is this something that should work?
>> Must I rebuild the indices with 8.7? (that would be bad for our usecase
>> here)
>>
>> Thanks!
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene 8.7 error searching an index created with 8.3

Adrien Grand
This is related to phrase matching indeed. Positions are stored in blocks
of 128 values, where every block is encoded with a different number of bits
per value. And the error you are seeing suggests that one block reports 69
bits per value.

The fact that CheckIndex didn't complain is surprising. Did you only verify
checksums (the -fast option) or did you run the full CheckIndex?

Is your problem reproducible? If yes, does it still reproduce if you move
to a recent JVM?

On Tue, Nov 24, 2020 at 3:22 PM Nicolás Lichtmaier <[hidden email]>
wrote:

> Lucene 8.7's CheckIndex says there are no errors in the index.
>
> On closer inspection this seems related to phrase matching...
>
> El 24/11/20 a las 05:18, Adrien Grand escribió:
> > Can you run CheckIndex on your index to make sure it is not corrupt?
> >
> > On Tue, Nov 24, 2020 at 1:01 AM Nicolás Lichtmaier
> > <[hidden email]> wrote:
> >
> >> I'm seeing errors like this one (using backwards codecs):
> >>
> >> java.lang.ArrayIndexOutOfBoundsException: Index 69 out of bounds for
> >> length 33
> >>       at
> >> org.apache.lucene.codecs.lucene50.ForUtil.readBlock(ForUtil.java:196)
> >>       at
> >>
> >>
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.refillPositions(Lucene50PostingsReader.java:721)
> >>       at
> >>
> >>
> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:924)
> >>       at
> >>
> >>
> org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
> >>       at
> >>
> >>
> org.apache.lucene.search.SloppyPhraseMatcher.advancePP(SloppyPhraseMatcher.java:262)
> >>       at
> >>
> >>
> org.apache.lucene.search.SloppyPhraseMatcher.nextMatch(SloppyPhraseMatcher.java:173)
> >>       at
> >> org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:58)
> >>       at
> >>
> >>
> org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
> >>       at
> >>
> >>
> org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
> >>       at
> >>
> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
> >>       at
> >>
> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
> >>       at
> >>
> >>
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
> >>       at
> >>
> >>
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
> >>       at
> >>
> >>
> org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:67)
> >>       at
> >>
> >>
> org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:194)
> >>       at
> >>
> >>
> org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
> >>       at
> >>
> >>
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
> >>       at
> >>
> >>
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
> >>       at
> >>
> >>
> org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
> >>       at
> >>
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
> >>       at
> >> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
> >>       at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
> >>       at
> >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661)
> >>       at
> >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
> >>       at
> >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
> >>       at
> >>
> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
> >>       at
> >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
> >>
> >> They seem to be connected with double values stored as "docvalues" and
> >> user in formulas to affect the scores.
> >>
> >> Is there any known incompatibility? Is this something that should work?
> >> Must I rebuild the indices with 8.7? (that would be bad for our usecase
> >> here)
> >>
> >> Thanks!
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
>


--
Adrien
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 8.7 error searching an index created with 8.3

Nicolás Lichtmaier-2
This is reproducible only within our product, I haven't yet been able to
isolate this and reproduce it standalone. It's Java 11.

Yes, I've run CheckIndex with the "-slow" option and with assertions
enabled.

El 24/11/20 a las 11:32, Adrien Grand escribió:

> This is related to phrase matching indeed. Positions are stored in
> blocks of 128 values, where every block is encoded with a different
> number of bits per value. And the error you are seeing suggests that
> one block reports 69 bits per value.
>
> The fact that CheckIndex didn't complain is surprising. Did you only
> verify checksums (the -fast option) or did you run the full CheckIndex?
>
> Is your problem reproducible? If yes, does it still reproduce if you
> move to a recent JVM?
>
> On Tue, Nov 24, 2020 at 3:22 PM Nicolás Lichtmaier
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Lucene 8.7's CheckIndex says there are no errors in the index.
>
>     On closer inspection this seems related to phrase matching...
>
>     El 24/11/20 a las 05:18, Adrien Grand escribió:
>     > Can you run CheckIndex on your index to make sure it is not corrupt?
>     >
>     > On Tue, Nov 24, 2020 at 1:01 AM Nicolás Lichtmaier
>     > <[hidden email]> wrote:
>     >
>     >> I'm seeing errors like this one (using backwards codecs):
>     >>
>     >> java.lang.ArrayIndexOutOfBoundsException: Index 69 out of
>     bounds for
>     >> length 33
>     >>       at
>     >>
>     org.apache.lucene.codecs.lucene50.ForUtil.readBlock(ForUtil.java:196)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.refillPositions(Lucene50PostingsReader.java:721)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:924)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.search.SloppyPhraseMatcher.advancePP(SloppyPhraseMatcher.java:262)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.search.SloppyPhraseMatcher.nextMatch(SloppyPhraseMatcher.java:173)
>     >>       at
>     >>
>     org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:58)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
>     >>       at
>     >>
>     org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>     >>       at
>     >>
>     org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:67)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:194)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
>     >>       at
>     >>
>     >>
>     org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
>     >>       at
>     >>
>     org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
>     >>       at
>     >>
>     org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
>     >>       at
>     org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>     >>       at
>     >>
>     org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661)
>     >>       at
>     >>
>     org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
>     >>       at
>     >>
>     org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
>     >>       at
>     >>
>     org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
>     >>       at
>     >>
>     org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
>     >>
>     >> They seem to be connected with double values stored as
>     "docvalues" and
>     >> user in formulas to affect the scores.
>     >>
>     >> Is there any known incompatibility? Is this something that
>     should work?
>     >> Must I rebuild the indices with 8.7? (that would be bad for our
>     usecase
>     >> here)
>     >>
>     >> Thanks!
>     >>
>     >>
>     >>
>     ---------------------------------------------------------------------
>     >> To unsubscribe, e-mail: [hidden email]
>     <mailto:[hidden email]>
>     >> For additional commands, e-mail:
>     [hidden email]
>     <mailto:[hidden email]>
>     >>
>     >>
>
>
>
> --
> Adrien
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 8.7 error searching an index created with 8.3

Nicolás Lichtmaier-2
I'd like to add that if I enable assertions I get a stack trace like this:


java.lang.AssertionError
     at
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:903)
     at
org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
     at
org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:46)
     at
org.apache.lucene.search.SloppyPhraseMatcher.initSimple(SloppyPhraseMatcher.java:368)
     at
org.apache.lucene.search.SloppyPhraseMatcher.initPhrasePositions(SloppyPhraseMatcher.java:356)
     at
org.apache.lucene.search.SloppyPhraseMatcher.reset(SloppyPhraseMatcher.java:153)
     at
org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:49)
     at
org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
     at
org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
     at
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
     at
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
     at
org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
     at
org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
     at
org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:67)
     at
org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:194)
     at
org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
     at
org.apache.lucene.queries.function.ValueSource$WrappedDoubleValuesSource$1.advanceExact(ValueSource.java:215)
     at
com.wolfram.textsearch.MultiplicationDoubleValuesSource$1.advanceExact(MultiplicationDoubleValuesSource.java:60)
     ... 27 more

Meaning that posPendingCount in Lucene50PostingsReader is 0 when
nextPosition() is called.

At the point the assertion fails these are the other values in this object:


> encoded = {byte[512]@2705} [0, 112, 7, 20, -48, -8, 16, 96, -99, 25,
> +502 more]
> docDeltaBuffer = {int[147]@2706} [1164, 2, 506, 183, 3, 190, 1, 1, 1,
> 57, +137 more]
> freqBuffer = {int[147]@2707} [1, 1, 1, 1, 1, 2, 1, 2, 1, 3, +137 more]
> posDeltaBuffer = {int[147]@2708} [7, 7, 333, 248, 262, 157, 414, 104,
> 157, 409, +137 more]
> payloadLengthBuffer = null
> offsetStartDeltaBuffer = null
> offsetLengthBuffer = null
> payloadBytes = null
> payloadByteUpto = 0
> payloadLength = 0
> lastStartOffset = 0
> startOffset = -1
> endOffset = -1
> docBufferUpto = 3
> posBufferUpto = 3
> skipper = null
> skipped = false
> startDocIn = {ByteBufferIndexInput$SingleBufferImpl@2709}
> "MMapIndexInput(path="/usr/local/Wolfram/Mathematica/12.3/Documentation/English/SearchIndex/3/_0.cfs")
> [slice=_0_Lucene50_0.doc]"
> docIn = {ByteBufferIndexInput$SingleBufferImpl@2710}
> "MMapIndexInput(path="/usr/local/Wolfram/Mathematica/12.3/Documentation/English/SearchIndex/3/_0.cfs")
> [slice=_0_Lucene50_0.doc]"
> posIn = {ByteBufferIndexInput$SingleBufferImpl@2704}
> "MMapIndexInput(path="/usr/local/Wolfram/Mathematica/12.3/Documentation/English/SearchIndex/3/_0.cfs")
> [slice=_0_Lucene50_0.pos]"
> payIn = null
> payload = null
> indexHasOffsets = false
> indexHasPayloads = false
> docFreq = 69
> totalTermFreq = 146
> docUpto = 3
> doc = 1672
> accum = 1672
> freq = 1
> position = 333
> posPendingCount = 0
> posPendingFP = -1
> payPendingFP = 0
> docTermStartFP = 683174
> posTermStartFP = 236756
> payTermStartFP = 0
> lastPosBlockFP = 236949
> skipOffset = -1
> nextSkipDoc = 2147483647
> needsOffsets = false
> needsPayloads = false
> singletonDocID = -1

Maybe this information is useful to see what's going on, or at least add
some code somewhere to help clarify this issue.

Thanks!


El 24/11/20 a las 11:36, Nicolás Lichtmaier escribió:

> This is reproducible only within our product, I haven't yet been able
> to isolate this and reproduce it standalone. It's Java 11.
>
> Yes, I've run CheckIndex with the "-slow" option and with assertions
> enabled.
>
> El 24/11/20 a las 11:32, Adrien Grand escribió:
>> This is related to phrase matching indeed. Positions are stored in
>> blocks of 128 values, where every block is encoded with a different
>> number of bits per value. And the error you are seeing suggests that
>> one block reports 69 bits per value.
>>
>> The fact that CheckIndex didn't complain is surprising. Did you only
>> verify checksums (the -fast option) or did you run the full CheckIndex?
>>
>> Is your problem reproducible? If yes, does it still reproduce if you
>> move to a recent JVM?
>>
>> On Tue, Nov 24, 2020 at 3:22 PM Nicolás Lichtmaier
>> <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>     Lucene 8.7's CheckIndex says there are no errors in the index.
>>
>>     On closer inspection this seems related to phrase matching...
>>
>>     El 24/11/20 a las 05:18, Adrien Grand escribió:
>>     > Can you run CheckIndex on your index to make sure it is not
>> corrupt?
>>     >
>>     > On Tue, Nov 24, 2020 at 1:01 AM Nicolás Lichtmaier
>>     > <[hidden email]> wrote:
>>     >
>>     >> I'm seeing errors like this one (using backwards codecs):
>>     >>
>>     >> java.lang.ArrayIndexOutOfBoundsException: Index 69 out of
>>     bounds for
>>     >> length 33
>>     >>       at
>>     >>
>> org.apache.lucene.codecs.lucene50.ForUtil.readBlock(ForUtil.java:196)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.refillPositions(Lucene50PostingsReader.java:721)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$EverythingEnum.nextPosition(Lucene50PostingsReader.java:924)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.SloppyPhraseMatcher.advancePP(SloppyPhraseMatcher.java:262)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.SloppyPhraseMatcher.nextMatch(SloppyPhraseMatcher.java:173)
>>     >>       at
>>     >>
>> org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:58)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
>>     >>       at
>>     >>
>> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>>     >>       at
>>     >>
>> org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:67)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:194)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
>>     >>       at
>>     >>
>>     >>
>> org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
>>     >>       at
>>     >>
>> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
>>     >>       at
>>     >>
>> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
>>     >>       at
>> org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>>     >>       at
>>     >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661)
>>     >>       at
>>     >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
>>     >>       at
>>     >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
>>     >>       at
>>     >>
>> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
>>     >>       at
>>     >>
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
>>     >>
>>     >> They seem to be connected with double values stored as
>>     "docvalues" and
>>     >> user in formulas to affect the scores.
>>     >>
>>     >> Is there any known incompatibility? Is this something that
>>     should work?
>>     >> Must I rebuild the indices with 8.7? (that would be bad for our
>>     usecase
>>     >> here)
>>     >>
>>     >> Thanks!
>>     >>
>>     >>
>>     >>
>> ---------------------------------------------------------------------
>>     >> To unsubscribe, e-mail: [hidden email]
>>     <mailto:[hidden email]>
>>     >> For additional commands, e-mail:
>>     [hidden email]
>>     <mailto:[hidden email]>
>>     >>
>>     >>
>>
>>
>>
>> --
>> Adrien
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]