Re:Possible bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re:Possible bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

Christine Poerschke (BLOOMBERG/ LONDON)
Hello Florin Babes,

Thanks for this detailed report! I agree you experiencing ArrayIndexOutOfBoundsException during SolrFeature computation sounds like a bug, would you like to open a SOLR JIRA issue for it?

Here's some investigative ideas I would have, in no particular order:

Reproducibility: if a failed query is run again, does it also fail second time around (when some caches may be used)?

Data as a factor: is your setup single-sharded or multi-sharded? in a multi-sharded setup if the same query fails on some shards but succeeds on others (and all shards have some documents that match the query) then this could support a theory that a certain combination of data and features leads to the exception.

Feature vs. Model: you mention use of a MultipleAdditiveTrees model, if the same features are used in a LinearModel instead, do the same errors happen? or if no model is used but only feature extraction is done, does that give errors?

Identification of the troublesome feature(s): narrowing down to a single feature or a small combination of features could make it easier to figure out the problem. assuming the existing logging doesn't identify the features, replacing the org.apache.solr.ltr.feature.SolrFeature with a com.mycompany.solr.ltr.feature.MySolrFeature containing instrumentation could provide insights e.g. the existing code [2] logs feature names for UnsupportedOperationException and if it also caught ArrayIndexOutOfBoundsException then it could log the feature name before rethrowing the exception.

Based on your detail below and this [3] conditional in the code probably at least two features will be necessary to hit the issue, but for investigative purposes two features could still be simplified potentially to effectively one feature e.g. if one feature is a SolrFeature and the other is a ValueFeature or if featureA and featureB are both SolrFeature features with _identical_ parameters but different names.

Hope that helps.

Regards,

Christine

[1] https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html#extracting-features
[2] https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.3/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/SolrFeature.java#L243
[3] https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.3/solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRScoringQuery.java#L520-L525

From: [hidden email] At: 01/04/21 17:31:44To:  [hidden email]
Subject: Possible bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

Hello,
We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
we receive an error when we try to compute some SolrFeatures. We didn't
find any pattern of the queries that fail.
Example:
We have the following query raw parameters:
q=lg cx 4k oled 120 hz -> just of many examples
term_dq=lg cx 4k oled 120 hz
rq={!ltr model=model reRankDocs=1000 store=feature_store
efi.term=${term_dq}}
defType=edismax,
mm=2<75%
The features are something like this:
{
      "name":"similarity_query_fileld_1",
      "class":"org.apache.solr.ltr.feature.SolrFeature",
      "params":{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
      "store":"feature_store"
},
{
      "name":"similarity_query_field_2",
      "class":"org.apache.solr.ltr.feature.SolrFeature",
      "params":{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
      "store":"feature_store"
}

We are testing ~6300 production queries and for about 1% of them we receive
that following error message:
"metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
    "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
for length 2",

The stacktrace is :
org.apache.solr.common.SolrException:
java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
9)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
at
org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
omponent.java:1513)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
java:360)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
:214)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
a:1596)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
0)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
0)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580
)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215
)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerC
ollection.java:221)
at
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java
:177)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java
:146)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:500)
at
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnectio
n.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.jav
a:336)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.j
ava:313)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.
java:171)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:12
9)
at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(Reserved
ThreadExecutor.java:375)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:
938)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
for length 2
at
org.apache.lucene.search.DisiPriorityQueue.add(DisiPriorityQueue.java:102)
at
org.apache.lucene.search.MinShouldMatchSumScorer.advanceTail(MinShouldMatchSumSc
orer.java:246)
at
org.apache.lucene.search.MinShouldMatchSumScorer.updateFreq(MinShouldMatchSumSco
rer.java:312)
at
org.apache.lucene.search.MinShouldMatchSumScorer.score(MinShouldMatchSumScorer.j
ava:320)
at
org.apache.solr.ltr.feature.SolrFeature$SolrFeatureWeight$SolrFeatureScorer.scor
e(SolrFeature.java:242)
at
org.apache.solr.ltr.LTRScoringQuery$ModelWeight$ModelScorer$SparseModelScorer.sc
ore(LTRScoringQuery.java:595)
at
org.apache.solr.ltr.LTRScoringQuery$ModelWeight$ModelScorer.score(LTRScoringQuer
y.java:540)
at org.apache.solr.ltr.LTRRescorer.scoreFeatures(LTRRescorer.java:183)
at org.apache.solr.ltr.LTRRescorer.rescore(LTRRescorer.java:122)
at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:119)


We've searched the mailings lists and issues tracker and we didn't find any
bug opened.
Could you please give us a hint of what we can do to fix this?

Thanks,
Florin Babes


Reply | Threaded
Open this post in threaded view
|

Re: Possible bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

Florin Babes
Hello, Christine and thank you for your help!

So, we've investigated further based on your suggestions and have the
following things to note:

Reproducibility: We can reproduce the same queries on multiple runs, with
the same error.
Data as a factor: Our setup is single-sharded, so we can't investigate
further on this.
Feature vs. Model: We've also tried a dummy LinearModel with only two
features and the problem still occurs.
Identification of the troublesome feature(s): We've narrowed our model to
only two features and the problem always occurs (for some queries, not all)
when we have a feature with a mm=1 and a feature with a mm>=3. The problem
also occurs when we only do feature extraction and the problem seems to
always occur on the feature with the bigger mm. The errors seem to be
related to the size of the head DisiPriorityQueue created here:
https://github.com/apache/lucene-solr/blob/branch_8_6/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L107
as the error changes as we change the mm for the second feature:

1 feature with mm=1 and one with mm=3 -> Index 4 out of bounds for length 4
1 feature with mm=1 and one with mm=5 -> Index 2 out of bounds for length 2

You can find below the dummy feature-store.

[
    {
        "store": "dummystore",
        "name": "similarity_name_mm_1",
        "class": "org.apache.solr.ltr.feature.SolrFeature",
        "params": {
            "q": "{!dismax qf=name mm=1}${term}"
        }
    },
    {
        "store": "dummystore",
        "name": "similarity_names_mm_3",
        "class": "org.apache.solr.ltr.feature.SolrFeature",
        "params": {
            "q": "{!dismax qf=name mm=3}${term}"
        }
    }
]

The problem starts occuring in Solr 8.6.0, as we tried multiple versions <
8.6 and >= 8.6 and the problem started on 8.6.0 and we tend to believe it's
because of the following changes:
https://issues.apache.org/jira/browse/SOLR-14364 as they're the only major
changes related to LTR which were introduced in Solr 8.6.0.

I've created a Solr JIRA bug/issue ticket here:
https://issues.apache.org/jira/browse/SOLR-15071

Thank you for your help!

În mar., 5 ian. 2021 la 19:40, Christine Poerschke (BLOOMBERG/ LONDON) <
[hidden email]> a scris:

> Hello Florin Babes,
>
> Thanks for this detailed report! I agree you experiencing
> ArrayIndexOutOfBoundsException during SolrFeature computation sounds like a
> bug, would you like to open a SOLR JIRA issue for it?
>
> Here's some investigative ideas I would have, in no particular order:
>
> Reproducibility: if a failed query is run again, does it also fail second
> time around (when some caches may be used)?
>
> Data as a factor: is your setup single-sharded or multi-sharded? in a
> multi-sharded setup if the same query fails on some shards but succeeds on
> others (and all shards have some documents that match the query) then this
> could support a theory that a certain combination of data and features
> leads to the exception.
>
> Feature vs. Model: you mention use of a MultipleAdditiveTrees model, if
> the same features are used in a LinearModel instead, do the same errors
> happen? or if no model is used but only feature extraction is done, does
> that give errors?
>
> Identification of the troublesome feature(s): narrowing down to a single
> feature or a small combination of features could make it easier to figure
> out the problem. assuming the existing logging doesn't identify the
> features, replacing the org.apache.solr.ltr.feature.SolrFeature with a
> com.mycompany.solr.ltr.feature.MySolrFeature containing instrumentation
> could provide insights e.g. the existing code [2] logs feature names for
> UnsupportedOperationException and if it also caught
> ArrayIndexOutOfBoundsException then it could log the feature name before
> rethrowing the exception.
>
> Based on your detail below and this [3] conditional in the code probably
> at least two features will be necessary to hit the issue, but for
> investigative purposes two features could still be simplified potentially
> to effectively one feature e.g. if one feature is a SolrFeature and the
> other is a ValueFeature or if featureA and featureB are both SolrFeature
> features with _identical_ parameters but different names.
>
> Hope that helps.
>
> Regards,
>
> Christine
>
> [1]
> https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html#extracting-features
> [2]
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.3/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/SolrFeature.java#L243
> [3]
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.3/solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRScoringQuery.java#L520-L525
>
> From: [hidden email] At: 01/04/21 17:31:44To:
> [hidden email]
> Subject: Possible bug on LTR when using solr 8.6.3 - index out of bounds
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>       "name":"similarity_query_fileld_1",
>       "class":"org.apache.solr.ltr.feature.SolrFeature",
>       "params":{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>       "store":"feature_store"
> },
> {
>       "name":"similarity_query_field_2",
>       "class":"org.apache.solr.ltr.feature.SolrFeature",
>       "params":{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>       "store":"feature_store"
> }
>
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>     "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
>
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length
> 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
>
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
>
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
>
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
>
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
> )
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
> 0)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
> )
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
> 0)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580
> )
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215
> )
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerC
> ollection.java:221)
> at
>
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java
> :177)
> at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java
> :146)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
>
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at org.eclipse.jetty.server.Server.handle(Server.java:500)
> at
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
> at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
> at
> org.eclipse.jetty.io
> .AbstractConnection$ReadCallback.succeeded(AbstractConnectio
> n.java:311)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
> at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
> at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.jav
> a:336)
> at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.j
> ava:313)
> at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.
> java:171)
> at
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:12
> 9)
> at
>
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(Reserved
> ThreadExecutor.java:375)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:
> 938)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2
> at
> org.apache.lucene.search.DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> at
>
> org.apache.lucene.search.MinShouldMatchSumScorer.advanceTail(MinShouldMatchSumSc
> orer.java:246)
> at
>
> org.apache.lucene.search.MinShouldMatchSumScorer.updateFreq(MinShouldMatchSumSco
> rer.java:312)
> at
>
> org.apache.lucene.search.MinShouldMatchSumScorer.score(MinShouldMatchSumScorer.j
> ava:320)
> at
>
> org.apache.solr.ltr.feature.SolrFeature$SolrFeatureWeight$SolrFeatureScorer.scor
> e(SolrFeature.java:242)
> at
>
> org.apache.solr.ltr.LTRScoringQuery$ModelWeight$ModelScorer$SparseModelScorer.sc
> ore(LTRScoringQuery.java:595)
> at
>
> org.apache.solr.ltr.LTRScoringQuery$ModelWeight$ModelScorer.score(LTRScoringQuer
> y.java:540)
> at org.apache.solr.ltr.LTRRescorer.scoreFeatures(LTRRescorer.java:183)
> at org.apache.solr.ltr.LTRRescorer.rescore(LTRRescorer.java:122)
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:119)
>
>
> We've searched the mailings lists and issues tracker and we didn't find any
> bug opened.
> Could you please give us a hint of what we can do to fix this?
>
> Thanks,
> Florin Babes
>
>
>