[Apache Solr ReRanking] Sort Clauses Bug

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Apache Solr ReRanking] Sort Clauses Bug

Alessandro Benedetti
Hi all,
I was playing a bit with the reranking capability and I discovered that:

*Sort by score, then by secondary field -> OK*
http://localhost:8983/solr/books/select?q=vegeta ssj&*sort=score
desc,downloads desc*&fl=id,title,score,downloads

*ReRank, Sort by score, then by secondary field -> KO*
http://localhost:8983/solr/books/select?q=*:*&rq={!rerank reRankQuery=$rqq
reRankDocs=1200 reRankWeight=3}&rqq=(vegeta ssj)&*sort=score desc,downloads
desc*&fl=id,title,score,downloads

Is this intended? It sounds counter-intuitive to me and I wanted to check
before opening a Jira issue
Tested on 8.1.1 but it should be in master as well.

Regards
--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
www.sease.io
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: [Apache Solr ReRanking] Sort Clauses Bug

Erick Erickson
Hmmm, can we see a bit of sample output? I always have to read this backwards, the outer query results are sent to the inner query, so my _guess_ is that the sort is applied to the “q=*:*” and then the top 1,200 are sorted by score by the rerank. But then I’m often confused about this.

Erick

> On Sep 25, 2019, at 5:47 PM, Alessandro Benedetti <[hidden email]> wrote:
>
> Hi all,
> I was playing a bit with the reranking capability and I discovered that:
>
> *Sort by score, then by secondary field -> OK*
> http://localhost:8983/solr/books/select?q=vegeta ssj&*sort=score
> desc,downloads desc*&fl=id,title,score,downloads
>
> *ReRank, Sort by score, then by secondary field -> KO*
> http://localhost:8983/solr/books/select?q=*:*&rq={!rerank reRankQuery=$rqq
> reRankDocs=1200 reRankWeight=3}&rqq=(vegeta ssj)&*sort=score desc,downloads
> desc*&fl=id,title,score,downloads
>
> Is this intended? It sounds counter-intuitive to me and I wanted to check
> before opening a Jira issue
> Tested on 8.1.1 but it should be in master as well.
>
> Regards
> --------------------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> www.sease.io

Reply | Threaded
Open this post in threaded view
|

Re: [Apache Solr ReRanking] Sort Clauses Bug

Alessandro Benedetti
In the first OK scenario, the search results are sorted with score desc,
and when the score is identical, the secondary sort field is applied.

In the KO scenario, only score desc is taken into consideration(the
reranked score) , the secondary sort by the sort field is ignored.

I suspect an intuitive expected result would be to have the same behaviour
that happens with no reranking, so:
1) sort of the final results by reranked score desc
2) when identical raranked score, sort by secondat sort field

Is it clearer?
Any wrong assumption?


On Thu, 26 Sep 2019, 14:34 Erick Erickson, <[hidden email]> wrote:

> Hmmm, can we see a bit of sample output? I always have to read this
> backwards, the outer query results are sent to the inner query, so my
> _guess_ is that the sort is applied to the “q=*:*” and then the top 1,200
> are sorted by score by the rerank. But then I’m often confused about this.
>
> Erick
>
> > On Sep 25, 2019, at 5:47 PM, Alessandro Benedetti <[hidden email]>
> wrote:
> >
> > Hi all,
> > I was playing a bit with the reranking capability and I discovered that:
> >
> > *Sort by score, then by secondary field -> OK*
> > http://localhost:8983/solr/books/select?q=vegeta ssj&*sort=score
> > desc,downloads desc*&fl=id,title,score,downloads
> >
> > *ReRank, Sort by score, then by secondary field -> KO*
> > http://localhost:8983/solr/books/select?q=*:*&rq={!rerank
> reRankQuery=$rqq
> > reRankDocs=1200 reRankWeight=3}&rqq=(vegeta ssj)&*sort=score
> desc,downloads
> > desc*&fl=id,title,score,downloads
> >
> > Is this intended? It sounds counter-intuitive to me and I wanted to check
> > before opening a Jira issue
> > Tested on 8.1.1 but it should be in master as well.
> >
> > Regards
> > --------------------------
> > Alessandro Benedetti
> > Search Consultant, R&D Software Engineer, Director
> > www.sease.io
>
>
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: [Apache Solr ReRanking] Sort Clauses Bug

Erick Erickson
OK so to restate, you expect the sort specified to be applied to both the “outer” and “inner” queries. Makes sense, seems like a good enhancement.

Hmm, I wonder if you can put the sort parameter in with the rerank specification, like: q={!rerank reRankQuery=$rqq reRankDocs=1200 reRankWeight=3 sort="score desc, downloads desc”}

That doesn’t address your initial point, just curious if it’d do as a workaround meanwhile.

Best,
Erick


> On Sep 26, 2019, at 10:54 AM, Alessandro Benedetti <[hidden email]> wrote:
>
> In the first OK scenario, the search results are sorted with score desc,
> and when the score is identical, the secondary sort field is applied.
>
> In the KO scenario, only score desc is taken into consideration(the
> reranked score) , the secondary sort by the sort field is ignored.
>
> I suspect an intuitive expected result would be to have the same behaviour
> that happens with no reranking, so:
> 1) sort of the final results by reranked score desc
> 2) when identical raranked score, sort by secondat sort field
>
> Is it clearer?
> Any wrong assumption?
>
>
> On Thu, 26 Sep 2019, 14:34 Erick Erickson, <[hidden email]> wrote:
>
>> Hmmm, can we see a bit of sample output? I always have to read this
>> backwards, the outer query results are sent to the inner query, so my
>> _guess_ is that the sort is applied to the “q=*:*” and then the top 1,200
>> are sorted by score by the rerank. But then I’m often confused about this.
>>
>> Erick
>>
>>> On Sep 25, 2019, at 5:47 PM, Alessandro Benedetti <[hidden email]>
>> wrote:
>>>
>>> Hi all,
>>> I was playing a bit with the reranking capability and I discovered that:
>>>
>>> *Sort by score, then by secondary field -> OK*
>>> http://localhost:8983/solr/books/select?q=vegeta ssj&*sort=score
>>> desc,downloads desc*&fl=id,title,score,downloads
>>>
>>> *ReRank, Sort by score, then by secondary field -> KO*
>>> http://localhost:8983/solr/books/select?q=*:*&rq={!rerank
>> reRankQuery=$rqq
>>> reRankDocs=1200 reRankWeight=3}&rqq=(vegeta ssj)&*sort=score
>> desc,downloads
>>> desc*&fl=id,title,score,downloads
>>>
>>> Is this intended? It sounds counter-intuitive to me and I wanted to check
>>> before opening a Jira issue
>>> Tested on 8.1.1 but it should be in master as well.
>>>
>>> Regards
>>> --------------------------
>>> Alessandro Benedetti
>>> Search Consultant, R&D Software Engineer, Director
>>> www.sease.io
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [Apache Solr ReRanking] Sort Clauses Bug

Alessandro Benedetti
Personally I was expecting the sort request parameter to be applied on the
final search results:
1) run original query, get top K based on score
2) run re rank query on the top K, recalculate the scores
3) finally apply the sort

But when you mentioned "you expect the sort specified to be applied to both
the “outer” and “inner” queries",
I changed my mind, it is probably a better solution to give the user a nice
flexibility on controlling both the original query sort (to affect the top
K retrieval) and the final sort (the one sorting the reranked results).

*Currently the 'sort' global request parameter affects the way the top K
are retrieved, then they are re-ranked.*
Unfortunately the workaround you suggested through the local params of the
rerank query parser doesn't seem to work at all in 8.1.1 :(
Unless it was introduced in 8.2 I think it is a good idea to create the
jira issue, with this in mind:
1) we want to be able to decide the sort for both the original query(to
assess the top K) and the final results
2) we need to decide which request parameter should do what
e.g.
should the 'sort' request param affect *the original query* OR the final
results?
should the 'sort' in the local params of the reRank query parser affect
 the original query OR *the final results*?

In bold my personal preference, but I don't have any hard position in
regards.

Cheers
--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
www.sease.io


On Thu, Sep 26, 2019 at 5:23 PM Erick Erickson <[hidden email]>
wrote:

> OK so to restate, you expect the sort specified to be applied to both the
> “outer” and “inner” queries. Makes sense, seems like a good enhancement.
>
> Hmm, I wonder if you can put the sort parameter in with the rerank
> specification, like: q={!rerank reRankQuery=$rqq reRankDocs=1200
> reRankWeight=3 sort="score desc, downloads desc”}
>
> That doesn’t address your initial point, just curious if it’d do as a
> workaround meanwhile.
>
> Best,
> Erick
>
>
> > On Sep 26, 2019, at 10:54 AM, Alessandro Benedetti <[hidden email]>
> wrote:
> >
> > In the first OK scenario, the search results are sorted with score desc,
> > and when the score is identical, the secondary sort field is applied.
> >
> > In the KO scenario, only score desc is taken into consideration(the
> > reranked score) , the secondary sort by the sort field is ignored.
> >
> > I suspect an intuitive expected result would be to have the same
> behaviour
> > that happens with no reranking, so:
> > 1) sort of the final results by reranked score desc
> > 2) when identical raranked score, sort by secondat sort field
> >
> > Is it clearer?
> > Any wrong assumption?
> >
> >
> > On Thu, 26 Sep 2019, 14:34 Erick Erickson, <[hidden email]>
> wrote:
> >
> >> Hmmm, can we see a bit of sample output? I always have to read this
> >> backwards, the outer query results are sent to the inner query, so my
> >> _guess_ is that the sort is applied to the “q=*:*” and then the top
> 1,200
> >> are sorted by score by the rerank. But then I’m often confused about
> this.
> >>
> >> Erick
> >>
> >>> On Sep 25, 2019, at 5:47 PM, Alessandro Benedetti <
> [hidden email]>
> >> wrote:
> >>>
> >>> Hi all,
> >>> I was playing a bit with the reranking capability and I discovered
> that:
> >>>
> >>> *Sort by score, then by secondary field -> OK*
> >>> http://localhost:8983/solr/books/select?q=vegeta ssj&*sort=score
> >>> desc,downloads desc*&fl=id,title,score,downloads
> >>>
> >>> *ReRank, Sort by score, then by secondary field -> KO*
> >>> http://localhost:8983/solr/books/select?q=*:*&rq={!rerank
> >> reRankQuery=$rqq
> >>> reRankDocs=1200 reRankWeight=3}&rqq=(vegeta ssj)&*sort=score
> >> desc,downloads
> >>> desc*&fl=id,title,score,downloads
> >>>
> >>> Is this intended? It sounds counter-intuitive to me and I wanted to
> check
> >>> before opening a Jira issue
> >>> Tested on 8.1.1 but it should be in master as well.
> >>>
> >>> Regards
> >>> --------------------------
> >>> Alessandro Benedetti
> >>> Search Consultant, R&D Software Engineer, Director
> >>> www.sease.io
> >>
> >>
>
>
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io