Solr 7 MoreLikeThis boost calculation

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr 7 MoreLikeThis boost calculation

Jesse Wang
Hi folks,

 

Looks like LUCENE-5795 (https://github.com/apache/lucene-solr/commit/173a44e67c7c3c1a9ffbe7259ea8b45f1f53b015#diff-d3409eb300a059322d46e4c9f43717ed) changed the “lessThan” condition in FreqQ PriorityQueue to actually be less than in order to only collect top N terms.

 

However, when calculating the boost MoreLikeThis::createQuery() it still uses the PriorityQueue’s first pop() as the bestScore, when that actually would be the least element now. We worked around this in our production instance by iterating thru the entire PriorityQueue to find the largest score (since we only ever use very small N=5).

 

I’m wondering if this is what was intended or a known bug?

 

Thanks,

-Jesse

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Solr 7 MoreLikeThis boost calculation

Alessandro Benedetti
Hi Jesse,
you are correct, the variable 'bestScore' used in the
createQuery(PriorityQueue<ScoreTerm> q) should be "minScore".

it is used to normalise the terms score :
tq = new BoostQuery(tq, boostFactor * myScore / bestScore);
e.g.

Queue -> Term1:100 , Term2:50, Term3:20, Term4:10

The minScore will be 10 and the normalised score will be :
Term1:10 , Term2:5, Term3:2, Term4:1

These values will be used to build the boost term queries.

I see no particular problem with that.
What is your concern ?



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io