Function query to boost scores by a constant if all terms are present

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Function query to boost scores by a constant if all terms are present

Bill Dueber-2
Let me describe what I'm trying to accomplish, first, since what I think is
the solution is almost always wrong. :-)

I'm doing dismax queries with mm set such that not all terms need to match,
e.g. only 2 of 3 query terms need to match.

Most of the time, items that match all three terms will float to the top by
normal ranking, but sometimes there are only two terms that are like a rash
across the record, and they end up with a higher score than some items that
match all three query terms.

I'd like to boost items with all the query terms to the top *without
changing their order*.

My first thought was to use a simple boost query allfields:(a AND b AND c),
but the order of the set of records that contain all three terms changes
when I do that. What I *think* I need to do is basically to say, "Hey, all
the items with all three terms get an extra 40,000 points, but change
nothing else".

I keep thinking I can get what I need with a subquery and map, but keep
failing.

Any advice would be very, very welcome.

 -Bill-



--
Bill Dueber
Library Systems Programmer
University of Michigan Library
Reply | Threaded
Open this post in threaded view
|

Re: Function query to boost scores by a constant if all terms are present

iorixxx
> Most of the time, items that match all three terms will
> float to the top by
> normal ranking, but sometimes there are only two terms that
> are like a rash
> across the record, and they end up with a higher score than
> some items that
> match all three query terms.
>
> I'd like to boost items with all the query terms to the top
> *without
> changing their order*.
>
> My first thought was to use a simple boost query
> allfields:(a AND b AND c),
> but the order of the set of records that contain all three
> terms changes
> when I do that. What I *think* I need to do is basically to
> say, "Hey, all
> the items with all three terms get an extra 40,000 points,
> but change
> nothing else".

This is a hard task, and I am not sure it is possible. But you need to change similarity algorithm for that. Final score is composed of many factors. coord, norm, tf-idf ...

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html

May be you can try to customize coord(q,d). But there can be always some cases that you describe. For example very long document containing three terms will be punished due to its length. A very short document with two query terms can pop-up before it.

It is easy to "rank items with all three terms" so that they comes first, (omitNorms="true" and omitTermFreqAndPositions="true" should almost do it) but "change nothing else" part is not.

Easiest thing can be throw additional query with pure AND operator and display these result in a special way.


     
Reply | Threaded
Open this post in threaded view
|

Re: Function query to boost scores by a constant if all terms are present

Jan Høydahl / Cominvent
You can use the map() function for this, see http://wiki.apache.org/solr/FunctionQuery#map

q=a fox&defType=dismax&qf=allfields&bf=map(query($qq),0,0,0,100.0)&qq=allfields:(quick AND brown AND fence)

This adds a constant boost of 100.0 if the $qq field returns a non-zero score, which it does whenever all three terms match.

PS: You can achieve the same in a Lucene query, using q=a fox _val_:"map(query($qq),0,0,0,100.0)"

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 17. aug. 2010, at 22.48, Ahmet Arslan wrote:

>> Most of the time, items that match all three terms will
>> float to the top by
>> normal ranking, but sometimes there are only two terms that
>> are like a rash
>> across the record, and they end up with a higher score than
>> some items that
>> match all three query terms.
>>
>> I'd like to boost items with all the query terms to the top
>> *without
>> changing their order*.
>>
>> My first thought was to use a simple boost query
>> allfields:(a AND b AND c),
>> but the order of the set of records that contain all three
>> terms changes
>> when I do that. What I *think* I need to do is basically to
>> say, "Hey, all
>> the items with all three terms get an extra 40,000 points,
>> but change
>> nothing else".
>
> This is a hard task, and I am not sure it is possible. But you need to change similarity algorithm for that. Final score is composed of many factors. coord, norm, tf-idf ...
>
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html
>
> May be you can try to customize coord(q,d). But there can be always some cases that you describe. For example very long document containing three terms will be punished due to its length. A very short document with two query terms can pop-up before it.
>
> It is easy to "rank items with all three terms" so that they comes first, (omitNorms="true" and omitTermFreqAndPositions="true" should almost do it) but "change nothing else" part is not.
>
> Easiest thing can be throw additional query with pure AND operator and display these result in a special way.
>
>
>