# Function query to boost scores by a constant if all terms are present

3 messages
Open this post in threaded view
|

## Function query to boost scores by a constant if all terms are present

 Let me describe what I'm trying to accomplish, first, since what I think is the solution is almost always wrong. :-) I'm doing dismax queries with mm set such that not all terms need to match, e.g. only 2 of 3 query terms need to match. Most of the time, items that match all three terms will float to the top by normal ranking, but sometimes there are only two terms that are like a rash across the record, and they end up with a higher score than some items that match all three query terms. I'd like to boost items with all the query terms to the top *without changing their order*. My first thought was to use a simple boost query allfields:(a AND b AND c), but the order of the set of records that contain all three terms changes when I do that. What I *think* I need to do is basically to say, "Hey, all the items with all three terms get an extra 40,000 points, but change nothing else". I keep thinking I can get what I need with a subquery and map, but keep failing. Any advice would be very, very welcome.  -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Open this post in threaded view
|

## Re: Function query to boost scores by a constant if all terms are present

 > Most of the time, items that match all three terms will > float to the top by > normal ranking, but sometimes there are only two terms that > are like a rash > across the record, and they end up with a higher score than > some items that > match all three query terms. > > I'd like to boost items with all the query terms to the top > *without > changing their order*. > > My first thought was to use a simple boost query > allfields:(a AND b AND c), > but the order of the set of records that contain all three > terms changes > when I do that. What I *think* I need to do is basically to > say, "Hey, all > the items with all three terms get an extra 40,000 points, > but change > nothing else". This is a hard task, and I am not sure it is possible. But you need to change similarity algorithm for that. Final score is composed of many factors. coord, norm, tf-idf ... http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.htmlMay be you can try to customize coord(q,d). But there can be always some cases that you describe. For example very long document containing three terms will be punished due to its length. A very short document with two query terms can pop-up before it. It is easy to "rank items with all three terms" so that they comes first, (omitNorms="true" and omitTermFreqAndPositions="true" should almost do it) but "change nothing else" part is not. Easiest thing can be throw additional query with pure AND operator and display these result in a special way.
Open this post in threaded view
|

## Re: Function query to boost scores by a constant if all terms are present

 You can use the map() function for this, see http://wiki.apache.org/solr/FunctionQuery#mapq=a fox&defType=dismax&qf=allfields&bf=map(query(\$qq),0,0,0,100.0)&qq=allfields:(quick AND brown AND fence) This adds a constant boost of 100.0 if the \$qq field returns a non-zero score, which it does whenever all three terms match. PS: You can achieve the same in a Lucene query, using q=a fox _val_:"map(query(\$qq),0,0,0,100.0)" -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 17. aug. 2010, at 22.48, Ahmet Arslan wrote: >> Most of the time, items that match all three terms will >> float to the top by >> normal ranking, but sometimes there are only two terms that >> are like a rash >> across the record, and they end up with a higher score than >> some items that >> match all three query terms. >> >> I'd like to boost items with all the query terms to the top >> *without >> changing their order*. >> >> My first thought was to use a simple boost query >> allfields:(a AND b AND c), >> but the order of the set of records that contain all three >> terms changes >> when I do that. What I *think* I need to do is basically to >> say, "Hey, all >> the items with all three terms get an extra 40,000 points, >> but change >> nothing else". > > This is a hard task, and I am not sure it is possible. But you need to change similarity algorithm for that. Final score is composed of many factors. coord, norm, tf-idf ... > > http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html> > May be you can try to customize coord(q,d). But there can be always some cases that you describe. For example very long document containing three terms will be punished due to its length. A very short document with two query terms can pop-up before it. > > It is easy to "rank items with all three terms" so that they comes first, (omitNorms="true" and omitTermFreqAndPositions="true" should almost do it) but "change nothing else" part is not. > > Easiest thing can be throw additional query with pure AND operator and display these result in a special way. > > >