Possible to set minimum score/relevance?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible to set minimum score/relevance?

bo_b
Hello,

I was wondering if it was possible to set a minimum score/relevance for search results? And how is the score calculated anyway? I thought i read somewhere that lucene scores were normalized between 0..1, but that doesnt seem to be the case for solr?

In our case we have indexed a 7 million posts vbulletin database. On a search page we have, we would like to be able to have a sidebar which includes a link to our vbulletin search that says "Found xxxx extra results in vbulletin".

But searches in the vbulletin database returns an awful lots of hits(like 100.000+ for some queries), even though perhaps only the first handful seem relevant. So ideally we would like the link to say "Found 12 extra results in vbulletin", if the first 12 results had a high score, and result 13 to 100.000 had a low score.

Best regards,
Bo
Reply | Threaded
Open this post in threaded view
|

Re: Possible to set minimum score/relevance?

Yonik Seeley-2
On 10/16/06, bo_b <[hidden email]> wrote:
> I was wondering if it was possible to set a minimum score/relevance for
> search results? And how is the score calculated anyway?

http://lucene.apache.org/java/docs/scoring.html

Making an arbitrary cuttoff mean something would be quite difficult.

> I thought i read
> somewhere that lucene scores were normalized between 0..1, but that doesnt
> seem to be the case for solr?

Solr never normalizes scores since it may be easily done by the client
- the maxScore is given in the results, so just divide all scores by
maxScore.  If Solr normalized scores, information would be thrown away
and clients wouldn't be able to un-normalize if needed.

> In our case we have indexed a 7 million posts vbulletin database. On a
> search page we have, we would like to be able to have a sidebar which
> includes a link to our vbulletin search that says "Found xxxx extra results
> in vbulletin".
>
> But searches in the vbulletin database returns an awful lots of hits(like
> 100.000+ for some queries), even though perhaps only the first handful seem
> relevant. So ideally we would like the link to say "Found 12 extra results
> in vbulletin", if the first 12 results had a high score, and result 13 to
> 100.000 had a low score.

You could try to analyze the scores yourself and see if there is a
natural "break".

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Possible to set minimum score/relevance?

Chris Hostetter-3

: Making an arbitrary cuttoff mean something would be quite difficult.

The specifics on this are discussed in the Lucene Java FAQ...

http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03

: > But searches in the vbulletin database returns an awful lots of hits(like
: > 100.000+ for some queries), even though perhaps only the first handful seem
: > relevant. So ideally we would like the link to say "Found 12 extra results
: > in vbulletin", if the first 12 results had a high score, and result 13 to
: > 100.000 had a low score.

the reasl question is, are ou just going to display that text, or is it
going to be a link to the actual search: if you've going to give the user
a link, then you're going to want to make sure the page they get to
matches up with their expecation from the link text, so saying there are
only 12 results when there are really 100.000 is going to be a bold faced
lie -- what you should do is re-evaluate your query structure so that you
only get the really good results (the 12) and have optional UI elements
allowing people to relax the search criteria to get the full 100K.

what criteria you should use to keep the results set small really depends
on how you define "good" results" vs "bad" results.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Possible to set minimum score/relevance?

bo_b
Chris Hostetter wrote
: > But searches in the vbulletin database returns an awful lots of hits(like
: > 100.000+ for some queries), even though perhaps only the first handful seem
: > relevant. So ideally we would like the link to say "Found 12 extra results
: > in vbulletin", if the first 12 results had a high score, and result 13 to
: > 100.000 had a low score.

the reasl question is, are ou just going to display that text, or is it
going to be a link to the actual search: if you've going to give the user
a link, then you're going to want to make sure the page they get to
matches up with their expecation from the link text, so saying there are
only 12 results when there are really 100.000 is going to be a bold faced
lie -- what you should do is re-evaluate your query structure so that you
only get the really good results (the 12) and have optional UI elements
allowing people to relax the search criteria to get the full 100K.

what criteria you should use to keep the results set small really depends
on how you define "good" results" vs "bad" results.

-Hoss
There will be a link to the actual search, and I agree the number of results on the result page needs to be the same as the text in the link says.

But anyway we just discovered that using the minimum match feature of the dismax request handler allows us to narrow the amount of search results down quite a bit.

Because we take a users query and expand it semantically through an external component(a bit like the synonym.txt file, but with weights assigned to each of the synonyms) before feeding it to solr, so a 2 word query might end up being a 10 word query, and this cause a huge increase in results.

Using the minium match feature seems to work out well for bringing it back down to realistic levels.

Thanks,
Bo