Solr Score threshold 'reasonably', independent of results returned

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr Score threshold 'reasonably', independent of results returned

Ramzi Alqrainy
This post was updated on .
Usually, search results are sorted by their score (how well the document matched the query), but it is common to need to support the sorting of supplied data too.
Boosting affects the scores of matching documents in order to affect ranking in score-sorted search results. Providing a boost value, whether at the document or field level, is optional.

When the results are returned with scores, we want to be able to only "keep" results that are above some score (i.e. results of a certain quality only). Is it possible to do this when the returned subset could be anything?

I ask because it seems like on some queries a score of say 0.008 is resulting in a decent match, whereas other queries a higher score results in a poor match.
I have written pseudo code to achieve what I said.
Note: I have attached my code as screenshot



Kindly fine the below url
https://issues.apache.org/jira/browse/SOLR-3747

I think this task will help us to avoid poor match. for example.
Query: solr lucene apple
Doc1 : solr solr apple score: 10
Doc2 : lucene apple score : 7
Doc3 : test solr test score : 2
Actually, I don't need Doc3 in my result, I want to ignore it. If we implement this task, we can give solr threshold 30%. then solr will ignore result Doc3 according to below calculation.
Doc1 100% (10/10 *100%)
Doc2 70% (7/10 *100%)
Doc3 20% (2/10 *100%)
Mou
Reply | Threaded
Open this post in threaded view
|

Re: Solr Score threshold 'reasonably', independent of results returned

Mou
Hi,
I think that this totally depends on your requirements and thus applicable for a user scenario. Score does not have any absolute meaning, it is always relative to the query. If you want to watch some particular queries and want to show results with score above previously set threshold, you can use this.

If I always have that x% threshold in place , there may be many queries which would not return anything and I certainly do not want that.
Reply | Threaded
Open this post in threaded view
|

Re: Solr Score threshold 'reasonably', independent of results returned

Ravish Bhagdev
Commercial solutions often have %age that is meant to signify the quality
of match.  Solr has relative score and you cannot tell by just looking at
this value if a result is relevant enough to be in first page or not.
 Score depends on "what else is in the index" so not easy to normalize in
the way you suggest.

Ravish

On Wed, Aug 22, 2012 at 4:03 PM, Mou <[hidden email]> wrote:

> Hi,
> I think that this totally depends on your requirements and thus applicable
> for a user scenario. Score does not have any absolute meaning, it is always
> relative to the query. If you want to watch some particular queries and
> want
> to show results with score above previously set threshold, you can use
> this.
>
> If I always have that x% threshold in place , there may be many queries
> which would not return anything and I certainly do not want that.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Score-threshold-reasonably-independent-of-results-returned-tp4002312p4002673.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr Score threshold 'reasonably', independent of results returned

Ramzi Alqrainy
It will never return no result because its relative to score in previous result

If score<0.25*last_score then stop

Since score>0 and last score is 0 for initial hit it will not stop
Reply | Threaded
Open this post in threaded view
|

Re: Solr Score threshold 'reasonably', independent of results returned

Ramzi Alqrainy
In reply to this post by Ravish Bhagdev
You are right Mr.Ravish, because this depends on (ranking and search fields) formula, but please allow me to tell you that Solr score can help us to define this document is relevant or not in some cases.
Reply | Threaded
Open this post in threaded view
|

Re: Solr Score threshold 'reasonably', independent of results returned

Lance Norskog-2
Not really. The percentage given in other search packages is fairly
bogus. You have to do a global batch analysis of all of the index to
get a true scale for relevance.

On Sat, Aug 25, 2012 at 1:38 PM, Ramzi Alqrainy
<[hidden email]> wrote:
> You are right Mr.Ravish, because this depends on (ranking and search fields)
> formula, but please allow me to tell you that Solr score can help us to
> define this document is relevant or not in some cases.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Score-threshold-reasonably-independent-of-results-returned-tp4002312p4003248.html
> Sent from the Solr - User mailing list archive at Nabble.com.



--
Lance Norskog
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Solr Score threshold 'reasonably', independent of results returned

Chris Hostetter-3

: Not really. The percentage given in other search packages is fairly
: bogus. You have to do a global batch analysis of all of the index to
: get a true scale for relevance.

Exactly...

https://wiki.apache.org/solr/FAQ#Why_Aren.27t_Scores_returned_as_a_percentage.3F_How_Do_I_normalize_Scores.3F
https://wiki.apache.org/lucene-java/ScoresAsPercentages

*you* -- as the person in control of your solr instance, who kows
everything about every document in the index, and has total control over
the set of valid queries being executed against the index -- you *MAY* be
able to compute a meaningful "threshold" of scores, based on the
constraints you know/enforce.  But Solr can't do this, because in
general Solr doesn't know those constraints (or if those constraints even
exist) for an arbitrary index.


-Hoss