Quantcast

Modify solr score

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Modify solr score

tstusr
Hi.

We are making an application that searches for certain specific topics, as many captured words on a document the higher the score.

We have 2 scenarios of testing. The first one with documents that users tag as relevant and other ones that contains documents out of our domain.

In first scenario, we report ratios of 1-2% on the amount of captured terms against all document words. For the second scenario, we report ratios of less than 0.005%.

Nevertheless, scores remain almost equal, ~0.85 for the first stage and ~0.8 for the latter one.


So what we want is to decrease the score we report for this latter scenario according to the percentage of words captured in some way.


Is there any way to store those values in a field in order to use them as query boost. Or any way to override the score default calculation to change relevancy?


Thanks in advance...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modify solr score

alessandro.benedetti
It has been discussed countless times, never rely on score values.
Rely on the ranking of your results.
It seems you model a <topic> as a least of keywords and then you just run a query for each topic.
Essentially for you, a <topic> is a query.

The ranking of your results will already be affected by how many times ( Term Frequency) such keywords appear in the results.
You can even play with different query parsers ( such as dismax/edismax) and play with the mm percentage to estabilish how strict you want your results to be, in relation with input query [1] .
Can you elaborate better the way you would like to customize the score ? Which factor would you like to modify ?

Cheers

[1] https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Themm(MinimumShouldMatch)Parameter
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modify solr score

tstusr
Since we report the score, we think there will be some relation between them. As far as we know scoring (and then ranking) are calculated based on tf-idf.

What we want to do is to make a qualitative ranking, it means, according to one topic we will tag documents as "very related", "fairly related" or "poor related". So, we select some documents completely unrelated to a topic.

On a very related document we found a ratio of ~2% of words that reports ~0.85 of score (what we think is related to ranking). On a test document we found a ratio of less than 0.01% and the score is heigher than the first one. What we expect is that documents not related (those ones with less ratio) report lower scores so we can then use them as minimum and create the scale.

We came with multiply (of affect in some way) the default rank solr provide us with the ratio of documents so unrelated documents will be penalized while those with higher ratio values will be overrated.

Greetings, and thanks for your help.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modify solr score

Walter Underwood
It isn’t going to work. The score is not an absolute relevance measurement. It only says that the first document is more relevant than the second, and so on.

Scores are not comparable between different queries. The score cannot be used to say that the first hit for query A is a better match than the first hit for query B.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)


> On Apr 21, 2017, at 9:35 AM, tstusr <[hidden email]> wrote:
>
> Since we report the score, we think there will be some relation between them.
> As far as we know scoring (and then ranking) are calculated based on tf-idf.
>
> What we want to do is to make a qualitative ranking, it means, according to
> one topic we will tag documents as "very related", "fairly related" or "poor
> related". So, we select some documents completely unrelated to a topic.
>
> On a very related document we found a ratio of ~2% of words that reports
> ~0.85 of score (what we think is related to ranking). On a test document we
> found a ratio of less than 0.01% and the score is heigher than the first
> one. What we expect is that documents not related (those ones with less
> ratio) report lower scores so we can then use them as minimum and create the
> scale.
>
> We came with multiply (of affect in some way) the default rank solr provide
> us with the ratio of documents so unrelated documents will be penalized
> while those with higher ratio values will be overrated.
>
> Greetings, and thanks for your help.
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Modify-solr-score-tp4331300p4331315.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modify solr score

tstusr
Well, maybe I explain it wrong.

We have entry points, each of them are related to a topic. It mens that when we select the first topic all information has to be related in some way to this vocabulary. So, it can work since we select documents not related to each vocabulary of every entry point. To establish a threshold of minimums, so that, we are trying to use hit ratio to modify score.

After we rank on that topics, all work after that is about faceting, word selection and so on.

Greeting
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modify solr score

Walter Underwood
Using a minimum score cut off does not work. The score is not an absolute estimate of relevance.

The idf component of the score is a whole-corpus metric. When you add or delete documents, the scores for the exact same query can change.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)


> On Apr 21, 2017, at 10:18 AM, tstusr <[hidden email]> wrote:
>
> Well, maybe I explain it wrong.
>
> We have entry points, each of them are related to a topic. It mens that when
> we select the first topic all information has to be related in some way to
> this vocabulary. So, it can work since we select documents not related to
> each vocabulary of every entry point. To establish a threshold of minimums,
> so that, we are trying to use hit ratio to modify score.
>
> After we rank on that topics, all work after that is about faceting, word
> selection and so on.
>
> Greeting
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Modify-solr-score-tp4331300p4331331.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modify solr score

tstusr
Well, I know they can change.

I think, the main problem here it that (in this point) documents completely unrelated to a topic are being ranked as high as documents related. So, in order to penalize them we are trying to use the ratio or term frequency/word length.

Nevertheless we aren't able to find a practical way to make it.

Greetings.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modify solr score

Rick Leir-2
Ulf: Maybe there is a way you could filter out the unrelated documents. Qf?
Rick

On April 21, 2017 2:18:59 PM EDT, tstusr <[hidden email]> wrote:

>Well, I know they can change.
>
>I think, the main problem here it that (in this point) documents
>completely
>unrelated to a topic are being ranked as high as documents related. So,
>in
>order to penalize them we are trying to use the ratio or term
>frequency/word
>length.
>
>Nevertheless we aren't able to find a practical way to make it.
>
>Greetings.
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Modify-solr-score-tp4331300p4331342.html
>Sent from the Solr - User mailing list archive at Nabble.com.

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Modify solr score

Erik Hatcher-4
In reply to this post by tstusr
This may be suggesting a solution that is too experimental or using the wrong hammer for the job, but to me it sounds like you could use “payloads” for this type of ranking of terms relationship to a document.  

See SOLR-1485 for the recent work I’ve been doing (and aim to get committed soon).   You could index documents in this way:

   id, weighted_terms_dpf
   1, A|5.0 B|95.0
    2,A|88.7 B|0.1

And then search for “A” and use the 88.7 value to factor into the score or sorting.  

        Erik



> On Apr 21, 2017, at 12:35 PM, tstusr <[hidden email]> wrote:
>
> Since we report the score, we think there will be some relation between them.
> As far as we know scoring (and then ranking) are calculated based on tf-idf.
>
> What we want to do is to make a qualitative ranking, it means, according to
> one topic we will tag documents as "very related", "fairly related" or "poor
> related". So, we select some documents completely unrelated to a topic.
>
> On a very related document we found a ratio of ~2% of words that reports
> ~0.85 of score (what we think is related to ranking). On a test document we
> found a ratio of less than 0.01% and the score is heigher than the first
> one. What we expect is that documents not related (those ones with less
> ratio) report lower scores so we can then use them as minimum and create the
> scale.
>
> We came with multiply (of affect in some way) the default rank solr provide
> us with the ratio of documents so unrelated documents will be penalized
> while those with higher ratio values will be overrated.
>
> Greetings, and thanks for your help.
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Modify-solr-score-tp4331300p4331315.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Loading...