concise definition of Lucene score?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

concise definition of Lucene score?

jloken
Hi all,

I have attempted to find a concise definition of how the Lucene score is
calculated, something that can be understood by most people.

The information I found is accurate, but not particularly concise.
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apac
he/lucene/search/Similarity.html

If there is no boosting or sorting involved, how is a default sort
calculated?

Many thanks,
Jon


BiP Solutions Limited is a company registered in Scotland with Company Number SC086146 and VAT number 383030966 and having its registered office at Park House, 300 Glasgow Road, Shawfield, Glasgow, G73 1SQ ****************************************************************************
This e-mail (and any attachment) is intended only for the attention of the addressee(s). Its unauthorised use, disclosure, storage or copying is not permitted. If you are not the intended recipient, please destroyall copies and inform the sender by return e-mail.
This e-mail (whether you are the sender or the recipient) may be monitored, recorded and retained by BiP Solutions Ltd.
E-mail monitoring/ blocking software may be used, and e-mail content may be read at any time. You have a responsibility to ensure laws are not broken when composing or forwarding e-mails and their contents.
****************************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: concise definition of Lucene score?

Grant Ingersoll-2
What's not concise about a complex math formula?  :-)

The basic Term Vector approach to IR, that Lucene more or less  
implements, says that the score for a document given a query is the  
cosine of the angle formed between the query vector and the document  
vector.

I like to draw a standard x-y axis graph and then draw two vectors,  
one being the query and the other the document and then draw in the  
angle between them (w/ their tails at the same point).  The score is  
the cosine of that angle (more or less).  That is usually sufficient  
for "most people".  People who like math can read the formula.

-Grant

On Sep 3, 2008, at 6:17 AM, Jon Loken wrote:

> Hi all,
>
> I have attempted to find a concise definition of how the Lucene  
> score is
> calculated, something that can be understood by most people.
>
> The information I found is accurate, but not particularly concise.
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apac
> he/lucene/search/Similarity.html
>
> If there is no boosting or sorting involved, how is a default sort
> calculated?
>
> Many thanks,
> Jon
>
>
> BiP Solutions Limited is a company registered in Scotland with  
> Company Number SC086146 and VAT number 383030966 and having its  
> registered office at Park House, 300 Glasgow Road, Shawfield,  
> Glasgow, G73 1SQ  
> ****************************************************************************
> This e-mail (and any attachment) is intended only for the attention  
> of the addressee(s). Its unauthorised use, disclosure, storage or  
> copying is not permitted. If you are not the intended recipient,  
> please destroyall copies and inform the sender by return e-mail.
> This e-mail (whether you are the sender or the recipient) may be  
> monitored, recorded and retained by BiP Solutions Ltd.
> E-mail monitoring/ blocking software may be used, and e-mail content  
> may be read at any time. You have a responsibility to ensure laws  
> are not broken when composing or forwarding e-mails and their  
> contents.
> ****************************************************************************
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: concise definition of Lucene score?

hossman
In reply to this post by jloken

: I have attempted to find a concise definition of how the Lucene score is
: calculated, something that can be understood by most people.

The answer tends to vary based on exactly what type of query you are
talking about ... TermQuery?  PhraseQuery?  BooleanQuery contianing a mix?

I'm going to take a shot in the dark and guess that if you feel like the
explanation on the Similarity docs is too verbose, then perhaps what you
are looking for isn't an definition, but a simple example.

the explain() method can be used to show exactly what hte score
calculation is for a given query and a given document.  while it won't
always show you the *full* picture of what types of scores might be
produced by the query (ie if a doc matches all clauses of a BooleanQuery,
it won't show you that there would be a coordFactor if it had only matched
one) it is the most straight forward way to get a simple understanding of
"why is this the score?" for any concrete example.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]