idf in scores

classic Classic list List threaded Threaded
3 messages Options
adb
Reply | Threaded
Open this post in threaded view
|

idf in scores

adb
I've been trying to understand how idf is arrived at from a query.  I have a
single Document with 9 fields.  One field "subject" has the phrase "RFC2822 -
Internet Message Format" and a second "body" has the contents of rfc2822.

The other fields contain additional meta data.  If I search for subject:message
I get the following explanation.

0.15342641 = fieldWeight(subject:message in 0), product of:
   1.0 = tf(termFreq(subject:message)=1)
   0.30685282 = idf(docFreq=1)
   0.5 = fieldNorm(field=subject, doc=0)

why does the idf get that value?
Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: idf in scores

Yonik Seeley-2
On 11/7/06, Antony Bowesman <[hidden email]> wrote:

> I've been trying to understand how idf is arrived at from a query.  I have a
> single Document with 9 fields.  One field "subject" has the phrase "RFC2822 -
> Internet Message Format" and a second "body" has the contents of rfc2822.
>
> The other fields contain additional meta data.  If I search for subject:message
> I get the following explanation.
>
> 0.15342641 = fieldWeight(subject:message in 0), product of:
>    1.0 = tf(termFreq(subject:message)=1)
>    0.30685282 = idf(docFreq=1)
>    0.5 = fieldNorm(field=subject, doc=0)
>
> why does the idf get that value?

idf is dependent only on the corpus, not on the individual document.
The formula is here:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html
1+log(1/2) = 0.30685282

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

adb
Reply | Threaded
Open this post in threaded view
|

Re: idf in scores

adb
Yonik Seeley wrote:
>
> idf is dependent only on the corpus, not on the individual document.
> The formula is here:
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html 
>
> 1+log(1/2) = 0.30685282

Thanks Yonik, whilst all is not yet completely clear, it is much more so!
Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]