explain() - fieldnorm

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

explain() - fieldnorm

JensBurkhardt
Hey everybody,

As my subject is telling, i have a little problem with analyzing the explain() output.
I know, that the fieldnorm value consists out of "documentboost, fieldboost and lengthNorm".
Is is possible to recieve the single values? I know that they are multiplied while indexing but
can they be stored so that i can read them when i analyze my search?
The Problem is, that i have 2 Documents I want to compare but the only difference is the fieldnorm value
and i don't know which value exactly makes this difference.

Best regards
Jens Burkhardt
Reply | Threaded
Open this post in threaded view
|

Re: explain() - fieldnorm

hossman
: As my subject is telling, i have a little problem with analyzing the
: explain() output.
: I know, that the fieldnorm value consists out of "documentboost, fieldboost
: and lengthNorm".
: Is is possible to recieve the single values? I know that they are multiplied
: while indexing but
: can they be stored so that i can read them when i analyze my search?

the number of terms the docs have in a given field can be determined by
doing a nested iteration over a TermEnum and TermDoc and keeping count,
but there is no way to keep extract the document boost vs the field boost
-- if you want to know what those were later you have to store them
yourselves (in a stored field perhaps).

: The Problem is, that i have 2 Documents I want to compare but the only
: difference is the fieldnorm value
: and i don't know which value exactly makes this difference.

typically the answer to that question for me is "length" because i don't
use field boosts and doc boosts -- if you *do* use field boosts or doc
boosts, you would typically know what you had, and could check what boost
values you had used later (based on whatever source you orriginally built
your index from)




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: explain() - fieldnorm

JensBurkhardt
Okay, thanks a lot. Maybe I should change my indexing behavior ;-) .

Greetings
Jens

hossman wrote
: As my subject is telling, i have a little problem with analyzing the
: explain() output.
: I know, that the fieldnorm value consists out of "documentboost, fieldboost
: and lengthNorm".
: Is is possible to recieve the single values? I know that they are multiplied
: while indexing but
: can they be stored so that i can read them when i analyze my search?

the number of terms the docs have in a given field can be determined by
doing a nested iteration over a TermEnum and TermDoc and keeping count,
but there is no way to keep extract the document boost vs the field boost
-- if you want to know what those were later you have to store them
yourselves (in a stored field perhaps).

: The Problem is, that i have 2 Documents I want to compare but the only
: difference is the fieldnorm value
: and i don't know which value exactly makes this difference.

typically the answer to that question for me is "length" because i don't
use field boosts and doc boosts -- if you *do* use field boosts or doc
boosts, you would typically know what you had, and could check what boost
values you had used later (based on whatever source you orriginally built
your index from)




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: explain() - fieldnorm

JensBurkhardt
In reply to this post by hossman
another problem just occurred. These are the results from explain() :

0.27576536 = (MATCH) product of:
  0.827296 = (MATCH) sum of:
    0.827296 = (MATCH) sum of:
      0.24544832 = (MATCH) weight(ti:genetik in 1849319), product of:
        0.015469407 = queryWeight(ti:genetik), product of:
          10.577795 = idf(docFreq=270)
          0.0014624415 = queryNorm
        15.866693 = (MATCH) fieldWeight(ti:genetik in 1849319), product of:
          1.0 = tf(termFreq(ti:genetik)=1)
          10.577795 = idf(docFreq=270)
          1.5 = fieldNorm(field=ti, doc=1849319)
      0.58184767 = (MATCH) weight(au:knippers in 1849319), product of:
        0.020028148 = queryWeight(au:knippers), product of:
          13.695007 = idf(docFreq=11)
          0.0014624415 = queryNorm
        29.051497 = (MATCH) fieldWeight(au:knippers in 1849319), product of:
          1.4142135 = tf(termFreq(au:knippers)=2)
          13.695007 = idf(docFreq=11)
          1.5 = fieldNorm(field=au, doc=1849319)
  0.33333334 = coord(1/3)

0.27576536 = (MATCH) product of:
  0.827296 = (MATCH) sum of:
    0.827296 = (MATCH) sum of:
      0.24544832 = (MATCH) weight(ti:genetik in 3221603), product of:
        0.015469407 = queryWeight(ti:genetik), product of:
          10.577795 = idf(docFreq=270)
          0.0014624415 = queryNorm
        15.866693 = (MATCH) fieldWeight(ti:genetik in 3221603), product of:
          1.0 = tf(termFreq(ti:genetik)=1)
          10.577795 = idf(docFreq=270)
          1.5 = fieldNorm(field=ti, doc=3221603)
      0.58184767 = (MATCH) weight(au:knippers in 3221603), product of:
        0.020028148 = queryWeight(au:knippers), product of:
          13.695007 = idf(docFreq=11)
          0.0014624415 = queryNorm
        29.051497 = (MATCH) fieldWeight(au:knippers in 3221603), product of:
          1.4142135 = tf(termFreq(au:knippers)=2)
          13.695007 = idf(docFreq=11)
          1.5 = fieldNorm(field=au, doc=3221603)
  0.33333334 = coord(1/3)

As you can see, both are exactly the same. The thing i don't understand is, that the two documents have different documentboosts (the first one got an boost of 1.62 , the second of 1.65) - the boosts are different because the two books got different publication years - but explain() tells me that my fieldNorm value is 1.5.
While indexing i use a new similarity class where lengthNorm just returns 1, so the field length does not matter anymore.

Best Regards

Jens Burkhardt

hossman wrote
: As my subject is telling, i have a little problem with analyzing the
: explain() output.
: I know, that the fieldnorm value consists out of "documentboost, fieldboost
: and lengthNorm".
: Is is possible to recieve the single values? I know that they are multiplied
: while indexing but
: can they be stored so that i can read them when i analyze my search?

the number of terms the docs have in a given field can be determined by
doing a nested iteration over a TermEnum and TermDoc and keeping count,
but there is no way to keep extract the document boost vs the field boost
-- if you want to know what those were later you have to store them
yourselves (in a stored field perhaps).

: The Problem is, that i have 2 Documents I want to compare but the only
: difference is the fieldnorm value
: and i don't know which value exactly makes this difference.

typically the answer to that question for me is "length" because i don't
use field boosts and doc boosts -- if you *do* use field boosts or doc
boosts, you would typically know what you had, and could check what boost
values you had used later (based on whatever source you orriginally built
your index from)




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: explain() - fieldnorm

Grant Ingersoll-2

On Mar 25, 2008, at 12:10 PM, JensBurkhardt wrote:

>
> As you can see, both are exactly the same. The thing i don't  
> understand is,
> that the two documents have different documentboosts (the first one  
> got an
> boost of 1.62 , the second of 1.65) - the boosts are different  
> because the
> two books got different publication years - but explain() tells me  
> that my
> fieldNorm value is 1.5.

Document boosts do not have much granularity due to the limited number  
of bits in the norm.  I seem to recall Yonik publishing a list of  
values at one time on the mailing list, but I can't for the life of me  
conjure the keywords to find it at the moment, as it was one a related  
topic.  Perhaps his memory is better than mine...

HTH,
Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]