Possible bug in scoring function for TermQuery?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Possible bug in scoring function for TermQuery?

Karl Wright
The following code in the TermWeight subclass of TermQuery seems inconsistent:
 
    public float sumOfSquaredWeights() throws IOException {
      idf = getSimilarity(searcher).idf(term, searcher); // compute idf
      queryWeight = idf * getBoost();             // compute query weight
      return queryWeight * queryWeight;           // square it
    }
 
    public void normalize(float queryNorm) {
      this.queryNorm = queryNorm;
      queryWeight *= queryNorm;                   // normalize query weight
      // KDW - extra idf term makes no sense!!!
      value = queryWeight * idf;                  // idf for document
    }

The inconsistency comes from the fact that when normalizing for only one term, the weight value should be unity (1.0).  In this case, queryNorm as passed into the normalize() method will be sqrt(1/sumOfSquaredWeights()).  The extra idf term in the normalize() method seems thus to be superfluous.
 
I therefore think that the correct code should be:
 
    public float sumOfSquaredWeights() throws IOException {
      idf = getSimilarity(searcher).idf(term, searcher); // compute idf
      queryWeight = idf * getBoost();             // compute query weight
      return queryWeight * queryWeight;           // square it
    }
    public void normalize(float queryNorm) {
      this.queryNorm = queryNorm;
      queryWeight *= queryNorm;                   // normalize query weight
      // KDW - extra idf term makes no sense; remove it.
      // value = queryWeight * idf;                  // idf for document
      value = queryWeight;
    }

 
Karl

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com