hl.requireFieldMatch and idf

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

hl.requireFieldMatch and idf

Koji Sekiguchi
Hello,

If an index has (many) deleted docs and not optimized,
when I set hl.requireFieldMatch=true, highlight doesn't work sometimes.

cause:
If hl.requireFieldMatch set to true, DefaultSolrHighlight.getQueryScorer()
uses QueryScorer(Query,IndexReader,String) constructor in Lucene
highlighter.
Then the constructor calls getIdfWeightedTerms() to get an array of
WeightedTerm.
In getIdfWeightedTerms(), idf is calculated to get weighted terms.
And the calculated idf can be minus with un-optimized index.

Does DefaultSolrHighlight.getQueryScorer() use
QueryScorer(Query,IndexReader,String)
by design? If no, I'm happy to open a ticket.


Thank you,

Koji

Reply | Threaded
Open this post in threaded view
|

Re: hl.requireFieldMatch and idf

Mike Klaas

On 27-Mar-08, at 1:46 AM, Koji Sekiguchi wrote:

> Hello,
>
> If an index has (many) deleted docs and not optimized,
> when I set hl.requireFieldMatch=true, highlight doesn't work  
> sometimes.
>
> cause:
> If hl.requireFieldMatch set to true,  
> DefaultSolrHighlight.getQueryScorer()
> uses QueryScorer(Query,IndexReader,String) constructor in Lucene
> highlighter.
> Then the constructor calls getIdfWeightedTerms() to get an array of
> WeightedTerm.
> In getIdfWeightedTerms(), idf is calculated to get weighted terms.
> And the calculated idf can be minus with un-optimized index.

Okay, _this_ is the true bug.  I don't see how lucene can return a  
negative idf, optimized index or no.

>
> Does DefaultSolrHighlight.getQueryScorer() use
> QueryScorer(Query,IndexReader,String)
> by design? If no, I'm happy to open a ticket.

Indeed it is by design: this is how requireFieldMatch is implemented,  
as the lucene highlighter will require the field to match as well as  
the term.  A consequence of this is that the idf's as also folded into  
the score, which is triggering the bug you are seeing.

I think it would be best to open a bug on the lucene side of things,  
with a test case triggering negative idf.

thanks,
-Mike
Reply | Threaded
Open this post in threaded view
|

Re: hl.requireFieldMatch and idf

Koji Sekiguchi
Mike,

Thank you for your response.

>> cause:
>> If hl.requireFieldMatch set to true,
>> DefaultSolrHighlight.getQueryScorer()
>> uses QueryScorer(Query,IndexReader,String) constructor in Lucene
>> highlighter.
>> Then the constructor calls getIdfWeightedTerms() to get an array of
>> WeightedTerm.
>> In getIdfWeightedTerms(), idf is calculated to get weighted terms.
>> And the calculated idf can be minus with un-optimized index.
>
> Okay, _this_ is the true bug.  I don't see how lucene can return a
> negative idf, optimized index or no.
I think that docFreq includes deleted docs count and this is Lucene's
feature.
This feature causes a negative idf, as long as the following fomula is used:

// o.a.l.s.highlight.QueryTermExtractor.java
float idf=(float)(Math.log((float)totalNumDocs/(double)(docFreq+1)) + 1.0);

>> Does DefaultSolrHighlight.getQueryScorer() use
>> QueryScorer(Query,IndexReader,String)
>> by design? If no, I'm happy to open a ticket.
>
> Indeed it is by design: this is how requireFieldMatch is implemented,
> as the lucene highlighter will require the field to match as well as
> the term.  A consequence of this is that the idf's as also folded into
> the score, which is triggering the bug you are seeing.
Can we use QueryScorer(Query,String) instead of
QueryScorer(Query,IndexReader,String) to implement
hl.requireFieldMatch=true? I've opened SOLR-517 to follow up this problem.

Thank you,

Koji