[jira] [Updated] (LUCENE-8083) Give similarities better values for maxScore

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Updated] (LUCENE-8083) Give similarities better values for maxScore

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-8083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand updated LUCENE-8083:
    Attachment: LUCENE-8083.patch

Here is a patch that improves BM25's maxScore by taking the maxFreq into account, and implements maxScore on all SimilarityBase impls by passing freq=maxFreq and docLen=1 to the score method. I also added new tests that are specific to this maxScore method.

Practically, this means that the LUCENE-4100 optimizations now work well with similarities whose score saturates quickly with increasing frequencies like all DFR similarities, IBSimilarity with DistributionSPL, AxiomaticF2EXP and AxiomaticF2LOG. It might work well with other similarities as well in the future if we start recording the per-term (or maybe per-field would be a good start) maximum term frequency.

> Give similarities better values for maxScore
> --------------------------------------------
>                 Key: LUCENE-8083
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8083
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8083.patch
> The benefits of LUCENE-4100 largely depend on the quality of the upper bound of the scores that is provided by the similarity.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]