[jira] [Updated] (LUCENE-4100) Maxscore - Efficient Scoring

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Updated] (LUCENE-4100) Maxscore - Efficient Scoring

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand updated LUCENE-4100:
    Attachment: LUCENE-4100.patch

Here is a patch:
 - more docs and tests
 - replaces needsScores with a SearchMode enum as suggested by Robert
 - the MAXSCORE optimization work with top-level disjunctions and filtered disjunctions (FILTER or MUST_NOT)
 - TopScoreDocsCollector sets the totalHitCount to -1 when the optimization is used since the total hit count is unknown
 - MaxScoreScorer was changed to reason on integers rather than doubles to avoid floating-point arithmetic issues. To do that it scales all max scores into 0..2^16, rounding up when working on the max scores of sub clauses, and down when rounding the min competitive score in order to make sure to not miss matches (at the cost of potentially more false positives, but this is fine)

The patch is alreay huge (due to the needsScore/searchMode change mostly) so I wanted to do the strict minimum here for this feature to be useful, but we'll need follow-ups to make the optimization work with the paging collector, conjunctions that have more than one scoring clause, TopFieldCollector when the first sort field is the score, integrate it with IndexSearcher (currently you need to create the collector manually to use it), etc.

> Maxscore - Efficient Scoring
> ----------------------------
>                 Key: LUCENE-4100
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4100
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs, core/query/scoring, core/search
>    Affects Versions: 4.0-ALPHA
>            Reporter: Stefan Pohl
>              Labels: api-change, gsoc2014, patch, performance
>             Fix For: 4.9, 6.0
>         Attachments: LUCENE-4100.patch, LUCENE-4100.patch, contrib_maxscore.tgz, maxscore.patch
> At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient algorithm first published in the IR domain in 1995 by H. Turtle & J. Flood, that I find deserves more attention among Lucene users (and developers).
> I implemented a proof of concept and did some performance measurements with example queries and lucenebench, the package of Mike McCandless, resulting in very significant speedups.
> This ticket is to get started the discussion on including the implementation into Lucene's codebase. Because the technique requires awareness about it from the Lucene user/developer, it seems best to become a contrib/module package so that it consciously can be chosen to be used.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]