[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (LUCENE-4100) Maxscore - Efficient Scoring

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203051#comment-16203051 ]

Robert Muir commented on LUCENE-4100:

Can we avoid the ScoreMode.merge? This seems really, really confusing. In general I don't think we should support such merging in MultiCollector or anywhere else, we should simply throw exception if things are different.

I think the enum should be further revisited/simplified: essentially at the minimum it must capture 2 booleans from the user: whether scores are needed, and whether exact total hit count is needed. Perhaps instead of the enum two booleans would be easier for now.

I don't understand why we should set the totalHitCount to -1, vs setting to a useful approximation, like google. The user said they didn't need the exact total hit count, so it should be no surprise, and its a hell of a lot more useful than a negative number.

> Maxscore - Efficient Scoring
> ----------------------------
>                 Key: LUCENE-4100
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4100
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs, core/query/scoring, core/search
>    Affects Versions: 4.0-ALPHA
>            Reporter: Stefan Pohl
>              Labels: api-change, gsoc2014, patch, performance
>             Fix For: 4.9, 6.0
>         Attachments: LUCENE-4100.patch, LUCENE-4100.patch, contrib_maxscore.tgz, maxscore.patch
> At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient algorithm first published in the IR domain in 1995 by H. Turtle & J. Flood, that I find deserves more attention among Lucene users (and developers).
> I implemented a proof of concept and did some performance measurements with example queries and lucenebench, the package of Mike McCandless, resulting in very significant speedups.
> This ticket is to get started the discussion on including the implementation into Lucene's codebase. Because the technique requires awareness about it from the Lucene user/developer, it seems best to become a contrib/module package so that it consciously can be chosen to be used.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]