[jira] [Commented] (LUCENE-8142) Should codecs expose raw impacts?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-8142) Should codecs expose raw impacts?

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439805#comment-16439805 ]

Adrien Grand commented on LUCENE-8142:
--------------------------------------

I gave this a try. {{ImpactsEnum}} has a new method {{getImpacts}} that returns impacts on multiple levels. It makes it naturally implemented by a skip list. It might make it more challenging to back this information by another data-structure, but it also has API benefits, like removing references  to {{SimScorer}}  from {{TermsEnum.impacts}}.

wikibigall gives an improvement to term queries since this change allows term queries to skip at any level while they could only do it on the first level before. However the fact that the API is a bit more heavy seems to incur a slight slow down to conjunctions/disjunctions. I don't think it is an issue, especially because this change improves testing by allowing to better compare impacts against indexed data. Also this API means that we can now speed up queries that merge frequencies and norms rather than scores like {{SynonymQuery}} and {{BlendedTermQuery}}, which was not possible before.

{noformat}
             AndHighHigh       83.36      (3.8%)       79.45      (3.1%)   -4.7% ( -11% -    2%)
              OrHighHigh       34.42      (2.7%)       32.93      (2.0%)   -4.3% (  -8% -    0%)
              AndHighMed      115.73      (3.3%)      111.67      (3.0%)   -3.5% (  -9% -    2%)
               OrHighMed       24.44      (3.3%)       23.74      (2.1%)   -2.9% (  -8% -    2%)
               OrHighLow     1952.31      (4.7%)     1912.93      (3.6%)   -2.0% (  -9% -    6%)
              AndHighLow     1837.61      (4.1%)     1802.22      (3.9%)   -1.9% (  -9% -    6%)
                  Fuzzy1      229.31      (9.8%)      226.03      (8.9%)   -1.4% ( -18% -   19%)
                  IntNRQ       31.75     (14.0%)       31.36     (12.5%)   -1.2% ( -24% -   29%)
                  Fuzzy2      194.10      (9.6%)      192.36     (11.6%)   -0.9% ( -20% -   22%)
         MedSloppyPhrase       54.96      (4.7%)       54.62      (4.2%)   -0.6% (  -9% -    8%)
        HighSloppyPhrase        6.21      (5.9%)        6.18      (5.7%)   -0.5% ( -11% -   11%)
         LowSloppyPhrase       19.26      (4.4%)       19.19      (4.3%)   -0.4% (  -8% -    8%)
       HighTermMonthSort      180.22      (9.8%)      179.53     (10.4%)   -0.4% ( -18% -   21%)
                Wildcard       60.86      (6.0%)       60.63      (6.3%)   -0.4% ( -11% -   12%)
                 Prefix3       88.19      (8.3%)       87.89      (8.5%)   -0.3% ( -15% -   17%)
                 Respell      195.14      (2.1%)      194.57      (2.5%)   -0.3% (  -4% -    4%)
              HighPhrase       54.69      (1.6%)       54.72      (1.6%)    0.1% (  -3% -    3%)
               MedPhrase       41.52      (1.8%)       41.56      (1.9%)    0.1% (  -3% -    3%)
               LowPhrase       55.59      (1.8%)       55.68      (1.9%)    0.2% (  -3% -    3%)
             MedSpanNear       28.55      (3.8%)       28.74      (3.8%)    0.7% (  -6% -    8%)
            HighSpanNear       16.88      (4.6%)       17.03      (4.6%)    0.9% (  -7% -   10%)
             LowSpanNear       14.50      (6.3%)       14.67      (6.2%)    1.1% ( -10% -   14%)
   HighTermDayOfYearSort       61.22     (12.3%)       62.04     (12.4%)    1.3% ( -20% -   29%)
                 LowTerm     2478.52      (4.1%)     2692.79      (4.0%)    8.6% (   0% -   17%)
                 MedTerm      835.85      (5.8%)     1323.83      (6.8%)   58.4% (  43% -   75%)
                HighTerm      472.60      (6.8%)     1718.45     (15.6%)  263.6% ( 225% -  306%)
{noformat}

> Should codecs expose raw impacts?
> ---------------------------------
>
>                 Key: LUCENE-8142
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8142
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8142.patch
>
>
> Follow-up of LUCENE-4198. Currently, call-sites of TermsEnum.impacts provide a SimScorer so that the maximum score for the block can be computed. Should ImpactsEnum instead return the (freq,norm) pairs and let callers deal with max score computation?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]