[jira] [Commented] (LUCENE-8011) Improve similarity explanations

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-8011) Improve similarity explanations

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281487#comment-16281487 ]

ASF GitHub Bot commented on LUCENE-8011:
----------------------------------------

Github user jpountz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/280#discussion_r155453742
 
    --- Diff: lucene/core/src/java/org/apache/lucene/search/similarities/AfterEffectL.java ---
    @@ -34,11 +34,14 @@ public final double score(BasicStats stats, double tfn) {
       }
       
       @Override
    +  // TODO: add explanation for tfn
    +  // Currently not possible, as CheckHits.verifyExplanation fails because
    +  // in case of a single sub-expl the test expects
    +  // the sub-expl's score to be equal to the parent expl's score
    --- End diff --
   
    this should be possible by rebasing or merging master back, I modified CheckHits yesterday so that it allows the score to be different from the parent explanation if the explanation matches `.*, computed as .* from:`


> Improve similarity explanations
> -------------------------------
>
>                 Key: LUCENE-8011
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8011
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>              Labels: newdev
>
> LUCENE-7997 improves BM25 and Classic explains to better explain:
> {noformat}
> product of:
>   2.2 = scaling factor, k1 + 1
>   9.388654 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
>     1.0 = n, number of documents containing term
>     17927.0 = N, total number of documents with field
>   0.9987758 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
>     979.0 = freq, occurrences of term within document
>     1.2 = k1, term saturation parameter
>     0.75 = b, length normalization parameter
>     1.0 = dl, length of field
>     1.0 = avgdl, average length of field
> {noformat}
> Previously it was pretty cryptic and used confusing terminology like docCount/docFreq without explanation:
> {noformat}
> product of:
>   0.016547536 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
>     449.0 = docFreq
>     456.0 = docCount
>   2.1920826 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
>     113659.0 = freq=113658
>     1.2 = parameter k1
>     0.75 = parameter b
>     2300.5593 = avgFieldLength
>     1048600.0 = fieldLength
> {noformat}
> We should fix other similarities too in the same way, they should be more practical.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]