[jira] [Commented] (LUCENE-8477) Improve handling of inner disjunctions in intervals

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-8477) Improve handling of inner disjunctions in intervals

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793928#comment-16793928 ]

Alan Woodward commented on LUCENE-8477:
---------------------------------------

Here's a better patch, using term counting rather than prefix matching - the latter won't work if we have stacked tokens, for example, and this makes things much simpler.

> Improve handling of inner disjunctions in intervals
> ---------------------------------------------------
>
>                 Key: LUCENE-8477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8477
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8477.patch, LUCENE-8477.patch
>
>
> The current implementation of the disjunction interval produced by {{Intervals.or}} is a direct implementation of the OR operator from the Vigna paper.  This produces minimal intervals, meaning that (a) is preferred over (a b), and (b) also over (a b).  This has advantages when it comes to counting intervals for scoring, but also has drawbacks when it comes to matching.  For example, a phrase query for ((a OR (a b)) BLOCK (c)) will not match the document (a b c), because (a) will be preferred over (a b), and (a c) does not match.
> This ticket is to discuss the best way of dealing with disjunctions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]