[jira] [Commented] (LUCENE-5182) FVH can end in very very long running recursion on phrase highlight

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (LUCENE-5182) FVH can end in very very long running recursion on phrase highlight

Hudson (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744999#comment-13744999 ]

Robert Muir commented on LUCENE-5182:

Yeah I'm not sure either: maybe just a Math.min and a default of Integer.MAX_VALUE. Sure its still trappy but at least its an improvement.

another idea (if the user is using the IDF-weighted fragments) might be to somehow not process terms where docFreq/maxDoc > foo%, realizing they wont contribute much to the score anyway.

But in general i feel like the problem will still exist without an algorithmic change.

anyway +1 to the patch

> FVH can end in very very long running recursion on phrase highlight
> -------------------------------------------------------------------
>                 Key: LUCENE-5182
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5182
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 5.0, 4.4
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 5.0, 4.5
>         Attachments: LUCENE-5182.patch
> due to the nature of FVH extract logic a simple phrase query can put a FHV into a super long running recursion. I had documents taking literally days to return form the extract phrases logic. I have a test that reproduces the problem and a possible fix. The reason for this is that the FVH never tries to early terminate if a phrase is already way beyond the slop coming from the phrase query. If there is a document with lot of occurrences or two or more terms in the phrase this literally tries to match all possible combinations of the terms in the doc.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]