[jira] [Commented] (LUCENE-4858) Ability to terminate queries on a per-segment basis

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (LUCENE-4858) Ability to terminate queries on a per-segment basis

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607711#comment-13607711 ]

Adrien Grand commented on LUCENE-4858:

{quote} What in the patch guarantees that any segment with more than maxBufferedDocs is sorted? Perhaps I've missed it, but I looked for code which ensures every such segment gets picked up by SortingMP, however didn't find it.

I don't think that in general we should make assumptions based on a maxBufferedDocs setting because the default setting in IWC is per RAM consumption and also it seems slightly unrelated. I.e. if a segment is sorted, but has deletions such that numDocs < maxBufferedDocs, we do full collection, while we can early terminate as usual?{quote}

Indeed I think that finding out which segments are sorted is the main issue. My idea was to say that if you want to use early query termination, you need to set maxBufferedDocs to a given limit (low values improve early query termination while high values improve indexing speed), so that large segments (the ones that are interesting for early query termination since they require time to collect) that have more than maxBufferedDocs documents (deleted or not) are known to be sorted, because they result from a merge. Of course, this could miss some small segments which are sorted but since they are small, they're not as interesting for early query termination?

What options do we have here? I think you mentionned tagging sorted segments, do you have an idea where/how we could do that?

bq. And hopefully we can stuff the early termination logic down to IndexSearcher eventually. There are other scenarios for early termination, such as time limit, and therefore I think it's ok if we have an EarlyTerminationException which IndexSearcher responds to.

Inded, I think this makes sense.

> Ability to terminate queries on a per-segment basis
> ---------------------------------------------------
>                 Key: LUCENE-4858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4858
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.3
> Spin-off of LUCENE-4752, see https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565 and https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282
> When an index is sorted per-segment, queries that sort according to the index sort order could be early terminated.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]