[jira] Created: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)
Investigate Rewriting Constant Scoring MultiTermQueries per segment
-------------------------------------------------------------------

                 Key: LUCENE-2130
                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Mark Miller
            Priority: Minor


This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.

But if we could rewrite constant score multi-term queries per segment, MTQ's with auto, constant, or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This way, if you have a bunch of really large segments and a lot of really small segments, you wouldn't apply a huge booleanquery against all of the small segments which don't have those terms anyway. How advantageous this is, I'm not sure yet.

No biggie, not likely, but what the heck.

So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787239#action_12787239 ]

Mark Miller commented on LUCENE-2130:
-------------------------------------

Whoops - a little off in that summary - you would't apply a huge boolean query - you'd just have a sparser filter. This might not be that beneficial.

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto, constant, or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This way, if you have a bunch of really large segments and a lot of really small segments, you wouldn't apply a huge booleanquery against all of the small segments which don't have those terms anyway. How advantageous this is, I'm not sure yet.
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)
In reply to this post by Walter (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787248#action_12787248 ]

Mark Miller commented on LUCENE-2130:
-------------------------------------

Okay - so talking to Robert in chat - the advantage when you are enumerating a lot of terms is that you avoid DirectoryReaders MultiTermEnum and its PQ.

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto, constant, or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This way, if you have a bunch of really large segments and a lot of really small segments, you wouldn't apply a huge booleanquery against all of the small segments which don't have those terms anyway. How advantageous this is, I'm not sure yet.
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)
In reply to this post by Walter (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-2130:
--------------------------------

    Attachment: LUCENE-2130.patch

The ugly patch

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto, constant, or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This way, if you have a bunch of really large segments and a lot of really small segments, you wouldn't apply a huge booleanquery against all of the small segments which don't have those terms anyway. How advantageous this is, I'm not sure yet.
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)
In reply to this post by Walter (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787239#action_12787239 ]

Mark Miller edited comment on LUCENE-2130 at 12/8/09 2:11 AM:
--------------------------------------------------------------

Whoops - a little off in that summary - you would't apply a huge boolean query - you'd just have a sparser filter. This might not be that beneficial.

* edit *

Smaller, sparser filter?

      was (Author: [hidden email]):
    Whoops - a little off in that summary - you would't apply a huge boolean query - you'd just have a sparser filter. This might not be that beneficial.
 

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto, constant, or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This way, if you have a bunch of really large segments and a lot of really small segments, you wouldn't apply a huge booleanquery against all of the small segments which don't have those terms anyway. How advantageous this is, I'm not sure yet.
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)
In reply to this post by Walter (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787250#action_12787250 ]

Mark Miller edited comment on LUCENE-2130 at 12/8/09 2:16 AM:
--------------------------------------------------------------

The ugly patch - (which doesn't yet handle the filter supplied case)

      was (Author: [hidden email]):
    The ugly patch
 

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto, constant, or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This way, if you have a bunch of really large segments and a lot of really small segments, you wouldn't apply a huge booleanquery against all of the small segments which don't have those terms anyway. How advantageous this is, I'm not sure yet.
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)
In reply to this post by Walter (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787269#action_12787269 ]

Robert Muir commented on LUCENE-2130:
-------------------------------------

I like the idea of fixing this problem somehow, here is the background:

I noticed one very bad-performing wildcard query on an index with numbers 0000000-9999999 was the query "*NNNNNN" where N's are random numbers.
the problem is that the query will enumerate the entire term dict but only return 10 results.

in Constant Auto mode, this means it will use DirectoryReader's MultiTerms, even on an optimized index, with a lot of wasted priority queue usage, etc.

This is a corner case in my tests, but the reason I think it would be nice to fix is because things like contrib/regex behave in this way maybe more often, they frequently scan the entire term dictionary only to return a few results.


> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto, constant, or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This way, if you have a bunch of really large segments and a lot of really small segments, you wouldn't apply a huge booleanquery against all of the small segments which don't have those terms anyway. How advantageous this is, I'm not sure yet.
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)
In reply to this post by Walter (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787239#action_12787239 ]

Mark Miller edited comment on LUCENE-2130 at 12/8/09 3:25 AM:
--------------------------------------------------------------

Whoops - a little off in that summary - you would't apply a huge boolean query - you'd just have a sparser filter. This might not be that beneficial.

* edit *

Smaller, sparser filter?

*edit*

Err - in the ConstantScore mode, i guess your really just subdividing the filter - so no real benefit. I didn't realize it, but the constant booleanquery mode does still use the booleanquery (of course, why else have it) - but its only going to be with few clauses, so neither is really a benefit.

      was (Author: [hidden email]):
    Whoops - a little off in that summary - you would't apply a huge boolean query - you'd just have a sparser filter. This might not be that beneficial.

* edit *

Smaller, sparser filter?
 

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto, constant, or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This way, if you have a bunch of really large segments and a lot of really small segments, you wouldn't apply a huge booleanquery against all of the small segments which don't have those terms anyway. How advantageous this is, I'm not sure yet.
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)
In reply to this post by Walter (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-2130:
--------------------------------

    Comment: was deleted

(was: Whoops - a little off in that summary - you would't apply a huge boolean query - you'd just have a sparser filter. This might not be that beneficial.

* edit *

Smaller, sparser filter?

*edit*

Err - in the ConstantScore mode, i guess your really just subdividing the filter - so no real benefit. I didn't realize it, but the constant booleanquery mode does still use the booleanquery (of course, why else have it) - but its only going to be with few clauses, so neither is really a benefit.)

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto, constant, or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This way, if you have a bunch of really large segments and a lot of really small segments, you wouldn't apply a huge booleanquery against all of the small segments which don't have those terms anyway. How advantageous this is, I'm not sure yet.
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)
In reply to this post by Walter (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-2130:
--------------------------------

    Description:
This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.

But if we could rewrite constant score multi-term queries per segment, MTQ's with auto (when the heuristic doesnt cut over to constant filter), or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This also allows you to avoid DirectoryReaders MultiTermEnum and its PQ. (See Roberts comment below).

No biggie, not likely, but what the heck.

So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.

Not sure its worth the baggage for the win - but perhaps the objective can be met in another way.



  was:
This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.

But if we could rewrite constant score multi-term queries per segment, MTQ's with auto, constant, or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This way, if you have a bunch of really large segments and a lot of really small segments, you wouldn't apply a huge booleanquery against all of the small segments which don't have those terms anyway. How advantageous this is, I'm not sure yet.

No biggie, not likely, but what the heck.

So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.




I've spewed too much confusion in this issue - just going to rewrite the summary.

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto (when the heuristic doesnt cut over to constant filter), or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This also allows you to avoid DirectoryReaders MultiTermEnum and its PQ. (See Roberts comment below).
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.
> Not sure its worth the baggage for the win - but perhaps the objective can be met in another way.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)
In reply to this post by Walter (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-2130:
--------------------------------

    Attachment: LUCENE-2130.patch

updated

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: Flex Branch
>
>         Attachments: LUCENE-2130.patch, LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto (when the heuristic doesnt cut over to constant filter), or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This also allows you to avoid DirectoryReaders MultiTermEnum and its PQ. (See Roberts comment below).
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.
> Not sure its worth the baggage for the win - but perhaps the objective can be met in another way.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

Walter (Jira)
In reply to this post by Walter (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-2130:
--------------------------------

    Fix Version/s: Flex Branch

> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: Flex Branch
>
>         Attachments: LUCENE-2130.patch, LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it. The only idea I have come up with is fairly ugly, and unless something better comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's with auto (when the heuristic doesnt cut over to constant filter), or constant boolean rewrite could enum terms against a single segment and then apply a boolean query against each segment with just the terms that are known to be in that segment. This also allows you to avoid DirectoryReaders MultiTermEnum and its PQ. (See Roberts comment below).
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights - lateCnstRewrite or something, that defaults to false. MTQ would return true if its in a constant score mode. On the top level rewrite, if this is detected, an empty ConstantScoreQuery is made, and its Weight is turned to lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets its boost set to the MTQ's boost. Then when we are searching per segment, if the Weight is lateCnstRewrite, we grab the orig query and actually do the rewrite against the subreader and grab the actual constantscore weight. It works I think - but its a little ugly.
> Not sure its worth the baggage for the win - but perhaps the objective can be met in another way.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]