[jira] Created: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

JIRA jira@apache.org
FieldComparator.TermOrdValComparator compares by value unnecessarily
--------------------------------------------------------------------

                 Key: LUCENE-2531
                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
            Reporter: Michael McCandless
            Priority: Minor
             Fix For: 3.1, 4.0


Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.

Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.

This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2531:
---------------------------------------

    Attachment: LUCENE-2504.patch
                LUCENE-2504-3x.patch

Attached patches, for 3.1 and 4.0.

This also cleans up the String/TermOrdValComparator, absorbing the convert method directly into setBottom, and doing away with some dead code.  With this fix we no longer compare by value at all if index is a single segment (I think!).

> FieldComparator.TermOrdValComparator compares by value unnecessarily
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2504-3x.patch, LUCENE-2504.patch
>
>
> Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.
> Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.
> This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2531:
---------------------------------------

    Attachment: LUCENE-2531.patch
                LUCENE-2531-3x.patch

Another iteration on the patches with some small further optimizing.

> FieldComparator.TermOrdValComparator compares by value unnecessarily
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2504-3x.patch, LUCENE-2504.patch, LUCENE-2531-3x.patch, LUCENE-2531.patch
>
>
> Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.
> Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.
> This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887122#action_12887122 ]

Yonik Seeley commented on LUCENE-2531:
--------------------------------------

The thing that jumps out at me is the removal of "reversed"... I'd need to apply the patch to study enough to figure that out, but any clues you could give would be helpful (yes, I'm being lazy ;-)

> FieldComparator.TermOrdValComparator compares by value unnecessarily
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2504-3x.patch, LUCENE-2504.patch, LUCENE-2531-3x.patch, LUCENE-2531.patch
>
>
> Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.
> Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.
> This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887126#action_12887126 ]

Michael McCandless commented on LUCENE-2531:
--------------------------------------------

I hear you; this stuff is kinda hairy ;)

The usage of reverse (and, sortPos) was only in dead code, inside the convert method.

The convert method used to be used more often (to do on-demand convert of the ord when its readerGen != current one), but we stopped doing that and only convert the bottom slot now, which mean this if:

{noformat}
      if (sortPos == 0 && bottomSlot != -1 && bottomSlot != slot) {
{noformat}

Can never be true (bottomSlot is *always* == slot), so that true clause was all dead code anyway.

> FieldComparator.TermOrdValComparator compares by value unnecessarily
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2504-3x.patch, LUCENE-2504.patch, LUCENE-2531-3x.patch, LUCENE-2531.patch
>
>
> Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.
> Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.
> This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-2531) FieldComparator.TermOrdValComparator compares by value unnecessarily

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-2531.
----------------------------------------

    Resolution: Fixed

> FieldComparator.TermOrdValComparator compares by value unnecessarily
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2531
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2504-3x.patch, LUCENE-2504.patch, LUCENE-2531-3x.patch, LUCENE-2531.patch
>
>
> Digging on LUCENE-2504, I noticed that TermOrdValComparator's compareBottom method falls back on compare-by-value when it needn't.
> Specifically, if we know the current bottom ord "matches" the current segment, we can skip the value comparison when the ords are the same (ie, return 0) because the ords are exactly comparable.
> This is hurting string sort perf especially for optimized indices (and also unoptimized indices), and especially for highly redundant (not many unique values) fields.  This affects all releases >= 2.9.x, but trunk is likely more severely affected since looking up a value is more costly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]