[jira] Created: (SOLR-114) HashDocSet new hash(), andNot(), union()

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-114) HashDocSet new hash(), andNot(), union()

JIRA jira@apache.org
HashDocSet new hash(), andNot(), union()
----------------------------------------

                 Key: SOLR-114
                 URL: https://issues.apache.org/jira/browse/SOLR-114
             Project: Solr
          Issue Type: Improvement
          Components: search
            Reporter: Yonik Seeley


Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union().

While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-114) HashDocSet new hash(), andNot(), union()

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-114:
------------------------------

    Attachment: hashdocset.patch

> HashDocSet new hash(), andNot(), union()
> ----------------------------------------
>
>                 Key: SOLR-114
>                 URL: https://issues.apache.org/jira/browse/SOLR-114
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Yonik Seeley
>         Attachments: hashdocset.patch
>
>
> Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union().
> While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-114) HashDocSet new hash(), andNot(), union()

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466154 ]

Yonik Seeley commented on SOLR-114:
-----------------------------------

Performance results:
  - HashDocSet.exists() is 13% faster
  - HashDocSet.intersectionSize() is thus 9% faster
  - HashDocSet.union() is 20 times faster
  - HashDocSet.andNot() is 27 times faster

Tested with Sun JDK6 -server on a P4

> HashDocSet new hash(), andNot(), union()
> ----------------------------------------
>
>                 Key: SOLR-114
>                 URL: https://issues.apache.org/jira/browse/SOLR-114
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Yonik Seeley
>         Attachments: hashdocset.patch
>
>
> Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union().
> While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-114) HashDocSet new hash(), andNot(), union()

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466160 ]

Hoss Man commented on SOLR-114:
-------------------------------

quick questions...

1) what test did you run to get those numbers? ... even if we don't commit it, we should attach it to this Jira issue
2) we should probably test at least the Sun 1.5 JVM as well right?




> HashDocSet new hash(), andNot(), union()
> ----------------------------------------
>
>                 Key: SOLR-114
>                 URL: https://issues.apache.org/jira/browse/SOLR-114
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Yonik Seeley
>         Attachments: hashdocset.patch
>
>
> Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union().
> While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-114) HashDocSet new hash(), andNot(), union()

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466166 ]

Yonik Seeley commented on SOLR-114:
-----------------------------------

The performance tests are commented out in the TestDocSet test... I  had other changes in my tree related to negative queries and only selected the two source files for diffs.

I had quickly tested Java5 to make sure it was still faster in all instances, and it was.  Numbers were about the same, some speedups larger and some smaller than Java6.

> HashDocSet new hash(), andNot(), union()
> ----------------------------------------
>
>                 Key: SOLR-114
>                 URL: https://issues.apache.org/jira/browse/SOLR-114
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Yonik Seeley
>         Attachments: hashdocset.patch
>
>
> Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union().
> While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-114) HashDocSet new hash(), andNot(), union()

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-114:
------------------------------

    Attachment: test.patch

> HashDocSet new hash(), andNot(), union()
> ----------------------------------------
>
>                 Key: SOLR-114
>                 URL: https://issues.apache.org/jira/browse/SOLR-114
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Yonik Seeley
>         Attachments: hashdocset.patch, test.patch
>
>
> Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union().
> While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-114) HashDocSet new hash(), andNot(), union()

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466176 ]

Yonik Seeley commented on SOLR-114:
-----------------------------------

tested on an AMD opteron, 64 bit mode, Java5 -server -Xbatch and exists() was 8.5% faster, intersectionSize() was 7% faster.
I didn't bother testing union(), andNot(), as they are obviously going to be much faster.


> HashDocSet new hash(), andNot(), union()
> ----------------------------------------
>
>                 Key: SOLR-114
>                 URL: https://issues.apache.org/jira/browse/SOLR-114
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Yonik Seeley
>         Attachments: hashdocset.patch, test.patch
>
>
> Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union().
> While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (SOLR-114) HashDocSet new hash(), andNot(), union()

Mike Klaas
On 1/19/07, Yonik Seeley (JIRA) <[hidden email]> wrote:
>
>     [ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466176 ]
>
> Yonik Seeley commented on SOLR-114:
> -----------------------------------
>
> tested on an AMD opteron, 64 bit mode, Java5 -server -Xbatch and exists() was 8.5% faster, intersectionSize() was 7% faster.
> I didn't bother testing union(), andNot(), as they are obviously going to be much faster.

Nice job!

-Mike
Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (SOLR-114) HashDocSet new hash(), andNot(), union()

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley resolved SOLR-114.
-------------------------------

    Resolution: Fixed

committed.

> HashDocSet new hash(), andNot(), union()
> ----------------------------------------
>
>                 Key: SOLR-114
>                 URL: https://issues.apache.org/jira/browse/SOLR-114
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Yonik Seeley
>         Attachments: hashdocset.patch, test.patch
>
>
> Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union().
> While I was in there, I did a re-analysis of hash collision rates and came up with a cool new hash method that goes directly into a linear scan and is hence simpler, faster, and has fewer collisions.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Resolved: (SOLR-114) HashDocSet new hash(), andNot(), union()

Mike Klaas
On 1/20/07, Yonik Seeley (JIRA) <[hidden email]> wrote:

> > Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union().

Out of curiosity, what is your current plan for this?  Something along
the lines of storing a negated flag, which would be used to do
andNot() rather than intersection() in SolrIndexSearcher.getDocSet()?

I think it would be a great feature and can help out with devel or review.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Resolved: (SOLR-114) HashDocSet new hash(), andNot(), union()

Yonik Seeley-2
On 1/21/07, Mike Klaas <[hidden email]> wrote:
> On 1/20/07, Yonik Seeley (JIRA) <[hidden email]> wrote:
>
> > > Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union().
>
> Out of curiosity, what is your current plan for this?  Something along
> the lines of storing a negated flag, which would be used to do
> andNot() rather than intersection() in SolrIndexSearcher.getDocSet()?
>
> I think it would be a great feature and can help out with devel or review.

There are two related things, and I'm only tackling one.  I'm *not
*looking at a generated DocSet and then choosing to try and cache it
as an inverse if it would be smaller.

I am looking at queries, and determining if they are negative (no
positive elements, currently matches nothing in Lucene).  If they are
negative, I generate and cache the positive version, and do andNot()
for operations.

Code is done but untested (no test code yet even).  I'll add a draft
for your review now.

-Yonik