[jira] Created: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
Result of HashFunction.hash() contains all identical values
-----------------------------------------------------------

                 Key: HADOOP-2365
                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
             Project: Hadoop
          Issue Type: Bug
          Components: contrib/hbase
    Affects Versions: 0.16.0
            Reporter: Andrzej Bialecki
             Fix For: 0.16.0


There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated HADOOP-2365:
--------------------------------------

    Attachment: hash-v1.patch

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549100 ]

Jim Kellerman commented on HADOOP-2365:
---------------------------------------

-1 on patch. This section of code should read:

{code}
int[] result = new int[nbHash];
for (int i = 0, initval = 0; i < nbHash; i++) {
   initval = result[i] = Math.abs(JenkinsHash.hash(b, initval)) % maxValue;
}
return result;
{code}

However, thanks for finding my stupid mistake.



> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reassigned HADOOP-2365:
-------------------------------------

    Assignee: Jim Kellerman

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

Andrzej Białecki-2
In reply to this post by JIRA jira@apache.org
Jim Kellerman (JIRA) wrote:

>     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549100 ]
>
> Jim Kellerman commented on HADOOP-2365:
> ---------------------------------------
>
> -1 on patch. This section of code should read:
>
> {code}
> int[] result = new int[nbHash];
> for (int i = 0, initval = 0; i < nbHash; i++) {
>    initval = result[i] = Math.abs(JenkinsHash.hash(b, initval)) % maxValue;
> }
> return result;
> {code}
>

Yes, this works too - it shouldn't matter in this specific case.
Jenkins' hash has very good avalanche behavior, so even 1 bit difference
in the initvalue yields a completely different hash.


> However, thanks for finding my stupid mistake.

You're welcome. I'm using this class in a different Hadoop application,
where the problem became immediately apparent when I switched from my
home-grown BloomFilter implementation to this one.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2365:
----------------------------------

    Attachment: patch.txt

use the previous result as the next seed.

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2365:
----------------------------------

    Status: Patch Available  (was: Open)

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549165 ]

Hadoop QA commented on HADOOP-2365:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371163/patch.txt
against trunk revision r601818.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1283/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1283/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1283/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1283/console

This message is automatically generated.

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2365:
----------------------------------

    Status: Patch Available  (was: Open)

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2365:
----------------------------------

    Status: Open  (was: Patch Available)

Fix test case

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2365:
----------------------------------

    Attachment: patch.txt

Fixed test case

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549215 ]

Hadoop QA commented on HADOOP-2365:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371179/patch.txt
against trunk revision r601845.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1285/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1285/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1285/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1285/console

This message is automatically generated.

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549219 ]

Jim Kellerman commented on HADOOP-2365:
---------------------------------------

The latest build failed on TestTableJoinMapReduce which does not use bloom filters and consequently has no bearing on this patch.

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2365:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed change. Recent test failure was unrelated to this change.

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549385 ]

Hudson commented on HADOOP-2365:
--------------------------------

Integrated in Hadoop-Nightly #325 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/325/])

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549492 ]

Andrzej Bialecki  commented on HADOOP-2365:
-------------------------------------------

There may be other bugs lurking in BloomFilter / HashFunction. This is very hard to reproduce, but once in a while  (once per hundred million keys tested) I'm getting something like this:

java.lang.ArrayIndexOutOfBoundsException: -1215998
        at org.onelab.filter.BloomFilter.membershipTest(BloomFilter.java:134)


> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Reopened: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reopened HADOOP-2365:
-----------------------------------


ArrayIndexOutOfBounds exception

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549498 ]

Jim Kellerman commented on HADOOP-2365:
---------------------------------------

Andrzej,

Do you think it is occurring for the same key?

If you could provide your initialization parameters and a test example, that would be very helpful.

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-2365:
----------------------------------

    Priority: Minor  (was: Major)

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (HADOOP-2365) Result of HashFunction.hash() contains all identical values

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman resolved HADOOP-2365.
-----------------------------------

    Resolution: Fixed

Closing this issue. Tracking ArrayIndexOutOfBoundsException in HADOOP-2414

> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
>                 Key: HADOOP-2365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2365
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Andrzej Bialecki
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.