# Random number generators for NDFS block numbers

5 messages
Open this post in threaded view
|

## Random number generators for NDFS block numbers

 Doug Cutting expressed a concern to me about using util.Random to generate random 64 bit block numbers for NDFS.  The following is my analysis. Random number generators for NDFS block numbers    Requirements      capable of billions of block numbers      64 bit block numbers      deterministic for reproducible testing      ideally, want a generator that makes no repeats and is serializable.    Concern      Currently using built-in util.Random, is it good enough?    Background      Theory of hashing        In a hashing system, the collision rate is calculated just like the birthday problem/paradox.          rule of thumb is to have twice as many bits as number of bits needed for the total number of items.            example: 4 bits needed to count 16 items, needs 8 bit hashes.        In a non-hashing system, block numbers are not related to block contents, but the birthday problem still applies.      Theory of  random numbers        The amount of entropy that a PRNG instance has cannot be increased by concatenation.          A full 64 bits are needed to to store 4 billion items.        Diehard test suite is a good randomness benchmark.    Analysis      util.Random is a linear congruential generator (LCG) identical to drand48.        util.Random keeps 48 bits of state and gangs together 2 consecutive values to return 64 bit values.        LCGs suffer from periodicity in the low order bits which would make modular binning less than random.          "low order bits" could mean least significant byte.        LCGs have periods in the range 10^6 to 10^9 when using 32 bit words, a range of poor to fair.        seed = ( 0x5DEECE66DL * seed + 0xBL ) & ((1L << 48) - 1);          the origin of 0x5DEECE66D, a non-prime, is shrouded in the mists of time.        Results of the Birthday Spacings Test look good.       References        http://www.math.utah.edu/~beebe/java/random/README       http://www.pierssen.com/arcview/upload/esoterica/randomizer.html   Alternative      MersenneTwister replacement for Random is best choice, has a mind-bogglingly long period.        Matsumoto and Nishimura (1998)        Longest period of any known generator 2^19937 or about 10^6001        A period that exceeds the number of unique values seems ideal          (intuitively ideal; obviously a shorter period than the number of unique values is a problem).        Faster than standard Random.        Excellent result for Diehard Birthday Spacings Test.        Java Implementations.          All the implementation are derived from the free to use C source.          BSD type license            Sean Luke implementation          GNU Library General Public License            Brian O. Bush implementation          public domain           Michael L. Brundage implementation
Open this post in threaded view
|

## Re: Random number generators for NDFS block numbers

 I'm not into random generators, but this is a useful overview. I know mersenne and a few others are also implemented in Colt bundled library (although it might be one of the implementations you mention): http://hoschek.home.cern.ch/hoschek/colt/V1.0.3/doc/index.htmlD. Paul Baclace wrote: > Doug Cutting expressed a concern to me about using util.Random to generate > random 64 bit block numbers for NDFS.  The following is my analysis. > > Random number generators for NDFS block numbers >   Requirements >     capable of billions of block numbers >     64 bit block numbers >     deterministic for reproducible testing >     ideally, want a generator that makes no repeats and is serializable. >   Concern >     Currently using built-in util.Random, is it good enough? >   Background >     Theory of hashing >       In a hashing system, the collision rate is calculated just like > the birthday problem/paradox. >         rule of thumb is to have twice as many bits as number of bits > needed for the total number of items. >           example: 4 bits needed to count 16 items, needs 8 bit hashes. >       In a non-hashing system, block numbers are not related to block > contents, but the birthday problem still applies. >     Theory of  random numbers >       The amount of entropy that a PRNG instance has cannot be increased > by concatenation. >         A full 64 bits are needed to to store 4 billion items. >       Diehard test suite is a good randomness benchmark. >   Analysis >     util.Random is a linear congruential generator (LCG) identical to > drand48. >       util.Random keeps 48 bits of state and gangs together 2 > consecutive values to return 64 bit values. >       LCGs suffer from periodicity in the low order bits which would > make modular binning less than random. >         "low order bits" could mean least significant byte. >       LCGs have periods in the range 10^6 to 10^9 when using 32 bit > words, a range of poor to fair. >       seed = ( 0x5DEECE66DL * seed + 0xBL ) & ((1L << 48) - 1); >         the origin of 0x5DEECE66D, a non-prime, is shrouded in the mists > of time. >       Results of the Birthday Spacings Test look good. >      References >       http://www.math.utah.edu/~beebe/java/random/README>       http://www.pierssen.com/arcview/upload/esoterica/randomizer.html>   Alternative >     MersenneTwister replacement for Random is best choice, has a > mind-bogglingly long period. >       Matsumoto and Nishimura (1998) >       Longest period of any known generator 2^19937 or about 10^6001 >       A period that exceeds the number of unique values seems ideal >         (intuitively ideal; obviously a shorter period than the number > of unique values is a problem). >       Faster than standard Random. >       Excellent result for Diehard Birthday Spacings Test. >       Java Implementations. >         All the implementation are derived from the free to use C source. >         BSD type license >           Sean Luke implementation >         GNU Library General Public License >           Brian O. Bush implementation >         public domain >       Michael L. Brundage implementation
Open this post in threaded view
|

## Re: Random number generators for NDFS block numbers

 In reply to this post by Paul E. Baclace Thanks to archive.org, I found this additional reference:    http://www.math.sci.hiroshima-u.ac.jp/~m-mat/eindex.htmlwhich is the English home page of Matsumoto san, co-originator of Mersenne Twister. There is a faq and the original C source. Paul Paul Baclace wrote: > [...] >   Alternative >     MersenneTwister replacement for Random is best choice, has a > mind-bogglingly long period. >       Matsumoto and Nishimura (1998)  > [...]