Random testing performance (theoretical)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Random testing performance (theoretical)

Dawid Weiss
An interesting link was given to me today:

"The Central Limit Theorem Makes Random Testing Hard"
http://blog.regehr.org/archives/660

Just recently I had a talk at a local university
(http://idss.cs.put.poznan.pl/site/idss-en.html) and the low
theoretical probability of hitting _any_ sensible bug was also raised
as a concern. This is in stark contrast to what we see in practice,
isn't it? It is an interesting phenomenon because it means that (among
other possible explanations) the parameter/ search space of random
tests is just ridden with failures and exceptions and bugs and even
with close-to-zero probability we still hit a lot of them :)

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Random testing performance (theoretical)

Uwe Schindler
Hi Dawid,

I think we do not do completely random testing at all! Random testing (with the limits mentioned in the article) would mean that we do very random testing with also randomly generated tests. In contrast our tests all have a non-random background, means our tests are written with a specific use-case in mind (there are some exemptions, I agree, see below). So with any random parameter they basically test what a non-random junit test would also test. In most cases the randomness is only added on top (to extend the space of parameters we test) - so the chances to find bugs get greater. If we would test Lucene only with also randomly generated tests, I agree we would find no bugs at all. On the other hand, classical junit tests have no randomness at all, they just test that a specific "requirement" is implemented correctly and returns the correct result. The problem with those tests is the fact, that by fixing a bug to do the correct thing in test can break something else (this is especially important for IR tests, they rely on statistics!!!). You could write a test that expects a specific output of a lucene query and you can of course fix all the scoring logic to exactly create that result. Of course any other similar query can return bogus results. So by executing random queries, we can also test statistical things like ranking function works also for a larger set of queries or larger number of documents.

There are some exceptions: The famous TestRandomChains falls exactly in the category you mention: They just build a random chain of analyzer components and the number of combinations you could create is huge! With a few Tokenizers and a list of 150 TokenFilters you can create an insane amount of TokenStream combinations (just start to multiply..., it is the formula for the binomial coefficient). In that case, the chance to find a new bug is really small. The number of failures at the beginning really shows that the TokenStreams were in a very bad state! Now we have only few failures (one more today!), but I am sure there are still billions (ore more) of combinations that break the TokenStream contracts!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of
> Dawid Weiss
> Sent: Sunday, June 24, 2012 3:42 PM
> To: [hidden email]
> Cc: [hidden email]
> Subject: Random testing performance (theoretical)
>
> An interesting link was given to me today:
>
> "The Central Limit Theorem Makes Random Testing Hard"
> http://blog.regehr.org/archives/660
>
> Just recently I had a talk at a local university
> (http://idss.cs.put.poznan.pl/site/idss-en.html) and the low theoretical
> probability of hitting _any_ sensible bug was also raised as a concern. This is in
> stark contrast to what we see in practice, isn't it? It is an interesting
> phenomenon because it means that (among other possible explanations) the
> parameter/ search space of random tests is just ridden with failures and
> exceptions and bugs and even with close-to-zero probability we still hit a lot of
> them :)
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email] For additional
> commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Random testing performance (theoretical)

Dawid Weiss
Thanks Uwe. I think that guy actually refers to randomizing parameter/
variable space for testing, not generating tests at random in general.
I agree with you that any type of generational/ mutational/
metaheuristic-driven tests generation is not a good way to go in
practice. Lucene tests still have a very broad search space, even with
just randomized parameters/ data and for this that blog post would
apply (?).

Anyway, his is one argument in a broader discussion. I don't know
which theoretical approach would be ideal but I know for sure even
typical randomization of parameters reveals bugs pretty damn
quickly...

Dawid

On Sun, Jun 24, 2012 at 5:53 PM, Uwe Schindler <[hidden email]> wrote:

> Hi Dawid,
>
> I think we do not do completely random testing at all! Random testing (with the limits mentioned in the article) would mean that we do very random testing with also randomly generated tests. In contrast our tests all have a non-random background, means our tests are written with a specific use-case in mind (there are some exemptions, I agree, see below). So with any random parameter they basically test what a non-random junit test would also test. In most cases the randomness is only added on top (to extend the space of parameters we test) - so the chances to find bugs get greater. If we would test Lucene only with also randomly generated tests, I agree we would find no bugs at all. On the other hand, classical junit tests have no randomness at all, they just test that a specific "requirement" is implemented correctly and returns the correct result. The problem with those tests is the fact, that by fixing a bug to do the correct thing in test can break something else (this is especially important for IR tests, they rely on statistics!!!). You could write a test that expects a specific output of a lucene query and you can of course fix all the scoring logic to exactly create that result. Of course any other similar query can return bogus results. So by executing random queries, we can also test statistical things like ranking function works also for a larger set of queries or larger number of documents.
>
> There are some exceptions: The famous TestRandomChains falls exactly in the category you mention: They just build a random chain of analyzer components and the number of combinations you could create is huge! With a few Tokenizers and a list of 150 TokenFilters you can create an insane amount of TokenStream combinations (just start to multiply..., it is the formula for the binomial coefficient). In that case, the chance to find a new bug is really small. The number of failures at the beginning really shows that the TokenStreams were in a very bad state! Now we have only few failures (one more today!), but I am sure there are still billions (ore more) of combinations that break the TokenStream contracts!
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]] On Behalf Of
>> Dawid Weiss
>> Sent: Sunday, June 24, 2012 3:42 PM
>> To: [hidden email]
>> Cc: [hidden email]
>> Subject: Random testing performance (theoretical)
>>
>> An interesting link was given to me today:
>>
>> "The Central Limit Theorem Makes Random Testing Hard"
>> http://blog.regehr.org/archives/660
>>
>> Just recently I had a talk at a local university
>> (http://idss.cs.put.poznan.pl/site/idss-en.html) and the low theoretical
>> probability of hitting _any_ sensible bug was also raised as a concern. This is in
>> stark contrast to what we see in practice, isn't it? It is an interesting
>> phenomenon because it means that (among other possible explanations) the
>> parameter/ search space of random tests is just ridden with failures and
>> exceptions and bugs and even with close-to-zero probability we still hit a lot of
>> them :)
>>
>> Dawid
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email] For additional
>> commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Random testing performance (theoretical)

Robert Muir
On Sun, Jun 24, 2012 at 2:46 PM, Dawid Weiss
<[hidden email]> wrote:
>
> Anyway, his is one argument in a broader discussion. I don't know
> which theoretical approach would be ideal but I know for sure even
> typical randomization of parameters reveals bugs pretty damn
> quickly...
>

Some of it might be true: e.g. on my java6 there are 152 locales and
615 timezones. But in general these parameters are independent for our
purposes: we just want to make sure we handle timezones correctly, and
we handle locales correctly. If you don't do this, usually you find
out very very fast because the code is totally broken in some way. I
don't think we need to worry about the fact that we surely havent
tested all 93,480 combinations, maybe there is a lurking bug out
there, but what we do is way better than most projects, and at least
we make a best effort to prevent bugs like
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6240963 without
actually slowing down test runs.

--
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]