Tracking down inconsistent failure in jenkins

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Tracking down inconsistent failure in jenkins

Raphael Bircher
Hi all

In my welcome thread, Jason pointed me to Hoss's Jenkins failure page. I think this is a good point to start for me. I like to help you tracking down this inconsistent failures. At first, please, if I got something wrong, correct me!

As far as I understand, there are tests in Jenkins who sometimes fail, and sometimes not. It looks like nobody really know why. right?

So I need some information from you. First of all. Do you see the same behavior, if you do the test locally. First I want to exclude that the bug is within the Jenkins system.

Are there bugs who are possibly related to the tests in question?

When this behavior was first detected?

This is it for the start. Thanks for your help and your information

Regards, Raphael

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tracking down inconsistent failure in jenkins

Erick Erickson
Raphael:

Thanks for becoming involved!

It’s super-frustrating that some of the tests on Jenkins do (or do not) reproduce, even if you “beast” them. Hoss’ reports come from many different environments, from Windows to various Java releases to… So “does it fail locally” is a tricky question. Plus, many of the intermittent failures are timing-related, so the speed of your local machine, the other tasks running on your machine etc. can be a factor.

What I do is use Mark Miller’s “beast” script. See: https://gist.github.com/markrmiller/dbdb792216dc98b018ad

Two important parameters to the script above are
- how many separate tests you want to run in parallel. This helps when the failures are timing-related
- how many iterations of the tests you want to run. Each test puts its output in a separate subdirectory, so when a test fails you have the full logs in the corresponding subdirectory.

Then I run the failing test over and over and over. If I can get it to fail (and if you’re getting 0.5% failures, it’s _really_ hit or miss) then I can diagnose the logs in the appropriate directory, possibly add logging and run it all again.

Unfortunately, for intermittently-failing tests, you never _quite_ know if you’ve fixed the problem because your 10,000 iterations may have just lucked out.

Welcome to the joys of distributed computing ;)

Best,
Erick

> On Nov 8, 2019, at 5:31 PM, Raphael Bircher <[hidden email]> wrote:
>
> Hi all
>
> In my welcome thread, Jason pointed me to Hoss's Jenkins failure page. I think this is a good point to start for me. I like to help you tracking down this inconsistent failures. At first, please, if I got something wrong, correct me!
>
> As far as I understand, there are tests in Jenkins who sometimes fail, and sometimes not. It looks like nobody really know why. right?
>
> So I need some information from you. First of all. Do you see the same behavior, if you do the test locally. First I want to exclude that the bug is within the Jenkins system.
>
> Are there bugs who are possibly related to the tests in question?
>
> When this behavior was first detected?
>
> This is it for the start. Thanks for your help and your information
>
> Regards, Raphael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tracking down inconsistent failure in jenkins

Raphael Bircher
Hi Erick

On 2019/11/08 22:49:31, Erick Erickson <[hidden email]> wrote:
> Raphael:
>
> Thanks for becoming involved!
>
> It’s super-frustrating that some of the tests on Jenkins do (or do not) reproduce, even if you “beast” them. Hoss’ reports come from many different environments, from Windows to various Java releases to… So “does it fail locally” is a tricky question. Plus, many of the intermittent failures are timing-related, so the speed of your local machine, the other tasks running on your machine etc. can be a factor.

Ok, I expected something like this. Why are some test timing related? Are there any informations about this.

>
> What I do is use Mark Miller’s “beast” script. See: https://gist.github.com/markrmiller/dbdb792216dc98b018ad
>
> Two important parameters to the script above are
> - how many separate tests you want to run in parallel. This helps when the failures are timing-related
> - how many iterations of the tests you want to run. Each test puts its output in a separate subdirectory, so when a test fails you have the full logs in the corresponding subdirectory.
>
> Then I run the failing test over and over and over. If I can get it to fail (and if you’re getting 0.5% failures, it’s _really_ hit or miss) then I can diagnose the logs in the appropriate directory, possibly add logging and run it all again.
>
> Unfortunately, for intermittently-failing tests, you never _quite_ know if you’ve fixed the problem because your 10,000 iterations may have just lucked out.

So you never get a consistent result, even if you run the same test on one build several times? Can others confirm this behavior?

I was building solr and running the JUnit Tests now.

Regards, Raphael


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tracking down inconsistent failure in jenkins

Raphael Bircher
Hi all

On 2019/11/08 23:56:00, Raphael Bircher <[hidden email]> wrote:

> I was building solr and running the JUnit Tests now.

The tests was running, but I don't find the testlogs ;-)

I got two errors with a self builded solr from the head. I've also seen a Ubuntu machine on Jenkins with two errors. I let now run the test for a second time.

   [junit4] Execution time total: 1 hour 19 minutes 20 seconds
   [junit4] Tests summary: 888 suites (6 ignored), 4543 tests, 2 errors, 212 ignored (183 assumptions)

Regards, Raphael

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tracking down inconsistent failure in jenkins

Erick Erickson
How are you running the tests? Just “ant test”? If so the output is all written to stdout so I usually just redirect it somewhere , e.g. “ant test > results 2>&1”….

That file should have the failures _and_ a “reproduce with” line for each failing test. One thing I didn’t mention is that there’s also a significant bit of randomization in the test harness. Different locales are chosen at random, different directory implementations, different timezones, etc… We had one issue that was a JVM issue that only showed up in the Turkish locale for instance that we’d never have found without the randomization. The “reproduce with” line has all that information echoed and will run the test again with all the same bits of randomization.

It’s relatively rare for the test to fail reliably even if you use the “reproduce with” line because it’s, well, reproducible. When it is, you’ll see a JIRA raised something like “reproducible test failure” and/or someone will jump on it and fix it.

Timing issues: Well, just that. Say a test creates a collection and _assumes_ (no, this isn’t a good practice) that it’ll finish in 5 seconds and it takes 6, then drives on. Oops. Other more subtle issues are just threading issues where some sequence of context switching happens to hit an unanticipated problem. etc.

It’s not that we _never_ get reproducible tests, it’s that when we do someone fixes them. There are a _lot_ of tests in the full suite, so if timing-related tests fail 0.1% of the time…

You can confirm this yourself pretty easily, just save the output and run the “reproduce with” line.

Best,
Erick

> On Nov 8, 2019, at 8:26 PM, Raphael Bircher <[hidden email]> wrote:
>
> Hi all
>
> On 2019/11/08 23:56:00, Raphael Bircher <[hidden email]> wrote:
>
>> I was building solr and running the JUnit Tests now.
>
> The tests was running, but I don't find the testlogs ;-)
>
> I got two errors with a self builded solr from the head. I've also seen a Ubuntu machine on Jenkins with two errors. I let now run the test for a second time.
>
>   [junit4] Execution time total: 1 hour 19 minutes 20 seconds
>   [junit4] Tests summary: 888 suites (6 ignored), 4543 tests, 2 errors, 212 ignored (183 assumptions)
>
> Regards, Raphael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tracking down inconsistent failure in jenkins

Raphael Bircher
Hi Erick, *

On 2019/11/09 15:15:03, Erick Erickson <[hidden email]> wrote:
> How are you running the tests? Just “ant test”? If so the output is all written to stdout so I usually just redirect it somewhere , e.g. “ant test > results 2>&1”….

Yea, I run it with ant test. So the output is the log from the cmd, ok. Is there an other possibility to run the tests?
>
> That file should have the failures _and_ a “reproduce with” line for each failing test. One thing I didn’t mention is that there’s also a significant bit of randomization in the test harness. Different locales are chosen at random, different directory implementations, different timezones, etc… We had one issue that was a JVM issue that only showed up in the Turkish locale for instance that we’d never have found without the randomization. The “reproduce with” line has all that information echoed and will run the test again with all the same bits of randomization.

Ok, this sounds interesting

>
> It’s relatively rare for the test to fail reliably even if you use the “reproduce with” line because it’s, well, reproducible. When it is, you’ll see a JIRA raised something like “reproducible test failure” and/or someone will jump on it and fix it.

So just reproducible issues go into jira? In my experience, it makes sense in some case, to write an issue für a ireproducible bug. So you can collect all data on one place. Sometimes this helps to track the bugs down.
>
> Timing issues: Well, just that. Say a test creates a collection and _assumes_ (no, this isn’t a good practice) that it’ll finish in 5 seconds and it takes 6, then drives on. Oops. Other more subtle issues are just threading issues where some sequence of context switching happens to hit an unanticipated problem. etc.
>
> It’s not that we _never_ get reproducible tests, it’s that when we do someone fixes them. There are a _lot_ of tests in the full suite, so if timing-related tests fail 0.1% of the time…
>
> You can confirm this yourself pretty easily, just save the output and run the “reproduce with” line.
>

Ok, I will go now into the source, and see, what some test does and trying to get a test without errors. Last time I have had 1 error. If it continues like this, I will have 0 next time ;-)

Regards, Raphael
> Best,
> Erick
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tracking down inconsistent failure in jenkins

Erick Erickson

> On Nov 10, 2019, at 7:23 PM, Raphael Bircher <[hidden email]> wrote:
>
> So just reproducible issues go into jira? In my experience, it makes sense in some case, to write an issue für a ireproducible bug. So you can collect all data on one place. Sometimes this helps to track the bugs down.

It’s a judgement call, there’s no hard and fast rule. I don’t think there’s any point in raising a JIRA just saying that something failed and you cannot reproduce it unless you also have some information to add that might help anyone who want’s to work on it track it down. You’ll find a number of JIRAs like that. Or if you’re going to grab one and try to track it down, go ahead and raise a JIRA for it where we can collaborate. Whatever makes most sense...

And here’s another resource:

http://fucit.org/solr-jenkins-reports/

Hossman (Chris Hostetter) set up a system to report the failures on the various machines that run Lucene/Solr tests. I should have mentioned that I often go there when I see a failing test that doesn’t reproduce easily to see if it fails other places too. Especially when I see failures when testing locally after code changes. Each week I try to produce the “BadApple” report with a summary of Hoss’ results over the last 4 weeks.

Yes, this is a bit awkward. It’d be nice if all tests passed 100% of the time, but we’re not there. It’s a long story…..

Erick


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Tracking down inconsistent failure in jenkins

Chris Hostetter-3
In reply to this post by Raphael Bircher

: Ok, I expected something like this. Why are some test timing related? Are there any informations about this.

multi-threaded code.  

In the extreme case, Test falures may be timing related due to buggy race
conditions between multiple threads in "real code" (ie: very problematic
for end users).

More typically what we see is race conditions between a thread in "real
code" and a thread in "test code" (ie: a badly written test that doesn't
harm any production user but is anoying to track down).

Timing factors betwen machines can also cause reproducibility problems in
randomization logic -- the test framework tries to ensure every thread
get's it's own consistent 'Random', but if there are multiple test threads
that both interact with something (ie: a single SolrCore in a stress
test), the order those threads are scheduled by the VM determines the
order that the random values from each of those threads come into play --
ie: the order that 2 diff (random) updates from 2 diff threads hit solr,
potentially triggering a bug in some situations (ie: maybe a certain
ordering of updates).

other example of how  "timing" can cause reproducibility problems is in
unpredictible ordering in lists which are then used to select things at
random...

Example: maybe there is a bug with shutting down nodes that only manifests
if the "leader" replica of a collection is being shutdown -- a
(correctly written) test that might be garunteed that the
seed DEADBEEF will cause it to always spin
up exact 5 nodes and shutdown the 2nd node in the list -- but thread
scheduling during collection creation may cause the 1st node in the list
to _usually_ host the leader, but on rare occasion that 2nd node might
host the leader because of which thread joined the election first.


...these are just some examples off the top of my head based on years of
experience reading test logs ... and that's before we even talk about the
possiblity/anoyance of actual race conditions/deadlocks in code, that
might only trigger on slow machines (or fast machines) based on thread
scheduling ... assuming we even have a test that causes those particular
threads to try and run at the same time :)


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]