Status of solr tests

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Status of solr tests

Simon Willnauer-4
folks,

I got more active working on IndexWriter and Soft-Deletes etc. in the
last couple of weeks. It's a blast again and I really enjoy it. The
one thing that is IMO not acceptable is the status of solr tests. I
tried so many times to get them passing on several different OSs but
it seems this is pretty hopepless. It's get's even worse the
Lucene/Solr QA job literally marks every ticket I attach a patch to as
`-1` because of arbitrary solr tests, here is an example:

|| Reason || Tests ||
| Failed junit tests | solr.rest.TestManagedResourceStorage |
|   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
|   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
|   | solr.client.solrj.impl.CloudSolrClientTest |
|   | solr.common.util.TestJsonRecordReader |

Speaking to other committers I hear we should just disable this job. Sorry, WTF?

These tests seem to fail all the time, randomly and over and over
again. This renders the test as entirely useless to me. I even invest
time (wrong, I invested) looking into it if they are caused by me or
if I can do something about it. Yet, someone could call me out for
being responsible for them as a commiter, yes I am hence this email. I
don't think I am obliged to fix them. These projects have 50+
committers and having a shared codebase doesn't mean everybody has to
take care of everything. I think we are at the point where if I work
on Lucene I won't run solr tests at all otherwise there won't be any
progress. On the other hand solr tests never pass I wonder if the solr
code-base gets changes nevertheless? That is again a terrible
situation.

I spoke to varun and  anshum during buzzwords if they can give me some
hints what I am doing wrong but it seems like the way it is. I feel
terrible pushing stuff to our repo still seeing our tests fail. I get
~15 build failures from solr tests a day I am not the only one that
has mail filters to archive them if there isn't a lucene tests in the
failures.

This is a terrible state folks, how do we fix it? It's the lucene land
that get much love on the testing end but that also requires more work
on it, I expect solr to do the same. That at the same time requires
stop pushing new stuff until the situation is under control. The
effort of marking stuff as bad apples isn't the answer, this requires
effort from the drivers behind this project.

simon

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

david.w.smiley@gmail.com
(Sigh) I sympathize with your points Simon.  I'm +1 to modify the Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and are trying to improve the stability of the Solr tests but even optimistically the practical reality is that it won't be good enough anytime soon.  When we get there, we can reverse this.

On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer <[hidden email]> wrote:
folks,

I got more active working on IndexWriter and Soft-Deletes etc. in the
last couple of weeks. It's a blast again and I really enjoy it. The
one thing that is IMO not acceptable is the status of solr tests. I
tried so many times to get them passing on several different OSs but
it seems this is pretty hopepless. It's get's even worse the
Lucene/Solr QA job literally marks every ticket I attach a patch to as
`-1` because of arbitrary solr tests, here is an example:

|| Reason || Tests ||
| Failed junit tests | solr.rest.TestManagedResourceStorage |
|   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
|   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
|   | solr.client.solrj.impl.CloudSolrClientTest |
|   | solr.common.util.TestJsonRecordReader |

Speaking to other committers I hear we should just disable this job. Sorry, WTF?

These tests seem to fail all the time, randomly and over and over
again. This renders the test as entirely useless to me. I even invest
time (wrong, I invested) looking into it if they are caused by me or
if I can do something about it. Yet, someone could call me out for
being responsible for them as a commiter, yes I am hence this email. I
don't think I am obliged to fix them. These projects have 50+
committers and having a shared codebase doesn't mean everybody has to
take care of everything. I think we are at the point where if I work
on Lucene I won't run solr tests at all otherwise there won't be any
progress. On the other hand solr tests never pass I wonder if the solr
code-base gets changes nevertheless? That is again a terrible
situation.

I spoke to varun and  anshum during buzzwords if they can give me some
hints what I am doing wrong but it seems like the way it is. I feel
terrible pushing stuff to our repo still seeing our tests fail. I get
~15 build failures from solr tests a day I am not the only one that
has mail filters to archive them if there isn't a lucene tests in the
failures.

This is a terrible state folks, how do we fix it? It's the lucene land
that get much love on the testing end but that also requires more work
on it, I expect solr to do the same. That at the same time requires
stop pushing new stuff until the situation is under control. The
effort of marking stuff as bad apples isn't the answer, this requires
effort from the drivers behind this project.

simon

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

Erick Erickson
(Siiiiggggghhhh) All very true. You're not alone in your frustration.

I've been trying to at least BadApple tests that fail consistently, so
another option could be to disable BadApple'd tests. My hope has been
to get to the point of being able to reliably get clean runs, at least
when BadApple'd tests are disabled.

From that point I want to draw a line in the sand and immediately
address tests that fail that are _not_ BadApple'd. At least then we'll
stop getting _worse_. And then we can work on the BadApple'd tests.
But as David says, that's not going to be any time soon. It's been a
couple of months that I've been trying to just get the tests
BadApple'd without even trying to fix any of them.

It's particularly pernicious because with all the noise we don't see
failures we _should_ see.

So I don't have any good short-term answer either. We've built up a
very large technical debt in the testing. The first step is to stop
adding more debt, which is what I've been working on so far. And
that's the easy part....

Siiiiiiiiiiiiiigggggggggghhhhhhhhhh

Erick


On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]> wrote:

> (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
> Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and are
> trying to improve the stability of the Solr tests but even optimistically
> the practical reality is that it won't be good enough anytime soon.  When we
> get there, we can reverse this.
>
> On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer <[hidden email]>
> wrote:
>>
>> folks,
>>
>> I got more active working on IndexWriter and Soft-Deletes etc. in the
>> last couple of weeks. It's a blast again and I really enjoy it. The
>> one thing that is IMO not acceptable is the status of solr tests. I
>> tried so many times to get them passing on several different OSs but
>> it seems this is pretty hopepless. It's get's even worse the
>> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>> `-1` because of arbitrary solr tests, here is an example:
>>
>> || Reason || Tests ||
>> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>> |   | solr.client.solrj.impl.CloudSolrClientTest |
>> |   | solr.common.util.TestJsonRecordReader |
>>
>> Speaking to other committers I hear we should just disable this job.
>> Sorry, WTF?
>>
>> These tests seem to fail all the time, randomly and over and over
>> again. This renders the test as entirely useless to me. I even invest
>> time (wrong, I invested) looking into it if they are caused by me or
>> if I can do something about it. Yet, someone could call me out for
>> being responsible for them as a commiter, yes I am hence this email. I
>> don't think I am obliged to fix them. These projects have 50+
>> committers and having a shared codebase doesn't mean everybody has to
>> take care of everything. I think we are at the point where if I work
>> on Lucene I won't run solr tests at all otherwise there won't be any
>> progress. On the other hand solr tests never pass I wonder if the solr
>> code-base gets changes nevertheless? That is again a terrible
>> situation.
>>
>> I spoke to varun and  anshum during buzzwords if they can give me some
>> hints what I am doing wrong but it seems like the way it is. I feel
>> terrible pushing stuff to our repo still seeing our tests fail. I get
>> ~15 build failures from solr tests a day I am not the only one that
>> has mail filters to archive them if there isn't a lucene tests in the
>> failures.
>>
>> This is a terrible state folks, how do we fix it? It's the lucene land
>> that get much love on the testing end but that also requires more work
>> on it, I expect solr to do the same. That at the same time requires
>> stop pushing new stuff until the situation is under control. The
>> effort of marking stuff as bad apples isn't the answer, this requires
>> effort from the drivers behind this project.
>>
>> simon
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

Mark Miller-3
There is an okay chance I'm going to start making some improvements here as well. I've been working on a very stable set of tests on my starburst branch and will slowly bring in test fixes over time (I've already been making some on that branch for important tests). We should currently be defaulting to tests.badapples=false on all solr test runs - it's a joke to try and get a clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat commonly have so far avoided Erick's @BadApple hack and slash. They are bad appled on my dev branch now, but that is currently where any time I have is spent rather than on the main dev branches.

Also, too many flakey tests are introduced because devs are not beasting or beasting well before committing new heavy tests. Perhaps we could add some docs around that.

We have built in beasting support, we need to emphasize that a couple passes on a new test is not sufficient to test it's quality.

- Mark

On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[hidden email]> wrote:
(Siiiiggggghhhh) All very true. You're not alone in your frustration.

I've been trying to at least BadApple tests that fail consistently, so
another option could be to disable BadApple'd tests. My hope has been
to get to the point of being able to reliably get clean runs, at least
when BadApple'd tests are disabled.

From that point I want to draw a line in the sand and immediately
address tests that fail that are _not_ BadApple'd. At least then we'll
stop getting _worse_. And then we can work on the BadApple'd tests.
But as David says, that's not going to be any time soon. It's been a
couple of months that I've been trying to just get the tests
BadApple'd without even trying to fix any of them.

It's particularly pernicious because with all the noise we don't see
failures we _should_ see.

So I don't have any good short-term answer either. We've built up a
very large technical debt in the testing. The first step is to stop
adding more debt, which is what I've been working on so far. And
that's the easy part....

Siiiiiiiiiiiiiigggggggggghhhhhhhhhh

Erick


On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]> wrote:
> (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
> Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and are
> trying to improve the stability of the Solr tests but even optimistically
> the practical reality is that it won't be good enough anytime soon.  When we
> get there, we can reverse this.
>
> On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer <[hidden email]>
> wrote:
>>
>> folks,
>>
>> I got more active working on IndexWriter and Soft-Deletes etc. in the
>> last couple of weeks. It's a blast again and I really enjoy it. The
>> one thing that is IMO not acceptable is the status of solr tests. I
>> tried so many times to get them passing on several different OSs but
>> it seems this is pretty hopepless. It's get's even worse the
>> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>> `-1` because of arbitrary solr tests, here is an example:
>>
>> || Reason || Tests ||
>> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>> |   | solr.client.solrj.impl.CloudSolrClientTest |
>> |   | solr.common.util.TestJsonRecordReader |
>>
>> Speaking to other committers I hear we should just disable this job.
>> Sorry, WTF?
>>
>> These tests seem to fail all the time, randomly and over and over
>> again. This renders the test as entirely useless to me. I even invest
>> time (wrong, I invested) looking into it if they are caused by me or
>> if I can do something about it. Yet, someone could call me out for
>> being responsible for them as a commiter, yes I am hence this email. I
>> don't think I am obliged to fix them. These projects have 50+
>> committers and having a shared codebase doesn't mean everybody has to
>> take care of everything. I think we are at the point where if I work
>> on Lucene I won't run solr tests at all otherwise there won't be any
>> progress. On the other hand solr tests never pass I wonder if the solr
>> code-base gets changes nevertheless? That is again a terrible
>> situation.
>>
>> I spoke to varun and  anshum during buzzwords if they can give me some
>> hints what I am doing wrong but it seems like the way it is. I feel
>> terrible pushing stuff to our repo still seeing our tests fail. I get
>> ~15 build failures from solr tests a day I am not the only one that
>> has mail filters to archive them if there isn't a lucene tests in the
>> failures.
>>
>> This is a terrible state folks, how do we fix it? It's the lucene land
>> that get much love on the testing end but that also requires more work
>> on it, I expect solr to do the same. That at the same time requires
>> stop pushing new stuff until the situation is under control. The
>> effort of marking stuff as bad apples isn't the answer, this requires
>> effort from the drivers behind this project.
>>
>> simon
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

--
Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

Erick Erickson
Mark (and everyone).

I'm trying to be somewhat conservative about what I BadApple, at this
point it's only things that have failed every week for the last 4.
Part of that conservatism is to avoid BadApple'ing tests that are
failing and _should_ fail.

I'm explicitly _not_ delving into any of the causes at all at this
point, it's overwhelming until we reduce the noise as everyone knows.

So please feel totally free to BadApple anything you know is flakey,
it won't intrude on my turf ;)

And since I realized I can also report tests that have _not_ failed in
a month that _are_ BadApple'd, we can be a little freer with
BadApple'ing tests since there's a mechanism for un-annotating them
without a lot of tedious effort.

FWIW.

On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[hidden email]> wrote:

> There is an okay chance I'm going to start making some improvements here as
> well. I've been working on a very stable set of tests on my starburst branch
> and will slowly bring in test fixes over time (I've already been making some
> on that branch for important tests). We should currently be defaulting to
> tests.badapples=false on all solr test runs - it's a joke to try and get a
> clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
> commonly have so far avoided Erick's @BadApple hack and slash. They are bad
> appled on my dev branch now, but that is currently where any time I have is
> spent rather than on the main dev branches.
>
> Also, too many flakey tests are introduced because devs are not beasting or
> beasting well before committing new heavy tests. Perhaps we could add some
> docs around that.
>
> We have built in beasting support, we need to emphasize that a couple passes
> on a new test is not sufficient to test it's quality.
>
> - Mark
>
> On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[hidden email]>
> wrote:
>>
>> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>>
>> I've been trying to at least BadApple tests that fail consistently, so
>> another option could be to disable BadApple'd tests. My hope has been
>> to get to the point of being able to reliably get clean runs, at least
>> when BadApple'd tests are disabled.
>>
>> From that point I want to draw a line in the sand and immediately
>> address tests that fail that are _not_ BadApple'd. At least then we'll
>> stop getting _worse_. And then we can work on the BadApple'd tests.
>> But as David says, that's not going to be any time soon. It's been a
>> couple of months that I've been trying to just get the tests
>> BadApple'd without even trying to fix any of them.
>>
>> It's particularly pernicious because with all the noise we don't see
>> failures we _should_ see.
>>
>> So I don't have any good short-term answer either. We've built up a
>> very large technical debt in the testing. The first step is to stop
>> adding more debt, which is what I've been working on so far. And
>> that's the easy part....
>>
>> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>>
>> Erick
>>
>>
>> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]>
>> wrote:
>> > (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>> > are
>> > trying to improve the stability of the Solr tests but even
>> > optimistically
>> > the practical reality is that it won't be good enough anytime soon.
>> > When we
>> > get there, we can reverse this.
>> >
>> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>> > <[hidden email]>
>> > wrote:
>> >>
>> >> folks,
>> >>
>> >> I got more active working on IndexWriter and Soft-Deletes etc. in the
>> >> last couple of weeks. It's a blast again and I really enjoy it. The
>> >> one thing that is IMO not acceptable is the status of solr tests. I
>> >> tried so many times to get them passing on several different OSs but
>> >> it seems this is pretty hopepless. It's get's even worse the
>> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>> >> `-1` because of arbitrary solr tests, here is an example:
>> >>
>> >> || Reason || Tests ||
>> >> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>> >> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>> >> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>> >> |   | solr.client.solrj.impl.CloudSolrClientTest |
>> >> |   | solr.common.util.TestJsonRecordReader |
>> >>
>> >> Speaking to other committers I hear we should just disable this job.
>> >> Sorry, WTF?
>> >>
>> >> These tests seem to fail all the time, randomly and over and over
>> >> again. This renders the test as entirely useless to me. I even invest
>> >> time (wrong, I invested) looking into it if they are caused by me or
>> >> if I can do something about it. Yet, someone could call me out for
>> >> being responsible for them as a commiter, yes I am hence this email. I
>> >> don't think I am obliged to fix them. These projects have 50+
>> >> committers and having a shared codebase doesn't mean everybody has to
>> >> take care of everything. I think we are at the point where if I work
>> >> on Lucene I won't run solr tests at all otherwise there won't be any
>> >> progress. On the other hand solr tests never pass I wonder if the solr
>> >> code-base gets changes nevertheless? That is again a terrible
>> >> situation.
>> >>
>> >> I spoke to varun and  anshum during buzzwords if they can give me some
>> >> hints what I am doing wrong but it seems like the way it is. I feel
>> >> terrible pushing stuff to our repo still seeing our tests fail. I get
>> >> ~15 build failures from solr tests a day I am not the only one that
>> >> has mail filters to archive them if there isn't a lucene tests in the
>> >> failures.
>> >>
>> >> This is a terrible state folks, how do we fix it? It's the lucene land
>> >> that get much love on the testing end but that also requires more work
>> >> on it, I expect solr to do the same. That at the same time requires
>> >> stop pushing new stuff until the situation is under control. The
>> >> effort of marking stuff as bad apples isn't the answer, this requires
>> >> effort from the drivers behind this project.
>> >>
>> >> simon
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> > --
>> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> > http://www.solrenterprisesearchserver.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
> --
> - Mark
> about.me/markrmiller

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

Martin Gainty

Erick-

appears that style mis-application may be categorised as INFO
are mixed in with SEVERE errors

Would it make sense to filter the errors based on severity ?


https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html

if you know Severity you can triage the SEVERE errors before working down to INFO errors

WDYT?

Martin 
______________________________________________




From: Erick Erickson <[hidden email]>
Sent: Friday, June 15, 2018 1:05 PM
To: [hidden email]; Mark Miller
Subject: Re: Status of solr tests
 
Mark (and everyone).

I'm trying to be somewhat conservative about what I BadApple, at this
point it's only things that have failed every week for the last 4.
Part of that conservatism is to avoid BadApple'ing tests that are
failing and _should_ fail.

I'm explicitly _not_ delving into any of the causes at all at this
point, it's overwhelming until we reduce the noise as everyone knows.

So please feel totally free to BadApple anything you know is flakey,
it won't intrude on my turf ;)

And since I realized I can also report tests that have _not_ failed in
a month that _are_ BadApple'd, we can be a little freer with
BadApple'ing tests since there's a mechanism for un-annotating them
without a lot of tedious effort.

FWIW.

On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[hidden email]> wrote:
> There is an okay chance I'm going to start making some improvements here as
> well. I've been working on a very stable set of tests on my starburst branch
> and will slowly bring in test fixes over time (I've already been making some
> on that branch for important tests). We should currently be defaulting to
> tests.badapples=false on all solr test runs - it's a joke to try and get a
> clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
> commonly have so far avoided Erick's @BadApple hack and slash. They are bad
> appled on my dev branch now, but that is currently where any time I have is
> spent rather than on the main dev branches.
>
> Also, too many flakey tests are introduced because devs are not beasting or
> beasting well before committing new heavy tests. Perhaps we could add some
> docs around that.
>
> We have built in beasting support, we need to emphasize that a couple passes
> on a new test is not sufficient to test it's quality.
>
> - Mark
>
> On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[hidden email]>
> wrote:
>>
>> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>>
>> I've been trying to at least BadApple tests that fail consistently, so
>> another option could be to disable BadApple'd tests. My hope has been
>> to get to the point of being able to reliably get clean runs, at least
>> when BadApple'd tests are disabled.
>>
>> From that point I want to draw a line in the sand and immediately
>> address tests that fail that are _not_ BadApple'd. At least then we'll
>> stop getting _worse_. And then we can work on the BadApple'd tests.
>> But as David says, that's not going to be any time soon. It's been a
>> couple of months that I've been trying to just get the tests
>> BadApple'd without even trying to fix any of them.
>>
>> It's particularly pernicious because with all the noise we don't see
>> failures we _should_ see.
>>
>> So I don't have any good short-term answer either. We've built up a
>> very large technical debt in the testing. The first step is to stop
>> adding more debt, which is what I've been working on so far. And
>> that's the easy part....
>>
>> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>>
>> Erick
>>
>>
>> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]>
>> wrote:
>> > (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>> > are
>> > trying to improve the stability of the Solr tests but even
>> > optimistically
>> > the practical reality is that it won't be good enough anytime soon.
>> > When we
>> > get there, we can reverse this.
>> >
>> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>> > <[hidden email]>
>> > wrote:
>> >>
>> >> folks,
>> >>
>> >> I got more active working on IndexWriter and Soft-Deletes etc. in the
>> >> last couple of weeks. It's a blast again and I really enjoy it. The
>> >> one thing that is IMO not acceptable is the status of solr tests. I
>> >> tried so many times to get them passing on several different OSs but
>> >> it seems this is pretty hopepless. It's get's even worse the
>> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>> >> `-1` because of arbitrary solr tests, here is an example:
>> >>
>> >> || Reason || Tests ||
>> >> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>> >> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>> >> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>> >> |   | solr.client.solrj.impl.CloudSolrClientTest |
>> >> |   | solr.common.util.TestJsonRecordReader |
>> >>
>> >> Speaking to other committers I hear we should just disable this job.
>> >> Sorry, WTF?
>> >>
>> >> These tests seem to fail all the time, randomly and over and over
>> >> again. This renders the test as entirely useless to me. I even invest
>> >> time (wrong, I invested) looking into it if they are caused by me or
>> >> if I can do something about it. Yet, someone could call me out for
>> >> being responsible for them as a commiter, yes I am hence this email. I
>> >> don't think I am obliged to fix them. These projects have 50+
>> >> committers and having a shared codebase doesn't mean everybody has to
>> >> take care of everything. I think we are at the point where if I work
>> >> on Lucene I won't run solr tests at all otherwise there won't be any
>> >> progress. On the other hand solr tests never pass I wonder if the solr
>> >> code-base gets changes nevertheless? That is again a terrible
>> >> situation.
>> >>
>> >> I spoke to varun and  anshum during buzzwords if they can give me some
>> >> hints what I am doing wrong but it seems like the way it is. I feel
>> >> terrible pushing stuff to our repo still seeing our tests fail. I get
>> >> ~15 build failures from solr tests a day I am not the only one that
>> >> has mail filters to archive them if there isn't a lucene tests in the
>> >> failures.
>> >>
>> >> This is a terrible state folks, how do we fix it? It's the lucene land
>> >> that get much love on the testing end but that also requires more work
>> >> on it, I expect solr to do the same. That at the same time requires
>> >> stop pushing new stuff until the situation is under control. The
>> >> effort of marking stuff as bad apples isn't the answer, this requires
>> >> effort from the drivers behind this project.
>> >>
>> >> simon
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> > --
>> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:


>> > http://www.solrenterprisesearchserver.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
> --
> - Mark
> about.me/markrmiller

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

Robert Muir
can we disable this bot already?

On Fri, Jun 15, 2018, 7:25 PM Martin Gainty <[hidden email]> wrote:

Erick-

appears that style mis-application may be categorised as INFO
are mixed in with SEVERE errors

Would it make sense to filter the errors based on severity ?


https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html

The Level class defines a set of standard logging levels that can be used to control logging output. The logging Level objects are ordered and are specified by ordered integers.
if you know Severity you can triage the SEVERE errors before working down to INFO errors

WDYT?

Martin 
______________________________________________




From: Erick Erickson <[hidden email]>
Sent: Friday, June 15, 2018 1:05 PM
To: [hidden email]; Mark Miller
Subject: Re: Status of solr tests
 
Mark (and everyone).

I'm trying to be somewhat conservative about what I BadApple, at this
point it's only things that have failed every week for the last 4.
Part of that conservatism is to avoid BadApple'ing tests that are
failing and _should_ fail.

I'm explicitly _not_ delving into any of the causes at all at this
point, it's overwhelming until we reduce the noise as everyone knows.

So please feel totally free to BadApple anything you know is flakey,
it won't intrude on my turf ;)

And since I realized I can also report tests that have _not_ failed in
a month that _are_ BadApple'd, we can be a little freer with
BadApple'ing tests since there's a mechanism for un-annotating them
without a lot of tedious effort.

FWIW.

On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[hidden email]> wrote:
> There is an okay chance I'm going to start making some improvements here as
> well. I've been working on a very stable set of tests on my starburst branch
> and will slowly bring in test fixes over time (I've already been making some
> on that branch for important tests). We should currently be defaulting to
> tests.badapples=false on all solr test runs - it's a joke to try and get a
> clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
> commonly have so far avoided Erick's @BadApple hack and slash. They are bad
> appled on my dev branch now, but that is currently where any time I have is
> spent rather than on the main dev branches.
>
> Also, too many flakey tests are introduced because devs are not beasting or
> beasting well before committing new heavy tests. Perhaps we could add some
> docs around that.
>
> We have built in beasting support, we need to emphasize that a couple passes
> on a new test is not sufficient to test it's quality.
>
> - Mark
>
> On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[hidden email]>
> wrote:
>>
>> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>>
>> I've been trying to at least BadApple tests that fail consistently, so
>> another option could be to disable BadApple'd tests. My hope has been
>> to get to the point of being able to reliably get clean runs, at least
>> when BadApple'd tests are disabled.
>>
>> From that point I want to draw a line in the sand and immediately
>> address tests that fail that are _not_ BadApple'd. At least then we'll
>> stop getting _worse_. And then we can work on the BadApple'd tests.
>> But as David says, that's not going to be any time soon. It's been a
>> couple of months that I've been trying to just get the tests
>> BadApple'd without even trying to fix any of them.
>>
>> It's particularly pernicious because with all the noise we don't see
>> failures we _should_ see.
>>
>> So I don't have any good short-term answer either. We've built up a
>> very large technical debt in the testing. The first step is to stop
>> adding more debt, which is what I've been working on so far. And
>> that's the easy part....
>>
>> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>>
>> Erick
>>
>>
>> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]>
>> wrote:
>> > (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>> > are
>> > trying to improve the stability of the Solr tests but even
>> > optimistically
>> > the practical reality is that it won't be good enough anytime soon.
>> > When we
>> > get there, we can reverse this.
>> >
>> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>> > <[hidden email]>
>> > wrote:
>> >>
>> >> folks,
>> >>
>> >> I got more active working on IndexWriter and Soft-Deletes etc. in the
>> >> last couple of weeks. It's a blast again and I really enjoy it. The
>> >> one thing that is IMO not acceptable is the status of solr tests. I
>> >> tried so many times to get them passing on several different OSs but
>> >> it seems this is pretty hopepless. It's get's even worse the
>> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>> >> `-1` because of arbitrary solr tests, here is an example:
>> >>
>> >> || Reason || Tests ||
>> >> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>> >> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>> >> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>> >> |   | solr.client.solrj.impl.CloudSolrClientTest |
>> >> |   | solr.common.util.TestJsonRecordReader |
>> >>
>> >> Speaking to other committers I hear we should just disable this job.
>> >> Sorry, WTF?
>> >>
>> >> These tests seem to fail all the time, randomly and over and over
>> >> again. This renders the test as entirely useless to me. I even invest
>> >> time (wrong, I invested) looking into it if they are caused by me or
>> >> if I can do something about it. Yet, someone could call me out for
>> >> being responsible for them as a commiter, yes I am hence this email. I
>> >> don't think I am obliged to fix them. These projects have 50+
>> >> committers and having a shared codebase doesn't mean everybody has to
>> >> take care of everything. I think we are at the point where if I work
>> >> on Lucene I won't run solr tests at all otherwise there won't be any
>> >> progress. On the other hand solr tests never pass I wonder if the solr
>> >> code-base gets changes nevertheless? That is again a terrible
>> >> situation.
>> >>
>> >> I spoke to varun and  anshum during buzzwords if they can give me some
>> >> hints what I am doing wrong but it seems like the way it is. I feel
>> >> terrible pushing stuff to our repo still seeing our tests fail. I get
>> >> ~15 build failures from solr tests a day I am not the only one that
>> >> has mail filters to archive them if there isn't a lucene tests in the
>> >> failures.
>> >>
>> >> This is a terrible state folks, how do we fix it? It's the lucene land
>> >> that get much love on the testing end but that also requires more work
>> >> on it, I expect solr to do the same. That at the same time requires
>> >> stop pushing new stuff until the situation is under control. The
>> >> effort of marking stuff as bad apples isn't the answer, this requires
>> >> effort from the drivers behind this project.
>> >>
>> >> simon
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> > --
>> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
View David Smiley’s profile on LinkedIn, the world's largest professional community. David has 3 jobs listed on their profile. See the complete profile on LinkedIn and discover David’s connections and jobs at similar companies.


>> > http://www.solrenterprisesearchserver.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
> --
> - Mark
> about.me/markrmiller

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

Erick Erickson
In reply to this post by Martin Gainty
Martin:

I have no idea how logging severity levels apply to unit tests that fail. It's not a question of triaging logs, it's a matter of Jenkins junit test runs reporting failures.



On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty <[hidden email]> wrote:

Erick-

appears that style mis-application may be categorised as INFO
are mixed in with SEVERE errors

Would it make sense to filter the errors based on severity ?


https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html

The Level class defines a set of standard logging levels that can be used to control logging output. The logging Level objects are ordered and are specified by ordered integers.
if you know Severity you can triage the SEVERE errors before working down to INFO errors

WDYT?

Martin 
______________________________________________




From: Erick Erickson <[hidden email]>
Sent: Friday, June 15, 2018 1:05 PM
To: [hidden email]; Mark Miller
Subject: Re: Status of solr tests
 
Mark (and everyone).

I'm trying to be somewhat conservative about what I BadApple, at this
point it's only things that have failed every week for the last 4.
Part of that conservatism is to avoid BadApple'ing tests that are
failing and _should_ fail.

I'm explicitly _not_ delving into any of the causes at all at this
point, it's overwhelming until we reduce the noise as everyone knows.

So please feel totally free to BadApple anything you know is flakey,
it won't intrude on my turf ;)

And since I realized I can also report tests that have _not_ failed in
a month that _are_ BadApple'd, we can be a little freer with
BadApple'ing tests since there's a mechanism for un-annotating them
without a lot of tedious effort.

FWIW.

On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[hidden email]> wrote:
> There is an okay chance I'm going to start making some improvements here as
> well. I've been working on a very stable set of tests on my starburst branch
> and will slowly bring in test fixes over time (I've already been making some
> on that branch for important tests). We should currently be defaulting to
> tests.badapples=false on all solr test runs - it's a joke to try and get a
> clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
> commonly have so far avoided Erick's @BadApple hack and slash. They are bad
> appled on my dev branch now, but that is currently where any time I have is
> spent rather than on the main dev branches.
>
> Also, too many flakey tests are introduced because devs are not beasting or
> beasting well before committing new heavy tests. Perhaps we could add some
> docs around that.
>
> We have built in beasting support, we need to emphasize that a couple passes
> on a new test is not sufficient to test it's quality.
>
> - Mark
>
> On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[hidden email]>
> wrote:
>>
>> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>>
>> I've been trying to at least BadApple tests that fail consistently, so
>> another option could be to disable BadApple'd tests. My hope has been
>> to get to the point of being able to reliably get clean runs, at least
>> when BadApple'd tests are disabled.
>>
>> From that point I want to draw a line in the sand and immediately
>> address tests that fail that are _not_ BadApple'd. At least then we'll
>> stop getting _worse_. And then we can work on the BadApple'd tests.
>> But as David says, that's not going to be any time soon. It's been a
>> couple of months that I've been trying to just get the tests
>> BadApple'd without even trying to fix any of them.
>>
>> It's particularly pernicious because with all the noise we don't see
>> failures we _should_ see.
>>
>> So I don't have any good short-term answer either. We've built up a
>> very large technical debt in the testing. The first step is to stop
>> adding more debt, which is what I've been working on so far. And
>> that's the easy part....
>>
>> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>>
>> Erick
>>
>>
>> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]>
>> wrote:
>> > (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>> > are
>> > trying to improve the stability of the Solr tests but even
>> > optimistically
>> > the practical reality is that it won't be good enough anytime soon.
>> > When we
>> > get there, we can reverse this.
>> >
>> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>> > <[hidden email]>
>> > wrote:
>> >>
>> >> folks,
>> >>
>> >> I got more active working on IndexWriter and Soft-Deletes etc. in the
>> >> last couple of weeks. It's a blast again and I really enjoy it. The
>> >> one thing that is IMO not acceptable is the status of solr tests. I
>> >> tried so many times to get them passing on several different OSs but
>> >> it seems this is pretty hopepless. It's get's even worse the
>> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>> >> `-1` because of arbitrary solr tests, here is an example:
>> >>
>> >> || Reason || Tests ||
>> >> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>> >> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>> >> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>> >> |   | solr.client.solrj.impl.CloudSolrClientTest |
>> >> |   | solr.common.util.TestJsonRecordReader |
>> >>
>> >> Speaking to other committers I hear we should just disable this job.
>> >> Sorry, WTF?
>> >>
>> >> These tests seem to fail all the time, randomly and over and over
>> >> again. This renders the test as entirely useless to me. I even invest
>> >> time (wrong, I invested) looking into it if they are caused by me or
>> >> if I can do something about it. Yet, someone could call me out for
>> >> being responsible for them as a commiter, yes I am hence this email. I
>> >> don't think I am obliged to fix them. These projects have 50+
>> >> committers and having a shared codebase doesn't mean everybody has to
>> >> take care of everything. I think we are at the point where if I work
>> >> on Lucene I won't run solr tests at all otherwise there won't be any
>> >> progress. On the other hand solr tests never pass I wonder if the solr
>> >> code-base gets changes nevertheless? That is again a terrible
>> >> situation.
>> >>
>> >> I spoke to varun and  anshum during buzzwords if they can give me some
>> >> hints what I am doing wrong but it seems like the way it is. I feel
>> >> terrible pushing stuff to our repo still seeing our tests fail. I get
>> >> ~15 build failures from solr tests a day I am not the only one that
>> >> has mail filters to archive them if there isn't a lucene tests in the
>> >> failures.
>> >>
>> >> This is a terrible state folks, how do we fix it? It's the lucene land
>> >> that get much love on the testing end but that also requires more work
>> >> on it, I expect solr to do the same. That at the same time requires
>> >> stop pushing new stuff until the situation is under control. The
>> >> effort of marking stuff as bad apples isn't the answer, this requires
>> >> effort from the drivers behind this project.
>> >>
>> >> simon
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> > --
>> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
View David Smiley’s profile on LinkedIn, the world's largest professional community. David has 3 jobs listed on their profile. See the complete profile on LinkedIn and discover David’s connections and jobs at similar companies.


>> > http://www.solrenterprisesearchserver.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
> --
> - Mark
> about.me/markrmiller

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

Simon Willnauer-4
Thanks folks, I appreciate you are sharing some thoughts about this. My biggest issue is that this is a permanent condition. I could have sent this mail 2, 4 or 6 years ago and it would have been as relevant as today. 

I am convinced mark can make some progress but this isn't fixable by a single person this is a structural problem or rather a cultural. I am not sure if everybody is aware of how terrible it is. I took a screenshot of my inbox the other day what I have to dig through on a constant basis everytime I commit a change to lucene to make sure I am not missing something. 



I don't even know how we can attract any new contributors or how many contributors have been scared away by this in the past. This is not good and bad-appeling these test isn't the answer unless we put a lot of effort into it, sorry I don't see it happening. I would have expected more than like 4 people from this PMC to reply to something like this. From my perspective there is a lot of harm done by this to the project and we have to figure out what we wanna do. This also affects our ability to release, guys our smoke-test builds never pass [1]. I don't know what to do if I were a RM for 7.4 (thanks adrien for doing it) Like I can not tell what is serious and what not on a solr build. It's also not just be smoke tester it's basically everything that runs after solr that is skipped on a regular basis. 

I don't have a good answer but we have to get this under control it's burdensome for lucene to carry this load and it's carrying it a quite some time. It wasn't very obvious how big this weights since I wasn't working on lucene internals for quite a while and speaking to many folks around here this is on their shoulders but it's not brought up for discussion, i think we have to.

simon



On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson <[hidden email]> wrote:
Martin:

I have no idea how logging severity levels apply to unit tests that fail. It's not a question of triaging logs, it's a matter of Jenkins junit test runs reporting failures.



On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty <[hidden email]> wrote:

Erick-

appears that style mis-application may be categorised as INFO
are mixed in with SEVERE errors

Would it make sense to filter the errors based on severity ?


https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html

The Level class defines a set of standard logging levels that can be used to control logging output. The logging Level objects are ordered and are specified by ordered integers.
if you know Severity you can triage the SEVERE errors before working down to INFO errors

WDYT?

Martin 
______________________________________________




From: Erick Erickson <[hidden email]>
Sent: Friday, June 15, 2018 1:05 PM
To: [hidden email]; Mark Miller
Subject: Re: Status of solr tests
 
Mark (and everyone).

I'm trying to be somewhat conservative about what I BadApple, at this
point it's only things that have failed every week for the last 4.
Part of that conservatism is to avoid BadApple'ing tests that are
failing and _should_ fail.

I'm explicitly _not_ delving into any of the causes at all at this
point, it's overwhelming until we reduce the noise as everyone knows.

So please feel totally free to BadApple anything you know is flakey,
it won't intrude on my turf ;)

And since I realized I can also report tests that have _not_ failed in
a month that _are_ BadApple'd, we can be a little freer with
BadApple'ing tests since there's a mechanism for un-annotating them
without a lot of tedious effort.

FWIW.

On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[hidden email]> wrote:
> There is an okay chance I'm going to start making some improvements here as
> well. I've been working on a very stable set of tests on my starburst branch
> and will slowly bring in test fixes over time (I've already been making some
> on that branch for important tests). We should currently be defaulting to
> tests.badapples=false on all solr test runs - it's a joke to try and get a
> clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
> commonly have so far avoided Erick's @BadApple hack and slash. They are bad
> appled on my dev branch now, but that is currently where any time I have is
> spent rather than on the main dev branches.
>
> Also, too many flakey tests are introduced because devs are not beasting or
> beasting well before committing new heavy tests. Perhaps we could add some
> docs around that.
>
> We have built in beasting support, we need to emphasize that a couple passes
> on a new test is not sufficient to test it's quality.
>
> - Mark
>
> On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[hidden email]>
> wrote:
>>
>> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>>
>> I've been trying to at least BadApple tests that fail consistently, so
>> another option could be to disable BadApple'd tests. My hope has been
>> to get to the point of being able to reliably get clean runs, at least
>> when BadApple'd tests are disabled.
>>
>> From that point I want to draw a line in the sand and immediately
>> address tests that fail that are _not_ BadApple'd. At least then we'll
>> stop getting _worse_. And then we can work on the BadApple'd tests.
>> But as David says, that's not going to be any time soon. It's been a
>> couple of months that I've been trying to just get the tests
>> BadApple'd without even trying to fix any of them.
>>
>> It's particularly pernicious because with all the noise we don't see
>> failures we _should_ see.
>>
>> So I don't have any good short-term answer either. We've built up a
>> very large technical debt in the testing. The first step is to stop
>> adding more debt, which is what I've been working on so far. And
>> that's the easy part....
>>
>> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>>
>> Erick
>>
>>
>> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]>
>> wrote:
>> > (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>> > are
>> > trying to improve the stability of the Solr tests but even
>> > optimistically
>> > the practical reality is that it won't be good enough anytime soon.
>> > When we
>> > get there, we can reverse this.
>> >
>> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>> > <[hidden email]>
>> > wrote:
>> >>
>> >> folks,
>> >>
>> >> I got more active working on IndexWriter and Soft-Deletes etc. in the
>> >> last couple of weeks. It's a blast again and I really enjoy it. The
>> >> one thing that is IMO not acceptable is the status of solr tests. I
>> >> tried so many times to get them passing on several different OSs but
>> >> it seems this is pretty hopepless. It's get's even worse the
>> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>> >> `-1` because of arbitrary solr tests, here is an example:
>> >>
>> >> || Reason || Tests ||
>> >> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>> >> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>> >> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>> >> |   | solr.client.solrj.impl.CloudSolrClientTest |
>> >> |   | solr.common.util.TestJsonRecordReader |
>> >>
>> >> Speaking to other committers I hear we should just disable this job.
>> >> Sorry, WTF?
>> >>
>> >> These tests seem to fail all the time, randomly and over and over
>> >> again. This renders the test as entirely useless to me. I even invest
>> >> time (wrong, I invested) looking into it if they are caused by me or
>> >> if I can do something about it. Yet, someone could call me out for
>> >> being responsible for them as a commiter, yes I am hence this email. I
>> >> don't think I am obliged to fix them. These projects have 50+
>> >> committers and having a shared codebase doesn't mean everybody has to
>> >> take care of everything. I think we are at the point where if I work
>> >> on Lucene I won't run solr tests at all otherwise there won't be any
>> >> progress. On the other hand solr tests never pass I wonder if the solr
>> >> code-base gets changes nevertheless? That is again a terrible
>> >> situation.
>> >>
>> >> I spoke to varun and  anshum during buzzwords if they can give me some
>> >> hints what I am doing wrong but it seems like the way it is. I feel
>> >> terrible pushing stuff to our repo still seeing our tests fail. I get
>> >> ~15 build failures from solr tests a day I am not the only one that
>> >> has mail filters to archive them if there isn't a lucene tests in the
>> >> failures.
>> >>
>> >> This is a terrible state folks, how do we fix it? It's the lucene land
>> >> that get much love on the testing end but that also requires more work
>> >> on it, I expect solr to do the same. That at the same time requires
>> >> stop pushing new stuff until the situation is under control. The
>> >> effort of marking stuff as bad apples isn't the answer, this requires
>> >> effort from the drivers behind this project.
>> >>
>> >> simon
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> > --
>> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
View David Smiley’s profile on LinkedIn, the world's largest professional community. David has 3 jobs listed on their profile. See the complete profile on LinkedIn and discover David’s connections and jobs at similar companies.


>> > http://www.solrenterprisesearchserver.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
> --
> - Mark
> about.me/markrmiller

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

sarowe
Hi Simon,

Have you seen the late-February thread “Test failures are out of control….”? : https://lists.apache.org/thread.html/b783a9d7c22f518b07355e8e4f2c6f56020a7c32f36a58a86d51a3b7@%3Cdev.lucene.apache.org%3E

If not, I suggest you go take a look.  Some of your questions are answered there.

--
Steve
www.lucidworks.com

> On Jun 19, 2018, at 9:41 AM, Simon Willnauer <[hidden email]> wrote:
>
> Thanks folks, I appreciate you are sharing some thoughts about this. My biggest issue is that this is a permanent condition. I could have sent this mail 2, 4 or 6 years ago and it would have been as relevant as today.
>
> I am convinced mark can make some progress but this isn't fixable by a single person this is a structural problem or rather a cultural. I am not sure if everybody is aware of how terrible it is. I took a screenshot of my inbox the other day what I have to dig through on a constant basis everytime I commit a change to lucene to make sure I am not missing something.
>
> <image.png>
>
> I don't even know how we can attract any new contributors or how many contributors have been scared away by this in the past. This is not good and bad-appeling these test isn't the answer unless we put a lot of effort into it, sorry I don't see it happening. I would have expected more than like 4 people from this PMC to reply to something like this. From my perspective there is a lot of harm done by this to the project and we have to figure out what we wanna do. This also affects our ability to release, guys our smoke-test builds never pass [1]. I don't know what to do if I were a RM for 7.4 (thanks adrien for doing it) Like I can not tell what is serious and what not on a solr build. It's also not just be smoke tester it's basically everything that runs after solr that is skipped on a regular basis.
>
> I don't have a good answer but we have to get this under control it's burdensome for lucene to carry this load and it's carrying it a quite some time. It wasn't very obvious how big this weights since I wasn't working on lucene internals for quite a while and speaking to many folks around here this is on their shoulders but it's not brought up for discussion, i think we have to.
>
> simon
>
> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/
>
>
> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson <[hidden email]> wrote:
> Martin:
>
> I have no idea how logging severity levels apply to unit tests that fail. It's not a question of triaging logs, it's a matter of Jenkins junit test runs reporting failures.
>
>
>
> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty <[hidden email]> wrote:
> Erick-
>
> appears that style mis-application may be categorised as INFO
> are mixed in with SEVERE errors
>
> Would it make sense to filter the errors based on severity ?
>
> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html
> Level (Java Platform SE 7 ) - Oracle Help Center
> docs.oracle.com
> The Level class defines a set of standard logging levels that can be used to control logging output. The logging Level objects are ordered and are specified by ordered integers.
> if you know Severity you can triage the SEVERE errors before working down to INFO errors
>
>
> WDYT?
> Martin
> ______________________________________________
>
>
>
> From: Erick Erickson <[hidden email]>
> Sent: Friday, June 15, 2018 1:05 PM
> To: [hidden email]; Mark Miller
> Subject: Re: Status of solr tests
>  
> Mark (and everyone).
>
> I'm trying to be somewhat conservative about what I BadApple, at this
> point it's only things that have failed every week for the last 4.
> Part of that conservatism is to avoid BadApple'ing tests that are
> failing and _should_ fail.
>
> I'm explicitly _not_ delving into any of the causes at all at this
> point, it's overwhelming until we reduce the noise as everyone knows.
>
> So please feel totally free to BadApple anything you know is flakey,
> it won't intrude on my turf ;)
>
> And since I realized I can also report tests that have _not_ failed in
> a month that _are_ BadApple'd, we can be a little freer with
> BadApple'ing tests since there's a mechanism for un-annotating them
> without a lot of tedious effort.
>
> FWIW.
>
> On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[hidden email]> wrote:
> > There is an okay chance I'm going to start making some improvements here as
> > well. I've been working on a very stable set of tests on my starburst branch
> > and will slowly bring in test fixes over time (I've already been making some
> > on that branch for important tests). We should currently be defaulting to
> > tests.badapples=false on all solr test runs - it's a joke to try and get a
> > clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
> > commonly have so far avoided Erick's @BadApple hack and slash. They are bad
> > appled on my dev branch now, but that is currently where any time I have is
> > spent rather than on the main dev branches.
> >
> > Also, too many flakey tests are introduced because devs are not beasting or
> > beasting well before committing new heavy tests. Perhaps we could add some
> > docs around that.
> >
> > We have built in beasting support, we need to emphasize that a couple passes
> > on a new test is not sufficient to test it's quality.
> >
> > - Mark
> >
> > On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[hidden email]>
> > wrote:
> >>
> >> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
> >>
> >> I've been trying to at least BadApple tests that fail consistently, so
> >> another option could be to disable BadApple'd tests. My hope has been
> >> to get to the point of being able to reliably get clean runs, at least
> >> when BadApple'd tests are disabled.
> >>
> >> From that point I want to draw a line in the sand and immediately
> >> address tests that fail that are _not_ BadApple'd. At least then we'll
> >> stop getting _worse_. And then we can work on the BadApple'd tests.
> >> But as David says, that's not going to be any time soon. It's been a
> >> couple of months that I've been trying to just get the tests
> >> BadApple'd without even trying to fix any of them.
> >>
> >> It's particularly pernicious because with all the noise we don't see
> >> failures we _should_ see.
> >>
> >> So I don't have any good short-term answer either. We've built up a
> >> very large technical debt in the testing. The first step is to stop
> >> adding more debt, which is what I've been working on so far. And
> >> that's the easy part....
> >>
> >> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
> >>
> >> Erick
> >>
> >>
> >> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]>
> >> wrote:
> >> > (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
> >> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
> >> > are
> >> > trying to improve the stability of the Solr tests but even
> >> > optimistically
> >> > the practical reality is that it won't be good enough anytime soon.
> >> > When we
> >> > get there, we can reverse this.
> >> >
> >> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
> >> > <[hidden email]>
> >> > wrote:
> >> >>
> >> >> folks,
> >> >>
> >> >> I got more active working on IndexWriter and Soft-Deletes etc. in the
> >> >> last couple of weeks. It's a blast again and I really enjoy it. The
> >> >> one thing that is IMO not acceptable is the status of solr tests. I
> >> >> tried so many times to get them passing on several different OSs but
> >> >> it seems this is pretty hopepless. It's get's even worse the
> >> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as
> >> >> `-1` because of arbitrary solr tests, here is an example:
> >> >>
> >> >> || Reason || Tests ||
> >> >> | Failed junit tests | solr.rest.TestManagedResourceStorage |
> >> >> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
> >> >> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
> >> >> |   | solr.client.solrj.impl.CloudSolrClientTest |
> >> >> |   | solr.common.util.TestJsonRecordReader |
> >> >>
> >> >> Speaking to other committers I hear we should just disable this job.
> >> >> Sorry, WTF?
> >> >>
> >> >> These tests seem to fail all the time, randomly and over and over
> >> >> again. This renders the test as entirely useless to me. I even invest
> >> >> time (wrong, I invested) looking into it if they are caused by me or
> >> >> if I can do something about it. Yet, someone could call me out for
> >> >> being responsible for them as a commiter, yes I am hence this email. I
> >> >> don't think I am obliged to fix them. These projects have 50+
> >> >> committers and having a shared codebase doesn't mean everybody has to
> >> >> take care of everything. I think we are at the point where if I work
> >> >> on Lucene I won't run solr tests at all otherwise there won't be any
> >> >> progress. On the other hand solr tests never pass I wonder if the solr
> >> >> code-base gets changes nevertheless? That is again a terrible
> >> >> situation.
> >> >>
> >> >> I spoke to varun and  anshum during buzzwords if they can give me some
> >> >> hints what I am doing wrong but it seems like the way it is. I feel
> >> >> terrible pushing stuff to our repo still seeing our tests fail. I get
> >> >> ~15 build failures from solr tests a day I am not the only one that
> >> >> has mail filters to archive them if there isn't a lucene tests in the
> >> >> failures.
> >> >>
> >> >> This is a terrible state folks, how do we fix it? It's the lucene land
> >> >> that get much love on the testing end but that also requires more work
> >> >> on it, I expect solr to do the same. That at the same time requires
> >> >> stop pushing new stuff until the situation is under control. The
> >> >> effort of marking stuff as bad apples isn't the answer, this requires
> >> >> effort from the drivers behind this project.
> >> >>
> >> >> simon
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: [hidden email]
> >> >> For additional commands, e-mail: [hidden email]
> >> >>
> >> > --
> >> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> >> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>
> David Smiley - Lucene/Solr Search Developer / Consultant - D W Smiley LLC | LinkedIn
> linkedin.com
> View David Smiley’s profile on LinkedIn, the world's largest professional community. David has 3 jobs listed on their profile. See the complete profile on LinkedIn and discover David’s connections and jobs at similar companies.
>
>
> >> > http://www.solrenterprisesearchserver.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> > --
> > - Mark
> > about.me/markrmiller
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

Simon Willnauer-4
Hi steve, I saw and followed that thread but the only outcome that I
can see it stuff being bad appled? I might miss something and I can go
and argue on specifics on that thread like:

> Testing distributed systems requires, well, distributed systems which is what starting clusters is all about.

which I have worked on for several years and I am convinced it's a
false statement. I didn't wanna go down that route which I think boils
down to the cultural disconnect. If I missed anything that is answered
I am sorry I will go and re-read it.

simon

On Tue, Jun 19, 2018 at 4:29 PM, Steve Rowe <[hidden email]> wrote:

> Hi Simon,
>
> Have you seen the late-February thread “Test failures are out of control….”? : https://lists.apache.org/thread.html/b783a9d7c22f518b07355e8e4f2c6f56020a7c32f36a58a86d51a3b7@%3Cdev.lucene.apache.org%3E
>
> If not, I suggest you go take a look.  Some of your questions are answered there.
>
> --
> Steve
> www.lucidworks.com
>
>> On Jun 19, 2018, at 9:41 AM, Simon Willnauer <[hidden email]> wrote:
>>
>> Thanks folks, I appreciate you are sharing some thoughts about this. My biggest issue is that this is a permanent condition. I could have sent this mail 2, 4 or 6 years ago and it would have been as relevant as today.
>>
>> I am convinced mark can make some progress but this isn't fixable by a single person this is a structural problem or rather a cultural. I am not sure if everybody is aware of how terrible it is. I took a screenshot of my inbox the other day what I have to dig through on a constant basis everytime I commit a change to lucene to make sure I am not missing something.
>>
>> <image.png>
>>
>> I don't even know how we can attract any new contributors or how many contributors have been scared away by this in the past. This is not good and bad-appeling these test isn't the answer unless we put a lot of effort into it, sorry I don't see it happening. I would have expected more than like 4 people from this PMC to reply to something like this. From my perspective there is a lot of harm done by this to the project and we have to figure out what we wanna do. This also affects our ability to release, guys our smoke-test builds never pass [1]. I don't know what to do if I were a RM for 7.4 (thanks adrien for doing it) Like I can not tell what is serious and what not on a solr build. It's also not just be smoke tester it's basically everything that runs after solr that is skipped on a regular basis.
>>
>> I don't have a good answer but we have to get this under control it's burdensome for lucene to carry this load and it's carrying it a quite some time. It wasn't very obvious how big this weights since I wasn't working on lucene internals for quite a while and speaking to many folks around here this is on their shoulders but it's not brought up for discussion, i think we have to.
>>
>> simon
>>
>> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/
>>
>>
>> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson <[hidden email]> wrote:
>> Martin:
>>
>> I have no idea how logging severity levels apply to unit tests that fail. It's not a question of triaging logs, it's a matter of Jenkins junit test runs reporting failures.
>>
>>
>>
>> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty <[hidden email]> wrote:
>> Erick-
>>
>> appears that style mis-application may be categorised as INFO
>> are mixed in with SEVERE errors
>>
>> Would it make sense to filter the errors based on severity ?
>>
>> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html
>> Level (Java Platform SE 7 ) - Oracle Help Center
>> docs.oracle.com
>> The Level class defines a set of standard logging levels that can be used to control logging output. The logging Level objects are ordered and are specified by ordered integers.
>> if you know Severity you can triage the SEVERE errors before working down to INFO errors
>>
>>
>> WDYT?
>> Martin
>> ______________________________________________
>>
>>
>>
>> From: Erick Erickson <[hidden email]>
>> Sent: Friday, June 15, 2018 1:05 PM
>> To: [hidden email]; Mark Miller
>> Subject: Re: Status of solr tests
>>
>> Mark (and everyone).
>>
>> I'm trying to be somewhat conservative about what I BadApple, at this
>> point it's only things that have failed every week for the last 4.
>> Part of that conservatism is to avoid BadApple'ing tests that are
>> failing and _should_ fail.
>>
>> I'm explicitly _not_ delving into any of the causes at all at this
>> point, it's overwhelming until we reduce the noise as everyone knows.
>>
>> So please feel totally free to BadApple anything you know is flakey,
>> it won't intrude on my turf ;)
>>
>> And since I realized I can also report tests that have _not_ failed in
>> a month that _are_ BadApple'd, we can be a little freer with
>> BadApple'ing tests since there's a mechanism for un-annotating them
>> without a lot of tedious effort.
>>
>> FWIW.
>>
>> On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[hidden email]> wrote:
>> > There is an okay chance I'm going to start making some improvements here as
>> > well. I've been working on a very stable set of tests on my starburst branch
>> > and will slowly bring in test fixes over time (I've already been making some
>> > on that branch for important tests). We should currently be defaulting to
>> > tests.badapples=false on all solr test runs - it's a joke to try and get a
>> > clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
>> > commonly have so far avoided Erick's @BadApple hack and slash. They are bad
>> > appled on my dev branch now, but that is currently where any time I have is
>> > spent rather than on the main dev branches.
>> >
>> > Also, too many flakey tests are introduced because devs are not beasting or
>> > beasting well before committing new heavy tests. Perhaps we could add some
>> > docs around that.
>> >
>> > We have built in beasting support, we need to emphasize that a couple passes
>> > on a new test is not sufficient to test it's quality.
>> >
>> > - Mark
>> >
>> > On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[hidden email]>
>> > wrote:
>> >>
>> >> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>> >>
>> >> I've been trying to at least BadApple tests that fail consistently, so
>> >> another option could be to disable BadApple'd tests. My hope has been
>> >> to get to the point of being able to reliably get clean runs, at least
>> >> when BadApple'd tests are disabled.
>> >>
>> >> From that point I want to draw a line in the sand and immediately
>> >> address tests that fail that are _not_ BadApple'd. At least then we'll
>> >> stop getting _worse_. And then we can work on the BadApple'd tests.
>> >> But as David says, that's not going to be any time soon. It's been a
>> >> couple of months that I've been trying to just get the tests
>> >> BadApple'd without even trying to fix any of them.
>> >>
>> >> It's particularly pernicious because with all the noise we don't see
>> >> failures we _should_ see.
>> >>
>> >> So I don't have any good short-term answer either. We've built up a
>> >> very large technical debt in the testing. The first step is to stop
>> >> adding more debt, which is what I've been working on so far. And
>> >> that's the easy part....
>> >>
>> >> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>> >>
>> >> Erick
>> >>
>> >>
>> >> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]>
>> >> wrote:
>> >> > (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>> >> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>> >> > are
>> >> > trying to improve the stability of the Solr tests but even
>> >> > optimistically
>> >> > the practical reality is that it won't be good enough anytime soon.
>> >> > When we
>> >> > get there, we can reverse this.
>> >> >
>> >> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>> >> > <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> folks,
>> >> >>
>> >> >> I got more active working on IndexWriter and Soft-Deletes etc. in the
>> >> >> last couple of weeks. It's a blast again and I really enjoy it. The
>> >> >> one thing that is IMO not acceptable is the status of solr tests. I
>> >> >> tried so many times to get them passing on several different OSs but
>> >> >> it seems this is pretty hopepless. It's get's even worse the
>> >> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>> >> >> `-1` because of arbitrary solr tests, here is an example:
>> >> >>
>> >> >> || Reason || Tests ||
>> >> >> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>> >> >> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>> >> >> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>> >> >> |   | solr.client.solrj.impl.CloudSolrClientTest |
>> >> >> |   | solr.common.util.TestJsonRecordReader |
>> >> >>
>> >> >> Speaking to other committers I hear we should just disable this job.
>> >> >> Sorry, WTF?
>> >> >>
>> >> >> These tests seem to fail all the time, randomly and over and over
>> >> >> again. This renders the test as entirely useless to me. I even invest
>> >> >> time (wrong, I invested) looking into it if they are caused by me or
>> >> >> if I can do something about it. Yet, someone could call me out for
>> >> >> being responsible for them as a commiter, yes I am hence this email. I
>> >> >> don't think I am obliged to fix them. These projects have 50+
>> >> >> committers and having a shared codebase doesn't mean everybody has to
>> >> >> take care of everything. I think we are at the point where if I work
>> >> >> on Lucene I won't run solr tests at all otherwise there won't be any
>> >> >> progress. On the other hand solr tests never pass I wonder if the solr
>> >> >> code-base gets changes nevertheless? That is again a terrible
>> >> >> situation.
>> >> >>
>> >> >> I spoke to varun and  anshum during buzzwords if they can give me some
>> >> >> hints what I am doing wrong but it seems like the way it is. I feel
>> >> >> terrible pushing stuff to our repo still seeing our tests fail. I get
>> >> >> ~15 build failures from solr tests a day I am not the only one that
>> >> >> has mail filters to archive them if there isn't a lucene tests in the
>> >> >> failures.
>> >> >>
>> >> >> This is a terrible state folks, how do we fix it? It's the lucene land
>> >> >> that get much love on the testing end but that also requires more work
>> >> >> on it, I expect solr to do the same. That at the same time requires
>> >> >> stop pushing new stuff until the situation is under control. The
>> >> >> effort of marking stuff as bad apples isn't the answer, this requires
>> >> >> effort from the drivers behind this project.
>> >> >>
>> >> >> simon
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: [hidden email]
>> >> >> For additional commands, e-mail: [hidden email]
>> >> >>
>> >> > --
>> >> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> >> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>
>> David Smiley - Lucene/Solr Search Developer / Consultant - D W Smiley LLC | LinkedIn
>> linkedin.com
>> View David Smiley’s profile on LinkedIn, the world's largest professional community. David has 3 jobs listed on their profile. See the complete profile on LinkedIn and discover David’s connections and jobs at similar companies.
>>
>>
>> >> > http://www.solrenterprisesearchserver.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> > --
>> > - Mark
>> > about.me/markrmiller
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

sarowe
I was thinking of Cassandra’s reply on that thread: https://lists.apache.org/thread.html/f8d84a669fc009429fcc51873fdd36b1e2b7f6c44b2e7abd9d8cf4fa@%3Cdev.lucene.apache.org%3E , which I’ll quote here:

> From: Cassandra Targett <[hidden email]>
> To: [hidden email]
> Subject: Re: Test failures are out of control......
> Date: 2018/02/21 23:13:15
> List: [hidden email]
>
> This issue is hugely important.
>
> At Lucidworks we have implemented a "Test Confidence" role that focuses on improving the ability of all members of the community to trust that reported failures from any of the Jenkins systems are actual failures and not flakey tests. This role rotates among the committers on our Solr Team, and a committer is assigned to the role for 2-week periods of time. Our goal is to have at least one committer on our team focused full-time on improving test confidence at all times. (Just a note on timing, we started this last summer, but we only recently reconfirmed our commitment to having someone assigned to it at all times.)
>
> One of the guidelines we've agreed to is that the person in the role should not look (only) at tests he has worked on. Instead, he should focus on tests that fail less than 100% of the time and/or are hard to reproduce *even if he didn't write the test or the code*.
>
> Another aspect of the Test Confidence role is to try to develop tools that can help the community overall in improving this situation. Two things have grown out of this effort so far:
>
> * Steve Rowe's work on a Jenkins job to reproduce test failures (LUCENE-8106)
> * Hoss has worked on aggregating all test failures from the 3 Jenkins systems (ASF, Policeman, and Steve's), downloading the test results & logs, and running some reports/stats on failures. He should be ready to share this more publicly soon.
>
> I think it's important to understand that flakey tests will *never* go away. There will always be a new flakey test to review/fix. Our goal should be to make it so most of the time, you can assume the test is broken and only discover it's flakey as part of digging.
>
> The idea of @BadApple marking (or some other notation) is an OK idea, but the problem is so bad today I worry it does nothing to find a way to ensure they get fixed. Lots of JIRAs get filed for problems with tests - I count about 180 open issues today - and many just sit there forever.
>
> The biggest thing I want to to avoid is making it even easier to avoid/ignore them. We should try to make it easier to highlight them, and we need a concerted effort to fix the tests once they've been identified as flakey.
>


--
Steve
www.lucidworks.com

> On Jun 19, 2018, at 11:15 AM, Simon Willnauer <[hidden email]> wrote:
>
> Hi steve, I saw and followed that thread but the only outcome that I
> can see it stuff being bad appled? I might miss something and I can go
> and argue on specifics on that thread like:
>
>> Testing distributed systems requires, well, distributed systems which is what starting clusters is all about.
>
> which I have worked on for several years and I am convinced it's a
> false statement. I didn't wanna go down that route which I think boils
> down to the cultural disconnect. If I missed anything that is answered
> I am sorry I will go and re-read it.
>
> simon
>
> On Tue, Jun 19, 2018 at 4:29 PM, Steve Rowe <[hidden email]> wrote:
>> Hi Simon,
>>
>> Have you seen the late-February thread “Test failures are out of control….”? : https://lists.apache.org/thread.html/b783a9d7c22f518b07355e8e4f2c6f56020a7c32f36a58a86d51a3b7@%3Cdev.lucene.apache.org%3E
>>
>> If not, I suggest you go take a look.  Some of your questions are answered there.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>>> On Jun 19, 2018, at 9:41 AM, Simon Willnauer <[hidden email]> wrote:
>>>
>>> Thanks folks, I appreciate you are sharing some thoughts about this. My biggest issue is that this is a permanent condition. I could have sent this mail 2, 4 or 6 years ago and it would have been as relevant as today.
>>>
>>> I am convinced mark can make some progress but this isn't fixable by a single person this is a structural problem or rather a cultural. I am not sure if everybody is aware of how terrible it is. I took a screenshot of my inbox the other day what I have to dig through on a constant basis everytime I commit a change to lucene to make sure I am not missing something.
>>>
>>> <image.png>
>>>
>>> I don't even know how we can attract any new contributors or how many contributors have been scared away by this in the past. This is not good and bad-appeling these test isn't the answer unless we put a lot of effort into it, sorry I don't see it happening. I would have expected more than like 4 people from this PMC to reply to something like this. From my perspective there is a lot of harm done by this to the project and we have to figure out what we wanna do. This also affects our ability to release, guys our smoke-test builds never pass [1]. I don't know what to do if I were a RM for 7.4 (thanks adrien for doing it) Like I can not tell what is serious and what not on a solr build. It's also not just be smoke tester it's basically everything that runs after solr that is skipped on a regular basis.
>>>
>>> I don't have a good answer but we have to get this under control it's burdensome for lucene to carry this load and it's carrying it a quite some time. It wasn't very obvious how big this weights since I wasn't working on lucene internals for quite a while and speaking to many folks around here this is on their shoulders but it's not brought up for discussion, i think we have to.
>>>
>>> simon
>>>
>>> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/
>>>
>>>
>>> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson <[hidden email]> wrote:
>>> Martin:
>>>
>>> I have no idea how logging severity levels apply to unit tests that fail. It's not a question of triaging logs, it's a matter of Jenkins junit test runs reporting failures.
>>>
>>>
>>>
>>> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty <[hidden email]> wrote:
>>> Erick-
>>>
>>> appears that style mis-application may be categorised as INFO
>>> are mixed in with SEVERE errors
>>>
>>> Would it make sense to filter the errors based on severity ?
>>>
>>> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html
>>> Level (Java Platform SE 7 ) - Oracle Help Center
>>> docs.oracle.com
>>> The Level class defines a set of standard logging levels that can be used to control logging output. The logging Level objects are ordered and are specified by ordered integers.
>>> if you know Severity you can triage the SEVERE errors before working down to INFO errors
>>>
>>>
>>> WDYT?
>>> Martin
>>> ______________________________________________
>>>
>>>
>>>
>>> From: Erick Erickson <[hidden email]>
>>> Sent: Friday, June 15, 2018 1:05 PM
>>> To: [hidden email]; Mark Miller
>>> Subject: Re: Status of solr tests
>>>
>>> Mark (and everyone).
>>>
>>> I'm trying to be somewhat conservative about what I BadApple, at this
>>> point it's only things that have failed every week for the last 4.
>>> Part of that conservatism is to avoid BadApple'ing tests that are
>>> failing and _should_ fail.
>>>
>>> I'm explicitly _not_ delving into any of the causes at all at this
>>> point, it's overwhelming until we reduce the noise as everyone knows.
>>>
>>> So please feel totally free to BadApple anything you know is flakey,
>>> it won't intrude on my turf ;)
>>>
>>> And since I realized I can also report tests that have _not_ failed in
>>> a month that _are_ BadApple'd, we can be a little freer with
>>> BadApple'ing tests since there's a mechanism for un-annotating them
>>> without a lot of tedious effort.
>>>
>>> FWIW.
>>>
>>> On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[hidden email]> wrote:
>>>> There is an okay chance I'm going to start making some improvements here as
>>>> well. I've been working on a very stable set of tests on my starburst branch
>>>> and will slowly bring in test fixes over time (I've already been making some
>>>> on that branch for important tests). We should currently be defaulting to
>>>> tests.badapples=false on all solr test runs - it's a joke to try and get a
>>>> clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
>>>> commonly have so far avoided Erick's @BadApple hack and slash. They are bad
>>>> appled on my dev branch now, but that is currently where any time I have is
>>>> spent rather than on the main dev branches.
>>>>
>>>> Also, too many flakey tests are introduced because devs are not beasting or
>>>> beasting well before committing new heavy tests. Perhaps we could add some
>>>> docs around that.
>>>>
>>>> We have built in beasting support, we need to emphasize that a couple passes
>>>> on a new test is not sufficient to test it's quality.
>>>>
>>>> - Mark
>>>>
>>>> On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[hidden email]>
>>>> wrote:
>>>>>
>>>>> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>>>>>
>>>>> I've been trying to at least BadApple tests that fail consistently, so
>>>>> another option could be to disable BadApple'd tests. My hope has been
>>>>> to get to the point of being able to reliably get clean runs, at least
>>>>> when BadApple'd tests are disabled.
>>>>>
>>>>> From that point I want to draw a line in the sand and immediately
>>>>> address tests that fail that are _not_ BadApple'd. At least then we'll
>>>>> stop getting _worse_. And then we can work on the BadApple'd tests.
>>>>> But as David says, that's not going to be any time soon. It's been a
>>>>> couple of months that I've been trying to just get the tests
>>>>> BadApple'd without even trying to fix any of them.
>>>>>
>>>>> It's particularly pernicious because with all the noise we don't see
>>>>> failures we _should_ see.
>>>>>
>>>>> So I don't have any good short-term answer either. We've built up a
>>>>> very large technical debt in the testing. The first step is to stop
>>>>> adding more debt, which is what I've been working on so far. And
>>>>> that's the easy part....
>>>>>
>>>>> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>>>>>
>>>>> Erick
>>>>>
>>>>>
>>>>> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]>
>>>>> wrote:
>>>>>> (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>>>>>> Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>>>>>> are
>>>>>> trying to improve the stability of the Solr tests but even
>>>>>> optimistically
>>>>>> the practical reality is that it won't be good enough anytime soon.
>>>>>> When we
>>>>>> get there, we can reverse this.
>>>>>>
>>>>>> On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>>>>>> <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>> folks,
>>>>>>>
>>>>>>> I got more active working on IndexWriter and Soft-Deletes etc. in the
>>>>>>> last couple of weeks. It's a blast again and I really enjoy it. The
>>>>>>> one thing that is IMO not acceptable is the status of solr tests. I
>>>>>>> tried so many times to get them passing on several different OSs but
>>>>>>> it seems this is pretty hopepless. It's get's even worse the
>>>>>>> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>>>>>>> `-1` because of arbitrary solr tests, here is an example:
>>>>>>>
>>>>>>> || Reason || Tests ||
>>>>>>> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>>>>>>> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>>>>>>> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>>>>>>> |   | solr.client.solrj.impl.CloudSolrClientTest |
>>>>>>> |   | solr.common.util.TestJsonRecordReader |
>>>>>>>
>>>>>>> Speaking to other committers I hear we should just disable this job.
>>>>>>> Sorry, WTF?
>>>>>>>
>>>>>>> These tests seem to fail all the time, randomly and over and over
>>>>>>> again. This renders the test as entirely useless to me. I even invest
>>>>>>> time (wrong, I invested) looking into it if they are caused by me or
>>>>>>> if I can do something about it. Yet, someone could call me out for
>>>>>>> being responsible for them as a commiter, yes I am hence this email. I
>>>>>>> don't think I am obliged to fix them. These projects have 50+
>>>>>>> committers and having a shared codebase doesn't mean everybody has to
>>>>>>> take care of everything. I think we are at the point where if I work
>>>>>>> on Lucene I won't run solr tests at all otherwise there won't be any
>>>>>>> progress. On the other hand solr tests never pass I wonder if the solr
>>>>>>> code-base gets changes nevertheless? That is again a terrible
>>>>>>> situation.
>>>>>>>
>>>>>>> I spoke to varun and  anshum during buzzwords if they can give me some
>>>>>>> hints what I am doing wrong but it seems like the way it is. I feel
>>>>>>> terrible pushing stuff to our repo still seeing our tests fail. I get
>>>>>>> ~15 build failures from solr tests a day I am not the only one that
>>>>>>> has mail filters to archive them if there isn't a lucene tests in the
>>>>>>> failures.
>>>>>>>
>>>>>>> This is a terrible state folks, how do we fix it? It's the lucene land
>>>>>>> that get much love on the testing end but that also requires more work
>>>>>>> on it, I expect solr to do the same. That at the same time requires
>>>>>>> stop pushing new stuff until the situation is under control. The
>>>>>>> effort of marking stuff as bad apples isn't the answer, this requires
>>>>>>> effort from the drivers behind this project.
>>>>>>>
>>>>>>> simon
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>>>>>> --
>>>>>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>>>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>>
>>> David Smiley - Lucene/Solr Search Developer / Consultant - D W Smiley LLC | LinkedIn
>>> linkedin.com
>>> View David Smiley’s profile on LinkedIn, the world's largest professional community. David has 3 jobs listed on their profile. See the complete profile on LinkedIn and discover David’s connections and jobs at similar companies.
>>>
>>>
>>>>>> http://www.solrenterprisesearchserver.com
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>> --
>>>> - Mark
>>>> about.me/markrmiller
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

Erick Erickson
In reply to this post by sarowe
" This is not good and bad-appeling these test isn't the answer unless
we put a lot of effort into it, sorry I don't see it happening."

This ticks me off. I've spent considerable time over the last 4 months
trying to get to a point where we can stop getting _worse_ as a
necessary _first_ step to getting better. Lucidworks is putting effort
in that direction too. What other concrete actions do you recommend
going forward?

Of course just BadApple-ing tests isn't a satisfactory answer. And the
e-mail filters I've arranged that allow me to only see failures that
do _not_ run BadApple tests are dangerous and completely crappy.
Unfortunately I don't have a magic wand to make it all better so this
stop-gap (I hope) allows progress.

We'll know progress is being made when the weekly BadApple reports
show a declining number of tests that are annotated. Certainly not
there yet, but working on it.

Perhaps you missed the point of the BadApple exercise. Reasonably soon
I hope to be at a point where we can draw a line in the sand where we
can say "This is a new failure, fix it or roll back the changes". Then
can we get persnickety about not adding _new_ failures. Then we can
reduce the backlog.

And the result of these efforts may be me curling into a ball and
sucking my thumb because the problem is intractable. We'll see.

One temporary-but-maybe-necessary option is to run with BadApple
enabled. I don't like that either, but it's better than not running
tests at all.

Unfortunately when I'm working on code I have to do another crappy
work-around; run the tests and then re-run any failing tests and
assume if the run is successful that it was a flaky test. The BadApple
annotations are helpful for that too since soon I hope to have
confidence that we've annotated most all the flaky tests and if I can
run failing tests successfully _and_ they're annotated it's probably
OK. Horrible process, no question about that but I have to start
somewhere.

Again, what additional steps do you recommend?

Erick

On Tue, Jun 19, 2018 at 7:29 AM, Steve Rowe <[hidden email]> wrote:

> Hi Simon,
>
> Have you seen the late-February thread “Test failures are out of control….”? : https://lists.apache.org/thread.html/b783a9d7c22f518b07355e8e4f2c6f56020a7c32f36a58a86d51a3b7@%3Cdev.lucene.apache.org%3E
>
> If not, I suggest you go take a look.  Some of your questions are answered there.
>
> --
> Steve
> www.lucidworks.com
>
>> On Jun 19, 2018, at 9:41 AM, Simon Willnauer <[hidden email]> wrote:
>>
>> Thanks folks, I appreciate you are sharing some thoughts about this. My biggest issue is that this is a permanent condition. I could have sent this mail 2, 4 or 6 years ago and it would have been as relevant as today.
>>
>> I am convinced mark can make some progress but this isn't fixable by a single person this is a structural problem or rather a cultural. I am not sure if everybody is aware of how terrible it is. I took a screenshot of my inbox the other day what I have to dig through on a constant basis everytime I commit a change to lucene to make sure I am not missing something.
>>
>> <image.png>
>>
>> I don't even know how we can attract any new contributors or how many contributors have been scared away by this in the past. This is not good and bad-appeling these test isn't the answer unless we put a lot of effort into it, sorry I don't see it happening. I would have expected more than like 4 people from this PMC to reply to something like this. From my perspective there is a lot of harm done by this to the project and we have to figure out what we wanna do. This also affects our ability to release, guys our smoke-test builds never pass [1]. I don't know what to do if I were a RM for 7.4 (thanks adrien for doing it) Like I can not tell what is serious and what not on a solr build. It's also not just be smoke tester it's basically everything that runs after solr that is skipped on a regular basis.
>>
>> I don't have a good answer but we have to get this under control it's burdensome for lucene to carry this load and it's carrying it a quite some time. It wasn't very obvious how big this weights since I wasn't working on lucene internals for quite a while and speaking to many folks around here this is on their shoulders but it's not brought up for discussion, i think we have to.
>>
>> simon
>>
>> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/
>>
>>
>> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson <[hidden email]> wrote:
>> Martin:
>>
>> I have no idea how logging severity levels apply to unit tests that fail. It's not a question of triaging logs, it's a matter of Jenkins junit test runs reporting failures.
>>
>>
>>
>> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty <[hidden email]> wrote:
>> Erick-
>>
>> appears that style mis-application may be categorised as INFO
>> are mixed in with SEVERE errors
>>
>> Would it make sense to filter the errors based on severity ?
>>
>> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html
>> Level (Java Platform SE 7 ) - Oracle Help Center
>> docs.oracle.com
>> The Level class defines a set of standard logging levels that can be used to control logging output. The logging Level objects are ordered and are specified by ordered integers.
>> if you know Severity you can triage the SEVERE errors before working down to INFO errors
>>
>>
>> WDYT?
>> Martin
>> ______________________________________________
>>
>>
>>
>> From: Erick Erickson <[hidden email]>
>> Sent: Friday, June 15, 2018 1:05 PM
>> To: [hidden email]; Mark Miller
>> Subject: Re: Status of solr tests
>>
>> Mark (and everyone).
>>
>> I'm trying to be somewhat conservative about what I BadApple, at this
>> point it's only things that have failed every week for the last 4.
>> Part of that conservatism is to avoid BadApple'ing tests that are
>> failing and _should_ fail.
>>
>> I'm explicitly _not_ delving into any of the causes at all at this
>> point, it's overwhelming until we reduce the noise as everyone knows.
>>
>> So please feel totally free to BadApple anything you know is flakey,
>> it won't intrude on my turf ;)
>>
>> And since I realized I can also report tests that have _not_ failed in
>> a month that _are_ BadApple'd, we can be a little freer with
>> BadApple'ing tests since there's a mechanism for un-annotating them
>> without a lot of tedious effort.
>>
>> FWIW.
>>
>> On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[hidden email]> wrote:
>> > There is an okay chance I'm going to start making some improvements here as
>> > well. I've been working on a very stable set of tests on my starburst branch
>> > and will slowly bring in test fixes over time (I've already been making some
>> > on that branch for important tests). We should currently be defaulting to
>> > tests.badapples=false on all solr test runs - it's a joke to try and get a
>> > clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
>> > commonly have so far avoided Erick's @BadApple hack and slash. They are bad
>> > appled on my dev branch now, but that is currently where any time I have is
>> > spent rather than on the main dev branches.
>> >
>> > Also, too many flakey tests are introduced because devs are not beasting or
>> > beasting well before committing new heavy tests. Perhaps we could add some
>> > docs around that.
>> >
>> > We have built in beasting support, we need to emphasize that a couple passes
>> > on a new test is not sufficient to test it's quality.
>> >
>> > - Mark
>> >
>> > On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[hidden email]>
>> > wrote:
>> >>
>> >> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>> >>
>> >> I've been trying to at least BadApple tests that fail consistently, so
>> >> another option could be to disable BadApple'd tests. My hope has been
>> >> to get to the point of being able to reliably get clean runs, at least
>> >> when BadApple'd tests are disabled.
>> >>
>> >> From that point I want to draw a line in the sand and immediately
>> >> address tests that fail that are _not_ BadApple'd. At least then we'll
>> >> stop getting _worse_. And then we can work on the BadApple'd tests.
>> >> But as David says, that's not going to be any time soon. It's been a
>> >> couple of months that I've been trying to just get the tests
>> >> BadApple'd without even trying to fix any of them.
>> >>
>> >> It's particularly pernicious because with all the noise we don't see
>> >> failures we _should_ see.
>> >>
>> >> So I don't have any good short-term answer either. We've built up a
>> >> very large technical debt in the testing. The first step is to stop
>> >> adding more debt, which is what I've been working on so far. And
>> >> that's the easy part....
>> >>
>> >> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>> >>
>> >> Erick
>> >>
>> >>
>> >> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]>
>> >> wrote:
>> >> > (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>> >> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>> >> > are
>> >> > trying to improve the stability of the Solr tests but even
>> >> > optimistically
>> >> > the practical reality is that it won't be good enough anytime soon.
>> >> > When we
>> >> > get there, we can reverse this.
>> >> >
>> >> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>> >> > <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >> folks,
>> >> >>
>> >> >> I got more active working on IndexWriter and Soft-Deletes etc. in the
>> >> >> last couple of weeks. It's a blast again and I really enjoy it. The
>> >> >> one thing that is IMO not acceptable is the status of solr tests. I
>> >> >> tried so many times to get them passing on several different OSs but
>> >> >> it seems this is pretty hopepless. It's get's even worse the
>> >> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>> >> >> `-1` because of arbitrary solr tests, here is an example:
>> >> >>
>> >> >> || Reason || Tests ||
>> >> >> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>> >> >> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>> >> >> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>> >> >> |   | solr.client.solrj.impl.CloudSolrClientTest |
>> >> >> |   | solr.common.util.TestJsonRecordReader |
>> >> >>
>> >> >> Speaking to other committers I hear we should just disable this job.
>> >> >> Sorry, WTF?
>> >> >>
>> >> >> These tests seem to fail all the time, randomly and over and over
>> >> >> again. This renders the test as entirely useless to me. I even invest
>> >> >> time (wrong, I invested) looking into it if they are caused by me or
>> >> >> if I can do something about it. Yet, someone could call me out for
>> >> >> being responsible for them as a commiter, yes I am hence this email. I
>> >> >> don't think I am obliged to fix them. These projects have 50+
>> >> >> committers and having a shared codebase doesn't mean everybody has to
>> >> >> take care of everything. I think we are at the point where if I work
>> >> >> on Lucene I won't run solr tests at all otherwise there won't be any
>> >> >> progress. On the other hand solr tests never pass I wonder if the solr
>> >> >> code-base gets changes nevertheless? That is again a terrible
>> >> >> situation.
>> >> >>
>> >> >> I spoke to varun and  anshum during buzzwords if they can give me some
>> >> >> hints what I am doing wrong but it seems like the way it is. I feel
>> >> >> terrible pushing stuff to our repo still seeing our tests fail. I get
>> >> >> ~15 build failures from solr tests a day I am not the only one that
>> >> >> has mail filters to archive them if there isn't a lucene tests in the
>> >> >> failures.
>> >> >>
>> >> >> This is a terrible state folks, how do we fix it? It's the lucene land
>> >> >> that get much love on the testing end but that also requires more work
>> >> >> on it, I expect solr to do the same. That at the same time requires
>> >> >> stop pushing new stuff until the situation is under control. The
>> >> >> effort of marking stuff as bad apples isn't the answer, this requires
>> >> >> effort from the drivers behind this project.
>> >> >>
>> >> >> simon
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: [hidden email]
>> >> >> For additional commands, e-mail: [hidden email]
>> >> >>
>> >> > --
>> >> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> >> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>
>> David Smiley - Lucene/Solr Search Developer / Consultant - D W Smiley LLC | LinkedIn
>> linkedin.com
>> View David Smiley’s profile on LinkedIn, the world's largest professional community. David has 3 jobs listed on their profile. See the complete profile on LinkedIn and discover David’s connections and jobs at similar companies.
>>
>>
>> >> > http://www.solrenterprisesearchserver.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> > --
>> > - Mark
>> > about.me/markrmiller
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

Simon Willnauer-4
Erik and Steve I am trying to answer both of you in one email.

Erik, I didn't want to tick you off. I appreciate you being on it and
stay on top of these failure. I am sorry if you read it that way.

Steve, regarding the "Test Confidence" role email from Cassandra. I
can appreciate the effort here but we only fight symptoms. It's like
if your mattress is so old that it gives you a headache every day and
you taking advil to fix it. These issues are a fundamental problem and
it needs an fundamental change to fix it. There must be a mindshift
towards reproducible software testing that doesn't rely on spawning
nodes up for fun and profit. This will always have that problem if you
run significantly complex tests against them.
I can take a step back and tell you we had the same issue in
elasticsearch and we couldn't cope with it anymore we spent a
significant amount of work to change our approach on testing. There
are many many features like the entire sequence ID layer that was
blocked by a untitesting framework that allows us to simulate all
kinds of networking issues. It took several months for this one
framework to be build and we didn't work on the feature until it
happened. This requires a ton of discipline an it also will cause you
to not add features unless you can actually effectively test them. We
added more than 1k to Unittest Suites in a year and it slowed us down
in the beginning. Now we are in a much better place.
I am happy to share my experience with this and why most of our
integration tests use a declarative language that essentially prevents
them from being too crazy. There are also some tests that exercise the
distributed system heavily and there are ongoing debates how much they
buy us they also fail and they are a pain in the ass. Yet, if they do
we block releases, there is a massive responsibility we have here.

I also think we can't rely on a test-triage cadence from a single
company. if we can we have a PMC issue here but I think diversity wise
we are in a good place which is great. We have to fix this to make it
work without a company sponsoring test-triage. I hope we are on the
same page here.

simon

On Tue, Jun 19, 2018 at 5:44 PM, Erick Erickson <[hidden email]> wrote:

> " This is not good and bad-appeling these test isn't the answer unless
> we put a lot of effort into it, sorry I don't see it happening."
>
> This ticks me off. I've spent considerable time over the last 4 months
> trying to get to a point where we can stop getting _worse_ as a
> necessary _first_ step to getting better. Lucidworks is putting effort
> in that direction too. What other concrete actions do you recommend
> going forward?
>
> Of course just BadApple-ing tests isn't a satisfactory answer. And the
> e-mail filters I've arranged that allow me to only see failures that
> do _not_ run BadApple tests are dangerous and completely crappy.
> Unfortunately I don't have a magic wand to make it all better so this
> stop-gap (I hope) allows progress.
>
> We'll know progress is being made when the weekly BadApple reports
> show a declining number of tests that are annotated. Certainly not
> there yet, but working on it.
>
> Perhaps you missed the point of the BadApple exercise. Reasonably soon
> I hope to be at a point where we can draw a line in the sand where we
> can say "This is a new failure, fix it or roll back the changes". Then
> can we get persnickety about not adding _new_ failures. Then we can
> reduce the backlog.
>
> And the result of these efforts may be me curling into a ball and
> sucking my thumb because the problem is intractable. We'll see.
>
> One temporary-but-maybe-necessary option is to run with BadApple
> enabled. I don't like that either, but it's better than not running
> tests at all.
>
> Unfortunately when I'm working on code I have to do another crappy
> work-around; run the tests and then re-run any failing tests and
> assume if the run is successful that it was a flaky test. The BadApple
> annotations are helpful for that too since soon I hope to have
> confidence that we've annotated most all the flaky tests and if I can
> run failing tests successfully _and_ they're annotated it's probably
> OK. Horrible process, no question about that but I have to start
> somewhere.
>
> Again, what additional steps do you recommend?
>
> Erick
>
> On Tue, Jun 19, 2018 at 7:29 AM, Steve Rowe <[hidden email]> wrote:
>> Hi Simon,
>>
>> Have you seen the late-February thread “Test failures are out of control….”? : https://lists.apache.org/thread.html/b783a9d7c22f518b07355e8e4f2c6f56020a7c32f36a58a86d51a3b7@%3Cdev.lucene.apache.org%3E
>>
>> If not, I suggest you go take a look.  Some of your questions are answered there.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>>> On Jun 19, 2018, at 9:41 AM, Simon Willnauer <[hidden email]> wrote:
>>>
>>> Thanks folks, I appreciate you are sharing some thoughts about this. My biggest issue is that this is a permanent condition. I could have sent this mail 2, 4 or 6 years ago and it would have been as relevant as today.
>>>
>>> I am convinced mark can make some progress but this isn't fixable by a single person this is a structural problem or rather a cultural. I am not sure if everybody is aware of how terrible it is. I took a screenshot of my inbox the other day what I have to dig through on a constant basis everytime I commit a change to lucene to make sure I am not missing something.
>>>
>>> <image.png>
>>>
>>> I don't even know how we can attract any new contributors or how many contributors have been scared away by this in the past. This is not good and bad-appeling these test isn't the answer unless we put a lot of effort into it, sorry I don't see it happening. I would have expected more than like 4 people from this PMC to reply to something like this. From my perspective there is a lot of harm done by this to the project and we have to figure out what we wanna do. This also affects our ability to release, guys our smoke-test builds never pass [1]. I don't know what to do if I were a RM for 7.4 (thanks adrien for doing it) Like I can not tell what is serious and what not on a solr build. It's also not just be smoke tester it's basically everything that runs after solr that is skipped on a regular basis.
>>>
>>> I don't have a good answer but we have to get this under control it's burdensome for lucene to carry this load and it's carrying it a quite some time. It wasn't very obvious how big this weights since I wasn't working on lucene internals for quite a while and speaking to many folks around here this is on their shoulders but it's not brought up for discussion, i think we have to.
>>>
>>> simon
>>>
>>> [1] https://builds.apache.org/job/Lucene-Solr-SmokeRelease-7.4/
>>>
>>>
>>> On Sat, Jun 16, 2018 at 6:40 AM, Erick Erickson <[hidden email]> wrote:
>>> Martin:
>>>
>>> I have no idea how logging severity levels apply to unit tests that fail. It's not a question of triaging logs, it's a matter of Jenkins junit test runs reporting failures.
>>>
>>>
>>>
>>> On Fri, Jun 15, 2018 at 4:25 PM, Martin Gainty <[hidden email]> wrote:
>>> Erick-
>>>
>>> appears that style mis-application may be categorised as INFO
>>> are mixed in with SEVERE errors
>>>
>>> Would it make sense to filter the errors based on severity ?
>>>
>>> https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html
>>> Level (Java Platform SE 7 ) - Oracle Help Center
>>> docs.oracle.com
>>> The Level class defines a set of standard logging levels that can be used to control logging output. The logging Level objects are ordered and are specified by ordered integers.
>>> if you know Severity you can triage the SEVERE errors before working down to INFO errors
>>>
>>>
>>> WDYT?
>>> Martin
>>> ______________________________________________
>>>
>>>
>>>
>>> From: Erick Erickson <[hidden email]>
>>> Sent: Friday, June 15, 2018 1:05 PM
>>> To: [hidden email]; Mark Miller
>>> Subject: Re: Status of solr tests
>>>
>>> Mark (and everyone).
>>>
>>> I'm trying to be somewhat conservative about what I BadApple, at this
>>> point it's only things that have failed every week for the last 4.
>>> Part of that conservatism is to avoid BadApple'ing tests that are
>>> failing and _should_ fail.
>>>
>>> I'm explicitly _not_ delving into any of the causes at all at this
>>> point, it's overwhelming until we reduce the noise as everyone knows.
>>>
>>> So please feel totally free to BadApple anything you know is flakey,
>>> it won't intrude on my turf ;)
>>>
>>> And since I realized I can also report tests that have _not_ failed in
>>> a month that _are_ BadApple'd, we can be a little freer with
>>> BadApple'ing tests since there's a mechanism for un-annotating them
>>> without a lot of tedious effort.
>>>
>>> FWIW.
>>>
>>> On Fri, Jun 15, 2018 at 9:09 AM, Mark Miller <[hidden email]> wrote:
>>> > There is an okay chance I'm going to start making some improvements here as
>>> > well. I've been working on a very stable set of tests on my starburst branch
>>> > and will slowly bring in test fixes over time (I've already been making some
>>> > on that branch for important tests). We should currently be defaulting to
>>> > tests.badapples=false on all solr test runs - it's a joke to try and get a
>>> > clean run otherwise, and even then somehow 4 or 5 tests that fail somewhat
>>> > commonly have so far avoided Erick's @BadApple hack and slash. They are bad
>>> > appled on my dev branch now, but that is currently where any time I have is
>>> > spent rather than on the main dev branches.
>>> >
>>> > Also, too many flakey tests are introduced because devs are not beasting or
>>> > beasting well before committing new heavy tests. Perhaps we could add some
>>> > docs around that.
>>> >
>>> > We have built in beasting support, we need to emphasize that a couple passes
>>> > on a new test is not sufficient to test it's quality.
>>> >
>>> > - Mark
>>> >
>>> > On Fri, Jun 15, 2018 at 9:46 AM Erick Erickson <[hidden email]>
>>> > wrote:
>>> >>
>>> >> (Siiiiggggghhhh) All very true. You're not alone in your frustration.
>>> >>
>>> >> I've been trying to at least BadApple tests that fail consistently, so
>>> >> another option could be to disable BadApple'd tests. My hope has been
>>> >> to get to the point of being able to reliably get clean runs, at least
>>> >> when BadApple'd tests are disabled.
>>> >>
>>> >> From that point I want to draw a line in the sand and immediately
>>> >> address tests that fail that are _not_ BadApple'd. At least then we'll
>>> >> stop getting _worse_. And then we can work on the BadApple'd tests.
>>> >> But as David says, that's not going to be any time soon. It's been a
>>> >> couple of months that I've been trying to just get the tests
>>> >> BadApple'd without even trying to fix any of them.
>>> >>
>>> >> It's particularly pernicious because with all the noise we don't see
>>> >> failures we _should_ see.
>>> >>
>>> >> So I don't have any good short-term answer either. We've built up a
>>> >> very large technical debt in the testing. The first step is to stop
>>> >> adding more debt, which is what I've been working on so far. And
>>> >> that's the easy part....
>>> >>
>>> >> Siiiiiiiiiiiiiigggggggggghhhhhhhhhh
>>> >>
>>> >> Erick
>>> >>
>>> >>
>>> >> On Fri, Jun 15, 2018 at 5:29 AM, David Smiley <[hidden email]>
>>> >> wrote:
>>> >> > (Sigh) I sympathize with your points Simon.  I'm +1 to modify the
>>> >> > Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.  We can and
>>> >> > are
>>> >> > trying to improve the stability of the Solr tests but even
>>> >> > optimistically
>>> >> > the practical reality is that it won't be good enough anytime soon.
>>> >> > When we
>>> >> > get there, we can reverse this.
>>> >> >
>>> >> > On Fri, Jun 15, 2018 at 3:32 AM Simon Willnauer
>>> >> > <[hidden email]>
>>> >> > wrote:
>>> >> >>
>>> >> >> folks,
>>> >> >>
>>> >> >> I got more active working on IndexWriter and Soft-Deletes etc. in the
>>> >> >> last couple of weeks. It's a blast again and I really enjoy it. The
>>> >> >> one thing that is IMO not acceptable is the status of solr tests. I
>>> >> >> tried so many times to get them passing on several different OSs but
>>> >> >> it seems this is pretty hopepless. It's get's even worse the
>>> >> >> Lucene/Solr QA job literally marks every ticket I attach a patch to as
>>> >> >> `-1` because of arbitrary solr tests, here is an example:
>>> >> >>
>>> >> >> || Reason || Tests ||
>>> >> >> | Failed junit tests | solr.rest.TestManagedResourceStorage |
>>> >> >> |   | solr.cloud.autoscaling.SearchRateTriggerIntegrationTest |
>>> >> >> |   | solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest |
>>> >> >> |   | solr.client.solrj.impl.CloudSolrClientTest |
>>> >> >> |   | solr.common.util.TestJsonRecordReader |
>>> >> >>
>>> >> >> Speaking to other committers I hear we should just disable this job.
>>> >> >> Sorry, WTF?
>>> >> >>
>>> >> >> These tests seem to fail all the time, randomly and over and over
>>> >> >> again. This renders the test as entirely useless to me. I even invest
>>> >> >> time (wrong, I invested) looking into it if they are caused by me or
>>> >> >> if I can do something about it. Yet, someone could call me out for
>>> >> >> being responsible for them as a commiter, yes I am hence this email. I
>>> >> >> don't think I am obliged to fix them. These projects have 50+
>>> >> >> committers and having a shared codebase doesn't mean everybody has to
>>> >> >> take care of everything. I think we are at the point where if I work
>>> >> >> on Lucene I won't run solr tests at all otherwise there won't be any
>>> >> >> progress. On the other hand solr tests never pass I wonder if the solr
>>> >> >> code-base gets changes nevertheless? That is again a terrible
>>> >> >> situation.
>>> >> >>
>>> >> >> I spoke to varun and  anshum during buzzwords if they can give me some
>>> >> >> hints what I am doing wrong but it seems like the way it is. I feel
>>> >> >> terrible pushing stuff to our repo still seeing our tests fail. I get
>>> >> >> ~15 build failures from solr tests a day I am not the only one that
>>> >> >> has mail filters to archive them if there isn't a lucene tests in the
>>> >> >> failures.
>>> >> >>
>>> >> >> This is a terrible state folks, how do we fix it? It's the lucene land
>>> >> >> that get much love on the testing end but that also requires more work
>>> >> >> on it, I expect solr to do the same. That at the same time requires
>>> >> >> stop pushing new stuff until the situation is under control. The
>>> >> >> effort of marking stuff as bad apples isn't the answer, this requires
>>> >> >> effort from the drivers behind this project.
>>> >> >>
>>> >> >> simon
>>> >> >>
>>> >> >> ---------------------------------------------------------------------
>>> >> >> To unsubscribe, e-mail: [hidden email]
>>> >> >> For additional commands, e-mail: [hidden email]
>>> >> >>
>>> >> > --
>>> >> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>> >> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>>
>>> David Smiley - Lucene/Solr Search Developer / Consultant - D W Smiley LLC | LinkedIn
>>> linkedin.com
>>> View David Smiley’s profile on LinkedIn, the world's largest professional community. David has 3 jobs listed on their profile. See the complete profile on LinkedIn and discover David’s connections and jobs at similar companies.
>>>
>>>
>>> >> > http://www.solrenterprisesearchserver.com
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: [hidden email]
>>> >> For additional commands, e-mail: [hidden email]
>>> >>
>>> > --
>>> > - Mark
>>> > about.me/markrmiller
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Status of solr tests

sarowe
In reply to this post by david.w.smiley@gmail.com

> On Jun 15, 2018, at 8:29 AM, David Smiley <[hidden email]> wrote:
>
> I'm +1 to modify the Lucene-side JIRA QA bot (Yetus) to not execute Solr tests.

Right now, Yetus only executes Solr tests when there is a Solr change in the patch; otherwise only Lucene tests are executed.

I just committed a modification to the Lucene/Solr Yetus personality that adds "-Dtests.badapples=false" to the per-modified-module “ant test” cmdline.  This should reduce the noise appreciably.

--
Steve
www.lucidworks.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]