Test failures are out of control......

classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Test failures are out of control......

Erick Erickson
There's an elephant in the room, and it's that failing tests are being
ignored. Mind you, Solr and Lucene are progressing at a furious pace
with lots of great functionality being added. That said, we're
building up a considerable "technical debt" when it comes to testing.

And I should say up front that major new functionality is expected to
take a while to shake out (e.g. autoscaling, streaming, V2 API etc.),
and noise from tests of new functionality is expected while things
bake.

Below is a list of tests that have failed at least once since just
last night. This has been getting worse as time passes, the broken
window problem. Some e-mails have 10 failing tests (+/-) so unless I
go through each and every one I don't know whether something I've done
is causing a problem or not.

I'm as guilty of letting things slide as anyone else, there's been a
long-standing issue with TestLazyCores that I work on sporadically for
instance that's _probably_ "something in the test framework"....

Several folks have spent some time digging into test failures and
identifying at least some of the causes, kudos to them. It seems
they're voices crying out in the wilderness though.

There is so much noise at this point that tests are becoming
irrelevant. I'm trying to work on SOLR-10809 for instance, where
there's a pretty good possibility that I'll close at least one thing
that shouldn't be closed. So I ran the full suite 10 times and
gathered all the failures. Now I have to try to separate the failures
caused by that JIRA from the ones that aren't related to it so I beast
each of the failing tests 100 times against master. If I get a failure
on master too for a particular test, I'll assume it's "not my problem"
and drive on.

I freely acknowledge that this is poor practice. It's driven by
frustration and the desire to make progress. While it's poor practice,
it's not as bad as only looking at tests that I _think_ are related or
ignoring all tests failures I can't instantly recognize as "my fault".

So what's our stance on this? Mark Miller had a terrific program at
one point allowing categorization of tests that failed at a glance,
but it hasn't been updated in a while.  Steve Rowe is working on the
problem too. Hoss and Cassandra have both added to the efforts as
well. And I'm sure I'm leaving out others.

Then there's the @Ignore and @BadApple annotations....

So, as a community, are we going to devote some energy to this? Or
shall we just start ignoring all of the frequently failing tests?
Frankly we'd be farther ahead at this point marking failing tests that
aren't getting any work with @Ignore or @BadApple and getting
compulsive about not letting any _new_ tests fail than continuing our
current path. I don't _like_ this option mind you, but it's better
than letting these accumulate forever and tests become more and more
difficult to use. As tests become more difficult to use, they're used
less and the problem gets worse.

Note, I made no effort to separate suite .vs. individual reports here.....

Erick

FAILED:  junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
FAILED:  junit.framework.TestSuite.org.apache.lucene.index.TestIndexWriterDeleteByQuery
FAILED:  junit.framework.TestSuite.org.apache.lucene.store.TestSleepingLockWrapper
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetExtrasCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyQueryFacetCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.client.solrj.TestLBHttpSolrClient
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.TestSolrCloudWithSecureImpersonation
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.autoscaling.AutoAddReplicasPlanActionTest
FAILED:  junit.framework.TestSuite.org.apache.solr.core.AlternateDirectoryTest
FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores
FAILED:  junit.framework.TestSuite.org.apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTest
FAILED:  junit.framework.TestSuite.org.apache.solr.ltr.TestSelectiveWeightCreation
FAILED:  junit.framework.TestSuite.org.apache.solr.ltr.store.rest.TestModelManager
FAILED:  junit.framework.TestSuite.org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory
FAILED:  junit.framework.TestSuite.org.apache.solr.search.join.BlockJoinFacetDistribTest
FAILED:  junit.framework.TestSuite.org.apache.solr.security.TestAuthorizationFramework
FAILED:  junit.framework.TestSuite.org.apache.solr.update.processor.TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory
FAILED:  org.apache.lucene.index.TestStressNRT.test
FAILED:  org.apache.solr.cloud.AddReplicaTest.test
FAILED:  org.apache.solr.cloud.DeleteShardTest.test
FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test
FAILED:  org.apache.solr.cloud.ReplaceNodeNoTargetTest.test
FAILED:  org.apache.solr.cloud.TestUtilizeNode.test
FAILED:  org.apache.solr.cloud.api.collections.CollectionsAPIDistributedZkTest.testCollectionsAPI
FAILED:  org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitAfterFailedSplit
FAILED:  org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest.testSimple
FAILED:  org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
FAILED:  org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest.testSimple
FAILED:  org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testSearchRate
FAILED:  org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate
FAILED:  org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication
FAILED:  org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
FAILED:  org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory.testCanHandleDecodingAndEncodingForSynonyms

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Adrien Grand
Thanks for bringing this up Erick. I agree with you we should silence those frequent failures. Like you said, the side-effects of not silencing them are even worse. I'll add that these flaky tests also make releasing harder, it took me three runs last time (Lucene/Solr 7.2) for the release build to succeed because of failed tests.

Le mer. 21 févr. 2018 à 16:52, Erick Erickson <[hidden email]> a écrit :
There's an elephant in the room, and it's that failing tests are being
ignored. Mind you, Solr and Lucene are progressing at a furious pace
with lots of great functionality being added. That said, we're
building up a considerable "technical debt" when it comes to testing.

And I should say up front that major new functionality is expected to
take a while to shake out (e.g. autoscaling, streaming, V2 API etc.),
and noise from tests of new functionality is expected while things
bake.

Below is a list of tests that have failed at least once since just
last night. This has been getting worse as time passes, the broken
window problem. Some e-mails have 10 failing tests (+/-) so unless I
go through each and every one I don't know whether something I've done
is causing a problem or not.

I'm as guilty of letting things slide as anyone else, there's been a
long-standing issue with TestLazyCores that I work on sporadically for
instance that's _probably_ "something in the test framework"....

Several folks have spent some time digging into test failures and
identifying at least some of the causes, kudos to them. It seems
they're voices crying out in the wilderness though.

There is so much noise at this point that tests are becoming
irrelevant. I'm trying to work on SOLR-10809 for instance, where
there's a pretty good possibility that I'll close at least one thing
that shouldn't be closed. So I ran the full suite 10 times and
gathered all the failures. Now I have to try to separate the failures
caused by that JIRA from the ones that aren't related to it so I beast
each of the failing tests 100 times against master. If I get a failure
on master too for a particular test, I'll assume it's "not my problem"
and drive on.

I freely acknowledge that this is poor practice. It's driven by
frustration and the desire to make progress. While it's poor practice,
it's not as bad as only looking at tests that I _think_ are related or
ignoring all tests failures I can't instantly recognize as "my fault".

So what's our stance on this? Mark Miller had a terrific program at
one point allowing categorization of tests that failed at a glance,
but it hasn't been updated in a while.  Steve Rowe is working on the
problem too. Hoss and Cassandra have both added to the efforts as
well. And I'm sure I'm leaving out others.

Then there's the @Ignore and @BadApple annotations....

So, as a community, are we going to devote some energy to this? Or
shall we just start ignoring all of the frequently failing tests?
Frankly we'd be farther ahead at this point marking failing tests that
aren't getting any work with @Ignore or @BadApple and getting
compulsive about not letting any _new_ tests fail than continuing our
current path. I don't _like_ this option mind you, but it's better
than letting these accumulate forever and tests become more and more
difficult to use. As tests become more difficult to use, they're used
less and the problem gets worse.

Note, I made no effort to separate suite .vs. individual reports here.....

Erick

FAILED:  junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
FAILED:  junit.framework.TestSuite.org.apache.lucene.index.TestIndexWriterDeleteByQuery
FAILED:  junit.framework.TestSuite.org.apache.lucene.store.TestSleepingLockWrapper
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetExtrasCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyQueryFacetCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.client.solrj.TestLBHttpSolrClient
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.TestSolrCloudWithSecureImpersonation
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.autoscaling.AutoAddReplicasPlanActionTest
FAILED:  junit.framework.TestSuite.org.apache.solr.core.AlternateDirectoryTest
FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores
FAILED:  junit.framework.TestSuite.org.apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTest
FAILED:  junit.framework.TestSuite.org.apache.solr.ltr.TestSelectiveWeightCreation
FAILED:  junit.framework.TestSuite.org.apache.solr.ltr.store.rest.TestModelManager
FAILED:  junit.framework.TestSuite.org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory
FAILED:  junit.framework.TestSuite.org.apache.solr.search.join.BlockJoinFacetDistribTest
FAILED:  junit.framework.TestSuite.org.apache.solr.security.TestAuthorizationFramework
FAILED:  junit.framework.TestSuite.org.apache.solr.update.processor.TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory
FAILED:  org.apache.lucene.index.TestStressNRT.test
FAILED:  org.apache.solr.cloud.AddReplicaTest.test
FAILED:  org.apache.solr.cloud.DeleteShardTest.test
FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test
FAILED:  org.apache.solr.cloud.ReplaceNodeNoTargetTest.test
FAILED:  org.apache.solr.cloud.TestUtilizeNode.test
FAILED:  org.apache.solr.cloud.api.collections.CollectionsAPIDistributedZkTest.testCollectionsAPI
FAILED:  org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitAfterFailedSplit
FAILED:  org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest.testSimple
FAILED:  org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
FAILED:  org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest.testSimple
FAILED:  org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testSearchRate
FAILED:  org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate
FAILED:  org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication
FAILED:  org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
FAILED:  org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory.testCanHandleDecodingAndEncodingForSynonyms

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Tommaso Teofili
+1, agree with Adrien, thanks for bringing this up Erick!



Il giorno mer 21 feb 2018 alle ore 17:15 Adrien Grand <[hidden email]> ha scritto:
Thanks for bringing this up Erick. I agree with you we should silence those frequent failures. Like you said, the side-effects of not silencing them are even worse. I'll add that these flaky tests also make releasing harder, it took me three runs last time (Lucene/Solr 7.2) for the release build to succeed because of failed tests.

Le mer. 21 févr. 2018 à 16:52, Erick Erickson <[hidden email]> a écrit :
There's an elephant in the room, and it's that failing tests are being
ignored. Mind you, Solr and Lucene are progressing at a furious pace
with lots of great functionality being added. That said, we're
building up a considerable "technical debt" when it comes to testing.

And I should say up front that major new functionality is expected to
take a while to shake out (e.g. autoscaling, streaming, V2 API etc.),
and noise from tests of new functionality is expected while things
bake.

Below is a list of tests that have failed at least once since just
last night. This has been getting worse as time passes, the broken
window problem. Some e-mails have 10 failing tests (+/-) so unless I
go through each and every one I don't know whether something I've done
is causing a problem or not.

I'm as guilty of letting things slide as anyone else, there's been a
long-standing issue with TestLazyCores that I work on sporadically for
instance that's _probably_ "something in the test framework"....

Several folks have spent some time digging into test failures and
identifying at least some of the causes, kudos to them. It seems
they're voices crying out in the wilderness though.

There is so much noise at this point that tests are becoming
irrelevant. I'm trying to work on SOLR-10809 for instance, where
there's a pretty good possibility that I'll close at least one thing
that shouldn't be closed. So I ran the full suite 10 times and
gathered all the failures. Now I have to try to separate the failures
caused by that JIRA from the ones that aren't related to it so I beast
each of the failing tests 100 times against master. If I get a failure
on master too for a particular test, I'll assume it's "not my problem"
and drive on.

I freely acknowledge that this is poor practice. It's driven by
frustration and the desire to make progress. While it's poor practice,
it's not as bad as only looking at tests that I _think_ are related or
ignoring all tests failures I can't instantly recognize as "my fault".

So what's our stance on this? Mark Miller had a terrific program at
one point allowing categorization of tests that failed at a glance,
but it hasn't been updated in a while.  Steve Rowe is working on the
problem too. Hoss and Cassandra have both added to the efforts as
well. And I'm sure I'm leaving out others.

Then there's the @Ignore and @BadApple annotations....

So, as a community, are we going to devote some energy to this? Or
shall we just start ignoring all of the frequently failing tests?
Frankly we'd be farther ahead at this point marking failing tests that
aren't getting any work with @Ignore or @BadApple and getting
compulsive about not letting any _new_ tests fail than continuing our
current path. I don't _like_ this option mind you, but it's better
than letting these accumulate forever and tests become more and more
difficult to use. As tests become more difficult to use, they're used
less and the problem gets worse.

Note, I made no effort to separate suite .vs. individual reports here.....

Erick

FAILED:  junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
FAILED:  junit.framework.TestSuite.org.apache.lucene.index.TestIndexWriterDeleteByQuery
FAILED:  junit.framework.TestSuite.org.apache.lucene.store.TestSleepingLockWrapper
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetExtrasCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyQueryFacetCloudTest
FAILED:  junit.framework.TestSuite.org.apache.solr.client.solrj.TestLBHttpSolrClient
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.TestSolrCloudWithSecureImpersonation
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.autoscaling.AutoAddReplicasPlanActionTest
FAILED:  junit.framework.TestSuite.org.apache.solr.core.AlternateDirectoryTest
FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores
FAILED:  junit.framework.TestSuite.org.apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTest
FAILED:  junit.framework.TestSuite.org.apache.solr.ltr.TestSelectiveWeightCreation
FAILED:  junit.framework.TestSuite.org.apache.solr.ltr.store.rest.TestModelManager
FAILED:  junit.framework.TestSuite.org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory
FAILED:  junit.framework.TestSuite.org.apache.solr.search.join.BlockJoinFacetDistribTest
FAILED:  junit.framework.TestSuite.org.apache.solr.security.TestAuthorizationFramework
FAILED:  junit.framework.TestSuite.org.apache.solr.update.processor.TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory
FAILED:  org.apache.lucene.index.TestStressNRT.test
FAILED:  org.apache.solr.cloud.AddReplicaTest.test
FAILED:  org.apache.solr.cloud.DeleteShardTest.test
FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test
FAILED:  org.apache.solr.cloud.ReplaceNodeNoTargetTest.test
FAILED:  org.apache.solr.cloud.TestUtilizeNode.test
FAILED:  org.apache.solr.cloud.api.collections.CollectionsAPIDistributedZkTest.testCollectionsAPI
FAILED:  org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitAfterFailedSplit
FAILED:  org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest.testSimple
FAILED:  org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
FAILED:  org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest.testSimple
FAILED:  org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
FAILED:  org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testSearchRate
FAILED:  org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate
FAILED:  org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication
FAILED:  org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
FAILED:  org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory.testCanHandleDecodingAndEncodingForSynonyms

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Dawid Weiss-2
It's a recurring theme, huh, Erick? :)

http://markmail.org/message/7eykbuyyaxbxn364

I agree with your opinion and I have expressed it more than once -- a
test that is failing for longer while and cannot be identified or
fixed is a candidate for removal. The noise in Solr tests have
increased to a degree that I stopped looking (a long time ago), unless
somebody explicitly pings me about something. I tried to fix some of
those tests, but it's beyond my capabilities in many cases (and my
time budget in others).

I also recall some folks had a different take on the subject; see Mark
Miller's opinion in the thread above, for example (there were other
threads too, but I can't find them now).

Dawid


On Wed, Feb 21, 2018 at 5:36 PM, Tommaso Teofili
<[hidden email]> wrote:

> +1, agree with Adrien, thanks for bringing this up Erick!
>
>
>
> Il giorno mer 21 feb 2018 alle ore 17:15 Adrien Grand <[hidden email]> ha
> scritto:
>>
>> Thanks for bringing this up Erick. I agree with you we should silence
>> those frequent failures. Like you said, the side-effects of not silencing
>> them are even worse. I'll add that these flaky tests also make releasing
>> harder, it took me three runs last time (Lucene/Solr 7.2) for the release
>> build to succeed because of failed tests.
>>
>> Le mer. 21 févr. 2018 à 16:52, Erick Erickson <[hidden email]> a
>> écrit :
>>>
>>> There's an elephant in the room, and it's that failing tests are being
>>> ignored. Mind you, Solr and Lucene are progressing at a furious pace
>>> with lots of great functionality being added. That said, we're
>>> building up a considerable "technical debt" when it comes to testing.
>>>
>>> And I should say up front that major new functionality is expected to
>>> take a while to shake out (e.g. autoscaling, streaming, V2 API etc.),
>>> and noise from tests of new functionality is expected while things
>>> bake.
>>>
>>> Below is a list of tests that have failed at least once since just
>>> last night. This has been getting worse as time passes, the broken
>>> window problem. Some e-mails have 10 failing tests (+/-) so unless I
>>> go through each and every one I don't know whether something I've done
>>> is causing a problem or not.
>>>
>>> I'm as guilty of letting things slide as anyone else, there's been a
>>> long-standing issue with TestLazyCores that I work on sporadically for
>>> instance that's _probably_ "something in the test framework"....
>>>
>>> Several folks have spent some time digging into test failures and
>>> identifying at least some of the causes, kudos to them. It seems
>>> they're voices crying out in the wilderness though.
>>>
>>> There is so much noise at this point that tests are becoming
>>> irrelevant. I'm trying to work on SOLR-10809 for instance, where
>>> there's a pretty good possibility that I'll close at least one thing
>>> that shouldn't be closed. So I ran the full suite 10 times and
>>> gathered all the failures. Now I have to try to separate the failures
>>> caused by that JIRA from the ones that aren't related to it so I beast
>>> each of the failing tests 100 times against master. If I get a failure
>>> on master too for a particular test, I'll assume it's "not my problem"
>>> and drive on.
>>>
>>> I freely acknowledge that this is poor practice. It's driven by
>>> frustration and the desire to make progress. While it's poor practice,
>>> it's not as bad as only looking at tests that I _think_ are related or
>>> ignoring all tests failures I can't instantly recognize as "my fault".
>>>
>>> So what's our stance on this? Mark Miller had a terrific program at
>>> one point allowing categorization of tests that failed at a glance,
>>> but it hasn't been updated in a while.  Steve Rowe is working on the
>>> problem too. Hoss and Cassandra have both added to the efforts as
>>> well. And I'm sure I'm leaving out others.
>>>
>>> Then there's the @Ignore and @BadApple annotations....
>>>
>>> So, as a community, are we going to devote some energy to this? Or
>>> shall we just start ignoring all of the frequently failing tests?
>>> Frankly we'd be farther ahead at this point marking failing tests that
>>> aren't getting any work with @Ignore or @BadApple and getting
>>> compulsive about not letting any _new_ tests fail than continuing our
>>> current path. I don't _like_ this option mind you, but it's better
>>> than letting these accumulate forever and tests become more and more
>>> difficult to use. As tests become more difficult to use, they're used
>>> less and the problem gets worse.
>>>
>>> Note, I made no effort to separate suite .vs. individual reports
>>> here.....
>>>
>>> Erick
>>>
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.index.TestIndexWriterDeleteByQuery
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.store.TestSleepingLockWrapper
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetExtrasCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyQueryFacetCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.client.solrj.TestLBHttpSolrClient
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.cloud.TestSolrCloudWithSecureImpersonation
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.cloud.autoscaling.AutoAddReplicasPlanActionTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.core.AlternateDirectoryTest
>>> FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.ltr.TestSelectiveWeightCreation
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.ltr.store.rest.TestModelManager
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.search.join.BlockJoinFacetDistribTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.security.TestAuthorizationFramework
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.update.processor.TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory
>>> FAILED:  org.apache.lucene.index.TestStressNRT.test
>>> FAILED:  org.apache.solr.cloud.AddReplicaTest.test
>>> FAILED:  org.apache.solr.cloud.DeleteShardTest.test
>>> FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test
>>> FAILED:  org.apache.solr.cloud.ReplaceNodeNoTargetTest.test
>>> FAILED:  org.apache.solr.cloud.TestUtilizeNode.test
>>> FAILED:
>>> org.apache.solr.cloud.api.collections.CollectionsAPIDistributedZkTest.testCollectionsAPI
>>> FAILED:
>>> org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitAfterFailedSplit
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest.testSimple
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest.testSimple
>>> FAILED:  org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testSearchRate
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate
>>> FAILED:
>>> org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication
>>> FAILED:
>>> org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
>>> FAILED:
>>> org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory.testCanHandleDecodingAndEncodingForSynonyms
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Erick Erickson
In reply to this post by Tommaso Teofili
Dawid:
Yep, definitely a recurring theme. But this time I may actually, you
know, do something about it ;)

Mark is one of the advocates of this theme, perhaps he got exhausted
trying to push that stone up the hill ;). Maybe it's my turn to pick
up the baton.... Comments about there being value to seeing these is
well taken, but outweighed IMO by the harm in there being so much
noise that failures that _should_ get attention are so easy to
overlook.

bq: The noise in Solr tests have increased to a degree that I stopped looking.

Exactly. To one degree or another I think this has happened to a _lot_
of people, myself certainly included.

And you've certainly done more than your share of fixing things in the
infrastructure, many thanks!

----------------

I'm not sure blanket @BadApple-ing these is The Right Thing To Do for
_all_ of them though as I know lots of active work is being done in
some areas. I'd hate for someone to be working in some area and
currently trying to fix something and have the  failures disappear and
think they were fixed when in reality they just weren't run.

Straw-man proposal:

> I'll volunteer to gather failing tests through the next few days from the dev e-mails. I'll create yet another umbrella JIRA that proposes to @BadApple _all_ of them unless someone steps up and volunteers to actively work on a particular test failure. Since I brought it up I'll get aggressive about @BadApple-ing failing tests in future. I'll link the current JIRAs for failing tests in as well (on a cursory glance there are 16 open ones)...

> If someone objects to @BadApple-ing a particular test, they should create a JIRA, assign it to themselves and actively work on it. Short shrift given to "I don't think we should @BadApple that test because someday someone might want to try to fix it".... In this proposal, it's perfectly acceptable to remove the @BadApple notation and push it, as long as it's being actively worked on.

> Would someone who knows the test infrastructure better than me be willing to volunteer to set up a run periodically with BadApple annotations disabled. Perhaps weekly? Even nightly? That way interested parties can see these but the rest of us would only have _one_ e-mail to ignore, not 10-20 a day. It'd be great if the subject mentioned something allowing the WithBadApple runs to be identified just by glancing at the subject..... Then errors that didn't have BadApple annotations would stand out from the noise since they would be in _other_ emails.

> It's easy enough to find all the BadApple-labeled tests, I'll also volunteer to post a weekly list.

Getting e-mails for flakey tests is acceptable IMO only if people are
working on them. I've certainly been in situations where I can't get
something to fail locally and have to rely on Jenkins etc to gather
logging info or see if my fixes really work. I _do_ care that we are
accumulating more and more failures and it's getting harder and harder
to know when failures are a function of new code or not.

WDYT?
Erick

On Wed, Feb 21, 2018 at 8:36 AM, Tommaso Teofili
<[hidden email]> wrote:

> +1, agree with Adrien, thanks for bringing this up Erick!
>
>
>
> Il giorno mer 21 feb 2018 alle ore 17:15 Adrien Grand <[hidden email]> ha
> scritto:
>>
>> Thanks for bringing this up Erick. I agree with you we should silence
>> those frequent failures. Like you said, the side-effects of not silencing
>> them are even worse. I'll add that these flaky tests also make releasing
>> harder, it took me three runs last time (Lucene/Solr 7.2) for the release
>> build to succeed because of failed tests.
>>
>> Le mer. 21 févr. 2018 à 16:52, Erick Erickson <[hidden email]> a
>> écrit :
>>>
>>> There's an elephant in the room, and it's that failing tests are being
>>> ignored. Mind you, Solr and Lucene are progressing at a furious pace
>>> with lots of great functionality being added. That said, we're
>>> building up a considerable "technical debt" when it comes to testing.
>>>
>>> And I should say up front that major new functionality is expected to
>>> take a while to shake out (e.g. autoscaling, streaming, V2 API etc.),
>>> and noise from tests of new functionality is expected while things
>>> bake.
>>>
>>> Below is a list of tests that have failed at least once since just
>>> last night. This has been getting worse as time passes, the broken
>>> window problem. Some e-mails have 10 failing tests (+/-) so unless I
>>> go through each and every one I don't know whether something I've done
>>> is causing a problem or not.
>>>
>>> I'm as guilty of letting things slide as anyone else, there's been a
>>> long-standing issue with TestLazyCores that I work on sporadically for
>>> instance that's _probably_ "something in the test framework"....
>>>
>>> Several folks have spent some time digging into test failures and
>>> identifying at least some of the causes, kudos to them. It seems
>>> they're voices crying out in the wilderness though.
>>>
>>> There is so much noise at this point that tests are becoming
>>> irrelevant. I'm trying to work on SOLR-10809 for instance, where
>>> there's a pretty good possibility that I'll close at least one thing
>>> that shouldn't be closed. So I ran the full suite 10 times and
>>> gathered all the failures. Now I have to try to separate the failures
>>> caused by that JIRA from the ones that aren't related to it so I beast
>>> each of the failing tests 100 times against master. If I get a failure
>>> on master too for a particular test, I'll assume it's "not my problem"
>>> and drive on.
>>>
>>> I freely acknowledge that this is poor practice. It's driven by
>>> frustration and the desire to make progress. While it's poor practice,
>>> it's not as bad as only looking at tests that I _think_ are related or
>>> ignoring all tests failures I can't instantly recognize as "my fault".
>>>
>>> So what's our stance on this? Mark Miller had a terrific program at
>>> one point allowing categorization of tests that failed at a glance,
>>> but it hasn't been updated in a while.  Steve Rowe is working on the
>>> problem too. Hoss and Cassandra have both added to the efforts as
>>> well. And I'm sure I'm leaving out others.
>>>
>>> Then there's the @Ignore and @BadApple annotations....
>>>
>>> So, as a community, are we going to devote some energy to this? Or
>>> shall we just start ignoring all of the frequently failing tests?
>>> Frankly we'd be farther ahead at this point marking failing tests that
>>> aren't getting any work with @Ignore or @BadApple and getting
>>> compulsive about not letting any _new_ tests fail than continuing our
>>> current path. I don't _like_ this option mind you, but it's better
>>> than letting these accumulate forever and tests become more and more
>>> difficult to use. As tests become more difficult to use, they're used
>>> less and the problem gets worse.
>>>
>>> Note, I made no effort to separate suite .vs. individual reports
>>> here.....
>>>
>>> Erick
>>>
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.index.TestBagOfPositions
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.index.TestIndexWriterDeleteByQuery
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.lucene.store.TestSleepingLockWrapper
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyFieldFacetExtrasCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.analytics.legacy.facet.LegacyQueryFacetCloudTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.client.solrj.TestLBHttpSolrClient
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.cloud.TestSolrCloudWithSecureImpersonation
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.cloud.autoscaling.AutoAddReplicasPlanActionTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.core.AlternateDirectoryTest
>>> FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.handler.component.DistributedFacetPivotSmallAdvancedTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.ltr.TestSelectiveWeightCreation
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.ltr.store.rest.TestModelManager
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.search.join.BlockJoinFacetDistribTest
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.security.TestAuthorizationFramework
>>> FAILED:
>>> junit.framework.TestSuite.org.apache.solr.update.processor.TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory
>>> FAILED:  org.apache.lucene.index.TestStressNRT.test
>>> FAILED:  org.apache.solr.cloud.AddReplicaTest.test
>>> FAILED:  org.apache.solr.cloud.DeleteShardTest.test
>>> FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test
>>> FAILED:  org.apache.solr.cloud.ReplaceNodeNoTargetTest.test
>>> FAILED:  org.apache.solr.cloud.TestUtilizeNode.test
>>> FAILED:
>>> org.apache.solr.cloud.api.collections.CollectionsAPIDistributedZkTest.testCollectionsAPI
>>> FAILED:
>>> org.apache.solr.cloud.api.collections.ShardSplitTest.testSplitAfterFailedSplit
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest.testSimple
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest.testSimple
>>> FAILED:  org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testSearchRate
>>> FAILED:
>>> org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate
>>> FAILED:
>>> org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication
>>> FAILED:
>>> org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest.testHistory
>>> FAILED:
>>> org.apache.solr.rest.schema.analysis.TestManagedSynonymFilterFactory.testCanHandleDecodingAndEncodingForSynonyms
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Yonik Seeley
We should be careful not to conflate running of unit tests with
automated reporting, and the differing roles that flakey tests play in
different scenarios.
For example, I no longer pay attention to automated failure reports,
esp if I haven't committed anything recently.
However, when I'm making code changes and do "ant test", I certainly
pay attention to failures and re-run any failing tests.  It sucks to
have to re-run a test just because it's flakey, but it's better than
accidentally committing a bug because test coverage was reduced.

I'd suggest:
1) fix/tweak automated test reporting to increase relevancy for developers
2) open a JIRA for each flakey test and evaluate impact of removal on
test coverage
3) If a new feature is added, and the test turns out to be flakey,
then the feature itself should be disabled before release.  This
prevents both new flakey tests without resulting in loss of test
coverage, as well as motivates those who care about the feature to fix
the tests.
4) fix flakey tests ;-)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Erick Erickson
Yonik:

What I'm frustrated by now is that variations on these themes haven't
cured the problem, and it's spun out of control and is getting worse.
It's the "getting worse" part that is most disturbing. Continuing as
we have in the past isn't working, it's time to try something else.

There are 17 open JIRAs for tests right now. Some as far back as 2010,
listed below. Since last night I've collected 49 distinct failures
from the dev e-mails (haven't triaged them completely to see if some
are contained in others, but "sort < all_of_them | uniq > my_file"
results in a file 49 lines long).

What I'm after here is a way to keep from backsliding further, and a
path to getting better. That's what's behind the straw-man proposal
that we get the tests to be clean, even if that means disabling the
ones that are flakey. I should have emphasized more strongly that the
corollary to disabling flakey tests is that we need to get aggressive
about not tolerating _new_ flaky tests. I'm volunteering (I guess) to
be the enforcer here, as much as public comments can be construed to
be "enforcing" ;)

If someone has a favorite test that they think adds value even if it
fails fairly frequently, we can un-BadApple it provided someone is
actively working on it. Or it can be un-BadAppled locally and/or
temporarily. I'm perfectly fine with flakey tests being run and
reported _if_ that helps resolve it.

Also I'm volunteering to produce a "weekly BadApple" list so people
can work on them as they see fit expressly to keep them from getting
lost.

(1) I have  no idea how to do this, or even if it's possible. What do
you have in mind?

(2) doesn't seem to be working based on the open JIRAs below and the
number of failing tests that are accumulating.

(3a) Well, the first part is what my "enforcing" comment is about
above I think ;)

(3b) I'd argue that the part about "without losing test coverage" is
partly addressed by the notion of running periodically with BadApple
disabled. Which each individual can also do at their discretion if
they care to. More importantly though, the test coverage isn't very
useful when failures are ignored anyway.

(4) I thoroughly applaud that one long term, but I'll settle in the
short term for doing something to keep even _more_ from accumulating.

I should also emphasize that disabling tests is certainly _NOT_ my
preference, fixing them all is. I doubt that declaring a moratorium on
all commits until all the tests were fixed is practical though ;) And
without something changing in our approach, I don't see much progress
being made.

SOLR-10053
SOLR-10070
SOLR-10071
SOLR-10139
SOLR-10287
SOLR-10815
SOLR-11911
SOLR-2175
SOLR-4147
SOLR-5880
SOLR-6423
SOLR-6944
SOLR-6961
SOLR-6974
SOLR-8122
SOLR-8182
SOLR-9869


On Wed, Feb 21, 2018 at 11:12 AM, Yonik Seeley <[hidden email]> wrote:

> We should be careful not to conflate running of unit tests with
> automated reporting, and the differing roles that flakey tests play in
> different scenarios.
> For example, I no longer pay attention to automated failure reports,
> esp if I haven't committed anything recently.
> However, when I'm making code changes and do "ant test", I certainly
> pay attention to failures and re-run any failing tests.  It sucks to
> have to re-run a test just because it's flakey, but it's better than
> accidentally committing a bug because test coverage was reduced.
>
> I'd suggest:
> 1) fix/tweak automated test reporting to increase relevancy for developers
> 2) open a JIRA for each flakey test and evaluate impact of removal on
> test coverage
> 3) If a new feature is added, and the test turns out to be flakey,
> then the feature itself should be disabled before release.  This
> prevents both new flakey tests without resulting in loss of test
> coverage, as well as motivates those who care about the feature to fix
> the tests.
> 4) fix flakey tests ;-)
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Yonik Seeley
On Wed, Feb 21, 2018 at 3:26 PM, Erick Erickson <[hidden email]> wrote:
> Yonik:
>
> What I'm frustrated by now is that variations on these themes haven't
> cured the problem, and it's spun out of control and is getting worse.

I understand, but what problem(s) are you trying to solve?  Just
because we are frustrated doesn't mean that *any* change is positive.
Some changes can have a definite negative affect on software quality.

You didn't respond to the main thrust of my message, so let me try to
explain it again more succinctly:

Flakey Test Problems:
a) Flakey tests create so much noise that people no longer pay
attention to the automated reporting via email.
b) When running unit tests manually before a commit (i.e. "ant test")
a flakey test can fail.

Solutions:
We cloud fix (a) by marking as flakey and having a new target
"non-flakey" that is run by the jenkins jobs that are currently run
continuously.
For (b) "ant test" should still include the flakey tests since it's
better to have to re-run a seemingly unrelated test to determine if
one broke something rather than increase committed bugs due to loss of
test coverage.  It's a pain, but perhaps it should be.  It's a real
problem that needs fixing and @Ignoring it won't work as a better
mechanism to get it fixed.  Sweeping it under the rug would seem to
ensure that it gets less attention.

And we can *always* decide to prevent new flakey tests, regardless of
what we do about the existing flakey tests.  Mark's tool is a good way
to see what the current list of flakey tests is.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Erick Erickson
Yonik:

Good discussion. I'm not wedded to a particular solution, it's just
the current direction is not sustainable.

I'll back up a bit and see if I can state my goals more clearly, it
looks like we're arguing for much the same thing.

> I want e-mail messages with test failures to be worth looking at. When I see that a test fail, I don't want to waste a time trying to figure out whether it's something newly introduced or not. I also want some less painful way to say "this change broke tests" rather than "this change may or may not have broken tests. Could somebody beast the old and new versions 100 times and hope that's enough to make a determination?". This looks like your (a).

> When I make a change, I want to be able to quickly determine whether my changes likely the cause of test failures or not. This looks like your (b). If we annotate all flakey tests that would be a significant help since it would be easy to glance at the test to see if it's a known flakey test or not. Armed with that knowledge I can be more comfortable with having it succeed a few times and chalking it up to flakey tests.

> I want to stop the downward trend we've been experiencing lately with more and more tests failing.

An annotation makes that possible I think, although I'm not clear on
why @Flakey this is superior to @BadApple. There are exactly three
BadApple annotations in the entire code base at present, is there
enough value in introducing another annotation to make it worthwhile?
Or could we just figure out whether any of those three tests that use
@BadApple should be changed to, say, @Ignore and then use @BadApple
for the rest? Perhaps we change the build system to enable BadApple by
default when running locally (or, conversely, enabling BadApple on
Jenkins).

Alternatively would it be possible to turn off e-mail notifications of
failures for @Flakey (or @BadApple, whatever) tests? That would do
too. That probably has the added advantage of allowing some reporting
tools to continue to function.

bq: And we can *always* decide to prevent new flakey tests, regardless
of what we do about the existing flakey tests...

We haven't been doing this though, flakey tests have been
proliferating. Mark's tool hasn't been run since last August unless
there's a newer URL than I'm looking at:
http://solr-tests.bitballoon.com/. I'm not so interested in what we
_could_ do as what we _are_ doing. And even tools such as this require
someone to monitor/complain/whimper. And I don't see volunteers
stepping forward. It's much easier to have a system where any failure
is unusual than count on people to wade through voluminous output.

bq: Just because we are frustrated doesn't mean that *any* change is positive.

Of course not. But nobody else seems to be bringing the topic up so I
thought I would.

On Wed, Feb 21, 2018 at 1:49 PM, Yonik Seeley <[hidden email]> wrote:

> On Wed, Feb 21, 2018 at 3:26 PM, Erick Erickson <[hidden email]> wrote:
>> Yonik:
>>
>> What I'm frustrated by now is that variations on these themes haven't
>> cured the problem, and it's spun out of control and is getting worse.
>
> I understand, but what problem(s) are you trying to solve?  Just
> because we are frustrated doesn't mean that *any* change is positive.
> Some changes can have a definite negative affect on software quality.
>
> You didn't respond to the main thrust of my message, so let me try to
> explain it again more succinctly:
>
> Flakey Test Problems:
> a) Flakey tests create so much noise that people no longer pay
> attention to the automated reporting via email.
> b) When running unit tests manually before a commit (i.e. "ant test")
> a flakey test can fail.
>
> Solutions:
> We cloud fix (a) by marking as flakey and having a new target
> "non-flakey" that is run by the jenkins jobs that are currently run
> continuously.
> For (b) "ant test" should still include the flakey tests since it's
> better to have to re-run a seemingly unrelated test to determine if
> one broke something rather than increase committed bugs due to loss of
> test coverage.  It's a pain, but perhaps it should be.  It's a real
> problem that needs fixing and @Ignoring it won't work as a better
> mechanism to get it fixed.  Sweeping it under the rug would seem to
> ensure that it gets less attention.
>
> And we can *always* decide to prevent new flakey tests, regardless of
> what we do about the existing flakey tests.  Mark's tool is a good way
> to see what the current list of flakey tests is.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Test failures are out of control......

Uwe Schindler
In reply to this post by Yonik Seeley
Hi,

> Flakey Test Problems:
> a) Flakey tests create so much noise that people no longer pay
> attention to the automated reporting via email.
> b) When running unit tests manually before a commit (i.e. "ant test")
> a flakey test can fail.
>
> Solutions:
> We cloud fix (a) by marking as flakey and having a new target
> "non-flakey" that is run by the jenkins jobs that are currently run
> continuously.

We have a solution for this already: Mark all those tests with @AwaitsFix or @BadApple
By default those aren't executed in Jenkins runs and also not for developers, but devs can enable/disable them using -Dtests.awaitsfix=true and -Dtests.badapples=true:

     [help] # Test groups. ----------------------------------------------------
     [help] #
     [help] # test groups can be enabled or disabled (true/false). Default
     [help] # value provided below in [brackets].
     [help]
     [help] ant -Dtests.nightly=[false]   - nightly test group (@Nightly)
     [help] ant -Dtests.weekly=[false]    - weekly tests (@Weekly)
     [help] ant -Dtests.awaitsfix=[false] - known issue (@AwaitsFix)
     [help] ant -Dtests.slow=[true]       - slow tests (@Slow)

We can of course also make a weekly jenkins jobs that enables those tests on Jenkins only weekly (like nightly stuff). We have "tests.badapples" and "tests.awaitsfix" - I don't know what's the difference between both.

So we have 2 options to classify tests, let's choose one and apply it to all Flakey tests!

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Jason Gerlowski
I don't have strong opinions about what we do with our existing flaky
tests.  I think re-running failures before commit might theoretically
catch more bugs than ignoring the test outright, but with all the
noise and how standard it is to need to rerun tests I'd be surprised
if the numbers are all that different.

Where I see some potential common ground though is in preventing new
flaky tests.  And in the long run, I think what we do to prevent new
flakes is going to be much more important than how we handle the
BadApples we have at this particular instant.  If we can put our
finger in the dam, the existing flakiness becomes much easier to put a
dent in.

I'm curious what you guys had in mind when you mentioned preventing
new flaky tests from popping up.  What are our options for "enforcing"
that?  Were you imagining reopening JIRAs and asking the original
committer to investigate?  Or outright reverting commits that
introduce flaky tests?  Or something in between (like disabling
features with flaky tests prior to releases)?

Best,

Jason

On Wed, Feb 21, 2018 at 6:03 PM, Uwe Schindler <[hidden email]> wrote:

> Hi,
>
>> Flakey Test Problems:
>> a) Flakey tests create so much noise that people no longer pay
>> attention to the automated reporting via email.
>> b) When running unit tests manually before a commit (i.e. "ant test")
>> a flakey test can fail.
>>
>> Solutions:
>> We cloud fix (a) by marking as flakey and having a new target
>> "non-flakey" that is run by the jenkins jobs that are currently run
>> continuously.
>
> We have a solution for this already: Mark all those tests with @AwaitsFix or @BadApple
> By default those aren't executed in Jenkins runs and also not for developers, but devs can enable/disable them using -Dtests.awaitsfix=true and -Dtests.badapples=true:
>
>      [help] # Test groups. ----------------------------------------------------
>      [help] #
>      [help] # test groups can be enabled or disabled (true/false). Default
>      [help] # value provided below in [brackets].
>      [help]
>      [help] ant -Dtests.nightly=[false]   - nightly test group (@Nightly)
>      [help] ant -Dtests.weekly=[false]    - weekly tests (@Weekly)
>      [help] ant -Dtests.awaitsfix=[false] - known issue (@AwaitsFix)
>      [help] ant -Dtests.slow=[true]       - slow tests (@Slow)
>
> We can of course also make a weekly jenkins jobs that enables those tests on Jenkins only weekly (like nightly stuff). We have "tests.badapples" and "tests.awaitsfix" - I don't know what's the difference between both.
>
> So we have 2 options to classify tests, let's choose one and apply it to all Flakey tests!
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Yonik Seeley
In reply to this post by Erick Erickson
On Wed, Feb 21, 2018 at 5:52 PM, Erick Erickson <[hidden email]> wrote:

> There are exactly three
> BadApple annotations in the entire code base at present, is there
> enough value in introducing another annotation to make it worthwhile?

If we change BadApple tests to be executed by default for "ant test"
(but not for the most frequent jenkins jobs), then that would be fine.
Basically, add a -Dtests.disable-badapples and use that for the
jenkins jobs that email the list all the time.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Cassandra Targett
In reply to this post by Uwe Schindler
This issue is hugely important.

At Lucidworks we have implemented a "Test Confidence" role that focuses on improving the ability of all members of the community to trust that reported failures from any of the Jenkins systems are actual failures and not flakey tests. This role rotates among the committers on our Solr Team, and a committer is assigned to the role for 2-week periods of time. Our goal is to have at least one committer on our team focused full-time on improving test confidence at all times. (Just a note on timing, we started this last summer, but we only recently reconfirmed our commitment to having someone assigned to it at all times.)

One of the guidelines we've agreed to is that the person in the role should not look (only) at tests he has worked on. Instead, he should focus on tests that fail less than 100% of the time and/or are hard to reproduce *even if he didn't write the test or the code*.

Another aspect of the Test Confidence role is to try to develop tools that can help the community overall in improving this situation. Two things have grown out of this effort so far:

* Steve Rowe's work on a Jenkins job to reproduce test failures (LUCENE-8106) 
* Hoss has worked on aggregating all test failures from the 3 Jenkins systems (ASF, Policeman, and Steve's), downloading the test results & logs, and running some reports/stats on failures. He should be ready to share this more publicly soon.

I think it's important to understand that flakey tests will *never* go away. There will always be a new flakey test to review/fix. Our goal should be to make it so most of the time, you can assume the test is broken and only discover it's flakey as part of digging.

The idea of @BadApple marking (or some other notation) is an OK idea, but the problem is so bad today I worry it does nothing to find a way to ensure they get fixed. Lots of JIRAs get filed for problems with tests - I count about 180 open issues today - and many just sit there forever. 

The biggest thing I want to to avoid is making it even easier to avoid/ignore them. We should try to make it easier to highlight them, and we need a concerted effort to fix the tests once they've been identified as flakey.

On Wed, Feb 21, 2018 at 5:03 PM, Uwe Schindler <[hidden email]> wrote:
Hi,

> Flakey Test Problems:
> a) Flakey tests create so much noise that people no longer pay
> attention to the automated reporting via email.
> b) When running unit tests manually before a commit (i.e. "ant test")
> a flakey test can fail.
>
> Solutions:
> We cloud fix (a) by marking as flakey and having a new target
> "non-flakey" that is run by the jenkins jobs that are currently run
> continuously.

We have a solution for this already: Mark all those tests with @AwaitsFix or @BadApple
By default those aren't executed in Jenkins runs and also not for developers, but devs can enable/disable them using -Dtests.awaitsfix=true and -Dtests.badapples=true:

     [help] # Test groups. ----------------------------------------------------
     [help] #
     [help] # test groups can be enabled or disabled (true/false). Default
     [help] # value provided below in [brackets].
     [help]
     [help] ant -Dtests.nightly=[false]   - nightly test group (@Nightly)
     [help] ant -Dtests.weekly=[false]    - weekly tests (@Weekly)
     [help] ant -Dtests.awaitsfix=[false] - known issue (@AwaitsFix)
     [help] ant -Dtests.slow=[true]       - slow tests (@Slow)

We can of course also make a weekly jenkins jobs that enables those tests on Jenkins only weekly (like nightly stuff). We have "tests.badapples" and "tests.awaitsfix" - I don't know what's the difference between both.

So we have 2 options to classify tests, let's choose one and apply it to all Flakey tests!

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

RE: Test failures are out of control......

Uwe Schindler
In reply to this post by Yonik Seeley
> > There are exactly three
> > BadApple annotations in the entire code base at present, is there
> > enough value in introducing another annotation to make it worthwhile?
>
> If we change BadApple tests to be executed by default for "ant test"
> (but not for the most frequent jenkins jobs), then that would be fine.
> Basically, add a -Dtests.disable-badapples and use that for the
> jenkins jobs that email the list all the time.

No need for a new sysprop. It's already there, just inverted! Configuring Jenkins to enable disable them is trivial.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Yonik Seeley
On Wed, Feb 21, 2018 at 6:13 PM, Uwe Schindler <[hidden email]> wrote:

>> > There are exactly three
>> > BadApple annotations in the entire code base at present, is there
>> > enough value in introducing another annotation to make it worthwhile?
>>
>> If we change BadApple tests to be executed by default for "ant test"
>> (but not for the most frequent jenkins jobs), then that would be fine.
>> Basically, add a -Dtests.disable-badapples and use that for the
>> jenkins jobs that email the list all the time.
>
> No need for a new sysprop. It's already there, just inverted! Configuring Jenkins to enable disable them is trivial.

The issue is that flakey tests should not be ignored by developers
running unit tests before committing new changes.  That's the most
important point in time for test coverage.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Test failures are out of control......

Uwe Schindler
Hey Yonik,

Have you read my e-mail? I just said that there is no need to add another sysprop as its already there! The default value for the sysprop is just a common-build.xml one-line change.

BTW, as I don't care about Solr tests most of the time, I disabled them completely on my local machine using lucene.build.properties in my user's home directory. Every developer can do the same on his own lucene.build.properties file (e.g. enable/disable bad apples). Just the default should be decided here.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Yonik Seeley [mailto:[hidden email]]
> Sent: Thursday, February 22, 2018 12:17 AM
> To: Solr/Lucene Dev <[hidden email]>
> Subject: Re: Test failures are out of control......
>
> On Wed, Feb 21, 2018 at 6:13 PM, Uwe Schindler <[hidden email]> wrote:
> >> > There are exactly three
> >> > BadApple annotations in the entire code base at present, is there
> >> > enough value in introducing another annotation to make it worthwhile?
> >>
> >> If we change BadApple tests to be executed by default for "ant test"
> >> (but not for the most frequent jenkins jobs), then that would be fine.
> >> Basically, add a -Dtests.disable-badapples and use that for the
> >> jenkins jobs that email the list all the time.
> >
> > No need for a new sysprop. It's already there, just inverted! Configuring
> Jenkins to enable disable them is trivial.
>
> The issue is that flakey tests should not be ignored by developers
> running unit tests before committing new changes.  That's the most
> important point in time for test coverage.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Chris Hostetter-3
In reply to this post by Cassandra Targett

: * Hoss has worked on aggregating all test failures from the 3 Jenkins
: systems (ASF, Policeman, and Steve's), downloading the test results & logs,
: and running some reports/stats on failures. He should be ready to share
: this more publicly soon.

I think Steve's linked to some of this before from jira comments, but it
was only recently I realized i've never explicitly said to the list "Hey
folks, here's a thing i've been working on" ...

  http://fucit.org/solr-jenkins-reports/
  https://github.com/hossman/jenkins-reports/

The most interesting bit is probably here...

  http://fucit.org/solr-jenkins-reports/failure-report.html

...but there are currently a few caveats:

1) there's some noise inthe '7days' data because I wasn't accounting for
the way jenkins reports some types of failure -- that will gradually clean
itself up

2) I think i've been been blocked by builds.apache.org, so at the moment
the data seems to just be from the sarowe & policeman jenkins failures.

3) allthough the system is archiving the past 7 days worth of jenkins logs
for any jobs with failures, there is currently no easy way to download
the relevant log(s) from that failure report -- you currently have to
download a CSV file like this one to corrolate the test failures to the
jenkins job, and then go look for that job in the job-data dirs...

  http://fucit.org/solr-jenkins-reports/reports/7days-method-failures.csv
  http://fucit.org/solr-jenkins-reports/job-data/

(My hope is to make #3 trivial from failure-report.html -- so you can say
"hey weird, this test has failed X times, let's go download those logs."
right from a single screen in your browser)




-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Dawid Weiss-2

Don't know, Yonik... If I make a change I am interested in regressions from the start state; running flaky tests makes it impossible and frustrating (and pointless in my opinion). I don't think my expectations are that much off from the average - you may wake up being the only person who has the defaults enabled, which seems wrong.

Dawid



On Feb 22, 2018 01:55, "Chris Hostetter" <[hidden email]> wrote:

: * Hoss has worked on aggregating all test failures from the 3 Jenkins
: systems (ASF, Policeman, and Steve's), downloading the test results & logs,
: and running some reports/stats on failures. He should be ready to share
: this more publicly soon.

I think Steve's linked to some of this before from jira comments, but it
was only recently I realized i've never explicitly said to the list "Hey
folks, here's a thing i've been working on" ...

  http://fucit.org/solr-jenkins-reports/
  https://github.com/hossman/jenkins-reports/

The most interesting bit is probably here...

  http://fucit.org/solr-jenkins-reports/failure-report.html

...but there are currently a few caveats:

1) there's some noise inthe '7days' data because I wasn't accounting for
the way jenkins reports some types of failure -- that will gradually clean
itself up

2) I think i've been been blocked by builds.apache.org, so at the moment
the data seems to just be from the sarowe & policeman jenkins failures.

3) allthough the system is archiving the past 7 days worth of jenkins logs
for any jobs with failures, there is currently no easy way to download
the relevant log(s) from that failure report -- you currently have to
download a CSV file like this one to corrolate the test failures to the
jenkins job, and then go look for that job in the job-data dirs...

  http://fucit.org/solr-jenkins-reports/reports/7days-method-failures.csv
  http://fucit.org/solr-jenkins-reports/job-data/

(My hope is to make #3 trivial from failure-report.html -- so you can say
"hey weird, this test has failed X times, let's go download those logs."
right from a single screen in your browser)




-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Test failures are out of control......

Uwe Schindler
In reply to this post by Chris Hostetter-3
Hi Hoss,

great and very helpful! Does it only contain Solr or are there also Lucene tests reported?

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Chris Hostetter [mailto:[hidden email]]
> Sent: Thursday, February 22, 2018 1:56 AM
> To: [hidden email]
> Subject: Re: Test failures are out of control......
>
>
> : * Hoss has worked on aggregating all test failures from the 3 Jenkins
> : systems (ASF, Policeman, and Steve's), downloading the test results & logs,
> : and running some reports/stats on failures. He should be ready to share
> : this more publicly soon.
>
> I think Steve's linked to some of this before from jira comments, but it
> was only recently I realized i've never explicitly said to the list "Hey
> folks, here's a thing i've been working on" ...
>
>   http://fucit.org/solr-jenkins-reports/
>   https://github.com/hossman/jenkins-reports/
>
> The most interesting bit is probably here...
>
>   http://fucit.org/solr-jenkins-reports/failure-report.html
>
> ...but there are currently a few caveats:
>
> 1) there's some noise inthe '7days' data because I wasn't accounting for
> the way jenkins reports some types of failure -- that will gradually clean
> itself up
>
> 2) I think i've been been blocked by builds.apache.org, so at the moment
> the data seems to just be from the sarowe & policeman jenkins failures.
>
> 3) allthough the system is archiving the past 7 days worth of jenkins logs
> for any jobs with failures, there is currently no easy way to download
> the relevant log(s) from that failure report -- you currently have to
> download a CSV file like this one to corrolate the test failures to the
> jenkins job, and then go look for that job in the job-data dirs...
>
>   http://fucit.org/solr-jenkins-reports/reports/7days-method-failures.csv
>   http://fucit.org/solr-jenkins-reports/job-data/
>
> (My hope is to make #3 trivial from failure-report.html -- so you can say
> "hey weird, this test has failed X times, let's go download those logs."
> right from a single screen in your browser)
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Test failures are out of control......

Adrien Grand
In reply to this post by Dawid Weiss-2
+1 Dawid

I understand your point Yonik, but the practical consequences are worse than disabling these tests like Erick pointed out in his initial emails.

If we are concerned about forgetting these disabled tests, which is a concern I agree, I think Uwe's idea to add a weekly job that runs with -Dtests.awaitsfix=true is a good compromise.

Le jeu. 22 févr. 2018 à 07:55, Dawid Weiss <[hidden email]> a écrit :

Don't know, Yonik... If I make a change I am interested in regressions from the start state; running flaky tests makes it impossible and frustrating (and pointless in my opinion). I don't think my expectations are that much off from the average - you may wake up being the only person who has the defaults enabled, which seems wrong.

Dawid



On Feb 22, 2018 01:55, "Chris Hostetter" <[hidden email]> wrote:

: * Hoss has worked on aggregating all test failures from the 3 Jenkins
: systems (ASF, Policeman, and Steve's), downloading the test results & logs,
: and running some reports/stats on failures. He should be ready to share
: this more publicly soon.

I think Steve's linked to some of this before from jira comments, but it
was only recently I realized i've never explicitly said to the list "Hey
folks, here's a thing i've been working on" ...

  http://fucit.org/solr-jenkins-reports/
  https://github.com/hossman/jenkins-reports/

The most interesting bit is probably here...

  http://fucit.org/solr-jenkins-reports/failure-report.html

...but there are currently a few caveats:

1) there's some noise inthe '7days' data because I wasn't accounting for
the way jenkins reports some types of failure -- that will gradually clean
itself up

2) I think i've been been blocked by builds.apache.org, so at the moment
the data seems to just be from the sarowe & policeman jenkins failures.

3) allthough the system is archiving the past 7 days worth of jenkins logs
for any jobs with failures, there is currently no easy way to download
the relevant log(s) from that failure report -- you currently have to
download a CSV file like this one to corrolate the test failures to the
jenkins job, and then go look for that job in the job-data dirs...

  http://fucit.org/solr-jenkins-reports/reports/7days-method-failures.csv
  http://fucit.org/solr-jenkins-reports/job-data/

(My hope is to make #3 trivial from failure-report.html -- so you can say
"hey weird, this test has failed X times, let's go download those logs."
right from a single screen in your browser)




-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12