I'd like some discussion about the problem outlined in SOLR-14861

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

I'd like some discussion about the problem outlined in SOLR-14861

Erick Erickson

The test framework, and perhaps all of Solr has a disorderly shutdown process. I’ve seen at least one case where this is responsible for “bogus” test failures, bogus in the sense that due to race conditions the test failed with unreleased objects. The short form is that our test harness can call CoreContainer.shutdown() directly, and we got to it while reload() operations were in-flight and had gotten past the test for CoreContainer.isShutdown(). Then the reload() thread is time-sliced out, the shutdown() thread gets partway through and the reload() thread then picks up, but CoreContainer is partly shutdown and things go wonky.

The focus on CoreContainer.isShutdown is just for illustration and is somewhat of a legacy problem since the test harness manipulates at this level.

Then looking through the code, there are a number of places outside CoreContainer that check the isShutdown flag in CoreContainer, so the problem is more widespread than just CoreContainer.

Don’t look at the patch on that JIRA, it’s a totally bad approach the more I think about it.

Generically, we need a mechanism that, when we shut Solr down we

1> stop any new requests from being processed. IMO they should be rejected immediately
2> wait for all in-flight operations to complete. This could get tricky if one of the operations is, say, optimize.
3> then shut down.

Then perhaps rework the test harness to use that mechanism rather than call CoreContainer.shutdown() directly.

That said, I don’t have a clue how to make that happen.

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]