We are using Solr 7.2.1 with 2 nodes (245GB RAM each) and 3 node ZK cluster
in production. We are using Java 8 with default GC settings (with
NewRatio=3) with 15GB heap, changed to 16 GB after the performance issue
We have about 90 collections in this (~8 shards each), about 50 of them
are actively being used. About 3 collections are being actively updated
using SolrJ update query with soft commit of 30 secs. Other collection go
through update handler batch CSV update.
We had read timeout/slowness issue when Young Collection size usage peaked.
As you can see in the GC Graph below during the problem time. After that we
increased the overall heap size to 16GB (from 15 GB) and as you can see
that we did not see any read issue.
1. I see our Heap is very large, we are seeing higher usage of young
collection, is this due to solrj updates (concurrent one record update)?
2. Should we change the NewRatio to 2 (so that young size increases
more)? as we are seeing only 58% usage of old gen
3. We are also seeing a behavior that if we restart the Solr in
production, when updates are happening, one server starts up, but does not
have all collections and shards up, and when we restart both the server up,
it comes up fine, is this behavior also related to the Solrj updates?