SolrCloud upgrade process

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

SolrCloud upgrade process

David Smiley
I was considering the process of a Solr upgrade to a SolrCloud cluster within a minor version (e.g. 8.3 -> 8.4).  

A concern I have is the implication of new Lucene index formats.  Lucene 8.4 bumped the Codec version because of postings being written differently to be more SIMD friendly -- https://issues.apache.org/jira/browse/LUCENE-9027
Lucene 8.4 will read an index created with Lucene 8.3 -- great; but Lucene 8.3 obviously can't read an index created with Lucene 8.4.  I'm not picking on this specific JIRA/change; it could be many others.  There's another coming in 8.6.

The instructions describe a rolling upgrading of each node one at a time.  Makes sense.  However, it's possible for a shard on an already upgraded node to become leader, have some documents written to it, and then a replica on a non-upgraded node might end up replicating segments from the leader.  This is possible with all replica types, though I think more likely with TLOG & PULL.  I am not sure if there are any protections for this (e.g. in replication handler / index fetcher); there should be.  I think that SolrCloud should prevent a replica from becoming a leader if there exists another replica (for the same shard) that has a lower Solr version.

I can think of two work-arounds:
(A) shut down the whole cluster to do the upgrade (forced down time)
(B) initiate read-only status for all collections https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/collection-management.adoc#L258 and also be careful not to create new collections during this time.  Then do the rolling upgrade as described in the docs above, and then remove the read-only status.  

~ David Smiley
Apache Lucene/Solr Search Developer
Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud upgrade process

Ilan Ginzburg
If there could be a way to force the new version to continue writing in the previous format for a while, that would allow switching to writing the new format once all nodes have been upgraded (or more likely when the cluster admin decides so).

Ilan

Le mar. 30 juin 2020 à 21:34, David Smiley <[hidden email]> a écrit :
I was considering the process of a Solr upgrade to a SolrCloud cluster within a minor version (e.g. 8.3 -> 8.4).  

A concern I have is the implication of new Lucene index formats.  Lucene 8.4 bumped the Codec version because of postings being written differently to be more SIMD friendly -- https://issues.apache.org/jira/browse/LUCENE-9027
Lucene 8.4 will read an index created with Lucene 8.3 -- great; but Lucene 8.3 obviously can't read an index created with Lucene 8.4.  I'm not picking on this specific JIRA/change; it could be many others.  There's another coming in 8.6.

The instructions describe a rolling upgrading of each node one at a time.  Makes sense.  However, it's possible for a shard on an already upgraded node to become leader, have some documents written to it, and then a replica on a non-upgraded node might end up replicating segments from the leader.  This is possible with all replica types, though I think more likely with TLOG & PULL.  I am not sure if there are any protections for this (e.g. in replication handler / index fetcher); there should be.  I think that SolrCloud should prevent a replica from becoming a leader if there exists another replica (for the same shard) that has a lower Solr version.

I can think of two work-arounds:
(A) shut down the whole cluster to do the upgrade (forced down time)
(B) initiate read-only status for all collections https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/collection-management.adoc#L258 and also be careful not to create new collections during this time.  Then do the rolling upgrade as described in the docs above, and then remove the read-only status.  

~ David Smiley
Apache Lucene/Solr Search Developer
Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud upgrade process

david.w.smiley@gmail.com
At the Lucene level, it was deemed too much of a PITA to retain the older writing code in future versions.  I think maybe this was re-discussed in the last year.  I sympathize.

~ David


On Tue, Jun 30, 2020 at 6:30 PM Ilan Ginzburg <[hidden email]> wrote:
If there could be a way to force the new version to continue writing in the previous format for a while, that would allow switching to writing the new format once all nodes have been upgraded (or more likely when the cluster admin decides so).

Ilan

Le mar. 30 juin 2020 à 21:34, David Smiley <[hidden email]> a écrit :
I was considering the process of a Solr upgrade to a SolrCloud cluster within a minor version (e.g. 8.3 -> 8.4).  

A concern I have is the implication of new Lucene index formats.  Lucene 8.4 bumped the Codec version because of postings being written differently to be more SIMD friendly -- https://issues.apache.org/jira/browse/LUCENE-9027
Lucene 8.4 will read an index created with Lucene 8.3 -- great; but Lucene 8.3 obviously can't read an index created with Lucene 8.4.  I'm not picking on this specific JIRA/change; it could be many others.  There's another coming in 8.6.

The instructions describe a rolling upgrading of each node one at a time.  Makes sense.  However, it's possible for a shard on an already upgraded node to become leader, have some documents written to it, and then a replica on a non-upgraded node might end up replicating segments from the leader.  This is possible with all replica types, though I think more likely with TLOG & PULL.  I am not sure if there are any protections for this (e.g. in replication handler / index fetcher); there should be.  I think that SolrCloud should prevent a replica from becoming a leader if there exists another replica (for the same shard) that has a lower Solr version.

I can think of two work-arounds:
(A) shut down the whole cluster to do the upgrade (forced down time)
(B) initiate read-only status for all collections https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/collection-management.adoc#L258 and also be careful not to create new collections during this time.  Then do the rolling upgrade as described in the docs above, and then remove the read-only status.  

~ David Smiley
Apache Lucene/Solr Search Developer