Seeking advice on SolrCloud production architecture with CDCR

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Seeking advice on SolrCloud production architecture with CDCR

Cody Burleson
Hi, all. We’re upgrading an old Solr 3.5 setup (master/slave replication) to SolrCloud (v7 or v8) and with the addition of a new data center (for dual data centers). I’ve done a lot of homework, but could still use some advice. While documentation explains Zookeper and SolrCloud pretty well, I don’t get a comfortable sense for how to lay everything out physically in the architecture.

At present, we have planned the same physical hardware as what we had for our master/slave setup (basically, 2 servers). Now, however, we’re going to duplicate that so that we also have the same in another data center: US and Europe. For this, the Cross Data Center Replication (CDCR; bi-directional) seems appropriate, but I’m not confident. Also, for the best fault tolerance and high-availability, I’m not real sure how to layout my Zookeper nodes and my Solr instances/shards/replicas physically across the servers. I’d like to start with the simplest possible setup and scale up only if necessary. Our index size is relatively small, I guess: ~150,000 documents.

I’m worried, for example, about spreading the Zookeper cluster between the two data centers because of potential latency across the pond. Maybe we keep the ZK ensemble on one side of the pond only? I imagined, for instance,  2 ZK nodes on one server, and one on the other (in at least one data center). But maybe we need 5 ZKs, with 1 on each server in the other data center? Then how about the Solr nodes, shards, and replicas? If anybody has done some remotely similar setup for production purposes, I would be grateful for any tips (and down-right giddy for a diagram).

I know I’m probably not even providing enough information to begin with, but perhaps someone will entertain a conversation?

Thanks, in advance, for sharing some of your valuable time and experience.

Cody
Reply | Threaded
Open this post in threaded view
|

Re: Seeking advice on SolrCloud production architecture with CDCR

Shawn Heisey-2
On 5/14/2019 4:55 PM, Cody Burleson wrote:
> I’m worried, for example, about spreading the Zookeper cluster between the two data centers because of potential latency across the pond. Maybe we keep the ZK ensemble on one side of the pond only? I imagined, for instance,  2 ZK nodes on one server, and one on the other (in at least one data center). But maybe we need 5 ZKs, with 1 on each server in the other data center? Then how about the Solr nodes, shards, and replicas? If anybody has done some remotely similar setup for production purposes, I would be grateful for any tips (and down-right giddy for a diagram).

If you're planning a geographically diverse ZooKeeper setup, you cannot
do it with only two datacenters.  You need at least three.  This is
inherent to the design of ZK and cannot be changed.  With two data
centers, you will always have one DC that if it goes down, ZK loses
quorum.  When ZK loses quorum, SolrCloud loses the ability to react to
failures and goes read-only.

You mentioned CDCR.  This involves two completely separate SolrCloud
clusters -- a full ZK ensemble in each location.  So you would have 3 ZK
servers and at least two Solr servers in one data center, and 3 ZK
servers plus at least two Solr servers in the other data center.

Thanks,
Shawn