Relationship Between Number of Solr Replicas and Number of Zookeeper Nodes (if any)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Relationship Between Number of Solr Replicas and Number of Zookeeper Nodes (if any)

THADC
Hello,

I am having trouble getting a clear understanding of the relationship
between my 3-node zookeeper cluster and how those 3 nodes relate to solr
replicas (if at all). Since the replicas exist for failover purposes
(correct?) as opposed to for load balancing (which is what the sharding
strategy addresses), I was assuming that  there should be as many replicas
per shard as there are zookeeper nodes. So in my case, one zookeeper node is
the leader for a given shard, while the other two are followers. Is this
correct?

Any insights are appreciated. thanks!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Relationship Between Number of Solr Replicas and Number of Zookeeper Nodes (if any)

Erick Erickson
Not at all. ZooKeeper is just the record-keeper for the _states_ of
the replicas, i.e. whether they are active, recovering, down and the
like, as well as the config sets (schema, solrconfig.xml etc).

There is no relationship between these two counts. Well, if you have a
zillion collections with a zillion replicas you may want to partition
things, but I've seen 100K replicas hosted on 3 ZK nodes.

And a misconception you have is the shard/replica usage. Replicas
exist for two reasons:
1> HA. If a replica goes down, the other replicas pick up the load
2> increasing QPS. If I have 5 replicas/shard and can serve X QPS,
increasing to 10 replicas/shard should give me close to 2X QPS .

_Shards_ only come in to play when you want to have more documents
than you can comfortably fit in a one-shard (perhaps many replicas)
setup.

Best,
Erick

On Fri, Jun 8, 2018 at 11:13 AM, THADC
<[hidden email]> wrote:

> Hello,
>
> I am having trouble getting a clear understanding of the relationship
> between my 3-node zookeeper cluster and how those 3 nodes relate to solr
> replicas (if at all). Since the replicas exist for failover purposes
> (correct?) as opposed to for load balancing (which is what the sharding
> strategy addresses), I was assuming that  there should be as many replicas
> per shard as there are zookeeper nodes. So in my case, one zookeeper node is
> the leader for a given shard, while the other two are followers. Is this
> correct?
>
> Any insights are appreciated. thanks!
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Relationship Between Number of Solr Replicas and Number of Zookeeper Nodes (if any)

THADC
Thanks Eric, that was helpful. But what if you want to proactively replicate
across multiple servers either at the VM or even physical server level. It
seems that we have control over the zookeeper locations and the solr server
locations since we explicitly define these when we configure the instances
(at least at the port level), but for replicas, in my limited experience, we
only specify the quantity without regard to physical/logical location for
each replica. Obviously at least for HA, location is relevant if you are
trying to address physical server failure or VM failure as opposed to
application-level (higher-level) failures.

Maybe a better question is, if I have multiple VMs that can span multiple
physical servers, how do I come up with appropriate distribution strategies
for both HA and performance?

Thank you again!

Tim Clotworthy



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Relationship Between Number of Solr Replicas and Number of Zookeeper Nodes (if any)

Erick Erickson
There's at least two ways of going about this:

1> when you create your collection, create it with the specil "EMPTY"
node set, then use ADDREPLICA to place each replica where you want it,
applying your knowledge of where the VMs are hosted.

2> use the replica placement rules, see:
https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
(but check your individual version, you can download the PDF for your
version see the "Other formats" drop-down at the top if this link).
What you're looking for I think is "rack awareness". Search for "rack"
at the link.

Best,
Erick

On Sun, Jun 10, 2018 at 5:25 AM, THADC
<[hidden email]> wrote:

> Thanks Eric, that was helpful. But what if you want to proactively replicate
> across multiple servers either at the VM or even physical server level. It
> seems that we have control over the zookeeper locations and the solr server
> locations since we explicitly define these when we configure the instances
> (at least at the port level), but for replicas, in my limited experience, we
> only specify the quantity without regard to physical/logical location for
> each replica. Obviously at least for HA, location is relevant if you are
> trying to address physical server failure or VM failure as opposed to
> application-level (higher-level) failures.
>
> Maybe a better question is, if I have multiple VMs that can span multiple
> physical servers, how do I come up with appropriate distribution strategies
> for both HA and performance?
>
> Thank you again!
>
> Tim Clotworthy
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Relationship Between Number of Solr Replicas and Number of Zookeeper Nodes (if any)

Shawn Heisey-2
In reply to this post by THADC
On 6/8/2018 12:13 PM, THADC wrote:
> I am having trouble getting a clear understanding of the relationship
> between my 3-node zookeeper cluster and how those 3 nodes relate to solr
> replicas (if at all). Since the replicas exist for failover purposes
> (correct?) as opposed to for load balancing (which is what the sharding
> strategy addresses), I was assuming that  there should be as many replicas
> per shard as there are zookeeper nodes. So in my case, one zookeeper node is
> the leader for a given shard, while the other two are followers. Is this
> correct?

There is no relationship at all between the number of zookeeper nodes
and the number of SolrCloud nodes, shards, or replicas.

Within the zookeeper ensemble there is an election to determine the
leader for the entire ensemble.  But that is for zookeeper -- it has has
absolutely no connection to the leader elections that Solr conducts for
its shard replicas.

As Erick already said:

The reason to have multiple shards in a collection is so that Solr can
handle a larger index.  By sharing the index across additional servers,
adding shards CAN increase performance, but if the number of machines
doesn't increase, fewer shards is usually (but not always) better.

At least two replicas per shard are required for high availability.
Some prefer to have three replicas for extra reliability.  Load
balancing is handled by additional replicas, not additional shards.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Relationship Between Number of Solr Replicas and Number of Zookeeper Nodes (if any)

THADC
Shawn, thanks. You say "at least two replicas per shard are required for high
availability". So that would be a total of three nodes for that shard,
correct?

Thanks, Tim Clotworthy



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Relationship Between Number of Solr Replicas and Number of Zookeeper Nodes (if any)

Shawn Heisey-2
On 6/11/2018 5:47 AM, THADC wrote:
> Shawn, thanks. You say "at least two replicas per shard are required for high
> availability". So that would be a total of three nodes for that shard,
> correct?

The smallest possible fault-tolerant Solr install is a total of three
servers.  Two of them will run Solr and ZooKeeper, one of them will run
ZooKeeper only, and might be a server with lower specs than the other
two.  Each collection will be built so that each shard has one replica
on one Solr node, and the other replica on the other Solr node.

So I think the answer to your question is no, there only needs be two
nodes.  If you want to have three nodes and three replicas per shard,
you can certainly set it up that way.

Thanks,
Shawn