Deciding on the number of Shards and Replica

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Deciding on the number of Shards and Replica

Sourav Moitra
Hello all,

I am Solr newbie. I am trying to setup three servers running both
Zookeeper ensemble and Solr in cloud mode. Each server has 4 core and
16gb of RAM. To start with I have put Xmx value of 6144M to Zookeeper
and Xmx value of 2048 to Solr.We have created 3 shards and 3 replica
each. The size of each replica turned out to be 3GB each and I am
planning to host multiple such collection.

Now my question is what are the problems do you see with this kind of setup ?
How can I improve the setup ?
What is the rule of thumb for number of Shards and replicas ?
Is there any correlation between number of servers vs number of shards
and replicas ?

Thank you for looking into this.

Sourav Moitra
https://souravmoitra.com
Reply | Threaded
Open this post in threaded view
|

Re: Deciding on the number of Shards and Replica

Shawn Heisey-2
On 10/7/2018 7:28 PM, Sourav Moitra wrote:

> I am Solr newbie. I am trying to setup three servers running both
> Zookeeper ensemble and Solr in cloud mode. Each server has 4 core and
> 16gb of RAM. To start with I have put Xmx value of 6144M to Zookeeper
> and Xmx value of 2048 to Solr.We have created 3 shards and 3 replica
> each. The size of each replica turned out to be 3GB each and I am
> planning to host multiple such collection.
>
> Now my question is what are the problems do you see with this kind of setup ?
> How can I improve the setup ?
> What is the rule of thumb for number of Shards and replicas ?
> Is there any correlation between number of servers vs number of shards
> and replicas ?

In a nutshell: There are no generic answers, no rule of thumb.  None at all.

https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

With some more detailed information, we can provide a GUESS about how
you should size things.  But that's all it will be, and it could be a
completely wrong guess.

Why are you giving 6GB of memory to zookeeper?  Unless you're going to
have a LOT of shard replicas and servers in your cloud, I can't imagine
each ZK server needing more than about 512MB, and it might even run with
far less.

Some questions that will be important to answer:

How many documents are in that 3GB shard replica?  How much index data
(both document count and size on disk) do you expect each machine to be
handling?  16GB might be nowhere near enough total memory for the
system, but without more information I can't even guess about that.

Do you know how many queries per second the cloud is likely to receive?

I saw a nearly identical question on the IRC channel a couple of hours
ago.  I had to leave, and when I made it back, the person asking the
question had left.

Thanks,
Shawn