EC2 instance type recommended for SOLR?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

EC2 instance type recommended for SOLR?

Costi Muraru
Hi folks,

I'm trying to decide on the EC2 instance type
<https://aws.amazon.com/ec2/pricing/> to use for a Solr cluster. Some
details about the cluster:
1) The total index size is 89.9GB (somewhere around 20 mil records).
2) The number of requests that reach Solr is pretty low (thousands per
day), but they are heavy (long queries with frange and stuff like that).
3) Running Solr 4.10
4) The focus is on quick response time

What I'm thinking is that:
- The entire index should fit into memory
- Limit the number of nodes to reduce inter-node network communication in
order to have a faster response time
- Have a replication factor of at least 2

So far, I'm leaning towards using:
- 6 x c3.4xlarge (each with 16 CPU and 30GB RAM)
or
- 3 x c3.8xlarge (each with 32 CPU and 60GB RAM)

Which one do you think that it would yield better results (faster response
time)?
Feedback is gladly appreciated.

Thanks,
Costi
Reply | Threaded
Open this post in threaded view
|

Re: EC2 instance type recommended for SOLR?

Toke Eskildsen
Costi Muraru <[hidden email]> wrote:
> 1) The total index size is 89.9GB (somewhere around 20 mil records).
> 2) The number of requests that reach Solr is pretty low (thousands per
> day), but they are heavy (long queries with frange and stuff like that).
> 3) Running Solr 4.10
> 4) The focus is on quick response time

> What I'm thinking is that:
> - The entire index should fit into memory

Doable without breaking the bank with that index size.

> - Limit the number of nodes to reduce inter-node network communication in
> order to have a faster response time

Unless you have large result sets (thousands of rows or facet entries), the network impact is unlikely to differ much for 3 vs. 6 machines. Normally Solr does not send that much over the network and as you have heavy queries (presumably calculation heavy), the raw query time will dwarf network traffic even more.

> So far, I'm leaning towards using:
> - 6 x c3.4xlarge (each with 16 CPU and 30GB RAM)
> or
> - 3 x c3.8xlarge (each with 32 CPU and 60GB RAM)

Those two setups are practically identical. I doubt there will be any real difference. If you have the money then it looks fine from a no-kill-like-overkill viewpoint. Lots of horse power.

Are you planning to have about 2*50 shards to take advantage of the many CPU cores? If you only have a few shards (let's say 2*9) and your requests are typically one at a time, most of your CPU cores will be idle most of the time.

- Toke Eskildsen