Moving to solrcloud from single instance

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Moving to solrcloud from single instance

Erie Data Systems
I am starting the planning stages of moving from a single instance of solr
8 to a solrcloud implementation.

Currently I have a 148GB index on a single dedicated server w 96gb ram @ 16
cores /2.4ghz ea. + SSD disk. The search is fast but obviously the index
size is greater than the physical memory, which to my understanding is not
a good thing.

I have a lot of experience with single instance but none with solrcloud. I
have 3 machines (other than my main 1) with the exact same hardware 96gb *
3 essentially which should be plenty.

My issue is that im not sure where to go to learn how to set this up, how
many shards, how many replicas, etc and would rather hire somebody or
something (detailed video or document)  to guide me through the process,
and make decisions along the way...For example I think a shard is a piece
of the index... but I dont even know how to decide how many replicas or
what they are .....

Thanks everyone.
-Craig
Reply | Threaded
Open this post in threaded view
|

Re: Moving to solrcloud from single instance

David Hastings
I actually never had a problem with the index being larger than the memory
for a standalone instance, but the entire index is on an SSD at least one
my end

On Mon, Aug 12, 2019 at 3:43 PM Erie Data Systems <[hidden email]>
wrote:

> I am starting the planning stages of moving from a single instance of solr
> 8 to a solrcloud implementation.
>
> Currently I have a 148GB index on a single dedicated server w 96gb ram @ 16
> cores /2.4ghz ea. + SSD disk. The search is fast but obviously the index
> size is greater than the physical memory, which to my understanding is not
> a good thing.
>
> I have a lot of experience with single instance but none with solrcloud. I
> have 3 machines (other than my main 1) with the exact same hardware 96gb *
> 3 essentially which should be plenty.
>
> My issue is that im not sure where to go to learn how to set this up, how
> many shards, how many replicas, etc and would rather hire somebody or
> something (detailed video or document)  to guide me through the process,
> and make decisions along the way...For example I think a shard is a piece
> of the index... but I dont even know how to decide how many replicas or
> what they are .....
>
> Thanks everyone.
> -Craig
>
Reply | Threaded
Open this post in threaded view
|

Re: Moving to solrcloud from single instance

Erick Erickson
Unless you expect your index to grow, as long performance is satisfactory there’s no reason to shard. _Replicate_ perhaps if you need to sustain a higher QPS.

Here’s a sizing blog I wrote a long time ago, but it still pertains. The short form is for you to load test one of your machines and find out how many docs you can put on it before it falls over. _Then_ decide whether you need to shard.

And by “performance is satisfactory”, I mean the time it takes to serve up a query. If you need to serve more queries, simply add more replicas (i.e. have a single-shard collection). Each replica has the entire index in that case, so if 1 machine can serve 30 QPS, replicating twice will let you serve 90 QPS .

If you do decide to shard, two things will happen. First, some operations aren’t well supported when you shard, group.func to name one.

Second, you’ll introduce a certain amount of overhead (balanced against each shard doing less work to be sure).

SolrCloud (in the one-shard, replicated case) will give you some good stuff, HA/DR, failover, expandability, etc. so I’m not discouraging moving to that. Just don’t shard etc. until you know you need to ;)

Best,
Erick

> On Aug 12, 2019, at 3:44 PM, David Hastings <[hidden email]> wrote:
>
> I actually never had a problem with the index being larger than the memory
> for a standalone instance, but the entire index is on an SSD at least one
> my end
>
> On Mon, Aug 12, 2019 at 3:43 PM Erie Data Systems <[hidden email]>
> wrote:
>
>> I am starting the planning stages of moving from a single instance of solr
>> 8 to a solrcloud implementation.
>>
>> Currently I have a 148GB index on a single dedicated server w 96gb ram @ 16
>> cores /2.4ghz ea. + SSD disk. The search is fast but obviously the index
>> size is greater than the physical memory, which to my understanding is not
>> a good thing.
>>
>> I have a lot of experience with single instance but none with solrcloud. I
>> have 3 machines (other than my main 1) with the exact same hardware 96gb *
>> 3 essentially which should be plenty.
>>
>> My issue is that im not sure where to go to learn how to set this up, how
>> many shards, how many replicas, etc and would rather hire somebody or
>> something (detailed video or document)  to guide me through the process,
>> and make decisions along the way...For example I think a shard is a piece
>> of the index... but I dont even know how to decide how many replicas or
>> what they are .....
>>
>> Thanks everyone.
>> -Craig
>>

Reply | Threaded
Open this post in threaded view
|

Re: Moving to solrcloud from single instance

Shawn Heisey-2
In reply to this post by Erie Data Systems
On 8/12/2019 1:42 PM, Erie Data Systems wrote:
> I am starting the planning stages of moving from a single instance of solr
> 8 to a solrcloud implementation.
>
> Currently I have a 148GB index on a single dedicated server w 96gb ram @ 16
> cores /2.4ghz ea. + SSD disk. The search is fast but obviously the index
> size is greater than the physical memory, which to my understanding is not
> a good thing.

An *IDEAL* setup would have enough memory available (not assigned to
programs) to be able to fit the entire index in the disk cache.

Lots of people run systems that aren't ideal and have perfectly
acceptable performance.  I did that for several years.  I would have
loved to have more memory, but the budget wasn't there, and the machines
I was using were already maxed out at 64GB.

If performance is acceptable already, I think that not being able to fit
the entire index into available memory is not enough of a reason to make
significant changes that might require significant development time for
your systems that keep Solr operational.  Switching to SolrCloud could
require changes to your other software.

> My issue is that im not sure where to go to learn how to set this up, how
> many shards, how many replicas, etc and would rather hire somebody or
> something (detailed video or document)  to guide me through the process,
> and make decisions along the way...For example I think a shard is a piece
> of the index... but I dont even know how to decide how many replicas or
> what they are .....

There are no standardized rules for making these decisions.  Typically
you have to make an educated guess and try it to see whether it works.

https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

If it's done in the typical way, telling a SolrCloud setup to create a
collection with 3 shards and 2 replicas will create six individual
indexes that make up the whole collection.  The index will be split into
three pieces (shards), and each of those pieces will have two copies
(replicas).  For each shard, an election will be done that will elect
one of the replicas as leader.

Sharding adds overhead.  In some cases with extremely large indexes, the
overhead is less than the performance gained by splitting the index onto
separate machines and letting those machines work in parallel.  In other
cases, the overhead may result in things actually getting slower.

Thanks,
Shawn