Hardware-Aware Solr Coud Sharding?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Hardware-Aware Solr Coud Sharding?

Michael Braun
We have a case of a Solr Cloud cluster with different kinds of nodes - some
may have significant differences in hardware specs (50-100% more
HD/RAM/CPU, etc). Ideally nodes with increased resources could take on more
shard replicas.

It looks like the Collections API (
https://lucene.apache.org/solr/guide/6_6/collections-api.html) supports
only even splitting of shards when using compositeId routing.

The way to handle this right now looks to be running additional Solr
instances on nodes with increased resources to balance the load (so if the
machines are 1x, 1.5x, and 2x, run 2 instances, 3 instances, and 4
instances, respectively). Has anyone looked into other ways of handling
this that don't require the additional Solr instance deployments?

-Michael
Reply | Threaded
Open this post in threaded view
|

Re: Hardware-Aware Solr Coud Sharding?

Deepak Goel
What does your base hardware configuration look like?

You could have several VM's on machines with higher configuration.



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Tue, Jun 12, 2018 at 8:42 PM, Michael Braun <[hidden email]> wrote:

> We have a case of a Solr Cloud cluster with different kinds of nodes - some
> may have significant differences in hardware specs (50-100% more
> HD/RAM/CPU, etc). Ideally nodes with increased resources could take on more
> shard replicas.
>
> It looks like the Collections API (
> https://lucene.apache.org/solr/guide/6_6/collections-api.html) supports
> only even splitting of shards when using compositeId routing.
>
> The way to handle this right now looks to be running additional Solr
> instances on nodes with increased resources to balance the load (so if the
> machines are 1x, 1.5x, and 2x, run 2 instances, 3 instances, and 4
> instances, respectively). Has anyone looked into other ways of handling
> this that don't require the additional Solr instance deployments?
>
> -Michael
>
Reply | Threaded
Open this post in threaded view
|

Re: Hardware-Aware Solr Coud Sharding?

Shawn Heisey-2
In reply to this post by Michael Braun
On 6/12/2018 9:12 AM, Michael Braun wrote:
> The way to handle this right now looks to be running additional Solr
> instances on nodes with increased resources to balance the load (so if the
> machines are 1x, 1.5x, and 2x, run 2 instances, 3 instances, and 4
> instances, respectively). Has anyone looked into other ways of handling
> this that don't require the additional Solr instance deployments?

Usually, no.  In most cases, you only want to run one Solr instance per
server.  One Solr instance can handle many individual shard replicas. 
If there are more individual indexes on a Solr instance, then it is
likely to be able to take advantage of additional system resources
without running another Solr instance.

The only time you should run multiple Solr instances is when the heap
requirements for running the required indexes with one instance would be
way too big.  Splitting the indexes between two instances with smaller
heaps might end up with much better garbage collection efficiency.

https://lucene.apache.org/solr/guide/7_3/taking-solr-to-production.html#running-multiple-solr-nodes-per-host

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Hardware-Aware Solr Coud Sharding?

Erick Erickson
In a mixed-hardware situation you can certainly place replicas as you
choose. Create a minimal collection or use the special nodeset EMPTY
and then place your replicas one-by-one.

You can also consider "replica placement rules", see:
https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html.
I _think_ this would be a variant of "rack aware". In this case you'd
provide a "snitch" that says something about the hardware
characteristics and the rules you'd define would be sensitive to that.

WARNING: haven't done this myself so don't have any examples to point to....

Best,
Erick

On Tue, Jun 12, 2018 at 8:34 AM, Shawn Heisey <[hidden email]> wrote:

> On 6/12/2018 9:12 AM, Michael Braun wrote:
>> The way to handle this right now looks to be running additional Solr
>> instances on nodes with increased resources to balance the load (so if the
>> machines are 1x, 1.5x, and 2x, run 2 instances, 3 instances, and 4
>> instances, respectively). Has anyone looked into other ways of handling
>> this that don't require the additional Solr instance deployments?
>
> Usually, no.  In most cases, you only want to run one Solr instance per
> server.  One Solr instance can handle many individual shard replicas.
> If there are more individual indexes on a Solr instance, then it is
> likely to be able to take advantage of additional system resources
> without running another Solr instance.
>
> The only time you should run multiple Solr instances is when the heap
> requirements for running the required indexes with one instance would be
> way too big.  Splitting the indexes between two instances with smaller
> heaps might end up with much better garbage collection efficiency.
>
> https://lucene.apache.org/solr/guide/7_3/taking-solr-to-production.html#running-multiple-solr-nodes-per-host
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: Hardware-Aware Solr Coud Sharding?

Jan Høydahl / Cominvent
You could also look into the Autoscaling stuff in 7.x which can be programmed to move shards around based on system load and HW specs on the various nodes, so in theory that framework (although still a bit unstable) will suggest moving some replicas from weak nodes over to more powerful ones. If you "overshard" your system, i.e. if you have three nodes, you create a collection with 9 shards, then there will be three shards per node, and Solr can suggest moving one of them off to anther server.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 12. jun. 2018 kl. 18:39 skrev Erick Erickson <[hidden email]>:
>
> In a mixed-hardware situation you can certainly place replicas as you
> choose. Create a minimal collection or use the special nodeset EMPTY
> and then place your replicas one-by-one.
>
> You can also consider "replica placement rules", see:
> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html.
> I _think_ this would be a variant of "rack aware". In this case you'd
> provide a "snitch" that says something about the hardware
> characteristics and the rules you'd define would be sensitive to that.
>
> WARNING: haven't done this myself so don't have any examples to point to....
>
> Best,
> Erick
>
> On Tue, Jun 12, 2018 at 8:34 AM, Shawn Heisey <[hidden email]> wrote:
>> On 6/12/2018 9:12 AM, Michael Braun wrote:
>>> The way to handle this right now looks to be running additional Solr
>>> instances on nodes with increased resources to balance the load (so if the
>>> machines are 1x, 1.5x, and 2x, run 2 instances, 3 instances, and 4
>>> instances, respectively). Has anyone looked into other ways of handling
>>> this that don't require the additional Solr instance deployments?
>>
>> Usually, no.  In most cases, you only want to run one Solr instance per
>> server.  One Solr instance can handle many individual shard replicas.
>> If there are more individual indexes on a Solr instance, then it is
>> likely to be able to take advantage of additional system resources
>> without running another Solr instance.
>>
>> The only time you should run multiple Solr instances is when the heap
>> requirements for running the required indexes with one instance would be
>> way too big.  Splitting the indexes between two instances with smaller
>> heaps might end up with much better garbage collection efficiency.
>>
>> https://lucene.apache.org/solr/guide/7_3/taking-solr-to-production.html#running-multiple-solr-nodes-per-host
>>
>> Thanks,
>> Shawn
>>

Reply | Threaded
Open this post in threaded view
|

Re: Hardware-Aware Solr Coud Sharding?

Michael Braun
Ended up working well with nodeset EMPTY and placing all replicas manually.
Thank you all for the assistance!

On Thu, Jun 14, 2018 at 9:28 AM, Jan Høydahl <[hidden email]> wrote:

> You could also look into the Autoscaling stuff in 7.x which can be
> programmed to move shards around based on system load and HW specs on the
> various nodes, so in theory that framework (although still a bit unstable)
> will suggest moving some replicas from weak nodes over to more powerful
> ones. If you "overshard" your system, i.e. if you have three nodes, you
> create a collection with 9 shards, then there will be three shards per
> node, and Solr can suggest moving one of them off to anther server.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 12. jun. 2018 kl. 18:39 skrev Erick Erickson <[hidden email]>:
> >
> > In a mixed-hardware situation you can certainly place replicas as you
> > choose. Create a minimal collection or use the special nodeset EMPTY
> > and then place your replicas one-by-one.
> >
> > You can also consider "replica placement rules", see:
> > https://lucene.apache.org/solr/guide/6_6/rule-based-
> replica-placement.html.
> > I _think_ this would be a variant of "rack aware". In this case you'd
> > provide a "snitch" that says something about the hardware
> > characteristics and the rules you'd define would be sensitive to that.
> >
> > WARNING: haven't done this myself so don't have any examples to point
> to....
> >
> > Best,
> > Erick
> >
> > On Tue, Jun 12, 2018 at 8:34 AM, Shawn Heisey <[hidden email]>
> wrote:
> >> On 6/12/2018 9:12 AM, Michael Braun wrote:
> >>> The way to handle this right now looks to be running additional Solr
> >>> instances on nodes with increased resources to balance the load (so if
> the
> >>> machines are 1x, 1.5x, and 2x, run 2 instances, 3 instances, and 4
> >>> instances, respectively). Has anyone looked into other ways of handling
> >>> this that don't require the additional Solr instance deployments?
> >>
> >> Usually, no.  In most cases, you only want to run one Solr instance per
> >> server.  One Solr instance can handle many individual shard replicas.
> >> If there are more individual indexes on a Solr instance, then it is
> >> likely to be able to take advantage of additional system resources
> >> without running another Solr instance.
> >>
> >> The only time you should run multiple Solr instances is when the heap
> >> requirements for running the required indexes with one instance would be
> >> way too big.  Splitting the indexes between two instances with smaller
> >> heaps might end up with much better garbage collection efficiency.
> >>
> >> https://lucene.apache.org/solr/guide/7_3/taking-solr-to-
> production.html#running-multiple-solr-nodes-per-host
> >>
> >> Thanks,
> >> Shawn
> >>
>
>