Autoscaling in 8.0

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Autoscaling in 8.0

Gus Heck
I'm a little worried about the state of Autoscaling. It looks like it has the potential to create bad first experiences. Granted 8.0 isn't supposed to be stable, but I'm seeing things that were documented for 7.6 not working in 8x

TLDR:  
  1. Default settings didn't distribute nodes evenly on brand new 50 node cluster
  2. Can't seem to write rules producing suggestions to distribute them evenly
  3. Suggestions are made that then fail despite quiet cluster, no changes.
Long version:

My Client and I did something that seems very vanilla but it didn't work out well, and the observed behavior contradicts what's published in https://lucene.apache.org/solr/guide/7_6/solr-upgrade-notes.html#solr-7-6 with respect to default core placement. 

The cluster is a 50 node AWS cluster that was freshly set up by a client to test out 8.0.0 (8.0.0-SNAPSHOT 69cbe29e78c400db22aab2f918405ce627d2d65d - solr - 2019-01-11 15:41:35).

They created a collection (A) with 50 shards, one replica each (total of50 cores). They specified maxShardsPerNode=1, and nothing relating to autoscaling. They indexed a small amount of data in (33438861 docs is small for them) for initial testing. They then handed it over to me, and not yet noticing anything wrong with it I added a second collection (B) similarly configured but with schema changes for comparison. However, I noticed at that point that the nodes page was showing a very strange result for this seemingly vanilla set of steps. Most nodes got one core of each collection, but not all:

Node 1 got 2 cores from A
Node 2 got 0 cores
Node 8 got 3 cores from B
Node 21 got 2 cores from A and 1 from B

I've spent all morning fiddling with rules to try to get a configuration that provides suggestions via /api/cluster/autoscaling/suggestions to equalize things and I just can't do it. In particular I can't ever get any suggestion to move anything to node 2. It's as if autoscaling is missing/unable to see node 2. A couple of times I got suggestions with green buttons in the UI (mostly I'm using Postman however)... when I clicked the green button it erred out saying no-node can satisfy.... Nothing's changing, no data incoming so why is it suggesting things that don't work?

When I look at /autoscaling/diagnostics I get this seemingly impossible result:
            {
                "node": "solr-2.customer.redacted.com:8983_solr",
                "isLive": true,
                "cores": 2,
                "freedisk": 140.03918838500977,
                "totaldisk": 147.5209503173828,
                "replicas": {}
            },

2 cores but no replicas? I looked on disk and there's no data on disk representing a core.

-Gus

Reply | Threaded
Open this post in threaded view
|

Re: Autoscaling in 8.0

Andrzej Białecki-2
This merits a JIRA issue - things should not behave this way, that’s for sure.

Please create one, attach the ZK:/autoscaling.json and the output of `/admin/collections?action=CLUSTERSTATUS` and the outputs from /autoscaling/suggestions and /autoscaling/diagnostics - you can anonymize actual node names as long as the data stays consistent (ie. the same node names across all files).

See also SOLR-13155.

> On 18 Jan 2019, at 20:27, Gus Heck <[hidden email]> wrote:
>
> I'm a little worried about the state of Autoscaling. It looks like it has the potential to create bad first experiences. Granted 8.0 isn't supposed to be stable, but I'm seeing things that were documented for 7.6 not working in 8x
>
> TLDR:  
> • Default settings didn't distribute nodes evenly on brand new 50 node cluster
> • Can't seem to write rules producing suggestions to distribute them evenly
> • Suggestions are made that then fail despite quiet cluster, no changes.
> Long version:
>
> My Client and I did something that seems very vanilla but it didn't work out well, and the observed behavior contradicts what's published in https://lucene.apache.org/solr/guide/7_6/solr-upgrade-notes.html#solr-7-6 with respect to default core placement.
>
> The cluster is a 50 node AWS cluster that was freshly set up by a client to test out 8.0.0 (8.0.0-SNAPSHOT 69cbe29e78c400db22aab2f918405ce627d2d65d - solr - 2019-01-11 15:41:35).
>
> They created a collection (A) with 50 shards, one replica each (total of50 cores). They specified maxShardsPerNode=1, and nothing relating to autoscaling. They indexed a small amount of data in (33438861 docs is small for them) for initial testing. They then handed it over to me, and not yet noticing anything wrong with it I added a second collection (B) similarly configured but with schema changes for comparison. However, I noticed at that point that the nodes page was showing a very strange result for this seemingly vanilla set of steps. Most nodes got one core of each collection, but not all:
>
> Node 1 got 2 cores from A
> Node 2 got 0 cores
> Node 8 got 3 cores from B
> Node 21 got 2 cores from A and 1 from B
>
> I've spent all morning fiddling with rules to try to get a configuration that provides suggestions via /api/cluster/autoscaling/suggestions to equalize things and I just can't do it. In particular I can't ever get any suggestion to move anything to node 2. It's as if autoscaling is missing/unable to see node 2. A couple of times I got suggestions with green buttons in the UI (mostly I'm using Postman however)... when I clicked the green button it erred out saying no-node can satisfy.... Nothing's changing, no data incoming so why is it suggesting things that don't work?
>
> When I look at /autoscaling/diagnostics I get this seemingly impossible result:
>             {
>                 "node": "solr-2.customer.redacted.com:8983_solr",
>                 "isLive": true,
>                 "cores": 2,
>                 "freedisk": 140.03918838500977,
>                 "totaldisk": 147.5209503173828,
>                 "replicas": {}
>             },
>
> 2 cores but no replicas? I looked on disk and there's no data on disk representing a core.
>
> -Gus
>
> --
> http://www.the111shift.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Autoscaling in 8.0

Gus Heck
I'm wondering if this is the multi-node manifestation of https://issues.apache.org/jira/browse/SOLR-13142 ?

On Mon, Jan 21, 2019 at 10:20 AM Andrzej Białecki <[hidden email]> wrote:
This merits a JIRA issue - things should not behave this way, that’s for sure.

Please create one, attach the ZK:/autoscaling.json and the output of `/admin/collections?action=CLUSTERSTATUS` and the outputs from /autoscaling/suggestions and /autoscaling/diagnostics - you can anonymize actual node names as long as the data stays consistent (ie. the same node names across all files).

See also SOLR-13155.

> On 18 Jan 2019, at 20:27, Gus Heck <[hidden email]> wrote:
>
> I'm a little worried about the state of Autoscaling. It looks like it has the potential to create bad first experiences. Granted 8.0 isn't supposed to be stable, but I'm seeing things that were documented for 7.6 not working in 8x
>
> TLDR: 
>       • Default settings didn't distribute nodes evenly on brand new 50 node cluster
>       • Can't seem to write rules producing suggestions to distribute them evenly
>       • Suggestions are made that then fail despite quiet cluster, no changes.
> Long version:
>
> My Client and I did something that seems very vanilla but it didn't work out well, and the observed behavior contradicts what's published in https://lucene.apache.org/solr/guide/7_6/solr-upgrade-notes.html#solr-7-6 with respect to default core placement.
>
> The cluster is a 50 node AWS cluster that was freshly set up by a client to test out 8.0.0 (8.0.0-SNAPSHOT 69cbe29e78c400db22aab2f918405ce627d2d65d - solr - 2019-01-11 15:41:35).
>
> They created a collection (A) with 50 shards, one replica each (total of50 cores). They specified maxShardsPerNode=1, and nothing relating to autoscaling. They indexed a small amount of data in (33438861 docs is small for them) for initial testing. They then handed it over to me, and not yet noticing anything wrong with it I added a second collection (B) similarly configured but with schema changes for comparison. However, I noticed at that point that the nodes page was showing a very strange result for this seemingly vanilla set of steps. Most nodes got one core of each collection, but not all:
>
> Node 1 got 2 cores from A
> Node 2 got 0 cores
> Node 8 got 3 cores from B
> Node 21 got 2 cores from A and 1 from B
>
> I've spent all morning fiddling with rules to try to get a configuration that provides suggestions via /api/cluster/autoscaling/suggestions to equalize things and I just can't do it. In particular I can't ever get any suggestion to move anything to node 2. It's as if autoscaling is missing/unable to see node 2. A couple of times I got suggestions with green buttons in the UI (mostly I'm using Postman however)... when I clicked the green button it erred out saying no-node can satisfy.... Nothing's changing, no data incoming so why is it suggesting things that don't work?
>
> When I look at /autoscaling/diagnostics I get this seemingly impossible result:
>             {
>                 "node": "solr-2.customer.redacted.com:8983_solr",
>                 "isLive": true,
>                 "cores": 2,
>                 "freedisk": 140.03918838500977,
>                 "totaldisk": 147.5209503173828,
>                 "replicas": {}
>             },
>
> 2 cores but no replicas? I looked on disk and there's no data on disk representing a core.
>
> -Gus
>
> --
> http://www.the111shift.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



--
Reply | Threaded
Open this post in threaded view
|

Re: Autoscaling in 8.0

Gus Heck

On Mon, Jan 21, 2019 at 11:16 AM Gus Heck <[hidden email]> wrote:
I'm wondering if this is the multi-node manifestation of https://issues.apache.org/jira/browse/SOLR-13142 ?

On Mon, Jan 21, 2019 at 10:20 AM Andrzej Białecki <[hidden email]> wrote:
This merits a JIRA issue - things should not behave this way, that’s for sure.

Please create one, attach the ZK:/autoscaling.json and the output of `/admin/collections?action=CLUSTERSTATUS` and the outputs from /autoscaling/suggestions and /autoscaling/diagnostics - you can anonymize actual node names as long as the data stays consistent (ie. the same node names across all files).

See also SOLR-13155.

> On 18 Jan 2019, at 20:27, Gus Heck <[hidden email]> wrote:
>
> I'm a little worried about the state of Autoscaling. It looks like it has the potential to create bad first experiences. Granted 8.0 isn't supposed to be stable, but I'm seeing things that were documented for 7.6 not working in 8x
>
> TLDR: 
>       • Default settings didn't distribute nodes evenly on brand new 50 node cluster
>       • Can't seem to write rules producing suggestions to distribute them evenly
>       • Suggestions are made that then fail despite quiet cluster, no changes.
> Long version:
>
> My Client and I did something that seems very vanilla but it didn't work out well, and the observed behavior contradicts what's published in https://lucene.apache.org/solr/guide/7_6/solr-upgrade-notes.html#solr-7-6 with respect to default core placement.
>
> The cluster is a 50 node AWS cluster that was freshly set up by a client to test out 8.0.0 (8.0.0-SNAPSHOT 69cbe29e78c400db22aab2f918405ce627d2d65d - solr - 2019-01-11 15:41:35).
>
> They created a collection (A) with 50 shards, one replica each (total of50 cores). They specified maxShardsPerNode=1, and nothing relating to autoscaling. They indexed a small amount of data in (33438861 docs is small for them) for initial testing. They then handed it over to me, and not yet noticing anything wrong with it I added a second collection (B) similarly configured but with schema changes for comparison. However, I noticed at that point that the nodes page was showing a very strange result for this seemingly vanilla set of steps. Most nodes got one core of each collection, but not all:
>
> Node 1 got 2 cores from A
> Node 2 got 0 cores
> Node 8 got 3 cores from B
> Node 21 got 2 cores from A and 1 from B
>
> I've spent all morning fiddling with rules to try to get a configuration that provides suggestions via /api/cluster/autoscaling/suggestions to equalize things and I just can't do it. In particular I can't ever get any suggestion to move anything to node 2. It's as if autoscaling is missing/unable to see node 2. A couple of times I got suggestions with green buttons in the UI (mostly I'm using Postman however)... when I clicked the green button it erred out saying no-node can satisfy.... Nothing's changing, no data incoming so why is it suggesting things that don't work?
>
> When I look at /autoscaling/diagnostics I get this seemingly impossible result:
>             {
>                 "node": "solr-2.customer.redacted.com:8983_solr",
>                 "isLive": true,
>                 "cores": 2,
>                 "freedisk": 140.03918838500977,
>                 "totaldisk": 147.5209503173828,
>                 "replicas": {}
>             },
>
> 2 cores but no replicas? I looked on disk and there's no data on disk representing a core.
>
> -Gus
>
> --
> http://www.the111shift.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



--


--