Ideas for debugging poor SolrCloud scalability

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Ideas for debugging poor SolrCloud scalability

Ian Rose
Howdy all -

The short version is: We are not seeing Solr Cloud performance scale (event
close to) linearly as we add nodes. Can anyone suggest good diagnostics for
finding scaling bottlenecks? Are there known 'gotchas' that make Solr Cloud
fail to scale?

In detail:

We have used Solr (in non-Cloud mode) for over a year and are now beginning
a transition to SolrCloud.  To this end I have been running some basic load
tests to figure out what kind of capacity we should expect to provision.
In short, I am seeing very poor scalability (increase in effective QPS) as
I add Solr nodes.  I'm hoping to get some ideas on where I should be
looking to debug this.  Apologies in advance for the length of this email;
I'm trying to be comprehensive and provide all relevant information.

Our setup:

1 load generating client
 - generates tiny, fake documents with unique IDs
 - performs only writes (no queries at all)
 - chooses a random solr server for each ADD request (with 1 doc per add
request)

N collections spread over K solr servers
 - every collection is sharded K times (so every solr instance has 1 shard
from every collection)
 - no replicas
 - external zookeeper server (not using zkRun)
 - autoCommit maxTime=60000
 - autoSoftCommit maxTime =15000

Everything is running within a single zone on Google Compute Engine, so
high quality gigabit network links between all machines (ping times < 1ms).

My methodology is as follows.
1. Start up a K solr servers.
2. Remove all existing collections.
3. Create N collections, with numShards=K for each.
4. Start load testing.  Every minute, print the number of successful
updates and the number of failed updates.
5. Keep increasing the offered load (via simulated users) until the qps
flatlines.

In brief (more detailed results at the bottom of email), I find that for
any number of nodes between 2 and 5, the QPS always caps out at ~3000.
Obviously something must be wrong here, as there should be a trend of the
QPS scaling (roughly) linearly with the number of nodes.  Or at the very
least going up at all!

So my question is what else should I be looking at here?

* CPU on the loadtest client is well under 100%
* No other obvious bottlenecks on loadtest client (running 2 clients leads
to ~1/2 qps on each)
* In many cases, CPU on the solr servers is quite low as well (e.g. with
100 users hitting 5 solr nodes, all nodes are >50% idle)
* Network bandwidth is a few MB/s, well under the gigabit capacity of our
network
* Disk bandwidth (< 2 MB/s) and iops (< 20/s) are low.

Any ideas?  Thanks very much!
- Ian


p.s. Here is my raw data broken out by number of nodes and number of
simulated users:


Num NodesNum UsersQPS111020153180110382511539001204050140410021472251790210
229021528502202900240321026032002803210210031803138535158031020903152560320
27603252890380305041375451560410220041525004202700425280043028505152450520
2640525279053028405100290052002810
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Shawn Heisey-2
On 10/30/2014 2:23 PM, Ian Rose wrote:
> My methodology is as follows.
> 1. Start up a K solr servers.
> 2. Remove all existing collections.
> 3. Create N collections, with numShards=K for each.
> 4. Start load testing.  Every minute, print the number of successful
> updates and the number of failed updates.
> 5. Keep increasing the offered load (via simulated users) until the qps
> flatlines.

If you want to increase QPS, you should not be increasing numShards.
You need to increase replicationFactor.  When your numShards matches the
number of servers, every single server will be doing part of the work
for every query.  If you increase replicationFactor instead, then each
server can be doing a different query in parallel.

Sharding the index is what you need to do when you need to scale the
size of the index, so each server does not get overwhelmed by dealing
with every document for every query.

Getting a high QPS with a big index requires increasing both numShards
*AND* replicationFactor.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Ian Rose
>
> If you want to increase QPS, you should not be increasing numShards.
> You need to increase replicationFactor.  When your numShards matches the
> number of servers, every single server will be doing part of the work
> for every query.



I think this is true only for actual queries, right?  I am not issuing any
queries, only writes (document inserts).  In the case of writes, increasing
the number of shards should increase my throughput (in ops/sec) more or
less linearly, right?


On Thu, Oct 30, 2014 at 4:50 PM, Shawn Heisey <[hidden email]> wrote:

> On 10/30/2014 2:23 PM, Ian Rose wrote:
> > My methodology is as follows.
> > 1. Start up a K solr servers.
> > 2. Remove all existing collections.
> > 3. Create N collections, with numShards=K for each.
> > 4. Start load testing.  Every minute, print the number of successful
> > updates and the number of failed updates.
> > 5. Keep increasing the offered load (via simulated users) until the qps
> > flatlines.
>
> If you want to increase QPS, you should not be increasing numShards.
> You need to increase replicationFactor.  When your numShards matches the
> number of servers, every single server will be doing part of the work
> for every query.  If you increase replicationFactor instead, then each
> server can be doing a different query in parallel.
>
> Sharding the index is what you need to do when you need to scale the
> size of the index, so each server does not get overwhelmed by dealing
> with every document for every query.
>
> Getting a high QPS with a big index requires increasing both numShards
> *AND* replicationFactor.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Matt Hilt
If you are issuing writes to shard non-leaders, then there is a large overhead for the eventual redirect to the leader. I noticed a 3-5 times performance increase by making my write client leader aware.


On Oct 30, 2014, at 2:56 PM, Ian Rose <[hidden email]> wrote:

>>
>> If you want to increase QPS, you should not be increasing numShards.
>> You need to increase replicationFactor.  When your numShards matches the
>> number of servers, every single server will be doing part of the work
>> for every query.
>
>
>
> I think this is true only for actual queries, right?  I am not issuing any
> queries, only writes (document inserts).  In the case of writes, increasing
> the number of shards should increase my throughput (in ops/sec) more or
> less linearly, right?
>
>
> On Thu, Oct 30, 2014 at 4:50 PM, Shawn Heisey <[hidden email]> wrote:
>
>> On 10/30/2014 2:23 PM, Ian Rose wrote:
>>> My methodology is as follows.
>>> 1. Start up a K solr servers.
>>> 2. Remove all existing collections.
>>> 3. Create N collections, with numShards=K for each.
>>> 4. Start load testing.  Every minute, print the number of successful
>>> updates and the number of failed updates.
>>> 5. Keep increasing the offered load (via simulated users) until the qps
>>> flatlines.
>>
>> If you want to increase QPS, you should not be increasing numShards.
>> You need to increase replicationFactor.  When your numShards matches the
>> number of servers, every single server will be doing part of the work
>> for every query.  If you increase replicationFactor instead, then each
>> server can be doing a different query in parallel.
>>
>> Sharding the index is what you need to do when you need to scale the
>> size of the index, so each server does not get overwhelmed by dealing
>> with every document for every query.
>>
>> Getting a high QPS with a big index requires increasing both numShards
>> *AND* replicationFactor.
>>
>> Thanks,
>> Shawn
>>
>>


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Shawn Heisey-2
In reply to this post by Ian Rose
On 10/30/2014 2:56 PM, Ian Rose wrote:
> I think this is true only for actual queries, right? I am not issuing
> any queries, only writes (document inserts). In the case of writes,
> increasing the number of shards should increase my throughput (in
> ops/sec) more or less linearly, right?

No, that won't affect indexing speed all that much.  The way to increase
indexing speed is to increase the number of processes or threads that
are indexing at the same time.  Instead of having one client sending
update requests, try five of them.  Also, index many documents with each
update request.  Sending one document at a time is very inefficient.

You didn't say how you're doing commits, but those need to be as
infrequent as you can manage.  Ideally, you would use autoCommit with
openSearcher=false on an interval of about five minutes, and send an
explicit commit (with the default openSearcher=true) after all the
indexing is done.

You may have requirements regarding document visibility that this won't
satisfy, but try to avoid doing commits with openSearcher=true (soft
commits qualify for this) extremely frequently, like once a second.
Once a minute is much more realistic.  Opening a new searcher is an
expensive operation, especially if you have cache warming configured.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Erick Erickson
Your indexing client, if written in SolrJ, should use CloudSolrServer
which is, in Matt's terms "leader aware". It divides up the
documents to be indexed into packets that where each doc in
the packet belongs on the same shard, and then sends the packet
to the shard leader. This avoids a lot of re-routing and should
scale essentially linearly. You may have to add more clients
though, depending upon who hard the document-generator is
working.

Also, make sure that you send batches of documents as Shawn
suggests, I use 1,000 as a starting point.

Best,
Erick

On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey <[hidden email]> wrote:

> On 10/30/2014 2:56 PM, Ian Rose wrote:
>> I think this is true only for actual queries, right? I am not issuing
>> any queries, only writes (document inserts). In the case of writes,
>> increasing the number of shards should increase my throughput (in
>> ops/sec) more or less linearly, right?
>
> No, that won't affect indexing speed all that much.  The way to increase
> indexing speed is to increase the number of processes or threads that
> are indexing at the same time.  Instead of having one client sending
> update requests, try five of them.  Also, index many documents with each
> update request.  Sending one document at a time is very inefficient.
>
> You didn't say how you're doing commits, but those need to be as
> infrequent as you can manage.  Ideally, you would use autoCommit with
> openSearcher=false on an interval of about five minutes, and send an
> explicit commit (with the default openSearcher=true) after all the
> indexing is done.
>
> You may have requirements regarding document visibility that this won't
> satisfy, but try to avoid doing commits with openSearcher=true (soft
> commits qualify for this) extremely frequently, like once a second.
> Once a minute is much more realistic.  Opening a new searcher is an
> expensive operation, especially if you have cache warming configured.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Ian Rose
Thanks for the suggestions so for, all.

1) We are not using SolrJ on the client (not using Java at all) but I am
working on writing a "smart" router so that we can always send to the
correct node.  I am certainly curious to see how that changes things.
Nonetheless even with the overhead of extra routing hops, the observed
behavior (no increase in performance with more nodes) doesn't make any
sense to me.

2) Commits: we are using autoCommit with openSearcher=false (maxTime=60000)
and autoSoftCommit (maxTime=15000).

3) Suggestions to batch documents certainly make sense for production code
but in this case I am not real concerned with absolute performance; I just
want to see the *relative* performance as we use more Solr nodes.  So I
don't think batching or not really matters.

4) "No, that won't affect indexing speed all that much.  The way to
increase indexing speed is to increase the number of processes or threads
that are indexing at the same time.  Instead of having one client
sending update
requests, try five of them."

Can you elaborate on this some?  I'm worried I might be misunderstanding
something fundamental.  A cluster of 3 shards over 3 Solr nodes
*should* support
a higher QPS than 2 shards over 2 Solr nodes, right?  That's the whole idea
behind sharding.  Regarding your comment of "increase the number of
processes or threads", note that for each value of K (number of Solr nodes)
I measured with several different numbers of simulated users so that I
could find a "saturation point".  For example, take a look at my data for
K=2:

Num NodesNum UsersQPS214722517902102290215285022029002403210260320028032102
1003180

It's clear that once the load test client has ~40 simulated users, the Solr
cluster is saturated.  Creating more users just increases the average
request latency, such that the total QPS remained (nearly) constant.  So I
feel pretty confident that a cluster of size 2 *maxes out* at ~3200 qps.
The problem is that I am finding roughly this same "max point", no matter
how many simulated users the load test client created, for any value of K
(> 1).

Cheers,
- Ian


On Thu, Oct 30, 2014 at 8:01 PM, Erick Erickson <[hidden email]>
wrote:

> Your indexing client, if written in SolrJ, should use CloudSolrServer
> which is, in Matt's terms "leader aware". It divides up the
> documents to be indexed into packets that where each doc in
> the packet belongs on the same shard, and then sends the packet
> to the shard leader. This avoids a lot of re-routing and should
> scale essentially linearly. You may have to add more clients
> though, depending upon who hard the document-generator is
> working.
>
> Also, make sure that you send batches of documents as Shawn
> suggests, I use 1,000 as a starting point.
>
> Best,
> Erick
>
> On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey <[hidden email]> wrote:
> > On 10/30/2014 2:56 PM, Ian Rose wrote:
> >> I think this is true only for actual queries, right? I am not issuing
> >> any queries, only writes (document inserts). In the case of writes,
> >> increasing the number of shards should increase my throughput (in
> >> ops/sec) more or less linearly, right?
> >
> > No, that won't affect indexing speed all that much.  The way to increase
> > indexing speed is to increase the number of processes or threads that
> > are indexing at the same time.  Instead of having one client sending
> > update requests, try five of them.  Also, index many documents with each
> > update request.  Sending one document at a time is very inefficient.
> >
> > You didn't say how you're doing commits, but those need to be as
> > infrequent as you can manage.  Ideally, you would use autoCommit with
> > openSearcher=false on an interval of about five minutes, and send an
> > explicit commit (with the default openSearcher=true) after all the
> > indexing is done.
> >
> > You may have requirements regarding document visibility that this won't
> > satisfy, but try to avoid doing commits with openSearcher=true (soft
> > commits qualify for this) extremely frequently, like once a second.
> > Once a minute is much more realistic.  Opening a new searcher is an
> > expensive operation, especially if you have cache warming configured.
> >
> > Thanks,
> > Shawn
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Erick Erickson
I'm really confused:

bq: I am not issuing any queries, only writes (document inserts)

bq: It's clear that once the load test client has ~40 simulated users

bq: A cluster of 3 shards over 3 Solr nodes *should* support
a higher QPS than 2 shards over 2 Solr nodes, right

QPS is usually used to mean "Queries Per Second", which is different from
the statement that "I am not issuing any queries....". And what do the
number of users have to do with inserting documents?

You also state: " In many cases, CPU on the solr servers is quite low as well"

So let's talk about indexing first. Indexing should scale nearly
linearly as long as
1> you are routing your docs to the correct leader, which happens with SolrJ
and the CloudSolrSever automatically. Rather than rolling your own, I strongly
suggest you try this out.
2> you have enough clients feeding the cluster to push CPU utilization
on them all.
Very often "slow indexing", or in your case "lack of scaling" is a
result of document
acquisition or, in your case, your doc generator is spending all it's
time waiting for
the individual documents to get to Solr and come back.

bq: "chooses a random solr server for each ADD request (with 1 doc per add
request)"

Probably your culprit right there. Each and every document requires that you
have to cross the network (and forward that doc to the correct leader). So given
that you're not seeing high CPU utilization, I suspect that you're not sending
enough docs to SolrCloud fast enough to see scaling. You need to batch up
multiple docs, I generally send 1,000 docs at a time.

But even if you do solve this, the inter-node routing will prevent
linear scaling.
When a doc (or a batch of docs) goes to a random Solr node, here's what
happens:
1> the docs are re-packaged into groups based on which shard they're
destined for
2> the sub-packets are forwarded to the leader for each shard
3> the responses are gathered back and returned to the client.

This set of operations will eventually degrade the scaling.

bq:  A cluster of 3 shards over 3 Solr nodes *should* support
a higher QPS than 2 shards over 2 Solr nodes, right?  That's the whole idea
behind sharding.

If we're talking search requests, the answer is no. Sharding is
what you do when your collection no longer fits on a single node.
If it _does_ fit on a single node, then you'll usually get better query
performance by adding a bunch of replicas to a single shard. When
the number of  docs on each shard grows large enough that you
no longer get good query performance, _then_ you shard. And
take the query hit.

If we're talking about inserts, then see above. I suspect your problem is
that you're _not_ "saturating the SolrCloud cluster", you're sending
docs to Solr very inefficiently and waiting on I/O. Batching docs and
sending them to the right leader should scale pretty linearly until you
start saturating your network.

Best,
Erick

On Thu, Oct 30, 2014 at 6:56 PM, Ian Rose <[hidden email]> wrote:

> Thanks for the suggestions so for, all.
>
> 1) We are not using SolrJ on the client (not using Java at all) but I am
> working on writing a "smart" router so that we can always send to the
> correct node.  I am certainly curious to see how that changes things.
> Nonetheless even with the overhead of extra routing hops, the observed
> behavior (no increase in performance with more nodes) doesn't make any
> sense to me.
>
> 2) Commits: we are using autoCommit with openSearcher=false (maxTime=60000)
> and autoSoftCommit (maxTime=15000).
>
> 3) Suggestions to batch documents certainly make sense for production code
> but in this case I am not real concerned with absolute performance; I just
> want to see the *relative* performance as we use more Solr nodes.  So I
> don't think batching or not really matters.
>
> 4) "No, that won't affect indexing speed all that much.  The way to
> increase indexing speed is to increase the number of processes or threads
> that are indexing at the same time.  Instead of having one client
> sending update
> requests, try five of them."
>
> Can you elaborate on this some?  I'm worried I might be misunderstanding
> something fundamental.  A cluster of 3 shards over 3 Solr nodes
> *should* support
> a higher QPS than 2 shards over 2 Solr nodes, right?  That's the whole idea
> behind sharding.  Regarding your comment of "increase the number of
> processes or threads", note that for each value of K (number of Solr nodes)
> I measured with several different numbers of simulated users so that I
> could find a "saturation point".  For example, take a look at my data for
> K=2:
>
> Num NodesNum UsersQPS214722517902102290215285022029002403210260320028032102
> 1003180
>
> It's clear that once the load test client has ~40 simulated users, the Solr
> cluster is saturated.  Creating more users just increases the average
> request latency, such that the total QPS remained (nearly) constant.  So I
> feel pretty confident that a cluster of size 2 *maxes out* at ~3200 qps.
> The problem is that I am finding roughly this same "max point", no matter
> how many simulated users the load test client created, for any value of K
> (> 1).
>
> Cheers,
> - Ian
>
>
> On Thu, Oct 30, 2014 at 8:01 PM, Erick Erickson <[hidden email]>
> wrote:
>
>> Your indexing client, if written in SolrJ, should use CloudSolrServer
>> which is, in Matt's terms "leader aware". It divides up the
>> documents to be indexed into packets that where each doc in
>> the packet belongs on the same shard, and then sends the packet
>> to the shard leader. This avoids a lot of re-routing and should
>> scale essentially linearly. You may have to add more clients
>> though, depending upon who hard the document-generator is
>> working.
>>
>> Also, make sure that you send batches of documents as Shawn
>> suggests, I use 1,000 as a starting point.
>>
>> Best,
>> Erick
>>
>> On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey <[hidden email]> wrote:
>> > On 10/30/2014 2:56 PM, Ian Rose wrote:
>> >> I think this is true only for actual queries, right? I am not issuing
>> >> any queries, only writes (document inserts). In the case of writes,
>> >> increasing the number of shards should increase my throughput (in
>> >> ops/sec) more or less linearly, right?
>> >
>> > No, that won't affect indexing speed all that much.  The way to increase
>> > indexing speed is to increase the number of processes or threads that
>> > are indexing at the same time.  Instead of having one client sending
>> > update requests, try five of them.  Also, index many documents with each
>> > update request.  Sending one document at a time is very inefficient.
>> >
>> > You didn't say how you're doing commits, but those need to be as
>> > infrequent as you can manage.  Ideally, you would use autoCommit with
>> > openSearcher=false on an interval of about five minutes, and send an
>> > explicit commit (with the default openSearcher=true) after all the
>> > indexing is done.
>> >
>> > You may have requirements regarding document visibility that this won't
>> > satisfy, but try to avoid doing commits with openSearcher=true (soft
>> > commits qualify for this) extremely frequently, like once a second.
>> > Once a minute is much more realistic.  Opening a new searcher is an
>> > expensive operation, especially if you have cache warming configured.
>> >
>> > Thanks,
>> > Shawn
>> >
>>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Ian Rose
Hi Erick -

Thanks for the detailed response and apologies for my confusing
terminology.  I should have said "WPS" (writes per second) instead of QPS
but I didn't want to introduce a weird new acronym since QPS is well
known.  Clearly a bad decision on my part.  To clarify: I am doing
*only* writes
(document adds).  Whenever I wrote "QPS" I was referring to writes.

It seems clear at this point that I should wrap up the code to do "smart"
routing rather than choose Solr nodes randomly.  And then see if that
changes things.  I must admit that although I understand that random node
selection will impose a performance hit, theoretically it seems to me that
the system should still scale up as you add more nodes (albeit at lower
absolute level of performance than if you used a smart router).
Nonetheless, I'm just theorycrafting here so the better thing to do is just
try it experimentally.  I hope to have that working today - will report
back on my findings.

Cheers,
- Ian

p.s. To clarify why we are rolling our own smart router code, we use Go
over here rather than Java.  Although if we still get bad performance with
our custom Go router I may try a pure Java load client using
CloudSolrServer to eliminate the possibility of bugs in our implementation.


On Fri, Oct 31, 2014 at 1:37 AM, Erick Erickson <[hidden email]>
wrote:

> I'm really confused:
>
> bq: I am not issuing any queries, only writes (document inserts)
>
> bq: It's clear that once the load test client has ~40 simulated users
>
> bq: A cluster of 3 shards over 3 Solr nodes *should* support
> a higher QPS than 2 shards over 2 Solr nodes, right
>
> QPS is usually used to mean "Queries Per Second", which is different from
> the statement that "I am not issuing any queries....". And what do the
> number of users have to do with inserting documents?
>
> You also state: " In many cases, CPU on the solr servers is quite low as
> well"
>
> So let's talk about indexing first. Indexing should scale nearly
> linearly as long as
> 1> you are routing your docs to the correct leader, which happens with
> SolrJ
> and the CloudSolrSever automatically. Rather than rolling your own, I
> strongly
> suggest you try this out.
> 2> you have enough clients feeding the cluster to push CPU utilization
> on them all.
> Very often "slow indexing", or in your case "lack of scaling" is a
> result of document
> acquisition or, in your case, your doc generator is spending all it's
> time waiting for
> the individual documents to get to Solr and come back.
>
> bq: "chooses a random solr server for each ADD request (with 1 doc per add
> request)"
>
> Probably your culprit right there. Each and every document requires that
> you
> have to cross the network (and forward that doc to the correct leader). So
> given
> that you're not seeing high CPU utilization, I suspect that you're not
> sending
> enough docs to SolrCloud fast enough to see scaling. You need to batch up
> multiple docs, I generally send 1,000 docs at a time.
>
> But even if you do solve this, the inter-node routing will prevent
> linear scaling.
> When a doc (or a batch of docs) goes to a random Solr node, here's what
> happens:
> 1> the docs are re-packaged into groups based on which shard they're
> destined for
> 2> the sub-packets are forwarded to the leader for each shard
> 3> the responses are gathered back and returned to the client.
>
> This set of operations will eventually degrade the scaling.
>
> bq:  A cluster of 3 shards over 3 Solr nodes *should* support
> a higher QPS than 2 shards over 2 Solr nodes, right?  That's the whole idea
> behind sharding.
>
> If we're talking search requests, the answer is no. Sharding is
> what you do when your collection no longer fits on a single node.
> If it _does_ fit on a single node, then you'll usually get better query
> performance by adding a bunch of replicas to a single shard. When
> the number of  docs on each shard grows large enough that you
> no longer get good query performance, _then_ you shard. And
> take the query hit.
>
> If we're talking about inserts, then see above. I suspect your problem is
> that you're _not_ "saturating the SolrCloud cluster", you're sending
> docs to Solr very inefficiently and waiting on I/O. Batching docs and
> sending them to the right leader should scale pretty linearly until you
> start saturating your network.
>
> Best,
> Erick
>
> On Thu, Oct 30, 2014 at 6:56 PM, Ian Rose <[hidden email]> wrote:
> > Thanks for the suggestions so for, all.
> >
> > 1) We are not using SolrJ on the client (not using Java at all) but I am
> > working on writing a "smart" router so that we can always send to the
> > correct node.  I am certainly curious to see how that changes things.
> > Nonetheless even with the overhead of extra routing hops, the observed
> > behavior (no increase in performance with more nodes) doesn't make any
> > sense to me.
> >
> > 2) Commits: we are using autoCommit with openSearcher=false
> (maxTime=60000)
> > and autoSoftCommit (maxTime=15000).
> >
> > 3) Suggestions to batch documents certainly make sense for production
> code
> > but in this case I am not real concerned with absolute performance; I
> just
> > want to see the *relative* performance as we use more Solr nodes.  So I
> > don't think batching or not really matters.
> >
> > 4) "No, that won't affect indexing speed all that much.  The way to
> > increase indexing speed is to increase the number of processes or threads
> > that are indexing at the same time.  Instead of having one client
> > sending update
> > requests, try five of them."
> >
> > Can you elaborate on this some?  I'm worried I might be misunderstanding
> > something fundamental.  A cluster of 3 shards over 3 Solr nodes
> > *should* support
> > a higher QPS than 2 shards over 2 Solr nodes, right?  That's the whole
> idea
> > behind sharding.  Regarding your comment of "increase the number of
> > processes or threads", note that for each value of K (number of Solr
> nodes)
> > I measured with several different numbers of simulated users so that I
> > could find a "saturation point".  For example, take a look at my data for
> > K=2:
> >
> > Num NodesNum
> UsersQPS214722517902102290215285022029002403210260320028032102
> > 1003180
> >
> > It's clear that once the load test client has ~40 simulated users, the
> Solr
> > cluster is saturated.  Creating more users just increases the average
> > request latency, such that the total QPS remained (nearly) constant.  So
> I
> > feel pretty confident that a cluster of size 2 *maxes out* at ~3200 qps.
> > The problem is that I am finding roughly this same "max point", no matter
> > how many simulated users the load test client created, for any value of K
> > (> 1).
> >
> > Cheers,
> > - Ian
> >
> >
> > On Thu, Oct 30, 2014 at 8:01 PM, Erick Erickson <[hidden email]
> >
> > wrote:
> >
> >> Your indexing client, if written in SolrJ, should use CloudSolrServer
> >> which is, in Matt's terms "leader aware". It divides up the
> >> documents to be indexed into packets that where each doc in
> >> the packet belongs on the same shard, and then sends the packet
> >> to the shard leader. This avoids a lot of re-routing and should
> >> scale essentially linearly. You may have to add more clients
> >> though, depending upon who hard the document-generator is
> >> working.
> >>
> >> Also, make sure that you send batches of documents as Shawn
> >> suggests, I use 1,000 as a starting point.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey <[hidden email]>
> wrote:
> >> > On 10/30/2014 2:56 PM, Ian Rose wrote:
> >> >> I think this is true only for actual queries, right? I am not issuing
> >> >> any queries, only writes (document inserts). In the case of writes,
> >> >> increasing the number of shards should increase my throughput (in
> >> >> ops/sec) more or less linearly, right?
> >> >
> >> > No, that won't affect indexing speed all that much.  The way to
> increase
> >> > indexing speed is to increase the number of processes or threads that
> >> > are indexing at the same time.  Instead of having one client sending
> >> > update requests, try five of them.  Also, index many documents with
> each
> >> > update request.  Sending one document at a time is very inefficient.
> >> >
> >> > You didn't say how you're doing commits, but those need to be as
> >> > infrequent as you can manage.  Ideally, you would use autoCommit with
> >> > openSearcher=false on an interval of about five minutes, and send an
> >> > explicit commit (with the default openSearcher=true) after all the
> >> > indexing is done.
> >> >
> >> > You may have requirements regarding document visibility that this
> won't
> >> > satisfy, but try to avoid doing commits with openSearcher=true (soft
> >> > commits qualify for this) extremely frequently, like once a second.
> >> > Once a minute is much more realistic.  Opening a new searcher is an
> >> > expensive operation, especially if you have cache warming configured.
> >> >
> >> > Thanks,
> >> > Shawn
> >> >
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Erick Erickson
NP, just making sure.

I suspect you'll get lots more bang for the buck, and
results much more closely matching your expectations if

1> you batch up a bunch of docs at once rather than
sending them one at a time. That's probably the easiest
thing to try. Sending docs one at a time is something of
an anti-pattern. I usually start with batches of 1,000.

And just to check.. You're not issuing any commits from the
client, right? Performance will be terrible if you issue commits
after every doc, that's totally an anti-pattern. Doubly so for
optimizes.... Since you showed us your solrconfig  autocommit
settings I'm assuming not but want to be sure.

2> use a leader-aware client. I'm totally unfamiliar with Go,
so I have no suggestions whatsoever to offer there.... But you'll
want to batch in this case too.

On Fri, Oct 31, 2014 at 5:51 AM, Ian Rose <[hidden email]> wrote:

> Hi Erick -
>
> Thanks for the detailed response and apologies for my confusing
> terminology.  I should have said "WPS" (writes per second) instead of QPS
> but I didn't want to introduce a weird new acronym since QPS is well
> known.  Clearly a bad decision on my part.  To clarify: I am doing
> *only* writes
> (document adds).  Whenever I wrote "QPS" I was referring to writes.
>
> It seems clear at this point that I should wrap up the code to do "smart"
> routing rather than choose Solr nodes randomly.  And then see if that
> changes things.  I must admit that although I understand that random node
> selection will impose a performance hit, theoretically it seems to me that
> the system should still scale up as you add more nodes (albeit at lower
> absolute level of performance than if you used a smart router).
> Nonetheless, I'm just theorycrafting here so the better thing to do is just
> try it experimentally.  I hope to have that working today - will report
> back on my findings.
>
> Cheers,
> - Ian
>
> p.s. To clarify why we are rolling our own smart router code, we use Go
> over here rather than Java.  Although if we still get bad performance with
> our custom Go router I may try a pure Java load client using
> CloudSolrServer to eliminate the possibility of bugs in our implementation.
>
>
> On Fri, Oct 31, 2014 at 1:37 AM, Erick Erickson <[hidden email]>
> wrote:
>
>> I'm really confused:
>>
>> bq: I am not issuing any queries, only writes (document inserts)
>>
>> bq: It's clear that once the load test client has ~40 simulated users
>>
>> bq: A cluster of 3 shards over 3 Solr nodes *should* support
>> a higher QPS than 2 shards over 2 Solr nodes, right
>>
>> QPS is usually used to mean "Queries Per Second", which is different from
>> the statement that "I am not issuing any queries....". And what do the
>> number of users have to do with inserting documents?
>>
>> You also state: " In many cases, CPU on the solr servers is quite low as
>> well"
>>
>> So let's talk about indexing first. Indexing should scale nearly
>> linearly as long as
>> 1> you are routing your docs to the correct leader, which happens with
>> SolrJ
>> and the CloudSolrSever automatically. Rather than rolling your own, I
>> strongly
>> suggest you try this out.
>> 2> you have enough clients feeding the cluster to push CPU utilization
>> on them all.
>> Very often "slow indexing", or in your case "lack of scaling" is a
>> result of document
>> acquisition or, in your case, your doc generator is spending all it's
>> time waiting for
>> the individual documents to get to Solr and come back.
>>
>> bq: "chooses a random solr server for each ADD request (with 1 doc per add
>> request)"
>>
>> Probably your culprit right there. Each and every document requires that
>> you
>> have to cross the network (and forward that doc to the correct leader). So
>> given
>> that you're not seeing high CPU utilization, I suspect that you're not
>> sending
>> enough docs to SolrCloud fast enough to see scaling. You need to batch up
>> multiple docs, I generally send 1,000 docs at a time.
>>
>> But even if you do solve this, the inter-node routing will prevent
>> linear scaling.
>> When a doc (or a batch of docs) goes to a random Solr node, here's what
>> happens:
>> 1> the docs are re-packaged into groups based on which shard they're
>> destined for
>> 2> the sub-packets are forwarded to the leader for each shard
>> 3> the responses are gathered back and returned to the client.
>>
>> This set of operations will eventually degrade the scaling.
>>
>> bq:  A cluster of 3 shards over 3 Solr nodes *should* support
>> a higher QPS than 2 shards over 2 Solr nodes, right?  That's the whole idea
>> behind sharding.
>>
>> If we're talking search requests, the answer is no. Sharding is
>> what you do when your collection no longer fits on a single node.
>> If it _does_ fit on a single node, then you'll usually get better query
>> performance by adding a bunch of replicas to a single shard. When
>> the number of  docs on each shard grows large enough that you
>> no longer get good query performance, _then_ you shard. And
>> take the query hit.
>>
>> If we're talking about inserts, then see above. I suspect your problem is
>> that you're _not_ "saturating the SolrCloud cluster", you're sending
>> docs to Solr very inefficiently and waiting on I/O. Batching docs and
>> sending them to the right leader should scale pretty linearly until you
>> start saturating your network.
>>
>> Best,
>> Erick
>>
>> On Thu, Oct 30, 2014 at 6:56 PM, Ian Rose <[hidden email]> wrote:
>> > Thanks for the suggestions so for, all.
>> >
>> > 1) We are not using SolrJ on the client (not using Java at all) but I am
>> > working on writing a "smart" router so that we can always send to the
>> > correct node.  I am certainly curious to see how that changes things.
>> > Nonetheless even with the overhead of extra routing hops, the observed
>> > behavior (no increase in performance with more nodes) doesn't make any
>> > sense to me.
>> >
>> > 2) Commits: we are using autoCommit with openSearcher=false
>> (maxTime=60000)
>> > and autoSoftCommit (maxTime=15000).
>> >
>> > 3) Suggestions to batch documents certainly make sense for production
>> code
>> > but in this case I am not real concerned with absolute performance; I
>> just
>> > want to see the *relative* performance as we use more Solr nodes.  So I
>> > don't think batching or not really matters.
>> >
>> > 4) "No, that won't affect indexing speed all that much.  The way to
>> > increase indexing speed is to increase the number of processes or threads
>> > that are indexing at the same time.  Instead of having one client
>> > sending update
>> > requests, try five of them."
>> >
>> > Can you elaborate on this some?  I'm worried I might be misunderstanding
>> > something fundamental.  A cluster of 3 shards over 3 Solr nodes
>> > *should* support
>> > a higher QPS than 2 shards over 2 Solr nodes, right?  That's the whole
>> idea
>> > behind sharding.  Regarding your comment of "increase the number of
>> > processes or threads", note that for each value of K (number of Solr
>> nodes)
>> > I measured with several different numbers of simulated users so that I
>> > could find a "saturation point".  For example, take a look at my data for
>> > K=2:
>> >
>> > Num NodesNum
>> UsersQPS214722517902102290215285022029002403210260320028032102
>> > 1003180
>> >
>> > It's clear that once the load test client has ~40 simulated users, the
>> Solr
>> > cluster is saturated.  Creating more users just increases the average
>> > request latency, such that the total QPS remained (nearly) constant.  So
>> I
>> > feel pretty confident that a cluster of size 2 *maxes out* at ~3200 qps.
>> > The problem is that I am finding roughly this same "max point", no matter
>> > how many simulated users the load test client created, for any value of K
>> > (> 1).
>> >
>> > Cheers,
>> > - Ian
>> >
>> >
>> > On Thu, Oct 30, 2014 at 8:01 PM, Erick Erickson <[hidden email]
>> >
>> > wrote:
>> >
>> >> Your indexing client, if written in SolrJ, should use CloudSolrServer
>> >> which is, in Matt's terms "leader aware". It divides up the
>> >> documents to be indexed into packets that where each doc in
>> >> the packet belongs on the same shard, and then sends the packet
>> >> to the shard leader. This avoids a lot of re-routing and should
>> >> scale essentially linearly. You may have to add more clients
>> >> though, depending upon who hard the document-generator is
>> >> working.
>> >>
>> >> Also, make sure that you send batches of documents as Shawn
>> >> suggests, I use 1,000 as a starting point.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey <[hidden email]>
>> wrote:
>> >> > On 10/30/2014 2:56 PM, Ian Rose wrote:
>> >> >> I think this is true only for actual queries, right? I am not issuing
>> >> >> any queries, only writes (document inserts). In the case of writes,
>> >> >> increasing the number of shards should increase my throughput (in
>> >> >> ops/sec) more or less linearly, right?
>> >> >
>> >> > No, that won't affect indexing speed all that much.  The way to
>> increase
>> >> > indexing speed is to increase the number of processes or threads that
>> >> > are indexing at the same time.  Instead of having one client sending
>> >> > update requests, try five of them.  Also, index many documents with
>> each
>> >> > update request.  Sending one document at a time is very inefficient.
>> >> >
>> >> > You didn't say how you're doing commits, but those need to be as
>> >> > infrequent as you can manage.  Ideally, you would use autoCommit with
>> >> > openSearcher=false on an interval of about five minutes, and send an
>> >> > explicit commit (with the default openSearcher=true) after all the
>> >> > indexing is done.
>> >> >
>> >> > You may have requirements regarding document visibility that this
>> won't
>> >> > satisfy, but try to avoid doing commits with openSearcher=true (soft
>> >> > commits qualify for this) extremely frequently, like once a second.
>> >> > Once a minute is much more realistic.  Opening a new searcher is an
>> >> > expensive operation, especially if you have cache warming configured.
>> >> >
>> >> > Thanks,
>> >> > Shawn
>> >> >
>> >>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Peter Keegan
Regarding batch indexing:
When I send batches of 1000 docs to a standalone Solr server, the log file
reports "(1000 adds)" in LogUpdateProcessor. But when I send them to the
leader of a replicated index, the leader log file reports much smaller
numbers, usually "(12 adds)". Why do the batches appear to be broken up?

Peter

On Fri, Oct 31, 2014 at 10:40 AM, Erick Erickson <[hidden email]>
wrote:

> NP, just making sure.
>
> I suspect you'll get lots more bang for the buck, and
> results much more closely matching your expectations if
>
> 1> you batch up a bunch of docs at once rather than
> sending them one at a time. That's probably the easiest
> thing to try. Sending docs one at a time is something of
> an anti-pattern. I usually start with batches of 1,000.
>
> And just to check.. You're not issuing any commits from the
> client, right? Performance will be terrible if you issue commits
> after every doc, that's totally an anti-pattern. Doubly so for
> optimizes.... Since you showed us your solrconfig  autocommit
> settings I'm assuming not but want to be sure.
>
> 2> use a leader-aware client. I'm totally unfamiliar with Go,
> so I have no suggestions whatsoever to offer there.... But you'll
> want to batch in this case too.
>
> On Fri, Oct 31, 2014 at 5:51 AM, Ian Rose <[hidden email]> wrote:
> > Hi Erick -
> >
> > Thanks for the detailed response and apologies for my confusing
> > terminology.  I should have said "WPS" (writes per second) instead of QPS
> > but I didn't want to introduce a weird new acronym since QPS is well
> > known.  Clearly a bad decision on my part.  To clarify: I am doing
> > *only* writes
> > (document adds).  Whenever I wrote "QPS" I was referring to writes.
> >
> > It seems clear at this point that I should wrap up the code to do "smart"
> > routing rather than choose Solr nodes randomly.  And then see if that
> > changes things.  I must admit that although I understand that random node
> > selection will impose a performance hit, theoretically it seems to me
> that
> > the system should still scale up as you add more nodes (albeit at lower
> > absolute level of performance than if you used a smart router).
> > Nonetheless, I'm just theorycrafting here so the better thing to do is
> just
> > try it experimentally.  I hope to have that working today - will report
> > back on my findings.
> >
> > Cheers,
> > - Ian
> >
> > p.s. To clarify why we are rolling our own smart router code, we use Go
> > over here rather than Java.  Although if we still get bad performance
> with
> > our custom Go router I may try a pure Java load client using
> > CloudSolrServer to eliminate the possibility of bugs in our
> implementation.
> >
> >
> > On Fri, Oct 31, 2014 at 1:37 AM, Erick Erickson <[hidden email]
> >
> > wrote:
> >
> >> I'm really confused:
> >>
> >> bq: I am not issuing any queries, only writes (document inserts)
> >>
> >> bq: It's clear that once the load test client has ~40 simulated users
> >>
> >> bq: A cluster of 3 shards over 3 Solr nodes *should* support
> >> a higher QPS than 2 shards over 2 Solr nodes, right
> >>
> >> QPS is usually used to mean "Queries Per Second", which is different
> from
> >> the statement that "I am not issuing any queries....". And what do the
> >> number of users have to do with inserting documents?
> >>
> >> You also state: " In many cases, CPU on the solr servers is quite low as
> >> well"
> >>
> >> So let's talk about indexing first. Indexing should scale nearly
> >> linearly as long as
> >> 1> you are routing your docs to the correct leader, which happens with
> >> SolrJ
> >> and the CloudSolrSever automatically. Rather than rolling your own, I
> >> strongly
> >> suggest you try this out.
> >> 2> you have enough clients feeding the cluster to push CPU utilization
> >> on them all.
> >> Very often "slow indexing", or in your case "lack of scaling" is a
> >> result of document
> >> acquisition or, in your case, your doc generator is spending all it's
> >> time waiting for
> >> the individual documents to get to Solr and come back.
> >>
> >> bq: "chooses a random solr server for each ADD request (with 1 doc per
> add
> >> request)"
> >>
> >> Probably your culprit right there. Each and every document requires that
> >> you
> >> have to cross the network (and forward that doc to the correct leader).
> So
> >> given
> >> that you're not seeing high CPU utilization, I suspect that you're not
> >> sending
> >> enough docs to SolrCloud fast enough to see scaling. You need to batch
> up
> >> multiple docs, I generally send 1,000 docs at a time.
> >>
> >> But even if you do solve this, the inter-node routing will prevent
> >> linear scaling.
> >> When a doc (or a batch of docs) goes to a random Solr node, here's what
> >> happens:
> >> 1> the docs are re-packaged into groups based on which shard they're
> >> destined for
> >> 2> the sub-packets are forwarded to the leader for each shard
> >> 3> the responses are gathered back and returned to the client.
> >>
> >> This set of operations will eventually degrade the scaling.
> >>
> >> bq:  A cluster of 3 shards over 3 Solr nodes *should* support
> >> a higher QPS than 2 shards over 2 Solr nodes, right?  That's the whole
> idea
> >> behind sharding.
> >>
> >> If we're talking search requests, the answer is no. Sharding is
> >> what you do when your collection no longer fits on a single node.
> >> If it _does_ fit on a single node, then you'll usually get better query
> >> performance by adding a bunch of replicas to a single shard. When
> >> the number of  docs on each shard grows large enough that you
> >> no longer get good query performance, _then_ you shard. And
> >> take the query hit.
> >>
> >> If we're talking about inserts, then see above. I suspect your problem
> is
> >> that you're _not_ "saturating the SolrCloud cluster", you're sending
> >> docs to Solr very inefficiently and waiting on I/O. Batching docs and
> >> sending them to the right leader should scale pretty linearly until you
> >> start saturating your network.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Oct 30, 2014 at 6:56 PM, Ian Rose <[hidden email]>
> wrote:
> >> > Thanks for the suggestions so for, all.
> >> >
> >> > 1) We are not using SolrJ on the client (not using Java at all) but I
> am
> >> > working on writing a "smart" router so that we can always send to the
> >> > correct node.  I am certainly curious to see how that changes things.
> >> > Nonetheless even with the overhead of extra routing hops, the observed
> >> > behavior (no increase in performance with more nodes) doesn't make any
> >> > sense to me.
> >> >
> >> > 2) Commits: we are using autoCommit with openSearcher=false
> >> (maxTime=60000)
> >> > and autoSoftCommit (maxTime=15000).
> >> >
> >> > 3) Suggestions to batch documents certainly make sense for production
> >> code
> >> > but in this case I am not real concerned with absolute performance; I
> >> just
> >> > want to see the *relative* performance as we use more Solr nodes.  So
> I
> >> > don't think batching or not really matters.
> >> >
> >> > 4) "No, that won't affect indexing speed all that much.  The way to
> >> > increase indexing speed is to increase the number of processes or
> threads
> >> > that are indexing at the same time.  Instead of having one client
> >> > sending update
> >> > requests, try five of them."
> >> >
> >> > Can you elaborate on this some?  I'm worried I might be
> misunderstanding
> >> > something fundamental.  A cluster of 3 shards over 3 Solr nodes
> >> > *should* support
> >> > a higher QPS than 2 shards over 2 Solr nodes, right?  That's the whole
> >> idea
> >> > behind sharding.  Regarding your comment of "increase the number of
> >> > processes or threads", note that for each value of K (number of Solr
> >> nodes)
> >> > I measured with several different numbers of simulated users so that I
> >> > could find a "saturation point".  For example, take a look at my data
> for
> >> > K=2:
> >> >
> >> > Num NodesNum
> >> UsersQPS214722517902102290215285022029002403210260320028032102
> >> > 1003180
> >> >
> >> > It's clear that once the load test client has ~40 simulated users, the
> >> Solr
> >> > cluster is saturated.  Creating more users just increases the average
> >> > request latency, such that the total QPS remained (nearly) constant.
> So
> >> I
> >> > feel pretty confident that a cluster of size 2 *maxes out* at ~3200
> qps.
> >> > The problem is that I am finding roughly this same "max point", no
> matter
> >> > how many simulated users the load test client created, for any value
> of K
> >> > (> 1).
> >> >
> >> > Cheers,
> >> > - Ian
> >> >
> >> >
> >> > On Thu, Oct 30, 2014 at 8:01 PM, Erick Erickson <
> [hidden email]
> >> >
> >> > wrote:
> >> >
> >> >> Your indexing client, if written in SolrJ, should use CloudSolrServer
> >> >> which is, in Matt's terms "leader aware". It divides up the
> >> >> documents to be indexed into packets that where each doc in
> >> >> the packet belongs on the same shard, and then sends the packet
> >> >> to the shard leader. This avoids a lot of re-routing and should
> >> >> scale essentially linearly. You may have to add more clients
> >> >> though, depending upon who hard the document-generator is
> >> >> working.
> >> >>
> >> >> Also, make sure that you send batches of documents as Shawn
> >> >> suggests, I use 1,000 as a starting point.
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey <[hidden email]>
> >> wrote:
> >> >> > On 10/30/2014 2:56 PM, Ian Rose wrote:
> >> >> >> I think this is true only for actual queries, right? I am not
> issuing
> >> >> >> any queries, only writes (document inserts). In the case of
> writes,
> >> >> >> increasing the number of shards should increase my throughput (in
> >> >> >> ops/sec) more or less linearly, right?
> >> >> >
> >> >> > No, that won't affect indexing speed all that much.  The way to
> >> increase
> >> >> > indexing speed is to increase the number of processes or threads
> that
> >> >> > are indexing at the same time.  Instead of having one client
> sending
> >> >> > update requests, try five of them.  Also, index many documents with
> >> each
> >> >> > update request.  Sending one document at a time is very
> inefficient.
> >> >> >
> >> >> > You didn't say how you're doing commits, but those need to be as
> >> >> > infrequent as you can manage.  Ideally, you would use autoCommit
> with
> >> >> > openSearcher=false on an interval of about five minutes, and send
> an
> >> >> > explicit commit (with the default openSearcher=true) after all the
> >> >> > indexing is done.
> >> >> >
> >> >> > You may have requirements regarding document visibility that this
> >> won't
> >> >> > satisfy, but try to avoid doing commits with openSearcher=true
> (soft
> >> >> > commits qualify for this) extremely frequently, like once a second.
> >> >> > Once a minute is much more realistic.  Opening a new searcher is an
> >> >> > expensive operation, especially if you have cache warming
> configured.
> >> >> >
> >> >> > Thanks,
> >> >> > Shawn
> >> >> >
> >> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Erick Erickson
Internally, the docs are batched up into smaller buckets (10 as I
remember) and forwarded to the correct shard leader. I suspect that's
what you're seeing.

Erick

On Fri, Oct 31, 2014 at 12:20 PM, Peter Keegan <[hidden email]> wrote:

> Regarding batch indexing:
> When I send batches of 1000 docs to a standalone Solr server, the log file
> reports "(1000 adds)" in LogUpdateProcessor. But when I send them to the
> leader of a replicated index, the leader log file reports much smaller
> numbers, usually "(12 adds)". Why do the batches appear to be broken up?
>
> Peter
>
> On Fri, Oct 31, 2014 at 10:40 AM, Erick Erickson <[hidden email]>
> wrote:
>
>> NP, just making sure.
>>
>> I suspect you'll get lots more bang for the buck, and
>> results much more closely matching your expectations if
>>
>> 1> you batch up a bunch of docs at once rather than
>> sending them one at a time. That's probably the easiest
>> thing to try. Sending docs one at a time is something of
>> an anti-pattern. I usually start with batches of 1,000.
>>
>> And just to check.. You're not issuing any commits from the
>> client, right? Performance will be terrible if you issue commits
>> after every doc, that's totally an anti-pattern. Doubly so for
>> optimizes.... Since you showed us your solrconfig  autocommit
>> settings I'm assuming not but want to be sure.
>>
>> 2> use a leader-aware client. I'm totally unfamiliar with Go,
>> so I have no suggestions whatsoever to offer there.... But you'll
>> want to batch in this case too.
>>
>> On Fri, Oct 31, 2014 at 5:51 AM, Ian Rose <[hidden email]> wrote:
>> > Hi Erick -
>> >
>> > Thanks for the detailed response and apologies for my confusing
>> > terminology.  I should have said "WPS" (writes per second) instead of QPS
>> > but I didn't want to introduce a weird new acronym since QPS is well
>> > known.  Clearly a bad decision on my part.  To clarify: I am doing
>> > *only* writes
>> > (document adds).  Whenever I wrote "QPS" I was referring to writes.
>> >
>> > It seems clear at this point that I should wrap up the code to do "smart"
>> > routing rather than choose Solr nodes randomly.  And then see if that
>> > changes things.  I must admit that although I understand that random node
>> > selection will impose a performance hit, theoretically it seems to me
>> that
>> > the system should still scale up as you add more nodes (albeit at lower
>> > absolute level of performance than if you used a smart router).
>> > Nonetheless, I'm just theorycrafting here so the better thing to do is
>> just
>> > try it experimentally.  I hope to have that working today - will report
>> > back on my findings.
>> >
>> > Cheers,
>> > - Ian
>> >
>> > p.s. To clarify why we are rolling our own smart router code, we use Go
>> > over here rather than Java.  Although if we still get bad performance
>> with
>> > our custom Go router I may try a pure Java load client using
>> > CloudSolrServer to eliminate the possibility of bugs in our
>> implementation.
>> >
>> >
>> > On Fri, Oct 31, 2014 at 1:37 AM, Erick Erickson <[hidden email]
>> >
>> > wrote:
>> >
>> >> I'm really confused:
>> >>
>> >> bq: I am not issuing any queries, only writes (document inserts)
>> >>
>> >> bq: It's clear that once the load test client has ~40 simulated users
>> >>
>> >> bq: A cluster of 3 shards over 3 Solr nodes *should* support
>> >> a higher QPS than 2 shards over 2 Solr nodes, right
>> >>
>> >> QPS is usually used to mean "Queries Per Second", which is different
>> from
>> >> the statement that "I am not issuing any queries....". And what do the
>> >> number of users have to do with inserting documents?
>> >>
>> >> You also state: " In many cases, CPU on the solr servers is quite low as
>> >> well"
>> >>
>> >> So let's talk about indexing first. Indexing should scale nearly
>> >> linearly as long as
>> >> 1> you are routing your docs to the correct leader, which happens with
>> >> SolrJ
>> >> and the CloudSolrSever automatically. Rather than rolling your own, I
>> >> strongly
>> >> suggest you try this out.
>> >> 2> you have enough clients feeding the cluster to push CPU utilization
>> >> on them all.
>> >> Very often "slow indexing", or in your case "lack of scaling" is a
>> >> result of document
>> >> acquisition or, in your case, your doc generator is spending all it's
>> >> time waiting for
>> >> the individual documents to get to Solr and come back.
>> >>
>> >> bq: "chooses a random solr server for each ADD request (with 1 doc per
>> add
>> >> request)"
>> >>
>> >> Probably your culprit right there. Each and every document requires that
>> >> you
>> >> have to cross the network (and forward that doc to the correct leader).
>> So
>> >> given
>> >> that you're not seeing high CPU utilization, I suspect that you're not
>> >> sending
>> >> enough docs to SolrCloud fast enough to see scaling. You need to batch
>> up
>> >> multiple docs, I generally send 1,000 docs at a time.
>> >>
>> >> But even if you do solve this, the inter-node routing will prevent
>> >> linear scaling.
>> >> When a doc (or a batch of docs) goes to a random Solr node, here's what
>> >> happens:
>> >> 1> the docs are re-packaged into groups based on which shard they're
>> >> destined for
>> >> 2> the sub-packets are forwarded to the leader for each shard
>> >> 3> the responses are gathered back and returned to the client.
>> >>
>> >> This set of operations will eventually degrade the scaling.
>> >>
>> >> bq:  A cluster of 3 shards over 3 Solr nodes *should* support
>> >> a higher QPS than 2 shards over 2 Solr nodes, right?  That's the whole
>> idea
>> >> behind sharding.
>> >>
>> >> If we're talking search requests, the answer is no. Sharding is
>> >> what you do when your collection no longer fits on a single node.
>> >> If it _does_ fit on a single node, then you'll usually get better query
>> >> performance by adding a bunch of replicas to a single shard. When
>> >> the number of  docs on each shard grows large enough that you
>> >> no longer get good query performance, _then_ you shard. And
>> >> take the query hit.
>> >>
>> >> If we're talking about inserts, then see above. I suspect your problem
>> is
>> >> that you're _not_ "saturating the SolrCloud cluster", you're sending
>> >> docs to Solr very inefficiently and waiting on I/O. Batching docs and
>> >> sending them to the right leader should scale pretty linearly until you
>> >> start saturating your network.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Thu, Oct 30, 2014 at 6:56 PM, Ian Rose <[hidden email]>
>> wrote:
>> >> > Thanks for the suggestions so for, all.
>> >> >
>> >> > 1) We are not using SolrJ on the client (not using Java at all) but I
>> am
>> >> > working on writing a "smart" router so that we can always send to the
>> >> > correct node.  I am certainly curious to see how that changes things.
>> >> > Nonetheless even with the overhead of extra routing hops, the observed
>> >> > behavior (no increase in performance with more nodes) doesn't make any
>> >> > sense to me.
>> >> >
>> >> > 2) Commits: we are using autoCommit with openSearcher=false
>> >> (maxTime=60000)
>> >> > and autoSoftCommit (maxTime=15000).
>> >> >
>> >> > 3) Suggestions to batch documents certainly make sense for production
>> >> code
>> >> > but in this case I am not real concerned with absolute performance; I
>> >> just
>> >> > want to see the *relative* performance as we use more Solr nodes.  So
>> I
>> >> > don't think batching or not really matters.
>> >> >
>> >> > 4) "No, that won't affect indexing speed all that much.  The way to
>> >> > increase indexing speed is to increase the number of processes or
>> threads
>> >> > that are indexing at the same time.  Instead of having one client
>> >> > sending update
>> >> > requests, try five of them."
>> >> >
>> >> > Can you elaborate on this some?  I'm worried I might be
>> misunderstanding
>> >> > something fundamental.  A cluster of 3 shards over 3 Solr nodes
>> >> > *should* support
>> >> > a higher QPS than 2 shards over 2 Solr nodes, right?  That's the whole
>> >> idea
>> >> > behind sharding.  Regarding your comment of "increase the number of
>> >> > processes or threads", note that for each value of K (number of Solr
>> >> nodes)
>> >> > I measured with several different numbers of simulated users so that I
>> >> > could find a "saturation point".  For example, take a look at my data
>> for
>> >> > K=2:
>> >> >
>> >> > Num NodesNum
>> >> UsersQPS214722517902102290215285022029002403210260320028032102
>> >> > 1003180
>> >> >
>> >> > It's clear that once the load test client has ~40 simulated users, the
>> >> Solr
>> >> > cluster is saturated.  Creating more users just increases the average
>> >> > request latency, such that the total QPS remained (nearly) constant.
>> So
>> >> I
>> >> > feel pretty confident that a cluster of size 2 *maxes out* at ~3200
>> qps.
>> >> > The problem is that I am finding roughly this same "max point", no
>> matter
>> >> > how many simulated users the load test client created, for any value
>> of K
>> >> > (> 1).
>> >> >
>> >> > Cheers,
>> >> > - Ian
>> >> >
>> >> >
>> >> > On Thu, Oct 30, 2014 at 8:01 PM, Erick Erickson <
>> [hidden email]
>> >> >
>> >> > wrote:
>> >> >
>> >> >> Your indexing client, if written in SolrJ, should use CloudSolrServer
>> >> >> which is, in Matt's terms "leader aware". It divides up the
>> >> >> documents to be indexed into packets that where each doc in
>> >> >> the packet belongs on the same shard, and then sends the packet
>> >> >> to the shard leader. This avoids a lot of re-routing and should
>> >> >> scale essentially linearly. You may have to add more clients
>> >> >> though, depending upon who hard the document-generator is
>> >> >> working.
>> >> >>
>> >> >> Also, make sure that you send batches of documents as Shawn
>> >> >> suggests, I use 1,000 as a starting point.
>> >> >>
>> >> >> Best,
>> >> >> Erick
>> >> >>
>> >> >> On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey <[hidden email]>
>> >> wrote:
>> >> >> > On 10/30/2014 2:56 PM, Ian Rose wrote:
>> >> >> >> I think this is true only for actual queries, right? I am not
>> issuing
>> >> >> >> any queries, only writes (document inserts). In the case of
>> writes,
>> >> >> >> increasing the number of shards should increase my throughput (in
>> >> >> >> ops/sec) more or less linearly, right?
>> >> >> >
>> >> >> > No, that won't affect indexing speed all that much.  The way to
>> >> increase
>> >> >> > indexing speed is to increase the number of processes or threads
>> that
>> >> >> > are indexing at the same time.  Instead of having one client
>> sending
>> >> >> > update requests, try five of them.  Also, index many documents with
>> >> each
>> >> >> > update request.  Sending one document at a time is very
>> inefficient.
>> >> >> >
>> >> >> > You didn't say how you're doing commits, but those need to be as
>> >> >> > infrequent as you can manage.  Ideally, you would use autoCommit
>> with
>> >> >> > openSearcher=false on an interval of about five minutes, and send
>> an
>> >> >> > explicit commit (with the default openSearcher=true) after all the
>> >> >> > indexing is done.
>> >> >> >
>> >> >> > You may have requirements regarding document visibility that this
>> >> won't
>> >> >> > satisfy, but try to avoid doing commits with openSearcher=true
>> (soft
>> >> >> > commits qualify for this) extremely frequently, like once a second.
>> >> >> > Once a minute is much more realistic.  Opening a new searcher is an
>> >> >> > expensive operation, especially if you have cache warming
>> configured.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Shawn
>> >> >> >
>> >> >>
>> >>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Peter Keegan
Yes, I was inadvertently sending them to a replica. When I sent them to the
leader, the leader reported (1000 adds) and the replica reported only 1 add
per document. So, it looks like the leader forwards the batched jobs
individually to the replicas.

On Fri, Oct 31, 2014 at 3:26 PM, Erick Erickson <[hidden email]>
wrote:

> Internally, the docs are batched up into smaller buckets (10 as I
> remember) and forwarded to the correct shard leader. I suspect that's
> what you're seeing.
>
> Erick
>
> On Fri, Oct 31, 2014 at 12:20 PM, Peter Keegan <[hidden email]>
> wrote:
> > Regarding batch indexing:
> > When I send batches of 1000 docs to a standalone Solr server, the log
> file
> > reports "(1000 adds)" in LogUpdateProcessor. But when I send them to the
> > leader of a replicated index, the leader log file reports much smaller
> > numbers, usually "(12 adds)". Why do the batches appear to be broken up?
> >
> > Peter
> >
> > On Fri, Oct 31, 2014 at 10:40 AM, Erick Erickson <
> [hidden email]>
> > wrote:
> >
> >> NP, just making sure.
> >>
> >> I suspect you'll get lots more bang for the buck, and
> >> results much more closely matching your expectations if
> >>
> >> 1> you batch up a bunch of docs at once rather than
> >> sending them one at a time. That's probably the easiest
> >> thing to try. Sending docs one at a time is something of
> >> an anti-pattern. I usually start with batches of 1,000.
> >>
> >> And just to check.. You're not issuing any commits from the
> >> client, right? Performance will be terrible if you issue commits
> >> after every doc, that's totally an anti-pattern. Doubly so for
> >> optimizes.... Since you showed us your solrconfig  autocommit
> >> settings I'm assuming not but want to be sure.
> >>
> >> 2> use a leader-aware client. I'm totally unfamiliar with Go,
> >> so I have no suggestions whatsoever to offer there.... But you'll
> >> want to batch in this case too.
> >>
> >> On Fri, Oct 31, 2014 at 5:51 AM, Ian Rose <[hidden email]>
> wrote:
> >> > Hi Erick -
> >> >
> >> > Thanks for the detailed response and apologies for my confusing
> >> > terminology.  I should have said "WPS" (writes per second) instead of
> QPS
> >> > but I didn't want to introduce a weird new acronym since QPS is well
> >> > known.  Clearly a bad decision on my part.  To clarify: I am doing
> >> > *only* writes
> >> > (document adds).  Whenever I wrote "QPS" I was referring to writes.
> >> >
> >> > It seems clear at this point that I should wrap up the code to do
> "smart"
> >> > routing rather than choose Solr nodes randomly.  And then see if that
> >> > changes things.  I must admit that although I understand that random
> node
> >> > selection will impose a performance hit, theoretically it seems to me
> >> that
> >> > the system should still scale up as you add more nodes (albeit at
> lower
> >> > absolute level of performance than if you used a smart router).
> >> > Nonetheless, I'm just theorycrafting here so the better thing to do is
> >> just
> >> > try it experimentally.  I hope to have that working today - will
> report
> >> > back on my findings.
> >> >
> >> > Cheers,
> >> > - Ian
> >> >
> >> > p.s. To clarify why we are rolling our own smart router code, we use
> Go
> >> > over here rather than Java.  Although if we still get bad performance
> >> with
> >> > our custom Go router I may try a pure Java load client using
> >> > CloudSolrServer to eliminate the possibility of bugs in our
> >> implementation.
> >> >
> >> >
> >> > On Fri, Oct 31, 2014 at 1:37 AM, Erick Erickson <
> [hidden email]
> >> >
> >> > wrote:
> >> >
> >> >> I'm really confused:
> >> >>
> >> >> bq: I am not issuing any queries, only writes (document inserts)
> >> >>
> >> >> bq: It's clear that once the load test client has ~40 simulated users
> >> >>
> >> >> bq: A cluster of 3 shards over 3 Solr nodes *should* support
> >> >> a higher QPS than 2 shards over 2 Solr nodes, right
> >> >>
> >> >> QPS is usually used to mean "Queries Per Second", which is different
> >> from
> >> >> the statement that "I am not issuing any queries....". And what do
> the
> >> >> number of users have to do with inserting documents?
> >> >>
> >> >> You also state: " In many cases, CPU on the solr servers is quite
> low as
> >> >> well"
> >> >>
> >> >> So let's talk about indexing first. Indexing should scale nearly
> >> >> linearly as long as
> >> >> 1> you are routing your docs to the correct leader, which happens
> with
> >> >> SolrJ
> >> >> and the CloudSolrSever automatically. Rather than rolling your own, I
> >> >> strongly
> >> >> suggest you try this out.
> >> >> 2> you have enough clients feeding the cluster to push CPU
> utilization
> >> >> on them all.
> >> >> Very often "slow indexing", or in your case "lack of scaling" is a
> >> >> result of document
> >> >> acquisition or, in your case, your doc generator is spending all it's
> >> >> time waiting for
> >> >> the individual documents to get to Solr and come back.
> >> >>
> >> >> bq: "chooses a random solr server for each ADD request (with 1 doc
> per
> >> add
> >> >> request)"
> >> >>
> >> >> Probably your culprit right there. Each and every document requires
> that
> >> >> you
> >> >> have to cross the network (and forward that doc to the correct
> leader).
> >> So
> >> >> given
> >> >> that you're not seeing high CPU utilization, I suspect that you're
> not
> >> >> sending
> >> >> enough docs to SolrCloud fast enough to see scaling. You need to
> batch
> >> up
> >> >> multiple docs, I generally send 1,000 docs at a time.
> >> >>
> >> >> But even if you do solve this, the inter-node routing will prevent
> >> >> linear scaling.
> >> >> When a doc (or a batch of docs) goes to a random Solr node, here's
> what
> >> >> happens:
> >> >> 1> the docs are re-packaged into groups based on which shard they're
> >> >> destined for
> >> >> 2> the sub-packets are forwarded to the leader for each shard
> >> >> 3> the responses are gathered back and returned to the client.
> >> >>
> >> >> This set of operations will eventually degrade the scaling.
> >> >>
> >> >> bq:  A cluster of 3 shards over 3 Solr nodes *should* support
> >> >> a higher QPS than 2 shards over 2 Solr nodes, right?  That's the
> whole
> >> idea
> >> >> behind sharding.
> >> >>
> >> >> If we're talking search requests, the answer is no. Sharding is
> >> >> what you do when your collection no longer fits on a single node.
> >> >> If it _does_ fit on a single node, then you'll usually get better
> query
> >> >> performance by adding a bunch of replicas to a single shard. When
> >> >> the number of  docs on each shard grows large enough that you
> >> >> no longer get good query performance, _then_ you shard. And
> >> >> take the query hit.
> >> >>
> >> >> If we're talking about inserts, then see above. I suspect your
> problem
> >> is
> >> >> that you're _not_ "saturating the SolrCloud cluster", you're sending
> >> >> docs to Solr very inefficiently and waiting on I/O. Batching docs and
> >> >> sending them to the right leader should scale pretty linearly until
> you
> >> >> start saturating your network.
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Thu, Oct 30, 2014 at 6:56 PM, Ian Rose <[hidden email]>
> >> wrote:
> >> >> > Thanks for the suggestions so for, all.
> >> >> >
> >> >> > 1) We are not using SolrJ on the client (not using Java at all)
> but I
> >> am
> >> >> > working on writing a "smart" router so that we can always send to
> the
> >> >> > correct node.  I am certainly curious to see how that changes
> things.
> >> >> > Nonetheless even with the overhead of extra routing hops, the
> observed
> >> >> > behavior (no increase in performance with more nodes) doesn't make
> any
> >> >> > sense to me.
> >> >> >
> >> >> > 2) Commits: we are using autoCommit with openSearcher=false
> >> >> (maxTime=60000)
> >> >> > and autoSoftCommit (maxTime=15000).
> >> >> >
> >> >> > 3) Suggestions to batch documents certainly make sense for
> production
> >> >> code
> >> >> > but in this case I am not real concerned with absolute
> performance; I
> >> >> just
> >> >> > want to see the *relative* performance as we use more Solr nodes.
> So
> >> I
> >> >> > don't think batching or not really matters.
> >> >> >
> >> >> > 4) "No, that won't affect indexing speed all that much.  The way to
> >> >> > increase indexing speed is to increase the number of processes or
> >> threads
> >> >> > that are indexing at the same time.  Instead of having one client
> >> >> > sending update
> >> >> > requests, try five of them."
> >> >> >
> >> >> > Can you elaborate on this some?  I'm worried I might be
> >> misunderstanding
> >> >> > something fundamental.  A cluster of 3 shards over 3 Solr nodes
> >> >> > *should* support
> >> >> > a higher QPS than 2 shards over 2 Solr nodes, right?  That's the
> whole
> >> >> idea
> >> >> > behind sharding.  Regarding your comment of "increase the number of
> >> >> > processes or threads", note that for each value of K (number of
> Solr
> >> >> nodes)
> >> >> > I measured with several different numbers of simulated users so
> that I
> >> >> > could find a "saturation point".  For example, take a look at my
> data
> >> for
> >> >> > K=2:
> >> >> >
> >> >> > Num NodesNum
> >> >> UsersQPS214722517902102290215285022029002403210260320028032102
> >> >> > 1003180
> >> >> >
> >> >> > It's clear that once the load test client has ~40 simulated users,
> the
> >> >> Solr
> >> >> > cluster is saturated.  Creating more users just increases the
> average
> >> >> > request latency, such that the total QPS remained (nearly)
> constant.
> >> So
> >> >> I
> >> >> > feel pretty confident that a cluster of size 2 *maxes out* at ~3200
> >> qps.
> >> >> > The problem is that I am finding roughly this same "max point", no
> >> matter
> >> >> > how many simulated users the load test client created, for any
> value
> >> of K
> >> >> > (> 1).
> >> >> >
> >> >> > Cheers,
> >> >> > - Ian
> >> >> >
> >> >> >
> >> >> > On Thu, Oct 30, 2014 at 8:01 PM, Erick Erickson <
> >> [hidden email]
> >> >> >
> >> >> > wrote:
> >> >> >
> >> >> >> Your indexing client, if written in SolrJ, should use
> CloudSolrServer
> >> >> >> which is, in Matt's terms "leader aware". It divides up the
> >> >> >> documents to be indexed into packets that where each doc in
> >> >> >> the packet belongs on the same shard, and then sends the packet
> >> >> >> to the shard leader. This avoids a lot of re-routing and should
> >> >> >> scale essentially linearly. You may have to add more clients
> >> >> >> though, depending upon who hard the document-generator is
> >> >> >> working.
> >> >> >>
> >> >> >> Also, make sure that you send batches of documents as Shawn
> >> >> >> suggests, I use 1,000 as a starting point.
> >> >> >>
> >> >> >> Best,
> >> >> >> Erick
> >> >> >>
> >> >> >> On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey <
> [hidden email]>
> >> >> wrote:
> >> >> >> > On 10/30/2014 2:56 PM, Ian Rose wrote:
> >> >> >> >> I think this is true only for actual queries, right? I am not
> >> issuing
> >> >> >> >> any queries, only writes (document inserts). In the case of
> >> writes,
> >> >> >> >> increasing the number of shards should increase my throughput
> (in
> >> >> >> >> ops/sec) more or less linearly, right?
> >> >> >> >
> >> >> >> > No, that won't affect indexing speed all that much.  The way to
> >> >> increase
> >> >> >> > indexing speed is to increase the number of processes or threads
> >> that
> >> >> >> > are indexing at the same time.  Instead of having one client
> >> sending
> >> >> >> > update requests, try five of them.  Also, index many documents
> with
> >> >> each
> >> >> >> > update request.  Sending one document at a time is very
> >> inefficient.
> >> >> >> >
> >> >> >> > You didn't say how you're doing commits, but those need to be as
> >> >> >> > infrequent as you can manage.  Ideally, you would use autoCommit
> >> with
> >> >> >> > openSearcher=false on an interval of about five minutes, and
> send
> >> an
> >> >> >> > explicit commit (with the default openSearcher=true) after all
> the
> >> >> >> > indexing is done.
> >> >> >> >
> >> >> >> > You may have requirements regarding document visibility that
> this
> >> >> won't
> >> >> >> > satisfy, but try to avoid doing commits with openSearcher=true
> >> (soft
> >> >> >> > commits qualify for this) extremely frequently, like once a
> second.
> >> >> >> > Once a minute is much more realistic.  Opening a new searcher
> is an
> >> >> >> > expensive operation, especially if you have cache warming
> >> configured.
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Shawn
> >> >> >> >
> >> >> >>
> >> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Ian Rose
Erick,

Just to make sure I am thinking about this right: batching will certainly
make a big difference in performance, but it should be more or less a
constant factor no matter how many Solr nodes you are using, right?  Right
now in my load tests, I'm not actually that concerned about the absolute
performance numbers; instead I'm just trying to figure out why relative
performance (no matter how bad it is since I am not batching) does not go
up with more Solr nodes.  Once I get that part figured out and we are
seeing more writes per sec when we add nodes, then I'll turn on batching in
the client to see what kind of additional performance gain that gets us.

Cheers,
Ian


On Fri, Oct 31, 2014 at 3:43 PM, Peter Keegan <[hidden email]>
wrote:

> Yes, I was inadvertently sending them to a replica. When I sent them to the
> leader, the leader reported (1000 adds) and the replica reported only 1 add
> per document. So, it looks like the leader forwards the batched jobs
> individually to the replicas.
>
> On Fri, Oct 31, 2014 at 3:26 PM, Erick Erickson <[hidden email]>
> wrote:
>
> > Internally, the docs are batched up into smaller buckets (10 as I
> > remember) and forwarded to the correct shard leader. I suspect that's
> > what you're seeing.
> >
> > Erick
> >
> > On Fri, Oct 31, 2014 at 12:20 PM, Peter Keegan <[hidden email]>
> > wrote:
> > > Regarding batch indexing:
> > > When I send batches of 1000 docs to a standalone Solr server, the log
> > file
> > > reports "(1000 adds)" in LogUpdateProcessor. But when I send them to
> the
> > > leader of a replicated index, the leader log file reports much smaller
> > > numbers, usually "(12 adds)". Why do the batches appear to be broken
> up?
> > >
> > > Peter
> > >
> > > On Fri, Oct 31, 2014 at 10:40 AM, Erick Erickson <
> > [hidden email]>
> > > wrote:
> > >
> > >> NP, just making sure.
> > >>
> > >> I suspect you'll get lots more bang for the buck, and
> > >> results much more closely matching your expectations if
> > >>
> > >> 1> you batch up a bunch of docs at once rather than
> > >> sending them one at a time. That's probably the easiest
> > >> thing to try. Sending docs one at a time is something of
> > >> an anti-pattern. I usually start with batches of 1,000.
> > >>
> > >> And just to check.. You're not issuing any commits from the
> > >> client, right? Performance will be terrible if you issue commits
> > >> after every doc, that's totally an anti-pattern. Doubly so for
> > >> optimizes.... Since you showed us your solrconfig  autocommit
> > >> settings I'm assuming not but want to be sure.
> > >>
> > >> 2> use a leader-aware client. I'm totally unfamiliar with Go,
> > >> so I have no suggestions whatsoever to offer there.... But you'll
> > >> want to batch in this case too.
> > >>
> > >> On Fri, Oct 31, 2014 at 5:51 AM, Ian Rose <[hidden email]>
> > wrote:
> > >> > Hi Erick -
> > >> >
> > >> > Thanks for the detailed response and apologies for my confusing
> > >> > terminology.  I should have said "WPS" (writes per second) instead
> of
> > QPS
> > >> > but I didn't want to introduce a weird new acronym since QPS is well
> > >> > known.  Clearly a bad decision on my part.  To clarify: I am doing
> > >> > *only* writes
> > >> > (document adds).  Whenever I wrote "QPS" I was referring to writes.
> > >> >
> > >> > It seems clear at this point that I should wrap up the code to do
> > "smart"
> > >> > routing rather than choose Solr nodes randomly.  And then see if
> that
> > >> > changes things.  I must admit that although I understand that random
> > node
> > >> > selection will impose a performance hit, theoretically it seems to
> me
> > >> that
> > >> > the system should still scale up as you add more nodes (albeit at
> > lower
> > >> > absolute level of performance than if you used a smart router).
> > >> > Nonetheless, I'm just theorycrafting here so the better thing to do
> is
> > >> just
> > >> > try it experimentally.  I hope to have that working today - will
> > report
> > >> > back on my findings.
> > >> >
> > >> > Cheers,
> > >> > - Ian
> > >> >
> > >> > p.s. To clarify why we are rolling our own smart router code, we use
> > Go
> > >> > over here rather than Java.  Although if we still get bad
> performance
> > >> with
> > >> > our custom Go router I may try a pure Java load client using
> > >> > CloudSolrServer to eliminate the possibility of bugs in our
> > >> implementation.
> > >> >
> > >> >
> > >> > On Fri, Oct 31, 2014 at 1:37 AM, Erick Erickson <
> > [hidden email]
> > >> >
> > >> > wrote:
> > >> >
> > >> >> I'm really confused:
> > >> >>
> > >> >> bq: I am not issuing any queries, only writes (document inserts)
> > >> >>
> > >> >> bq: It's clear that once the load test client has ~40 simulated
> users
> > >> >>
> > >> >> bq: A cluster of 3 shards over 3 Solr nodes *should* support
> > >> >> a higher QPS than 2 shards over 2 Solr nodes, right
> > >> >>
> > >> >> QPS is usually used to mean "Queries Per Second", which is
> different
> > >> from
> > >> >> the statement that "I am not issuing any queries....". And what do
> > the
> > >> >> number of users have to do with inserting documents?
> > >> >>
> > >> >> You also state: " In many cases, CPU on the solr servers is quite
> > low as
> > >> >> well"
> > >> >>
> > >> >> So let's talk about indexing first. Indexing should scale nearly
> > >> >> linearly as long as
> > >> >> 1> you are routing your docs to the correct leader, which happens
> > with
> > >> >> SolrJ
> > >> >> and the CloudSolrSever automatically. Rather than rolling your
> own, I
> > >> >> strongly
> > >> >> suggest you try this out.
> > >> >> 2> you have enough clients feeding the cluster to push CPU
> > utilization
> > >> >> on them all.
> > >> >> Very often "slow indexing", or in your case "lack of scaling" is a
> > >> >> result of document
> > >> >> acquisition or, in your case, your doc generator is spending all
> it's
> > >> >> time waiting for
> > >> >> the individual documents to get to Solr and come back.
> > >> >>
> > >> >> bq: "chooses a random solr server for each ADD request (with 1 doc
> > per
> > >> add
> > >> >> request)"
> > >> >>
> > >> >> Probably your culprit right there. Each and every document requires
> > that
> > >> >> you
> > >> >> have to cross the network (and forward that doc to the correct
> > leader).
> > >> So
> > >> >> given
> > >> >> that you're not seeing high CPU utilization, I suspect that you're
> > not
> > >> >> sending
> > >> >> enough docs to SolrCloud fast enough to see scaling. You need to
> > batch
> > >> up
> > >> >> multiple docs, I generally send 1,000 docs at a time.
> > >> >>
> > >> >> But even if you do solve this, the inter-node routing will prevent
> > >> >> linear scaling.
> > >> >> When a doc (or a batch of docs) goes to a random Solr node, here's
> > what
> > >> >> happens:
> > >> >> 1> the docs are re-packaged into groups based on which shard
> they're
> > >> >> destined for
> > >> >> 2> the sub-packets are forwarded to the leader for each shard
> > >> >> 3> the responses are gathered back and returned to the client.
> > >> >>
> > >> >> This set of operations will eventually degrade the scaling.
> > >> >>
> > >> >> bq:  A cluster of 3 shards over 3 Solr nodes *should* support
> > >> >> a higher QPS than 2 shards over 2 Solr nodes, right?  That's the
> > whole
> > >> idea
> > >> >> behind sharding.
> > >> >>
> > >> >> If we're talking search requests, the answer is no. Sharding is
> > >> >> what you do when your collection no longer fits on a single node.
> > >> >> If it _does_ fit on a single node, then you'll usually get better
> > query
> > >> >> performance by adding a bunch of replicas to a single shard. When
> > >> >> the number of  docs on each shard grows large enough that you
> > >> >> no longer get good query performance, _then_ you shard. And
> > >> >> take the query hit.
> > >> >>
> > >> >> If we're talking about inserts, then see above. I suspect your
> > problem
> > >> is
> > >> >> that you're _not_ "saturating the SolrCloud cluster", you're
> sending
> > >> >> docs to Solr very inefficiently and waiting on I/O. Batching docs
> and
> > >> >> sending them to the right leader should scale pretty linearly until
> > you
> > >> >> start saturating your network.
> > >> >>
> > >> >> Best,
> > >> >> Erick
> > >> >>
> > >> >> On Thu, Oct 30, 2014 at 6:56 PM, Ian Rose <[hidden email]>
> > >> wrote:
> > >> >> > Thanks for the suggestions so for, all.
> > >> >> >
> > >> >> > 1) We are not using SolrJ on the client (not using Java at all)
> > but I
> > >> am
> > >> >> > working on writing a "smart" router so that we can always send to
> > the
> > >> >> > correct node.  I am certainly curious to see how that changes
> > things.
> > >> >> > Nonetheless even with the overhead of extra routing hops, the
> > observed
> > >> >> > behavior (no increase in performance with more nodes) doesn't
> make
> > any
> > >> >> > sense to me.
> > >> >> >
> > >> >> > 2) Commits: we are using autoCommit with openSearcher=false
> > >> >> (maxTime=60000)
> > >> >> > and autoSoftCommit (maxTime=15000).
> > >> >> >
> > >> >> > 3) Suggestions to batch documents certainly make sense for
> > production
> > >> >> code
> > >> >> > but in this case I am not real concerned with absolute
> > performance; I
> > >> >> just
> > >> >> > want to see the *relative* performance as we use more Solr nodes.
> > So
> > >> I
> > >> >> > don't think batching or not really matters.
> > >> >> >
> > >> >> > 4) "No, that won't affect indexing speed all that much.  The way
> to
> > >> >> > increase indexing speed is to increase the number of processes or
> > >> threads
> > >> >> > that are indexing at the same time.  Instead of having one client
> > >> >> > sending update
> > >> >> > requests, try five of them."
> > >> >> >
> > >> >> > Can you elaborate on this some?  I'm worried I might be
> > >> misunderstanding
> > >> >> > something fundamental.  A cluster of 3 shards over 3 Solr nodes
> > >> >> > *should* support
> > >> >> > a higher QPS than 2 shards over 2 Solr nodes, right?  That's the
> > whole
> > >> >> idea
> > >> >> > behind sharding.  Regarding your comment of "increase the number
> of
> > >> >> > processes or threads", note that for each value of K (number of
> > Solr
> > >> >> nodes)
> > >> >> > I measured with several different numbers of simulated users so
> > that I
> > >> >> > could find a "saturation point".  For example, take a look at my
> > data
> > >> for
> > >> >> > K=2:
> > >> >> >
> > >> >> > Num NodesNum
> > >> >> UsersQPS214722517902102290215285022029002403210260320028032102
> > >> >> > 1003180
> > >> >> >
> > >> >> > It's clear that once the load test client has ~40 simulated
> users,
> > the
> > >> >> Solr
> > >> >> > cluster is saturated.  Creating more users just increases the
> > average
> > >> >> > request latency, such that the total QPS remained (nearly)
> > constant.
> > >> So
> > >> >> I
> > >> >> > feel pretty confident that a cluster of size 2 *maxes out* at
> ~3200
> > >> qps.
> > >> >> > The problem is that I am finding roughly this same "max point",
> no
> > >> matter
> > >> >> > how many simulated users the load test client created, for any
> > value
> > >> of K
> > >> >> > (> 1).
> > >> >> >
> > >> >> > Cheers,
> > >> >> > - Ian
> > >> >> >
> > >> >> >
> > >> >> > On Thu, Oct 30, 2014 at 8:01 PM, Erick Erickson <
> > >> [hidden email]
> > >> >> >
> > >> >> > wrote:
> > >> >> >
> > >> >> >> Your indexing client, if written in SolrJ, should use
> > CloudSolrServer
> > >> >> >> which is, in Matt's terms "leader aware". It divides up the
> > >> >> >> documents to be indexed into packets that where each doc in
> > >> >> >> the packet belongs on the same shard, and then sends the packet
> > >> >> >> to the shard leader. This avoids a lot of re-routing and should
> > >> >> >> scale essentially linearly. You may have to add more clients
> > >> >> >> though, depending upon who hard the document-generator is
> > >> >> >> working.
> > >> >> >>
> > >> >> >> Also, make sure that you send batches of documents as Shawn
> > >> >> >> suggests, I use 1,000 as a starting point.
> > >> >> >>
> > >> >> >> Best,
> > >> >> >> Erick
> > >> >> >>
> > >> >> >> On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey <
> > [hidden email]>
> > >> >> wrote:
> > >> >> >> > On 10/30/2014 2:56 PM, Ian Rose wrote:
> > >> >> >> >> I think this is true only for actual queries, right? I am not
> > >> issuing
> > >> >> >> >> any queries, only writes (document inserts). In the case of
> > >> writes,
> > >> >> >> >> increasing the number of shards should increase my throughput
> > (in
> > >> >> >> >> ops/sec) more or less linearly, right?
> > >> >> >> >
> > >> >> >> > No, that won't affect indexing speed all that much.  The way
> to
> > >> >> increase
> > >> >> >> > indexing speed is to increase the number of processes or
> threads
> > >> that
> > >> >> >> > are indexing at the same time.  Instead of having one client
> > >> sending
> > >> >> >> > update requests, try five of them.  Also, index many documents
> > with
> > >> >> each
> > >> >> >> > update request.  Sending one document at a time is very
> > >> inefficient.
> > >> >> >> >
> > >> >> >> > You didn't say how you're doing commits, but those need to be
> as
> > >> >> >> > infrequent as you can manage.  Ideally, you would use
> autoCommit
> > >> with
> > >> >> >> > openSearcher=false on an interval of about five minutes, and
> > send
> > >> an
> > >> >> >> > explicit commit (with the default openSearcher=true) after all
> > the
> > >> >> >> > indexing is done.
> > >> >> >> >
> > >> >> >> > You may have requirements regarding document visibility that
> > this
> > >> >> won't
> > >> >> >> > satisfy, but try to avoid doing commits with openSearcher=true
> > >> (soft
> > >> >> >> > commits qualify for this) extremely frequently, like once a
> > second.
> > >> >> >> > Once a minute is much more realistic.  Opening a new searcher
> > is an
> > >> >> >> > expensive operation, especially if you have cache warming
> > >> configured.
> > >> >> >> >
> > >> >> >> > Thanks,
> > >> >> >> > Shawn
> > >> >> >> >
> > >> >> >>
> > >> >>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Shawn Heisey-2
On 11/1/2014 9:52 AM, Ian Rose wrote:
> Just to make sure I am thinking about this right: batching will certainly
> make a big difference in performance, but it should be more or less a
> constant factor no matter how many Solr nodes you are using, right?  Right
> now in my load tests, I'm not actually that concerned about the absolute
> performance numbers; instead I'm just trying to figure out why relative
> performance (no matter how bad it is since I am not batching) does not go
> up with more Solr nodes.  Once I get that part figured out and we are
> seeing more writes per sec when we add nodes, then I'll turn on batching in
> the client to see what kind of additional performance gain that gets us.

The basic problem I see with your methodology is that you are sending an
update request and waiting for it to complete before sending another.
No matter how big the batches are, this is an inefficient use of resources.

If you send many such requests at the same time, then they will be
handled in parallel.  Lucene (and by extension, Solr) has the thread
synchronization required to keep multiple simultaneous update requests
from stomping on each other and corrupting the index.

If you have enough CPU cores, such handling will *truly* be in parallel,
otherwise the operating system will just take turns giving each thread
CPU time.  This results in a pretty good facsimile of parallel
operation, but because it splits the available CPU resources, isn't as
fast as true parallel operation.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Erick Erickson
bq: but it should be more or less a constant factor no matter how many
Solr nodes you are using, right?

Not really. You've stated that you're not driving Solr very hard in
your tests. Therefore you're waiting on I/O. Therefore your tests just
aren't going to scale linearly with the number of shards. This is a
simplification, but....

Your network utilization is pretty much irrelevant. I send a packet
somewhere. "somewhere" does some stuff and sends me back an
acknowledgement. While I'm waiting, the network is getting no traffic,
so..... If the network traffic was in the 90% range that would be
different, so it's a good thing to monitor.

Really, use a "leader aware" client and rack enough clients together
that you're driving Solr hard. Then double the number of shards. Then
rack enough _more_ clients to drive Solr at the same level. In this
case I'll go out on a limb and predict near 2x throughput increases.

One additional note, though. When you add _replicas_ to shards expect
to see a drop in throughput that may be quite significant, 20-40%
anecdotally...

Best,
Erick

On Sat, Nov 1, 2014 at 9:23 AM, Shawn Heisey <[hidden email]> wrote:

> On 11/1/2014 9:52 AM, Ian Rose wrote:
>> Just to make sure I am thinking about this right: batching will certainly
>> make a big difference in performance, but it should be more or less a
>> constant factor no matter how many Solr nodes you are using, right?  Right
>> now in my load tests, I'm not actually that concerned about the absolute
>> performance numbers; instead I'm just trying to figure out why relative
>> performance (no matter how bad it is since I am not batching) does not go
>> up with more Solr nodes.  Once I get that part figured out and we are
>> seeing more writes per sec when we add nodes, then I'll turn on batching in
>> the client to see what kind of additional performance gain that gets us.
>
> The basic problem I see with your methodology is that you are sending an
> update request and waiting for it to complete before sending another.
> No matter how big the batches are, this is an inefficient use of resources.
>
> If you send many such requests at the same time, then they will be
> handled in parallel.  Lucene (and by extension, Solr) has the thread
> synchronization required to keep multiple simultaneous update requests
> from stomping on each other and corrupting the index.
>
> If you have enough CPU cores, such handling will *truly* be in parallel,
> otherwise the operating system will just take turns giving each thread
> CPU time.  This results in a pretty good facsimile of parallel
> operation, but because it splits the available CPU resources, isn't as
> fast as true parallel operation.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Ian Rose
Hi again, all -

Since several people were kind enough to jump in to offer advice on this
thread, I wanted to follow up in case anyone finds this useful in the
future.

*tl;dr: *Routing updates to a random Solr node (and then letting it forward
the docs to where they need to go) is very expensive, more than I
expected.  Using a "smart" router that uses the cluster config to route
documents directly to their shard results in (near) linear scaling for us.

*Expository version:*

We use Go on our client, for which (to my knowledge) there is no SolrCloud
router implementation.  So we started by just routing updates to a random
Solr node and letting it forward the docs to where they need to go.  My
theory was that this would lead to a constant amount of additional work
(and thus still linear scaling).  This was based on the observation that if
you send an update of K documents to a Solr node in a N node cluster, in
the worst case scenario, all K documents will need to be forwarded on to
other nodes.  Since Solr nodes have perfect knowledge of where docs belong,
each doc would only take 1 additional hop to get to its replica.  So random
routing (in the limit) imposes 1 additional network hop for each document.

In practice, however, we find that (for small networks, at least) per-node
performance falls as you add shards.  In fact, the client performance (in
writes/sec) was essentially constant no matter how many shards we added.  I
do have a working theory as to why this might be (i.e. where the flaw is in
my logic above) but as this is merely an unverified theory I don't want to
lead anyone astray by writing it up here.

However, by writing a "smart" router that retrieves the clusterstate.json
file from Zookeeper and uses that to "perfectly" route documents to their
proper shard, we were able to achieve much better scalability.  Using a
synthetic workload, we were able to achieve 141.7 writes/sec to a cluster
of size 1 and 2506 writes/sec to a cluster of size 20 (125
writes/sec/node).  So a dropoff of ~12% which is not too bad.  We are
hoping to continue our tests with larger clusters to ensure that the
per-node write performance levels off and does not continue to drop as the
cluster scales.

I will also note that we initially had several bugs in our "smart" router
implementation so if you follow a similar path and see bad performance look
to your router implementation as you might not be routing correctly.  We
ended up writing a simple proxy that we ran in front of Solr to observe all
requests which helped immensely when verifying and debugging our router.
Yes tcpdump does something similar but viewing HTTP-level traffic is way
more convenient than TCP-level.  Plus Go makes little proxies like this
super easy to do.

Hope all that is useful to someone.  Thanks again to the posters above for
providing suggestions!

- Ian



On Sat, Nov 1, 2014 at 7:13 PM, Erick Erickson <[hidden email]>
wrote:

> bq: but it should be more or less a constant factor no matter how many
> Solr nodes you are using, right?
>
> Not really. You've stated that you're not driving Solr very hard in
> your tests. Therefore you're waiting on I/O. Therefore your tests just
> aren't going to scale linearly with the number of shards. This is a
> simplification, but....
>
> Your network utilization is pretty much irrelevant. I send a packet
> somewhere. "somewhere" does some stuff and sends me back an
> acknowledgement. While I'm waiting, the network is getting no traffic,
> so..... If the network traffic was in the 90% range that would be
> different, so it's a good thing to monitor.
>
> Really, use a "leader aware" client and rack enough clients together
> that you're driving Solr hard. Then double the number of shards. Then
> rack enough _more_ clients to drive Solr at the same level. In this
> case I'll go out on a limb and predict near 2x throughput increases.
>
> One additional note, though. When you add _replicas_ to shards expect
> to see a drop in throughput that may be quite significant, 20-40%
> anecdotally...
>
> Best,
> Erick
>
> On Sat, Nov 1, 2014 at 9:23 AM, Shawn Heisey <[hidden email]> wrote:
> > On 11/1/2014 9:52 AM, Ian Rose wrote:
> >> Just to make sure I am thinking about this right: batching will
> certainly
> >> make a big difference in performance, but it should be more or less a
> >> constant factor no matter how many Solr nodes you are using, right?
> Right
> >> now in my load tests, I'm not actually that concerned about the absolute
> >> performance numbers; instead I'm just trying to figure out why relative
> >> performance (no matter how bad it is since I am not batching) does not
> go
> >> up with more Solr nodes.  Once I get that part figured out and we are
> >> seeing more writes per sec when we add nodes, then I'll turn on
> batching in
> >> the client to see what kind of additional performance gain that gets us.
> >
> > The basic problem I see with your methodology is that you are sending an
> > update request and waiting for it to complete before sending another.
> > No matter how big the batches are, this is an inefficient use of
> resources.
> >
> > If you send many such requests at the same time, then they will be
> > handled in parallel.  Lucene (and by extension, Solr) has the thread
> > synchronization required to keep multiple simultaneous update requests
> > from stomping on each other and corrupting the index.
> >
> > If you have enough CPU cores, such handling will *truly* be in parallel,
> > otherwise the operating system will just take turns giving each thread
> > CPU time.  This results in a pretty good facsimile of parallel
> > operation, but because it splits the available CPU resources, isn't as
> > fast as true parallel operation.
> >
> > Thanks,
> > Shawn
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Shawn Heisey-2
On 11/7/2014 7:17 AM, Ian Rose wrote:
> *tl;dr: *Routing updates to a random Solr node (and then letting it forward
> the docs to where they need to go) is very expensive, more than I
> expected.  Using a "smart" router that uses the cluster config to route
> documents directly to their shard results in (near) linear scaling for us.

I will admit that I do not know everything that has to happen in order
to bounce updates to the proper shard leader, but I would have expected
the overhead involved to be relatively small.

I have opened an issue so we can see whether this situation can be improved.

https://issues.apache.org/jira/browse/SOLR-6717

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Ideas for debugging poor SolrCloud scalability

Erick Erickson
Ian:

Thanks much for the writeup! It's always good to have real-world documentation!

Best,
Erick

On Fri, Nov 7, 2014 at 8:26 AM, Shawn Heisey <[hidden email]> wrote:

> On 11/7/2014 7:17 AM, Ian Rose wrote:
>> *tl;dr: *Routing updates to a random Solr node (and then letting it forward
>> the docs to where they need to go) is very expensive, more than I
>> expected.  Using a "smart" router that uses the cluster config to route
>> documents directly to their shard results in (near) linear scaling for us.
>
> I will admit that I do not know everything that has to happen in order
> to bounce updates to the proper shard leader, but I would have expected
> the overhead involved to be relatively small.
>
> I have opened an issue so we can see whether this situation can be improved.
>
> https://issues.apache.org/jira/browse/SOLR-6717
>
> Thanks,
> Shawn
>