Some performance questions....

classic Classic list List threaded Threaded
64 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sun, Mar 25, 2018 at 2:24 PM, Shawn Heisey <[hidden email]> wrote:

> On 3/25/2018 1:45 AM, Shawn Heisey wrote:
>
>> I have written a little test program that can pound the system harder,
>> need a little more time to gather what I learned with it.
>>
>
> Here's the code and three results with different threadcounts:
>
> https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379
>
> I ran the program several times while writing it.  Once I had it finished,
> I did the 20 thread run first, then the 100 thread run, and then the 200
> thread run.  Gist re-ordered my files, wasn't expecting that.
>
>
$ Why is the 'qps' not increasing with increase in threads? (If I
understand the qps parameter right?)

$ Is it possible to run with 10 & 5 & 2 threads?

$ What were the server utilisation (CPU, Memory) when you ran the test?

$ The 'query median' increases from 35 to 470 as you increase threads from
20 to 200 (You had mentioned earlier that QTime for Banjo query was 11 when
you had hit it the second time around)

$ Can you please give Linux server configuration if possible?


> It was executed inside eclipse on a Windows 7 system.  The Solr servers
> are running Linux.  This is a distributed index with 7 total shards running
> on two servers.  The "shards" parameter is defined on the server side in
> the 'ncmain' core, which has an empty index.  The servers are NOT running
> in SolrCloud mode.
>
> As you can see in the code, I was using exactly the same query every time
> -- that "banjo" query that I tried earlier.
>
> I have to try and remember how to build a simple program like this on the
> commandline before I can try it in Linux.  I don't know if it would see a
> performance improvement running on Linux.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Shawn Heisey-2
On 3/25/2018 7:15 AM, Deepak Goel wrote:
> $ Why is the 'qps' not increasing with increase in threads? (If I
> understand the qps parameter right?)

Likely because I sent all these queries to a single copy of the index. 
We only have two copies of the index in production, plus a third copy on
a dev server running a newer version of Solr. I sent the queries from
the test program to the production server pair that's designated
"standby" -- not receiving queries unless the other pair is down.

Our Solr servers do not handle a high query load.  It's usually less
than two queries per second.

Handling a very high query load requires load balancing to multiple
copies of the index (replicas in SolrCloud terminology). We don't need
that, so we don't have a bunch of copies.  The only reason we have two
copies is so we can handle hardware failure gracefully.  I bypassed the
load balancer for these tests.

> $ Is it possible to run with 10 & 5 & 2 threads?

Sure.

I have updated the gist with those results.

https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379

> $ What were the server utilisation (CPU, Memory) when you ran the test?

I actually never looked when I was running the tests before.  I ran
additional tests so I could gather that data.  The updated gist has
vmstat information (while running a 20 thread test, and while running a
200 thread test) for the server side. The server named idxa1 has a
higher CPU load because it is aggregating the shard data and replying to
the query, in addition to serving three out of the seven shards.  The
server named idxa2 has four shards.  The extra shard on idxa2 is very
small - a little over 321000 docs, a little over 500MB disk used.  This
is where new docs are written.

The CPU load on idxa2 is similar for both thread levels.  I this is
because all queries are served from cache.  But idxa1 shows a higher
load, because even when the cache is used, that server must still
aggregate the shard data (which was pulled from cache) and create
responses.  The aggregation is not cached, because Solr has no way to
know that what it is receiving from the shards is cached data.

Here's the benchmark output from the 200 thread test when I was getting
the CPU information:

query count: 200000
elapsed count: 200000
query median: 488.0
elapsed median: 500.0
query 75th: 674.0
elapsed 75th: 686.0
query 95th: 1006.0
elapsed 95th: 1018.0
query 99th: 1283.01
elapsed 99th: 1299.0
total time in seconds: 542
numThreads: 200
queries per thread: 1000
qps: 369

> $ The 'query median' increases from 35 to 470 as you increase threads from
> 20 to 200 (You had mentioned earlier that QTime for Banjo query was 11 when
> you had hit it the second time around)

When I got 11 ms, that was doing *one* query.  This program does a lot
of them, so I'm not surprised by the increase.  I did the one-off
queries on the dev server, not the standby production servers that
received the load test.  The hardware specs are similar, except that in
dev, the entire index is on one server running Solr 6.6.2.  That server
also contains other indexes not being handled by the production pair I
used for the load test.

> $ Can you please give Linux server configuration if possible?

What *exactly* are you looking for here?  I've got some information
below, but I do not know if it's what you are after.

High level, first server (idxa1):
Dell PowerEdge 2950 III
Two 4-core CPUs.
model name      : Intel(R) Xeon(R) CPU           E5440  @ 2.83GHz
64GB memory
Solr is version 4.7.2, with an 8GB heap
About 140GB of index data
CentOS 6, kernel 2.6.32-431.11.2.el6.centos.plus.x86_64
Oracla java:
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

Differences on the second server (idxa2):
model name      : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
Slightly more (about 500MB) index data.
2.6.32-504.12.2.el6.centos.plus.x86_64.

The whole production index is in the ballpark of 280GB, and contains
over 187 million docs.  The dev server has more than 188 million docs. 
I think the reason that the counts are different is because we very
recently deleted a bunch of data from the database, but skipped the
update of the Solr index for the deletion.  The production indexes have
been rebuilt since the delete, but the dev index hasn't.

The network between the client running the test and the Solr servers
includes a layer 3 switch, some layer 2 switches, and a firewall.  All
network hardware is made by Cisco.  The entire path (including the
firewall) is gigabit.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
Some observations:

*#* The CPU Load on idxa1 never crosses above 91% mark mostly even if you
increase the load (by increasing the number of threads). This is similar to
my environment (I can never cross 90% on Linux even if I increase the load.
For Windows I can never cross 65% for some reason)

*#* Similarly the CPU Load on idxa2 never crosses 50% (I guess this follows
from the above point)

*#* Your system saturates at 10 threads (The qps hits the highest mark at
this load). Increasing the load further (number of threads - 20, 100, 200)
only worsens the response time, while the qps remains the same

*#* The Query-Time is anywhere between 25-100ms. For 200 threads, the
Query-Time is between 500-1400ms. This is for a load of 'Static-Query'.

A 'Dynamic-Query' load would only worsen the Query-Time (It will also
probably bring down the qps and max-cpu-utilisation)

*#* The author has a similar hardware configuration as yours (idxa1). The
author has not specified the OS though.

If it is Windows, then I would believe it might be a good idea to have 2
VM's on his box

If it is Linux, it might be a good idea to decide once someone does the
test with Dynamic-Query Load. If the author has a load of Static-Query,
then having one VM on his box should be fine as 90% of CPU resources can be
consumed (However he would loose on Reliability, Availability as compared
to 2 VM's)

Some other points:

*@* I would have liked to have the vmstat information for 10,5,7,8 threads

*@* Also if you could run the test for 7 and 8 threads (Because at 10
threads system saturates and at 5 threads the load is less)

*@* Can you please also do a Load-Test for Dynamic-Queries with 5-10
threads (I am sorry for asking too much. You can please ignore these
demands if it is too much). I will do the same on my environment



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sun, Mar 25, 2018 at 9:45 PM, Shawn Heisey <[hidden email]> wrote:

> On 3/25/2018 7:15 AM, Deepak Goel wrote:
>
>> $ Why is the 'qps' not increasing with increase in threads? (If I
>> understand the qps parameter right?)
>>
>
> Likely because I sent all these queries to a single copy of the index.  We
> only have two copies of the index in production, plus a third copy on a dev
> server running a newer version of Solr. I sent the queries from the test
> program to the production server pair that's designated "standby" -- not
> receiving queries unless the other pair is down.
>
> Our Solr servers do not handle a high query load.  It's usually less than
> two queries per second.
>
> Handling a very high query load requires load balancing to multiple copies
> of the index (replicas in SolrCloud terminology). We don't need that, so we
> don't have a bunch of copies.  The only reason we have two copies is so we
> can handle hardware failure gracefully.  I bypassed the load balancer for
> these tests.
>
> $ Is it possible to run with 10 & 5 & 2 threads?
>>
>
> Sure.
>
> I have updated the gist with those results.
>
> https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379
>
> $ What were the server utilisation (CPU, Memory) when you ran the test?
>>
>
> I actually never looked when I was running the tests before.  I ran
> additional tests so I could gather that data.  The updated gist has vmstat
> information (while running a 20 thread test, and while running a 200 thread
> test) for the server side. The server named idxa1 has a higher CPU load
> because it is aggregating the shard data and replying to the query, in
> addition to serving three out of the seven shards.  The server named idxa2
> has four shards.  The extra shard on idxa2 is very small - a little over
> 321000 docs, a little over 500MB disk used.  This is where new docs are
> written.
>
> The CPU load on idxa2 is similar for both thread levels.  I this is
> because all queries are served from cache.  But idxa1 shows a higher load,
> because even when the cache is used, that server must still aggregate the
> shard data (which was pulled from cache) and create responses.  The
> aggregation is not cached, because Solr has no way to know that what it is
> receiving from the shards is cached data.
>
> Here's the benchmark output from the 200 thread test when I was getting
> the CPU information:
>
> query count: 200000
> elapsed count: 200000
> query median: 488.0
> elapsed median: 500.0
> query 75th: 674.0
> elapsed 75th: 686.0
> query 95th: 1006.0
> elapsed 95th: 1018.0
> query 99th: 1283.01
> elapsed 99th: 1299.0
> total time in seconds: 542
> numThreads: 200
> queries per thread: 1000
> qps: 369
>
> $ The 'query median' increases from 35 to 470 as you increase threads from
>> 20 to 200 (You had mentioned earlier that QTime for Banjo query was 11
>> when
>> you had hit it the second time around)
>>
>
> When I got 11 ms, that was doing *one* query.  This program does a lot of
> them, so I'm not surprised by the increase.  I did the one-off queries on
> the dev server, not the standby production servers that received the load
> test.  The hardware specs are similar, except that in dev, the entire index
> is on one server running Solr 6.6.2.  That server also contains other
> indexes not being handled by the production pair I used for the load test.
>
> $ Can you please give Linux server configuration if possible?
>>
>
> What *exactly* are you looking for here?  I've got some information below,
> but I do not know if it's what you are after.
>
> High level, first server (idxa1):
> Dell PowerEdge 2950 III
> Two 4-core CPUs.
> model name      : Intel(R) Xeon(R) CPU           E5440  @ 2.83GHz
> 64GB memory
> Solr is version 4.7.2, with an 8GB heap
> About 140GB of index data
> CentOS 6, kernel 2.6.32-431.11.2.el6.centos.plus.x86_64
> Oracla java:
> java version "1.7.0_72"
> Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)
>
> Differences on the second server (idxa2):
> model name      : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
> Slightly more (about 500MB) index data.
> 2.6.32-504.12.2.el6.centos.plus.x86_64.
>
> The whole production index is in the ballpark of 280GB, and contains over
> 187 million docs.  The dev server has more than 188 million docs.  I think
> the reason that the counts are different is because we very recently
> deleted a bunch of data from the database, but skipped the update of the
> Solr index for the deletion.  The production indexes have been rebuilt
> since the delete, but the dev index hasn't.
>
> The network between the client running the test and the Solr servers
> includes a layer 3 switch, some layer 2 switches, and a firewall.  All
> network hardware is made by Cisco.  The entire path (including the
> firewall) is gigabit.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Walter Underwood
In reply to this post by Deepak Goel
> On Mar 24, 2018, at 5:21 PM, Deepak Goel <[hidden email]> wrote:
>
> My first test was to test with static queries. Does Solr scale-up as we
> increase the load of same query?
>
> The second test would be to check with 'Different Queries'.
>
> And then finally check with 80% similar queries and 20% different queries.

You insulted me when I gave a clear explanation about how to run a meaningful benchmark.

Now you give results from a totally invalid benchmark. If you are getting slow responses from a one-query “benchmark”, you have serious system or configuration mistakes. That should be returning in a few milliseconds.

Also, 80/20 isn’t even close to a realistic query load.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

1234