Some performance questions....

classic Classic list List threaded Threaded
64 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Walter Underwood
> On Mar 17, 2018, at 3:23 AM, Deepak Goel <[hidden email]> wrote:
>
> Sorry for being rude. But the ' results ' please, not the ' road to the
> results '

We have 15 different search collections, all different sizes and all with different kinds of queries. Here are the two major ones.

22 million docs
32 server Solr Cloud cluster, EC2 c4.8xlarge instances (36 CPU, 59 GB RAM)
Solr 6.6.2
4 shards
24,000 requests/minute
95th percentile query response time 5 to 7 seconds

250,000 docs
4 server Solr master/slave cluster, EC2 c4.4xlarge (16 CPU, 30 GB RAM)
Solr 4.10.4
60,000 requests/minute
95th percentile 100 ms

That should make everything crystal clear.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Shawn Heisey-2
In reply to this post by Deepak Goel
On 3/16/2018 4:24 PM, Deepak Goel wrote:
> It is taking less than 100ms to create a HttpSolrClient Object

"Less than 100ms" is vague.  Let's say by that you mean it takes at
least 50 milliseconds.  This is a lot slower than I expected it to be,
but if you've measured it, I'll accept that.

If every single thread you're running has to spend 50 milliseconds or
more creating a client before it can actually send a request, then the
application is going to be spending a lot of time NOT sending requests,
but creating and destroying clients.  (You didn't indicate how long the
close() takes)

Your numbers indicated a response time of 1426 milliseconds for Solr. 
If this is an average or a median, then that is not a fast query.  These
numbers make me question the entire benchmark setup.  Based on the code
provided, I don't see how the numbers can be that bad, even if we assume
that up to 100 milliseconds is spent creating every client.

Because the ES numbers are so much worse than the Solr numbers, I'm
betting that creating an ES client is even less efficient than creating
a Solr client.  If that's the case, I do not know why ... maybe that
client runs through more startup checks than a Solr client does. 
Creation time for the client shouldn't matter, since it should only be
done once for every benchmark run, and the time spent creating the
client shouldn't be counted in the benchmark numbers.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
In reply to this post by Shawn Heisey-2
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 17, 2018 at 2:56 AM, Shawn Heisey <[hidden email]> wrote:

> On 3/16/2018 2:21 PM, Deepak Goel wrote:
> > I wanted to test how many max connections can Solr handle concurrently.
> > Also I would have to implement an 'connection pooling' of the
> client-object
> > connections rather than a single connection thread
> >
> > However a single client object with thousands of queries coming in would
> > surely become a bottleneck. I can test this scenario too.
>
> Handling thousands of simultaneous queries is NOT something you can
> expect a single Solr server to do.  It's not going to happen.  It
> wouldn't happen with ES, either.  Handling that much load requires load
> balancing to a LOT of servers.  The server would much more of a
> bottleneck than the client.
>
> > The problem is the max throughput which I can get on the machine is
> around
> > 28 tps, even though I increase the load further & only 65% CPU is
> utilised
> > (there is still 35% which is not being used). This clearly indicates the
> > software is a problem as there is enough hardware resources.
>
> If your code is creating a client object before every single query, that
> could be part of the issue.  The benchmark code should be using the same
> client for all requests.  I really don't know how long it takes to
> create HttpSolrClient objects, but I don't imagine that it's super-fast.
>
> What version of SolrJ were you using?
>
> Depending on the SolrJ version you may need to create the client with a
> custom HttpClient object in order to allow it to handle plenty of
> threads.  This is how I create client objects in my SolrJ code:
>
>   RequestConfig rc = RequestConfig.custom().setConnectTimeout(2000)
>     .setSocketTimeout(60000).build();
>   CloseableHttpClient httpClient =
> HttpClients.custom().setDefaultRequestConfig(rc).setMaxConnPerRoute(1024)
>     .setMaxConnTotal(4096).disableAutomaticRetries().build();
>
>   SolrClient sc = new HttpSolrClient.Builder().withBaseSolrUrl(solrUrl)
>     .withHttpClient(httpClient).build();
>
> I tried the above suggestion. The throughput and utilisation remain the
same (they dont increase even if I increase the load). The response time
comes down.







*SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
(Windows)27.8142665UnTuned (Linux)Partially Tuned (Linux)Partially Tuned
(Windows)28.11.10560 *I am going to give your suggestion a spin on Linux
next (This might take a day or two)



> Thanks,
> Shawn
>
>

<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
In reply to this post by Walter Underwood
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Mar 19, 2018 at 2:40 AM, Walter Underwood <[hidden email]>
wrote:

> > On Mar 17, 2018, at 3:23 AM, Deepak Goel <[hidden email]> wrote:
> >
> > Sorry for being rude. But the ' results ' please, not the ' road to the
> > results '
>
> We have 15 different search collections, all different sizes and all with
> different kinds of queries. Here are the two major ones.
>
> 22 million docs
> 32 server Solr Cloud cluster, EC2 c4.8xlarge instances (36 CPU, 59 GB RAM)
> Solr 6.6.2
> 4 shards
> 24,000 requests/minute
> 95th percentile query response time 5 to 7 seconds
>
> 250,000 docs
> 4 server Solr master/slave cluster, EC2 c4.4xlarge (16 CPU, 30 GB RAM)
> Solr 4.10.4
> 60,000 requests/minute
> 95th percentile 100 ms
>
> This does not help at all. If you look at the author's question, i think
it is about a single server. You will have to post your results (25%CPU,
50%CPU, 75%CPU, 100%CPU) for a single server (how does the server scale
with increase in load)


> That should make everything crystal clear.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
>

<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
In reply to this post by Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Thu, Mar 22, 2018 at 1:25 AM, Deepak Goel <[hidden email]> wrote:

>
>
>
>
> Deepak
> "Please stop cruelty to Animals, help by becoming a Vegan"
> +91 73500 12833
> [hidden email]
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> On Sat, Mar 17, 2018 at 2:56 AM, Shawn Heisey <[hidden email]> wrote:
>
>> On 3/16/2018 2:21 PM, Deepak Goel wrote:
>> > I wanted to test how many max connections can Solr handle concurrently.
>> > Also I would have to implement an 'connection pooling' of the
>> client-object
>> > connections rather than a single connection thread
>> >
>> > However a single client object with thousands of queries coming in would
>> > surely become a bottleneck. I can test this scenario too.
>>
>> Handling thousands of simultaneous queries is NOT something you can
>> expect a single Solr server to do.  It's not going to happen.  It
>> wouldn't happen with ES, either.  Handling that much load requires load
>> balancing to a LOT of servers.  The server would much more of a
>> bottleneck than the client.
>>
>> > The problem is the max throughput which I can get on the machine is
>> around
>> > 28 tps, even though I increase the load further & only 65% CPU is
>> utilised
>> > (there is still 35% which is not being used). This clearly indicates the
>> > software is a problem as there is enough hardware resources.
>>
>> If your code is creating a client object before every single query, that
>> could be part of the issue.  The benchmark code should be using the same
>> client for all requests.  I really don't know how long it takes to
>> create HttpSolrClient objects, but I don't imagine that it's super-fast.
>>
>> What version of SolrJ were you using?
>>
>> Depending on the SolrJ version you may need to create the client with a
>> custom HttpClient object in order to allow it to handle plenty of
>> threads.  This is how I create client objects in my SolrJ code:
>>
>>   RequestConfig rc = RequestConfig.custom().setConnectTimeout(2000)
>>     .setSocketTimeout(60000).build();
>>   CloseableHttpClient httpClient =
>> HttpClients.custom().setDefaultRequestConfig(rc).setMaxConnPerRoute(1024)
>>     .setMaxConnTotal(4096).disableAutomaticRetries().build();
>>
>>   SolrClient sc = new HttpSolrClient.Builder().withBaseSolrUrl(solrUrl)
>>     .withHttpClient(httpClient).build();
>>
>> I tried the above suggestion. The throughput and utilisation remain the
> same (they dont increase even if I increase the load). The response time
> comes down.
>
>
>
>
>
>
>
> *SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
> (Windows)27.8142665UnTuned (Linux)Partially Tuned (Linux)Partially Tuned
> (Windows)28.11.10560 *I am going to give your suggestion a spin on Linux
> next (This might take a day or two)
>
>
>

This is how the Linux results look like


*SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
(Windows)27.8142665UnTuned (Linux)34528091Partially Tuned
(Linux)56417290Partially Tuned (Windows)28.11.10560*


Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
In reply to this post by Shawn Heisey-2
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Tue, Mar 20, 2018 at 3:32 AM, Shawn Heisey <[hidden email]> wrote:

> On 3/16/2018 4:24 PM, Deepak Goel wrote:
> > It is taking less than 100ms to create a HttpSolrClient Object
>
> "Less than 100ms" is vague.  Let's say by that you mean it takes at
> least 50 milliseconds.  This is a lot slower than I expected it to be,
> but if you've measured it, I'll accept that.
>
>
The results were a bit volatile from test to test. It used to take
sometimes 75ms and sometimes around 95ms. So I have stated the upper-bound
on the results (100ms)

(Sorry for being rude) However you don't need to accept my results. May I
suggest you to measure it yourself (or anyone else can also do it)


> If every single thread you're running has to spend 50 milliseconds or
> more creating a client before it can actually send a request, then the
> application is going to be spending a lot of time NOT sending requests,
> but creating and destroying clients.  (You didn't indicate how long the
> close() takes)
>

I did implement your solution (On windows it does not make a difference, on
Linux it does by at-least a margin of twice)


>
> Your numbers indicated a response time of 1426 milliseconds for Solr.
> If this is an average or a median, then that is not a fast query.  These
> numbers make me question the entire benchmark setup.


Do you have any specific questions about the benchmark setup?


> Based on the code
> provided, I don't see how the numbers can be that bad, even if we assume
> that up to 100 milliseconds is spent creating every client.
>
>
I have stated the numbers which I found during my test. The best way to
verify them is for someone else to run the same test. Otherwise I don't see
how we can verify the results


> Because the ES numbers are so much worse than the Solr numbers, I'm
> betting that creating an ES client is even less efficient than creating
> a Solr client.  If that's the case, I do not know why ... maybe that
> client runs through more startup checks than a Solr client does.
> Creation time for the client shouldn't matter, since it should only be
> done once for every benchmark run, and the time spent creating the
> client shouldn't be counted in the benchmark numbers.
>
>
I can check up & optimise the ES code. However it will take me a couple of
weeks on that


> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Shawn Heisey-2
In reply to this post by Deepak Goel
On 3/23/2018 11:21 AM, Deepak Goel wrote:
>> I tried the above suggestion. The throughput and utilisation remain the
>> same (they dont increase even if I increase the load). The response time
>> comes down.
>>

Are you still creating a new client object for every query?  Changing
how the client object is created won't improve anything if you're still
making a new one every time.

You're going to need to move the client creation somewhere else in your
code that only gets run once at startup, and then use the already-built
client object in the code that does the query.  The different way of
creating the client object that I gave you will ensure that it is
actually capable of running concurrently with many threads. (With some
older versions, this is not guaranteed)

Thanks,
Shawn



Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Fri, Mar 23, 2018 at 11:38 PM, Shawn Heisey <[hidden email]> wrote:

> On 3/23/2018 11:21 AM, Deepak Goel wrote:
> >> I tried the above suggestion. The throughput and utilisation remain the
> >> same (they dont increase even if I increase the load). The response time
> >> comes down.
> >>
>
> Are you still creating a new client object for every query?  Changing
> how the client object is created won't improve anything if you're still
> making a new one every time.
>
> You're going to need to move the client creation somewhere else in your
> code that only gets run once at startup, and then use the already-built
> client object in the code that does the query.  The different way of
> creating the client object that I gave you will ensure that it is
> actually capable of running concurrently with many threads. (With some
> older versions, this is not guaranteed)
>
>
Yes I am now creating a client object only once. On Linux it has superb
results (performance improves by around two times). However on Windows it
has no improvement


*SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
(Windows)27.8142665UnTuned (Linux)34528091Partially Tuned with Shawn's
suggestions (Linux)56417290Partially Tuned with Shawn's suggestions
(Windows)28.11.10560*





Thanks,
> Shawn
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Shawn Heisey-2
In reply to this post by Deepak Goel
On 3/23/2018 11:31 AM, Deepak Goel wrote:
> Do you have any specific questions about the benchmark setup?

How many docs are in the Solr index?  How much disk space does it
consume?  How much total memory is in the machine?  How much memory is
allocated to Java heaps?  Is there any other software running besides
the Solr server and the benchmark program?  If it's a virtual machine,
do you know anything about how many virtual machines are on the physical
hardware, and whether resources are oversubscribed on the physical hardware?

> I have stated the numbers which I found during my test. The best way to
> verify them is for someone else to run the same test. Otherwise I don't see
> how we can verify the results

You have provided a code fragment, not complete code that can be used to
compile exactly what you're running.  There is no information about
exactly what you're doing with JMeter.  There are no version numbers for
any of the software that you're using.  When I look at what's available,
I don't have enough information to replicate your test.

Your code fragment has a hard-coded query in it.  Running the same query
over and over won't provide meaningful results, and definitely shouldn't
show an average query time of nearly 1.5 seconds.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Rick Leir-2
In reply to this post by Deepak Goel


Deep,
What is the test so I can try it.

75 or 90 ms .. is that the JVM startup time?
Cheers -- Rick
>>
>>
>I have stated the numbers which I found during my test. The best way to
>verify them is for someone else to run the same test. Otherwise I don't
>see
>how we can verify the results


--
Sorry for being brief. Alternate email is rickleir at yahoo dot com
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Shawn Heisey-2
In reply to this post by Deepak Goel
On 3/23/2018 1:13 PM, Deepak Goel wrote:
> Yes I am now creating a client object only once. On Linux it has superb
> results (performance improves by around two times). However on Windows it
> has no improvement
>
> *SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
> (Windows)27.8142665UnTuned (Linux)34528091Partially Tuned with Shawn's
> suggestions (Linux)56417290Partially Tuned with Shawn's suggestions
> (Windows)28.11.10560*

This information is unreadable.  All the whitespace between the columns
is missing.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 24, 2018 at 6:18 AM, Shawn Heisey <[hidden email]> wrote:

> On 3/23/2018 1:13 PM, Deepak Goel wrote:
> > Yes I am now creating a client object only once. On Linux it has superb
> > results (performance improves by around two times). However on Windows it
> > has no improvement
> >
> > *SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
> > (Windows)27.8142665UnTuned (Linux)34528091Partially Tuned with Shawn's
> > suggestions (Linux)56417290Partially Tuned with Shawn's suggestions
> > (Windows)28.11.10560*
>
> This information is unreadable.  All the whitespace between the columns
> is missing.
>
> Please check this document
https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4bnIMRqKnNax3jh4GJlzM/edit?usp=sharing


> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
In reply to this post by Shawn Heisey-2
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 24, 2018 at 5:16 AM, Shawn Heisey <[hidden email]> wrote:

> On 3/23/2018 11:31 AM, Deepak Goel wrote:
> > Do you have any specific questions about the benchmark setup?
>
> How many docs are in the Solr index?  How much disk space does it
> consume?  How much total memory is in the machine?  How much memory is
> allocated to Java heaps?  Is there any other software running besides
> the Solr server and the benchmark program?  If it's a virtual machine,
> do you know anything about how many virtual machines are on the physical
> hardware, and whether resources are oversubscribed on the physical
> hardware?
>
> > I have stated the numbers which I found during my test. The best way to
> > verify them is for someone else to run the same test. Otherwise I don't
> see
> > how we can verify the results
>
> You have provided a code fragment, not complete code that can be used to
> compile exactly what you're running.  There is no information about
> exactly what you're doing with JMeter.  There are no version numbers for
> any of the software that you're using.  When I look at what's available,
> I don't have enough information to replicate your test.
>
> Your code fragment has a hard-coded query in it.  Running the same query
> over and over won't provide meaningful results, and definitely shouldn't
> show an average query time of nearly 1.5 seconds.
>
>
Please check the section *Questions from ‘Around the World’* in the
following doc for answers to your questions:

*https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4bnIMRqKnNax3jh4GJlzM/edit?usp=sharing
<https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4bnIMRqKnNax3jh4GJlzM/edit?usp=sharing>*


Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
In reply to this post by Rick Leir-2
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 24, 2018 at 6:03 AM, Rick Leir <[hidden email]> wrote:

>
>
> Deep,
> What is the test so I can try it.
>
>
*The test goal now according to me is to check:*

'How does Solr scales up on a single server (with varying OS if possible -
Linux, Windows) at 25%, 50%, 75%, 100% utilisation?'

*The original question from the Author was:*

Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for my
Solr and some other stuff.

Would it be more beneficial to only run 1 instance of Solr with the
collection stored on 4 HD's in RAID 0?? Or.... Have several Virtual
Machines each running of its own HD, ie: Have 4 VM's running Solr?


> 75 or 90 ms .. is that the JVM startup time?
>

This time is the time taken by my code to create a 'Client Object' in Solr
on Windows environment


> Cheers -- Rick
> >>
> >>
> >I have stated the numbers which I found during my test. The best way to
> >verify them is for someone else to run the same test. Otherwise I don't
> >see
> >how we can verify the results
>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Shawn Heisey
In reply to this post by Deepak Goel
On 3/24/2018 1:25 PM, Deepak Goel wrote:
> Please check the section *Questions from ‘Around the World’* in the
> following doc for answers to your questions:
>
> *https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4bnIMRqKnNax3jh4GJlzM/edit?usp=sharing

The document says that 80 percent of the time it's the same query and 20
percent it's a different one.  But the code does not have any facility
for changing the query, as far as I can see.  It appears to be always
the same.

If the query is always the same, or if it's the same 80 pecent of the
time, I would expect response time on the vast majority of the queries
to be about one to five milliseconds, no matter how big the index is,
but your document says it's 280 on Linux, and 1426 on Windows.

If all settings such as heap are at their defaults, then I suspect you
may be running Solr with a heap size that's FAR too small.  If this is
what's happening, then the JVM is going to be spending a very large
amount of time performing garbage collection, instead of running the
application.

The default heap size when starting Solr using the included scripts is
512 megabytes.  This is VERY small, to ensure that Solr will
successfully start on any system.  Nearly all users must increase the
heap size before they go to production.  I would set it to 2GB for your
index.  If starting Solr with the bin\solr or bin/solr command, add a
"-m 2g" parameter to the start command. 2GB should be a lot more than
Solr needs to handle that index, but it isn't a HUGE amount.  Be aware
that you may need to adjust the heap size for your Tomcat installation,
and possibly JMeter as well, to be sure that those processes are
allocating reasonable amounts of memory.  I do not know what the
recommended sizes for those programs will be, you would need to ask
those communities.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sun, Mar 25, 2018 at 4:00 AM, Shawn Heisey <[hidden email]> wrote:

> On 3/24/2018 1:25 PM, Deepak Goel wrote:
>
>> Please check the section *Questions from ‘Around the World’* in the
>> following doc for answers to your questions:
>>
>> *https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4
>> bnIMRqKnNax3jh4GJlzM/edit?usp=sharing
>>
>
> The document says that 80 percent of the time it's the same query and 20
> percent it's a different one.  But the code does not have any facility for
> changing the query, as far as I can see.  It appears to be always the same.
>
>
My first test was to test with static queries. Does Solr scale-up as we
increase the load of same query?

The second test would be to check with 'Different Queries'.

And then finally check with 80% similar queries and 20% different queries.


> If the query is always the same, or if it's the same 80 pecent of the
> time, I would expect response time on the vast majority of the queries to
> be about one to five milliseconds


Do you have any documented proof of the same (1 to 5ms)? Or is it an
educated guess


> , no matter how big the index is, but your document says it's 280 on
> Linux, and 1426 on Windows.
>
>
At peak loads on Linux, the response-time is 172ms. If I decrease the load
by half, the response time is around 50ms


> If all settings such as heap are at their defaults, then I suspect you may
> be running Solr with a heap size that's FAR too small.  If this is what's
> happening, then the JVM is going to be spending a very large amount of time
> performing garbage collection, instead of running the application.
>
>
I don't think the Jvm heap is a problem. But I will bump it up and test
again


> The default heap size when starting Solr using the included scripts is 512
> megabytes.  This is VERY small, to ensure that Solr will successfully start
> on any system.  Nearly all users must increase the heap size before they go
> to production.  I would set it to 2GB for your index.  If starting Solr
> with the bin\solr or bin/solr command, add a "-m 2g" parameter to the start
> command. 2GB should be a lot more than Solr needs to handle that index, but
> it isn't a HUGE amount.  Be aware that you may need to adjust the heap size
> for your Tomcat installation, and possibly JMeter as well, to be sure that
> those processes are allocating reasonable amounts of memory.


I dont think Tomcat and Jmeter are a bottleneck. But I will bump up the
heap size of them too


> I do not know what the recommended sizes for those programs will be, you
> would need to ask those communities.
>
>
The problem I am facing: On Windows, the tps is 28 while on Linix, the tps
is 564 (All the configuration and hardware is same). The other problem is,
Even if there is plenty of hardware available, the Windows environment does
not scale. And I wonder why is this so?


> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Shawn Heisey-2
On 3/24/2018 6:21 PM, Deepak Goel wrote:
> Do you have any documented proof of the same (1 to 5ms)? Or is it an
> educated guess

Just now, I did a test.  I did a "*:*" query (all docs), the QTime was
194 milliseconds, numFound was 188635489.  Then I did the exact same
query again.  QTime dropped to 39 milliseconds.

Next, I did a query for "banjo" ... something I don't think a lot of
people are searching for.  The QTime on this was 2395 milliseconds,
numFound was 737280.  Running the same query again, QTime dropped to 11
milliseconds.

My index is big and distributed.  Your index is very small, and likely
contained in one core, so it should have far better performance than my
index.

> I dont think Tomcat and Jmeter are a bottleneck. But I will bump up the
> heap size of them too

I was actually thinking that if these are run *without* a max heap
setting, that you might want to explicitly set the heap size so that
it's not too big.  Those programs probably don't need a very big heap at
all.  If Java were to choose a big default heap size, the server might
start swapping, and that would REALLY make performance bad, especially
on Windows.

> The problem I am facing: On Windows, the tps is 28 while on Linix, the tps
> is 564 (All the configuration and hardware is same). The other problem is,
> Even if there is plenty of hardware available, the Windows environment does
> not scale. And I wonder why is this so?

My first guess would be the 512MB heap, possibly causing even more
problems on Windows.

And then there's my general bias against Microsoft.  I have witnessed
deficiencies in their memory management, their filesystem performance,
and other things.  Linux just does a better job in almost every category
that I care about for a server.

Which version of Windows are you running it on?  You would only want to
do a test like this on a Server OS, and I'd hope that it's at least
Server 2008.  The client operating systems do not handle server programs
very well.  And it should be a 64-bit OS, with 64-bit Java.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Deepak Goel
On 25 Mar 2018 6:49 am, "Shawn Heisey" <[hidden email]> wrote:

On 3/24/2018 6:21 PM, Deepak Goel wrote:

> Do you have any documented proof of the same (1 to 5ms)? Or is it an
> educated guess
>

Just now, I did a test.  I did a "*:*" query (all docs), the QTime was 194
milliseconds, numFound was 188635489.  Then I did the exact same query
again.  QTime dropped to 39 milliseconds.

Next, I did a query for "banjo" ... something I don't think a lot of people
are searching for.  The QTime on this was 2395 milliseconds, numFound was
737280.  Running the same query again, QTime dropped to 11 milliseconds.


I believe you ran this query with a 1 user load. Or was it a multi-user
load test? If it was multi-user load test, how many users did you test for?
And what were the utilisations and tps?


My index is big and distributed.  Your index is very small, and likely
contained in one core, so it should have far better performance than my
index.


I dont think Tomcat and Jmeter are a bottleneck. But I will bump up the
> heap size of them too
>

I was actually thinking that if these are run *without* a max heap setting,
that you might want to explicitly set the heap size so that it's not too
big.  Those programs probably don't need a very big heap at all.  If Java
were to choose a big default heap size, the server might start swapping,
and that would REALLY make performance bad, especially on Windows.


The problem I am facing: On Windows, the tps is 28 while on Linix, the tps
> is 564 (All the configuration and hardware is same). The other problem is,
> Even if there is plenty of hardware available, the Windows environment does
> not scale. And I wonder why is this so?
>

My first guess would be the 512MB heap, possibly causing even more problems
on Windows.

And then there's my general bias against Microsoft.  I have witnessed
deficiencies in their memory management, their filesystem performance, and
other things.  Linux just does a better job in almost every category that I
care about for a server.

Which version of Windows are you running it on?  You would only want to do
a test like this on a Server OS, and I'd hope that it's at least Server
2008.  The client operating systems do not handle server programs very
well.  And it should be a 64-bit OS, with 64-bit Java.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Shawn Heisey-2
On 3/24/2018 10:42 PM, Deepak Goel wrote:
> I believe you ran this query with a 1 user load. Or was it a multi-user
> load test? If it was multi-user load test, how many users did you test for?
> And what were the utilisations and tps?

It was late Saturday night when I did that.  There's almost no load on
the system.

I literally did just the four queries I mentioned, using the admin UI.

I have written a little test program that can pound the system harder,
need a little more time to gather what I learned with it.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Some performance questions....

Shawn Heisey-2
On 3/25/2018 1:45 AM, Shawn Heisey wrote:
> I have written a little test program that can pound the system harder,
> need a little more time to gather what I learned with it.

Here's the code and three results with different threadcounts:

https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379

I ran the program several times while writing it.  Once I had it
finished, I did the 20 thread run first, then the 100 thread run, and
then the 200 thread run.  Gist re-ordered my files, wasn't expecting that.

It was executed inside eclipse on a Windows 7 system.  The Solr servers
are running Linux.  This is a distributed index with 7 total shards
running on two servers.  The "shards" parameter is defined on the server
side in the 'ncmain' core, which has an empty index.  The servers are
NOT running in SolrCloud mode.

As you can see in the code, I was using exactly the same query every
time -- that "banjo" query that I tried earlier.

I have to try and remember how to build a simple program like this on
the commandline before I can try it in Linux.  I don't know if it would
see a performance improvement running on Linux.

Thanks,
Shawn

1234