strange performance issue with many shards on one server

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

strange performance issue with many shards on one server

Frederik Kraus
 Hi,


I am experiencing a strange issue doing some load tests. Our setup:

- 2 server with each 24 cpu cores, 130GB of RAM
- 10 shards per server (needed for response times) running in a single tomcat instance
- each query queries all 20 shards (distributed search)

- each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries)
- all caches are warmed / high cache hit rates (99%) etc.


Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries.

Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer.

Any ideas are greatly appreciated :)

Fred.

Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Federico Fissore
Frederik Kraus, il 28/09/2011 12:58, ha scritto:
>   Hi,
>
>
> I am experiencing a strange issue doing some load tests. Our setup:
>

just because I've listened to JUG mates talking about that at the last
meeting, could it be that your CPUs are spending their time getting
things from RAM to CPU cache?

maybe that, say, 10% CPU power is spent on the bus

federico
Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Vadim Kisselmann
In reply to this post by Frederik Kraus
Hi Fred,
analyze the queries which take longer.
We observe our queries and see the problems with q-time with queries which
are complex, with phrase queries or queries which contains numbers or
special characters.
if you don't know it:
http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
Regards
Vadim


2011/9/28 Frederik Kraus <[hidden email]>

>  Hi,
>
>
> I am experiencing a strange issue doing some load tests. Our setup:
>
> - 2 server with each 24 cpu cores, 130GB of RAM
> - 10 shards per server (needed for response times) running in a single
> tomcat instance
> - each query queries all 20 shards (distributed search)
>
> - each shard holds about 1.5 mio documents (small shards are needed due to
> rather complex queries)
> - all caches are warmed / high cache hit rates (99%) etc.
>
>
> Now for some reason we cannot seem to fully utilize all CPU power (no disk
> IO), ie. increasing concurrent users doesn't increase CPU-Load at a point,
> decreases throughput and increases the response times of the individual
> queries.
>
> Also 1-2% of the queries take significantly longer: avg somewhere at 100ms
> while 1-2% take 1.5s or longer.
>
> Any ideas are greatly appreciated :)
>
> Fred.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Frederik Kraus
Hi Vladim,

the thing is, that those exact same queries, that take longer during a load test, perform just fine when executed at a slower request rate and are also random, i.e. there is no pattern in bad/slow queries.

My first thought was some kind of contention and/or connection starvation for the internal shard communication?

Fred.


Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:

> Hi Fred,
> analyze the queries which take longer.
> We observe our queries and see the problems with q-time with queries which
> are complex, with phrase queries or queries which contains numbers or
> special characters.
> if you don't know it:
> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> Regards
> Vadim
>
>
> 2011/9/28 Frederik Kraus <[hidden email] (mailto:[hidden email])>
>
> >  Hi,
> >
> >
> > I am experiencing a strange issue doing some load tests. Our setup:
> >
> > - 2 server with each 24 cpu cores, 130GB of RAM
> > - 10 shards per server (needed for response times) running in a single
> > tomcat instance
> > - each query queries all 20 shards (distributed search)
> >
> > - each shard holds about 1.5 mio documents (small shards are needed due to
> > rather complex queries)
> > - all caches are warmed / high cache hit rates (99%) etc.
> >
> >
> > Now for some reason we cannot seem to fully utilize all CPU power (no disk
> > IO), ie. increasing concurrent users doesn't increase CPU-Load at a point,
> > decreases throughput and increases the response times of the individual
> > queries.
> >
> > Also 1-2% of the queries take significantly longer: avg somewhere at 100ms
> > while 1-2% take 1.5s or longer.
> >
> > Any ideas are greatly appreciated :)
> >
> > Fred.

Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Vadim Kisselmann
Hi Fred,

ok, it's a strange behavior with same queries.
Another questions:
-which solr version?
-do you indexing during your load test? (because of index rebuilt)
-do you replicate your index?

Regards
Vadim



2011/9/28 Frederik Kraus <[hidden email]>

> Hi Vladim,
>
> the thing is, that those exact same queries, that take longer during a load
> test, perform just fine when executed at a slower request rate and are also
> random, i.e. there is no pattern in bad/slow queries.
>
> My first thought was some kind of contention and/or connection starvation
> for the internal shard communication?
>
> Fred.
>
>
> Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
>
> > Hi Fred,
> > analyze the queries which take longer.
> > We observe our queries and see the problems with q-time with queries
> which
> > are complex, with phrase queries or queries which contains numbers or
> > special characters.
> > if you don't know it:
> >
> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> > Regards
> > Vadim
> >
> >
> > 2011/9/28 Frederik Kraus <[hidden email] (mailto:
> [hidden email])>
> >
> > >  Hi,
> > >
> > >
> > > I am experiencing a strange issue doing some load tests. Our setup:
> > >
> > > - 2 server with each 24 cpu cores, 130GB of RAM
> > > - 10 shards per server (needed for response times) running in a single
> > > tomcat instance
> > > - each query queries all 20 shards (distributed search)
> > >
> > > - each shard holds about 1.5 mio documents (small shards are needed due
> to
> > > rather complex queries)
> > > - all caches are warmed / high cache hit rates (99%) etc.
> > >
> > >
> > > Now for some reason we cannot seem to fully utilize all CPU power (no
> disk
> > > IO), ie. increasing concurrent users doesn't increase CPU-Load at a
> point,
> > > decreases throughput and increases the response times of the individual
> > > queries.
> > >
> > > Also 1-2% of the queries take significantly longer: avg somewhere at
> 100ms
> > > while 1-2% take 1.5s or longer.
> > >
> > > Any ideas are greatly appreciated :)
> > >
> > > Fred.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Frederik Kraus


Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:

> Hi Fred,
>
> ok, it's a strange behavior with same queries.
> Another questions:
> -which solr version?

3.3 (might the NIOFSDirectory from 3.4 help?)
 
> -do you indexing during your load test? (because of index rebuilt)
nope
 
> -do you replicate your index?

nope

>
> Regards
> Vadim
>
>
>
> 2011/9/28 Frederik Kraus <[hidden email] (mailto:[hidden email])>
>
> > Hi Vladim,
> >
> > the thing is, that those exact same queries, that take longer during a load
> > test, perform just fine when executed at a slower request rate and are also
> > random, i.e. there is no pattern in bad/slow queries.
> >
> > My first thought was some kind of contention and/or connection starvation
> > for the internal shard communication?
> >
> > Fred.
> >
> >
> > Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
> >
> > > Hi Fred,
> > > analyze the queries which take longer.
> > > We observe our queries and see the problems with q-time with queries
> > which
> > > are complex, with phrase queries or queries which contains numbers or
> > > special characters.
> > > if you don't know it:
> > http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> > > Regards
> > > Vadim
> > >
> > >
> > > 2011/9/28 Frederik Kraus <[hidden email] (mailto:[hidden email]) (mailto:
> > [hidden email] (mailto:[hidden email]))>
> > >
> > > >  Hi,
> > > >
> > > >
> > > > I am experiencing a strange issue doing some load tests. Our setup:
> > > >
> > > > - 2 server with each 24 cpu cores, 130GB of RAM
> > > > - 10 shards per server (needed for response times) running in a single
> > > > tomcat instance
> > > > - each query queries all 20 shards (distributed search)
> > > >
> > > > - each shard holds about 1.5 mio documents (small shards are needed due
> > to
> > > > rather complex queries)
> > > > - all caches are warmed / high cache hit rates (99%) etc.
> > > >
> > > >
> > > > Now for some reason we cannot seem to fully utilize all CPU power (no
> > disk
> > > > IO), ie. increasing concurrent users doesn't increase CPU-Load at a
> > point,
> > > > decreases throughput and increases the response times of the individual
> > > > queries.
> > > >
> > > > Also 1-2% of the queries take significantly longer: avg somewhere at
> > 100ms
> > > > while 1-2% take 1.5s or longer.
> > > >
> > > > Any ideas are greatly appreciated :)
> > > >
> > > > Fred.

Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Frederik Kraus
I just had a look at the thread-dump, pasting 3 examples here:


'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)
at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)
at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)
at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)
at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)
at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
and

'http-8080-381' Id=6859, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, total cpu time=990.0000ms user time=920.0000ms

at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164)
at org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)






Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:

>
>
> Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
>
> > Hi Fred,
> >
> > ok, it's a strange behavior with same queries.
> > Another questions:
> > -which solr version?
>
> 3.3 (might the NIOFSDirectory from 3.4 help?)
>
> > -do you indexing during your load test? (because of index rebuilt)
> nope
>
> > -do you replicate your index?
>
> nope
> >
> > Regards
> > Vadim
> >
> >
> >
> > 2011/9/28 Frederik Kraus <[hidden email] (mailto:[hidden email])>
> >
> > > Hi Vladim,
> > >
> > > the thing is, that those exact same queries, that take longer during a load
> > > test, perform just fine when executed at a slower request rate and are also
> > > random, i.e. there is no pattern in bad/slow queries.
> > >
> > > My first thought was some kind of contention and/or connection starvation
> > > for the internal shard communication?
> > >
> > > Fred.
> > >
> > >
> > > Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
> > >
> > > > Hi Fred,
> > > > analyze the queries which take longer.
> > > > We observe our queries and see the problems with q-time with queries
> > > which
> > > > are complex, with phrase queries or queries which contains numbers or
> > > > special characters.
> > > > if you don't know it:
> > > http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> > > > Regards
> > > > Vadim
> > > >
> > > >
> > > > 2011/9/28 Frederik Kraus <[hidden email] (mailto:[hidden email]) (mailto:
> > > [hidden email] (mailto:[hidden email]))>
> > > >
> > > > >  Hi,
> > > > >
> > > > >
> > > > > I am experiencing a strange issue doing some load tests. Our setup:
> > > > >
> > > > > - 2 server with each 24 cpu cores, 130GB of RAM
> > > > > - 10 shards per server (needed for response times) running in a single
> > > > > tomcat instance
> > > > > - each query queries all 20 shards (distributed search)
> > > > >
> > > > > - each shard holds about 1.5 mio documents (small shards are needed due
> > > to
> > > > > rather complex queries)
> > > > > - all caches are warmed / high cache hit rates (99%) etc.
> > > > >
> > > > >
> > > > > Now for some reason we cannot seem to fully utilize all CPU power (no
> > > disk
> > > > > IO), ie. increasing concurrent users doesn't increase CPU-Load at a
> > > point,
> > > > > decreases throughput and increases the response times of the individual
> > > > > queries.
> > > > >
> > > > > Also 1-2% of the queries take significantly longer: avg somewhere at
> > > 100ms
> > > > > while 1-2% take 1.5s or longer.
> > > > >
> > > > > Any ideas are greatly appreciated :)
> > > > >
> > > > > Fred.

Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Vadim Kisselmann
Hmm, sorry don't know...
My ideas:
- tomcat generate this problem (for example: maxthreads, number of
connections...)
- JVM - Options, especially GC
- index locks, eventually an open issue in jira

Regards
Vadim




2011/9/28 Frederik Kraus <[hidden email]>

> I just had a look at the thread-dump, pasting 3 examples here:
>
>
> 'pool-31-thread-8233' Id=11626, BLOCKED on
> lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
> total cpu time=20.0000ms user time=20.0000ms
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)
> at
> org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)
> at
> org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
> at
> org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)
> at
> org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)
> at
> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)
> at
> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
> at
> org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
> 'pool-31-thread-8232' Id=11625, BLOCKED on
> lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
> total cpu time=20.0000ms user time=20.0000ms
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
> at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
> at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
> at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> and
>
> 'http-8080-381' Id=6859, WAITING on
> lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720,
> total cpu time=990.0000ms user time=920.0000ms
>
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
> at
> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164)
> at
> org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> at java.lang.Thread.run(Thread.java:662)
>
>
>
>
>
>
> Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:
>
> >
> >
> > Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
> >
> > > Hi Fred,
> > >
> > > ok, it's a strange behavior with same queries.
> > > Another questions:
> > > -which solr version?
> >
> > 3.3 (might the NIOFSDirectory from 3.4 help?)
> >
> > > -do you indexing during your load test? (because of index rebuilt)
> > nope
> >
> > > -do you replicate your index?
> >
> > nope
> > >
> > > Regards
> > > Vadim
> > >
> > >
> > >
> > > 2011/9/28 Frederik Kraus <[hidden email] (mailto:
> [hidden email])>
> > >
> > > > Hi Vladim,
> > > >
> > > > the thing is, that those exact same queries, that take longer during
> a load
> > > > test, perform just fine when executed at a slower request rate and
> are also
> > > > random, i.e. there is no pattern in bad/slow queries.
> > > >
> > > > My first thought was some kind of contention and/or connection
> starvation
> > > > for the internal shard communication?
> > > >
> > > > Fred.
> > > >
> > > >
> > > > Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
> > > >
> > > > > Hi Fred,
> > > > > analyze the queries which take longer.
> > > > > We observe our queries and see the problems with q-time with
> queries
> > > > which
> > > > > are complex, with phrase queries or queries which contains numbers
> or
> > > > > special characters.
> > > > > if you don't know it:
> > > >
> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> > > > > Regards
> > > > > Vadim
> > > > >
> > > > >
> > > > > 2011/9/28 Frederik Kraus <[hidden email] (mailto:
> [hidden email]) (mailto:
> > > > [hidden email] (mailto:[hidden email]))>
> > > > >
> > > > > >  Hi,
> > > > > >
> > > > > >
> > > > > > I am experiencing a strange issue doing some load tests. Our
> setup:
> > > > > >
> > > > > > - 2 server with each 24 cpu cores, 130GB of RAM
> > > > > > - 10 shards per server (needed for response times) running in a
> single
> > > > > > tomcat instance
> > > > > > - each query queries all 20 shards (distributed search)
> > > > > >
> > > > > > - each shard holds about 1.5 mio documents (small shards are
> needed due
> > > > to
> > > > > > rather complex queries)
> > > > > > - all caches are warmed / high cache hit rates (99%) etc.
> > > > > >
> > > > > >
> > > > > > Now for some reason we cannot seem to fully utilize all CPU power
> (no
> > > > disk
> > > > > > IO), ie. increasing concurrent users doesn't increase CPU-Load at
> a
> > > > point,
> > > > > > decreases throughput and increases the response times of the
> individual
> > > > > > queries.
> > > > > >
> > > > > > Also 1-2% of the queries take significantly longer: avg somewhere
> at
> > > > 100ms
> > > > > > while 1-2% take 1.5s or longer.
> > > > > >
> > > > > > Any ideas are greatly appreciated :)
> > > > > >
> > > > > > Fred.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Toke Eskildsen
In reply to this post by Frederik Kraus
On Wed, 2011-09-28 at 12:58 +0200, Frederik Kraus wrote:
> - 10 shards per server (needed for response times) running in a single tomcat instance

Have you tested that sharding actually decreases response times in your
case? I see the idea in decreasing response times with sharding at the
cost of decreasing throughput, but the added overhead of merging is
non-trivial.

> - each query queries all 20 shards (distributed search)
>
> - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries)
> - all caches are warmed / high cache hit rates (99%) etc.

> Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries.

It sounds as if there's a hard limit on the number of concurrent users
somewhere. I am no expert in httpclient, but the blocked threads in your
thread dump seems to indicate that they wait for connections to be
established rather than for results to be produced.

I seem to remember that tomcat has a default limit on 200 concurrent
connections and with 10 shards/search, that is just 200 / (10
shard_connections + 1 incoming_connection) = 18 concurrent searches.

> Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer.

Could be garbage collection, especially since it shows under high load
which might result in more old objects and thereby trigger full gc.

Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

kkrugler
In reply to this post by Frederik Kraus
Hi Frederik,

I haven't directly run into this issue with Solr, but I have experienced similar issues in a related context.

In my case, I had a custom webapp that made SolrJ requests and then generated some aggregated/analyzed results.

During load testing, we ran into a few different issues...

1. The load test software itself had an issue with scaling - I'm assuming that's not the case for you, but I've seen it happen more than once.

E.g. there's a limit to max parallel connections in the client being used to talk to Solr.

2. We needed to tune up the SolrJ settings for the HttpConnectionManager

Under heavy load, this was running out of free connections.

Given you've got 20 shards, each request is going to spawn 20 HTTP connections.

I don't know off the top of my head how solr.SearchHandler manages connections (and whether it's possible to tune this), but from the stack trace below it sure looks like you're blocked on getting free HTTP connections.

3. We needed to optimize our configuration for Jetty, Ubuntu, JVM GC, etc.

There are lots of knobs to twiddle here, for better or worse.

-- Ken

On Sep 28, 2011, at 5:21am, Frederik Kraus wrote:

> I just had a look at the thread-dump, pasting 3 examples here:
>
>
> 'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)
> at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)
> at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
> at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)
> at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)
> at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)
> at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
> at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
> 'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
> at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> and
>
> 'http-8080-381' Id=6859, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, total cpu time=990.0000ms user time=920.0000ms
>
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
> at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164)
> at org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469)
> at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271)
> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
> at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554)
> at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
> at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> at java.lang.Thread.run(Thread.java:662)
>
>
>
>
>
>
> Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:
>
>>
>>
>> Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
>>
>>> Hi Fred,
>>>
>>> ok, it's a strange behavior with same queries.
>>> Another questions:
>>> -which solr version?
>>
>> 3.3 (might the NIOFSDirectory from 3.4 help?)
>>
>>> -do you indexing during your load test? (because of index rebuilt)
>> nope
>>
>>> -do you replicate your index?
>>
>> nope
>>>
>>> Regards
>>> Vadim
>>>
>>>
>>>
>>> 2011/9/28 Frederik Kraus <[hidden email] (mailto:[hidden email])>
>>>
>>>> Hi Vladim,
>>>>
>>>> the thing is, that those exact same queries, that take longer during a load
>>>> test, perform just fine when executed at a slower request rate and are also
>>>> random, i.e. there is no pattern in bad/slow queries.
>>>>
>>>> My first thought was some kind of contention and/or connection starvation
>>>> for the internal shard communication?
>>>>
>>>> Fred.
>>>>
>>>>
>>>> Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
>>>>
>>>>> Hi Fred,
>>>>> analyze the queries which take longer.
>>>>> We observe our queries and see the problems with q-time with queries
>>>> which
>>>>> are complex, with phrase queries or queries which contains numbers or
>>>>> special characters.
>>>>> if you don't know it:
>>>> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
>>>>> Regards
>>>>> Vadim
>>>>>
>>>>>
>>>>> 2011/9/28 Frederik Kraus <[hidden email] (mailto:[hidden email]) (mailto:
>>>> [hidden email] (mailto:[hidden email]))>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> I am experiencing a strange issue doing some load tests. Our setup:
>>>>>>
>>>>>> - 2 server with each 24 cpu cores, 130GB of RAM
>>>>>> - 10 shards per server (needed for response times) running in a single
>>>>>> tomcat instance
>>>>>> - each query queries all 20 shards (distributed search)
>>>>>>
>>>>>> - each shard holds about 1.5 mio documents (small shards are needed due
>>>> to
>>>>>> rather complex queries)
>>>>>> - all caches are warmed / high cache hit rates (99%) etc.
>>>>>>
>>>>>>
>>>>>> Now for some reason we cannot seem to fully utilize all CPU power (no
>>>> disk
>>>>>> IO), ie. increasing concurrent users doesn't increase CPU-Load at a
>>>> point,
>>>>>> decreases throughput and increases the response times of the individual
>>>>>> queries.
>>>>>>
>>>>>> Also 1-2% of the queries take significantly longer: avg somewhere at
>>>> 100ms
>>>>>> while 1-2% take 1.5s or longer.
>>>>>>
>>>>>> Any ideas are greatly appreciated :)
>>>>>>
>>>>>> Fred.
>

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr



Reply | Threaded
Open this post in threaded view
|

RE: strange performance issue with many shards on one server

Jaeger, Jay - DOT
In reply to this post by Federico Fissore
That  would still show up as the CPU being busy.

-----Original Message-----
From: Federico Fissore [mailto:[hidden email]]
Sent: Wednesday, September 28, 2011 6:12 AM
To: [hidden email]
Subject: Re: strange performance issue with many shards on one server

Frederik Kraus, il 28/09/2011 12:58, ha scritto:
>   Hi,
>
>
> I am experiencing a strange issue doing some load tests. Our setup:
>

just because I've listened to JUG mates talking about that at the last
meeting, could it be that your CPUs are spending their time getting
things from RAM to CPU cache?

maybe that, say, 10% CPU power is spent on the bus

federico
Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Frederik Kraus
In reply to this post by kkrugler
 Hi Ken,  

the HttpConnectionManager was actually the first thing I looked at - and bumped the Solr default of 20 up to 50, 100, 400, 10000 (which should be more or less unlimited ;) ). Unfortunately didn't really solve anything. I don't know if the "static" HttpClient is a problem here as it will be the same HttpConnectionManager for all shards …

Obviously a way of validating this would be to spawn 20 tomcat (or jetty) instances, one for each shard and 10 per server - hopefully there is an easier way ;)

By the way: Ubuntu / GC / etc. are all tuned and shouldn't be a bottleneck here. The GC only spends about 50-100ms during a 10min load test, and never a full-GC.  

Just going through a jstack dump again, it looks like the HttpConnectionManager is actually waiting for a lock …

"pool-31-thread-15776" prio=10 tid=0x00007ef544249000 nid=0x50be waiting for monitor entry [0x00007ef4d38fc000]
 java.lang.Thread.State: BLOCKED (on object monitor)
 at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
 - waiting to lock <0x00007f07dd6bfa70> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
 at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
 at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
 at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
 at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
….

Fred.  


Am Mittwoch, 28. September 2011 um 17:48 schrieb Ken Krugler:

> Hi Frederik,
>  
> I haven't directly run into this issue with Solr, but I have experienced similar issues in a related context.
>  
> In my case, I had a custom webapp that made SolrJ requests and then generated some aggregated/analyzed results.
>  
> During load testing, we ran into a few different issues...
>  
> 1. The load test software itself had an issue with scaling - I'm assuming that's not the case for you, but I've seen it happen more than once.
>  
> E.g. there's a limit to max parallel connections in the client being used to talk to Solr.
>  
> 2. We needed to tune up the SolrJ settings for the HttpConnectionManager
>  
> Under heavy load, this was running out of free connections.
>  
> Given you've got 20 shards, each request is going to spawn 20 HTTP connections.
>  
> I don't know off the top of my head how solr.SearchHandler manages connections (and whether it's possible to tune this), but from the stack trace below it sure looks like you're blocked on getting free HTTP connections.
>  
> 3. We needed to optimize our configuration for Jetty, Ubuntu, JVM GC, etc.
>  
> There are lots of knobs to twiddle here, for better or worse.
>  
> -- Ken
>  
> On Sep 28, 2011, at 5:21am, Frederik Kraus wrote:
>  
> > I just had a look at the thread-dump, pasting 3 examples here:
> >  
> >  
> > 'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)  
> > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)  
> > at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)  
> > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)  
> > at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)  
> > at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)  
> > at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)  
> > at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)  
> > at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)  
> > at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)  
> > at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)  
> > at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)  
> > at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)  
> > at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)  
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)  
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)  
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)  
> > at java.lang.Thread.run(Thread.java:662)  
> >  
> > 'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)  
> > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)  
> > at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)  
> > at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)  
> > at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)  
> > at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)  
> > at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)  
> > at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)  
> > at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)  
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)  
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)  
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)  
> > at java.lang.Thread.run(Thread.java:662)  
> > and  
> >  
> > 'http-8080-381' Id=6859, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, total cpu time=990.0000ms user time=920.0000ms
> >  
> > at sun.misc.Unsafe.park(Native Method)  
> > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)  
> > at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)  
> > at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)  
> > at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164)  
> > at org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469)  
> > at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271)  
> > at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)  
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)  
> > at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)  
> > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)  
> > at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)  
> > at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)  
> > at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)  
> > at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)  
> > at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554)  
> > at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)  
> > at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)  
> > at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)  
> > at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)  
> > at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)  
> > at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)  
> > at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)  
> > at java.lang.Thread.run(Thread.java:662)  
> >  
> >  
> >  
> >  
> >  
> >  
> > Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:
> >  
> > >  
> > >  
> > > Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
> > >  
> > > > Hi Fred,
> > > >  
> > > > ok, it's a strange behavior with same queries.
> > > > Another questions:
> > > > -which solr version?
> > >  
> > > 3.3 (might the NIOFSDirectory from 3.4 help?)
> > >  
> > > > -do you indexing during your load test? (because of index rebuilt)
> > > nope
> > >  
> > > > -do you replicate your index?
> > >  
> > > nope  
> > > >  
> > > > Regards
> > > > Vadim
> > > >  
> > > >  
> > > >  
> > > > 2011/9/28 Frederik Kraus <[hidden email] (mailto:[hidden email])>
> > > >  
> > > > > Hi Vladim,
> > > > >  
> > > > > the thing is, that those exact same queries, that take longer during a load
> > > > > test, perform just fine when executed at a slower request rate and are also
> > > > > random, i.e. there is no pattern in bad/slow queries.
> > > > >  
> > > > > My first thought was some kind of contention and/or connection starvation
> > > > > for the internal shard communication?
> > > > >  
> > > > > Fred.
> > > > >  
> > > > >  
> > > > > Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
> > > > >  
> > > > > > Hi Fred,
> > > > > > analyze the queries which take longer.
> > > > > > We observe our queries and see the problems with q-time with queries
> > > > > which
> > > > > > are complex, with phrase queries or queries which contains numbers or
> > > > > > special characters.
> > > > > > if you don't know it:
> > > > > http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> > > > > > Regards
> > > > > > Vadim
> > > > > >  
> > > > > >  
> > > > > > 2011/9/28 Frederik Kraus <[hidden email] (mailto:[hidden email]) (mailto:
> > > > > [hidden email] (mailto:[hidden email]))>
> > > > > >  
> > > > > > > Hi,
> > > > > > >  
> > > > > > >  
> > > > > > > I am experiencing a strange issue doing some load tests. Our setup:
> > > > > > >  
> > > > > > > - 2 server with each 24 cpu cores, 130GB of RAM
> > > > > > > - 10 shards per server (needed for response times) running in a single
> > > > > > > tomcat instance
> > > > > > > - each query queries all 20 shards (distributed search)
> > > > > > >  
> > > > > > > - each shard holds about 1.5 mio documents (small shards are needed due
> > > > > to
> > > > > > > rather complex queries)
> > > > > > > - all caches are warmed / high cache hit rates (99%) etc.
> > > > > > >  
> > > > > > >  
> > > > > > > Now for some reason we cannot seem to fully utilize all CPU power (no
> > > > > disk
> > > > > > > IO), ie. increasing concurrent users doesn't increase CPU-Load at a
> > > > > point,
> > > > > > > decreases throughput and increases the response times of the individual
> > > > > > > queries.
> > > > > > >  
> > > > > > > Also 1-2% of the queries take significantly longer: avg somewhere at
> > > > > 100ms
> > > > > > > while 1-2% take 1.5s or longer.
> > > > > > >  
> > > > > > > Any ideas are greatly appreciated :)
> > > > > > >  
> > > > > > > Fred.
>  
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr


Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Frederik Kraus
In reply to this post by Toke Eskildsen


Am Mittwoch, 28. September 2011 um 16:40 schrieb Toke Eskildsen:

> On Wed, 2011-09-28 at 12:58 +0200, Frederik Kraus wrote:
> > - 10 shards per server (needed for response times) running in a single tomcat instance
>
> Have you tested that sharding actually decreases response times in your
> case? I see the idea in decreasing response times with sharding at the
> cost of decreasing throughput, but the added overhead of merging is
> non-trivial.
Yep unfortunately, the queries have huge boolean filterqueries for ACLs etc. which just take too long to compute in a single thread.

>
> > - each query queries all 20 shards (distributed search)
> >
> > - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries)
> > - all caches are warmed / high cache hit rates (99%) etc.
>
> > Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries.
>
> It sounds as if there's a hard limit on the number of concurrent users
> somewhere. I am no expert in httpclient, but the blocked threads in your
> thread dump seems to indicate that they wait for connections to be
> established rather than for results to be produced.
>
> I seem to remember that tomcat has a default limit on 200 concurrent
> connections and with 10 shards/search, that is just 200 / (10
> shard_connections + 1 incoming_connection) = 18 concurrent searches.
>

I have gradually bumped all of this up to (almost) infinity with no effect ;)


> > Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer.
>
> Could be garbage collection, especially since it shows under high load
> which might result in more old objects and thereby trigger full gc.
 GC is only spending something like 50-100ms total for a 10min load test



Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Federico Fissore
In reply to this post by Jaeger, Jay - DOT
Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto:
> That  would still show up as the CPU being busy.
>

i don't know how the program (top, htop, whatever) displays the value
but when the cpu has a cache miss definitely that thread sits and waits
for a number of clock cycles
with 130GB of ram (per server?) I suspect caches miss as a rule

just a suspicion however, nothing I'll bet on
Reply | Threaded
Open this post in threaded view
|

RE: strange performance issue with many shards on one server

Jaeger, Jay - DOT
Yes, that thread waits (in the sense that nothing useful gets done), but during that time, from the perspective of the applications and OS, that CPU is busy: it is not "waiting" in such a way that you can dispatch a different process.

The point is, that if this was actually the problem, it would show up in a higher CPU utilization than the correspondent reported.

-----Original Message-----
From: Federico Fissore [mailto:[hidden email]]
Sent: Wednesday, September 28, 2011 2:04 PM
To: [hidden email]
Subject: Re: strange performance issue with many shards on one server

Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto:
> That  would still show up as the CPU being busy.
>

i don't know how the program (top, htop, whatever) displays the value
but when the cpu has a cache miss definitely that thread sits and waits
for a number of clock cycles
with 130GB of ram (per server?) I suspect caches miss as a rule

just a suspicion however, nothing I'll bet on
Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Frederik Kraus
 Yep, I'm not getting more than 50-60% CPU during those load tests.


Am Mittwoch, 28. September 2011 um 23:01 schrieb Jaeger, Jay - DOT:

> Yes, that thread waits (in the sense that nothing useful gets done), but during that time, from the perspective of the applications and OS, that CPU is busy: it is not "waiting" in such a way that you can dispatch a different process.
>
> The point is, that if this was actually the problem, it would show up in a higher CPU utilization than the correspondent reported.
>
> -----Original Message-----
> From: Federico Fissore [mailto:[hidden email]]
> Sent: Wednesday, September 28, 2011 2:04 PM
> To: [hidden email] (mailto:[hidden email])
> Subject: Re: strange performance issue with many shards on one server
>
> Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto:
> > That would still show up as the CPU being busy.
>
> i don't know how the program (top, htop, whatever) displays the value
> but when the cpu has a cache miss definitely that thread sits and waits
> for a number of clock cycles
> with 130GB of ram (per server?) I suspect caches miss as a rule
>
> just a suspicion however, nothing I'll bet on


Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Federico Fissore
Frederik Kraus, il 28/09/2011 23:16, ha scritto:
>   Yep, I'm not getting more than 50-60% CPU during those load tests.
>

I would try reducing the number of shards. A part from the memory
discussion, this really seems to me a concurrency issue: too many
threads waiting for other threads to complete, too many context switches...

recently, on a lots-of-cores database server, we INCREASED speed by
REDUCING the number of cores/threads each query was allowed to use
(making sense of our customer investment)
maybe you can get a similar effect by reducing the number of pieces your
distributed search has to merge

my 2 eurocents

federico
Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

Lance Norskog-2
Come cache hit problems can be fixed with the Large Pages feature.

http://www.google.com/search?q=large+pages

On Wed, Sep 28, 2011 at 3:30 PM, Federico Fissore <[hidden email]>wrote:

> Frederik Kraus, il 28/09/2011 23:16, ha scritto:
>
>   Yep, I'm not getting more than 50-60% CPU during those load tests.
>>
>>
> I would try reducing the number of shards. A part from the memory
> discussion, this really seems to me a concurrency issue: too many threads
> waiting for other threads to complete, too many context switches...
>
> recently, on a lots-of-cores database server, we INCREASED speed by
> REDUCING the number of cores/threads each query was allowed to use (making
> sense of our customer investment)
> maybe you can get a similar effect by reducing the number of pieces your
> distributed search has to merge
>
> my 2 eurocents
>
> federico
>



--
Lance Norskog
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: strange performance issue with many shards on one server

kkrugler
In reply to this post by Frederik Kraus
Hi Frederik,

Did you figure out a solution to this problem?

I'm asking because I recently ran into a similar problem, with a similar setup (8 shards on one server).

Occasionally a query will take a very long time. Occasionally I see timeout exceptions with the HTTP requests. E.g.

> 348914 [pool-19-thread-14] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The
>  server localhost failed to respond
> 348915 [pool-19-thread-14] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request


Restarting Jetty seems to clear up the problem temporarily.

I've been looking at the code in Solr that handles distributed requests - and it's got some interesting smells, so I wouldn't be surprised if there's an issue related to how it's using HttpClient.

Regards,

-- Ken


On Sep 28, 2011, at 5:21am, Frederik Kraus wrote:

> I just had a look at the thread-dump, pasting 3 examples here:
>
>
> 'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)
> at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)
> at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
> at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)
> at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)
> at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)
> at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
> at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
> 'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
> at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> and
>
> 'http-8080-381' Id=6859, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, total cpu time=990.0000ms user time=920.0000ms
>
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
> at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164)
> at org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469)
> at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271)
> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
> at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554)
> at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
> at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> at java.lang.Thread.run(Thread.java:662)
>
>
>
>
>
>
> Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:
>
>>
>>
>> Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
>>
>>> Hi Fred,
>>>
>>> ok, it's a strange behavior with same queries.
>>> Another questions:
>>> -which solr version?
>>
>> 3.3 (might the NIOFSDirectory from 3.4 help?)
>>
>>> -do you indexing during your load test? (because of index rebuilt)
>> nope
>>
>>> -do you replicate your index?
>>
>> nope
>>>
>>> Regards
>>> Vadim
>>>
>>>
>>>
>>> 2011/9/28 Frederik Kraus <[hidden email] (mailto:[hidden email])>
>>>
>>>> Hi Vladim,
>>>>
>>>> the thing is, that those exact same queries, that take longer during a load
>>>> test, perform just fine when executed at a slower request rate and are also
>>>> random, i.e. there is no pattern in bad/slow queries.
>>>>
>>>> My first thought was some kind of contention and/or connection starvation
>>>> for the internal shard communication?
>>>>
>>>> Fred.
>>>>
>>>>
>>>> Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
>>>>
>>>>> Hi Fred,
>>>>> analyze the queries which take longer.
>>>>> We observe our queries and see the problems with q-time with queries
>>>> which
>>>>> are complex, with phrase queries or queries which contains numbers or
>>>>> special characters.
>>>>> if you don't know it:
>>>> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
>>>>> Regards
>>>>> Vadim
>>>>>
>>>>>
>>>>> 2011/9/28 Frederik Kraus <[hidden email] (mailto:[hidden email]) (mailto:
>>>> [hidden email] (mailto:[hidden email]))>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> I am experiencing a strange issue doing some load tests. Our setup:
>>>>>>
>>>>>> - 2 server with each 24 cpu cores, 130GB of RAM
>>>>>> - 10 shards per server (needed for response times) running in a single
>>>>>> tomcat instance
>>>>>> - each query queries all 20 shards (distributed search)
>>>>>>
>>>>>> - each shard holds about 1.5 mio documents (small shards are needed due
>>>> to
>>>>>> rather complex queries)
>>>>>> - all caches are warmed / high cache hit rates (99%) etc.
>>>>>>
>>>>>>
>>>>>> Now for some reason we cannot seem to fully utilize all CPU power (no
>>>> disk
>>>>>> IO), ie. increasing concurrent users doesn't increase CPU-Load at a
>>>> point,
>>>>>> decreases throughput and increases the response times of the individual
>>>>>> queries.
>>>>>>
>>>>>> Also 1-2% of the queries take significantly longer: avg somewhere at
>>>> 100ms
>>>>>> while 1-2% take 1.5s or longer.
>>>>>>
>>>>>> Any ideas are greatly appreciated :)
>>>>>>
>>>>>> Fred.
>

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr