improving search response time

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

improving search response time

Muneeb Ali
Hi All,

I need some guidance over improving search response time for our catalog search. we are using solr 1.4.0 version and have master/slave setup (3 dedicated servers, one being the master and other two slaves). The server specs are as follows:
 
Quad Core 2.5Ghz 1333mhz
12GB Ram
2x250GB disks (SATA Enterprise HDD)

Our 60GB index consists of 14million indexed documents.

I have done some of the configuration tweaks from this list: http://wiki.apache.org/lucene-java/ImproveSearchingSpeed  , including merge factor reduced to 6, minimizing number of stored fields, apart from hardware suggestions.

I would appreciate if anyone with similar background could shed some light on upgrading hardware in our situation. Or if any other configuration tweak that is not on the above list.

Thanks,

-Muneeb
Reply | Threaded
Open this post in threaded view
|

Re: improving search response time

Jan Høydahl / Cominvent
Some questions:

a) What operating system?
b) What Java container (Tomcat/Jetty)
c) What JAVA_OPTIONS? I.e. memory, garbage collection etc.
d) Example queries? I.e. what features, how many facets, sort fields etc
e) How do you load balance queries between the slaves?
f) What is your search latency now and @ what QPS? Also, where do you measure time - on the API or on the end-user page?
g) How often do you replicate?
h) Are you using warm-up-queries?
i) Are you ever optimizing your index?
j) Are you using highlighting? If so, are you using the fast vector highlighter or the regex?
k) What other search components are you using?
i) Are you using RAID setup for the disks? If so, what kind of RAID, what stripe-size and block size?

Have you benchmarked to see what the bottleneck is, i.e. what is taking the most time? Try to add &debugQuery=true and share the <debug> section with us. It includes timings for each component.

High latency could be caused by a number of different factors, and it is important to first isolate the bottleneck.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 18. aug. 2010, at 14.18, Muneeb Ali wrote:

>
> Hi All,
>
> I need some guidance over improving search response time for our catalog
> search. we are using solr 1.4.0 version and have master/slave setup (3
> dedicated servers, one being the master and other two slaves). The server
> specs are as follows:
>
> Quad Core 2.5Ghz 1333mhz
> 12GB Ram
> 2x250GB disks (SATA Enterprise HDD)
>
> Our 60GB index consists of 14million indexed documents.
>
> I have done some of the configuration tweaks from this list:
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed 
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed   , including merge
> factor reduced to 6, minimizing number of stored fields, apart from hardware
> suggestions.
>
> I would appreciate if anyone with similar background could shed some light
> on upgrading hardware in our situation. Or if any other configuration tweak
> that is not on the above list.
>
> Thanks,
>
> -Muneeb
> --
> View this message in context: http://lucene.472066.n3.nabble.com/improving-search-response-time-tp1204491p1204491.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: improving search response time

Gora Mohanty-2
In reply to this post by Muneeb Ali
On Wed, 18 Aug 2010 05:18:34 -0700 (PDT)
Muneeb Ali <[hidden email]> wrote:

>
> Hi All,
>
> I need some guidance over improving search response time for our
> catalog search.
[...]
> I would appreciate if anyone with similar background could shed
> some light on upgrading hardware in our situation. Or if any
> other configuration tweak that is not on the above list.
[...]

It would probably help if you could post some benchmarks of what
your current search response times are (from the Solr back-end,
and not from any front-end in front of it), and what your desired
response times are. You could use Apache bench, and or JMeter for
this.

As a data point, while our index size/no. of documents is smaller
(~40GB/3.6 million documents), we are seeing a mean response
time/request of ~120ms for numeric fields, at 50 simulated
simultaneous connections, with a single Solr server having 8 ~2GHz
cores, and 12GB RAM. This measure however, *is* influenced by the
effects of Solr caching. Solr is close to a factor of 10 faster
than our front-end, even though that is pulling almost everything
from Solr. So, we are happy on that front :-)

Regards,
Gora
Reply | Threaded
Open this post in threaded view
|

Re: improving search response time

Muneeb Ali
In reply to this post by Jan Høydahl / Cominvent
First, thanks very much for a prompt reply. Here is more info:

===============

a) What operating system?
Debian GNU/Linux 5.0

b) What Java container (Tomcat/Jetty) 
Jetty

c) What JAVA_OPTIONS? I.e. memory, garbage collection etc. 
-Xmx9000m   -DDEBUG   -Djava.awt.headless=true  
-Dorg.mortbay.log.class=org.mortbay.log.StdErrLog  
-Dcom.sun.management.jmxremote.port=3000
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
-javaagent:/usr/local/lib/newrelic/newrelic.jar

d) Example queries? I.e. what features, how many facets, sort fields etc
/select?start=0&rows=20&fl=id&hl=true&hl.fl=title%2Cabstract%2Cauthors&hl.fragsize=300&hl.simple.pre=<strong>&hl.simple.post=<%2Fstrong>&qt=dismax&q=gene therapy

We also get queries with filters examples:

/select?start=0&rows=20&fl=id&hl=true&hl.fl=title%2Cabstract%2Cauthors&hl.fragsize=300&hl.simple.pre=<strong>&hl.simple.post=<%2Fstrong>&qt=dismax&q=gene therapy&fq=meshterm:(gene)&fq=author:(david)

e) How do you load balance queries between the slaves?

proxy based load balance

f) What is your search latency now and @ what QPS? Also, where do you measure time - on the API or on the end-user page?

Average response time: 2600 - 3000 ms  with average throughput: 4-6 rpm (from 'new relic RPM' solr performance monitor)
 
g) How often do you replicate?
Daily (indexer runs each night) and replicates after indexing completes at master. However lately we are experiencing problems right after replication, and have to restart jetty (its most likely that slaves are running out of memory).

h) Are you using warm-up-queries?
Yes, using autoWarmCount variable in cache configuration/ these are specified as:

<filterCache class="solr.FastLRUCache"  size="5000" initialSize="1000" autowarmCount="500"/> 
<queryResultCache class="solr.LRUCache" size="10000" initialSize="20000"    autowarmCount="20000"/> 
<documentCache  class="solr.LRUCache"   size="10000"  initialSize="10000" autowarmCount="5000"/>

i) Are you ever optimizing your index?

Yes, daily after indexing. We are not doing dynamic updates to index, so I guess its not needed to be done multiple times.

j) Are you using highlighting? If so, are you using the fast vector highlighter or the regex?

Yes, we are using the default highlight component, with default fragmenter called 'gap' and not regex. solr.highlight.GapFragmenter, with fragsize=300.

k) What other search components are you using?
spellcheck component, we will be using faceting in future soon.

i) Are you using RAID setup for the disks? If so, what kind of RAID, what stripe-size and block size?

Yes, RAID-0:
$> cat /proc/mdstat
Personalities : [raid0]
md0 : active raid0 sda1[0] sdb1[1]
      449225344 blocks 64k chunks


==============

I havn't benchmarked it yet as such, however here is the debugQuery <section> from query results:

<lst name="debug">
<str name="rawquerystring">case study research</str>
<str name="querystring">case study research</str>

<str name="parsedquery">
+(DisjunctionMaxQuery((tags:case^1.2 | authors:case^7.5 | title:case^65.5 | matchAll:case | keywords:case^2.5 | meshterm:case^3.2 | abstract1:case^9.5)~0.01) DisjunctionMaxQuery((tags:studi^1.2 | authors:study^7.5 | title:study^65.5 | matchAll:study | keywords:studi^2.5 | meshterm:studi^3.2 | abstract1:studi^9.5)~0.01) DisjunctionMaxQuery((tags:research^1.2 | authors:research^7.5 | title:research^65.5 | matchAll:research | keywords:research^2.5 | meshterm:research^3.2 | abstract1:research^9.5)~0.01)) DisjunctionMaxQuery((tags:"case studi research"~50^1.2 | authors:"case study research"~50^7.5 | title:"case study research"~50^65.5 | matchAll:case study research | keywords:"case studi research"~50^2.5 | meshterm:"case studi research"~50^3.2 | abstract1:"case studi research"~50^9.5)~0.01) FunctionQuery((sum(sdouble(yearScore)))^1.1) FunctionQuery((sum(sdouble(readerScore)))^2.0)
</str>

<str name="parsedquery_toString">
+((tags:case^1.2 | authors:case^7.5 | title:case^65.5 | matchAll:case | keywords:case^2.5 | meshterm:case^3.2 | abstract1:case^9.5)~0.01 (tags:studi^1.2 | authors:study^7.5 | title:study^65.5 | matchAll:study | keywords:studi^2.5 | meshterm:studi^3.2 | abstract1:studi^9.5)~0.01 (tags:research^1.2 | authors:research^7.5 | title:research^65.5 | matchAll:research | keywords:research^2.5 | meshterm:research^3.2 | abstract1:research^9.5)~0.01) (tags:"case studi research"~50^1.2 | authors:"case study research"~50^7.5 | title:"case study research"~50^65.5 | matchAll:case study research | keywords:"case studi research"~50^2.5 | meshterm:"case studi research"~50^3.2 | abstract1:"case studi research"~50^9.5)~0.01 (sum(sdouble(yearScore)))^1.1 (sum(sdouble(readerScore)))^2.0
</str>

<lst name="explain">

<str name="7644c450-6d00-11df-a2b2-0026b95e3eb7">

9.473454 = (MATCH) sum of:
  2.247054 = (MATCH) sum of:
    0.7535966 = (MATCH) max plus 0.01 times others of:
      0.7535966 = (MATCH) weight(title:case^65.5 in 6557735), product of:
        0.29090396 = queryWeight(title:case^65.5), product of:
          65.5 = boost
          5.181068 = idf(docFreq=204956, maxDocs=13411507)
          8.5721357E-4 = queryNorm
        2.590534 = (MATCH) fieldWeight(title:case in 6557735), product of:
          1.0 = tf(termFreq(title:case)=1)
          5.181068 = idf(docFreq=204956, maxDocs=13411507)
          0.5 = fieldNorm(field=title, doc=6557735)
    0.5454388 = (MATCH) max plus 0.01 times others of:
      0.5454388 = (MATCH) weight(title:study^65.5 in 6557735), product of:
        0.24748746 = queryWeight(title:study^65.5), product of:
          65.5 = boost
          4.4078097 = idf(docFreq=444103, maxDocs=13411507)
          8.5721357E-4 = queryNorm
        2.2039049 = (MATCH) fieldWeight(title:study in 6557735), product of:
          1.0 = tf(termFreq(title:study)=1)
          4.4078097 = idf(docFreq=444103, maxDocs=13411507)
          0.5 = fieldNorm(field=title, doc=6557735)
    0.9480188 = (MATCH) max plus 0.01 times others of:
      0.9480188 = (MATCH) weight(title:research^65.5 in 6557735), product of:
        0.32627863 = queryWeight(title:research^65.5), product of:
          65.5 = boost
          5.8110995 = idf(docFreq=109154, maxDocs=13411507)
          8.5721357E-4 = queryNorm
        2.9055498 = (MATCH) fieldWeight(title:research in 6557735), product of:
          1.0 = tf(termFreq(title:research)=1)
          5.8110995 = idf(docFreq=109154, maxDocs=13411507)
          0.5 = fieldNorm(field=title, doc=6557735)
  6.6579494 = (MATCH) max plus 0.01 times others of:
    6.6579494 = weight(title:"case study research"~50^65.5 in 6557735), product of:
      0.86467004 = queryWeight(title:"case study research"~50^65.5), product of:
        65.5 = boost
        15.399977 = idf(title: case=204956 study=444103 research=109154)
        8.5721357E-4 = queryNorm
      7.6999884 = fieldWeight(title:"case study research" in 6557735), product of:
        1.0 = tf(phraseFreq=1.0)
        15.399977 = idf(title: case=204956 study=444103 research=109154)
        0.5 = fieldNorm(field=title, doc=6557735)
  0.053200547 = (MATCH) FunctionQuery(sum(sdouble(yearScore))), product of:
    56.420166 = sum(sdouble(yearScore)=56.42016783216783)
    1.1 = boost
    8.5721357E-4 = queryNorm
  0.5152504 = (MATCH) FunctionQuery(sum(sdouble(readerScore))), product of:
    300.53793 = sum(sdouble(readerScore)=300.5379289983797)
    2.0 = boost
    8.5721357E-4 = queryNorm
</str>

...
...
...

<str name="e3542c60-6d06-11df-afb8-0026b95d30b2">

9.212496 = (MATCH) sum of:
  2.247054 = (MATCH) sum of:
    0.7535966 = (MATCH) max plus 0.01 times others of:
      0.7535966 = (MATCH) weight(title:case^65.5 in 12274669), product of:
        0.29090396 = queryWeight(title:case^65.5), product of:
          65.5 = boost
          5.181068 = idf(docFreq=204956, maxDocs=13411507)
          8.5721357E-4 = queryNorm
        2.590534 = (MATCH) fieldWeight(title:case in 12274669), product of:
          1.0 = tf(termFreq(title:case)=1)
          5.181068 = idf(docFreq=204956, maxDocs=13411507)
          0.5 = fieldNorm(field=title, doc=12274669)
    0.5454388 = (MATCH) max plus 0.01 times others of:
      0.5454388 = (MATCH) weight(title:study^65.5 in 12274669), product of:
        0.24748746 = queryWeight(title:study^65.5), product of:
          65.5 = boost
          4.4078097 = idf(docFreq=444103, maxDocs=13411507)
          8.5721357E-4 = queryNorm
        2.2039049 = (MATCH) fieldWeight(title:study in 12274669), product of:
          1.0 = tf(termFreq(title:study)=1)
          4.4078097 = idf(docFreq=444103, maxDocs=13411507)
          0.5 = fieldNorm(field=title, doc=12274669)
    0.9480188 = (MATCH) max plus 0.01 times others of:
      0.9480188 = (MATCH) weight(title:research^65.5 in 12274669), product of:
        0.32627863 = queryWeight(title:research^65.5), product of:
          65.5 = boost
          5.8110995 = idf(docFreq=109154, maxDocs=13411507)
          8.5721357E-4 = queryNorm
        2.9055498 = (MATCH) fieldWeight(title:research in 12274669), product of:
          1.0 = tf(termFreq(title:research)=1)
          5.8110995 = idf(docFreq=109154, maxDocs=13411507)
          0.5 = fieldNorm(field=title, doc=12274669)
  6.6579494 = (MATCH) max plus 0.01 times others of:
    6.6579494 = weight(title:"case study research"~50^65.5 in 12274669), product of:
      0.86467004 = queryWeight(title:"case study research"~50^65.5), product of:
        65.5 = boost
        15.399977 = idf(title: case=204956 study=444103 research=109154)
        8.5721357E-4 = queryNorm
      7.6999884 = fieldWeight(title:"case study research" in 12274669), product of:
        1.0 = tf(phraseFreq=1.0)
        15.399977 = idf(title: case=204956 study=444103 research=109154)
        0.5 = fieldNorm(field=title, doc=12274669)
  0.030677302 = (MATCH) FunctionQuery(sum(sdouble(yearScore))), product of:
    32.533848 = sum(sdouble(yearScore)=32.533846153846156)
    1.1 = boost
    8.5721357E-4 = queryNorm
  0.27681494 = (MATCH) FunctionQuery(sum(sdouble(readerScore))), product of:
    161.46207 = sum(sdouble(readerScore)=161.46207100162033)
    2.0 = boost
    8.5721357E-4 = queryNorm
</str>
</lst>
<str name="QParser">DisMaxQParser</str>
<null name="altquerystring"/>

<arr name="boostfuncs">

<str>

         sum(readerScore)^2  sum(yearScore)^1.1
       
     
</str>
</arr>

<lst name="timing">
<double name="time">5468.0</double>

<lst name="prepare">
<double name="time">1.0</double>

<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">1.0</double>
</lst>

<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.SpellCheckComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>

<lst name="process">
<double name="time">5467.0</double>

<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">4734.0</double>
</lst>

<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">231.0</double>
</lst>

<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.SpellCheckComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">501.0</double>
</lst>
</lst>
</lst>
</lst>

=====================

Thanks for your help.
-Muneeb
Reply | Threaded
Open this post in threaded view
|

Re: improving search response time

Shawn Heisey-4
  Most of your time is spent doing the query itself, which in the light
of other information provided, does not surprise me.  With 12GB of RAM
and 9GB dedicated to the java heap, the available RAM for disk caching
is pretty low, especially if Solr is actually using all 9GB.

Since your index is 60GB, the system is most likely I/O bound.  
Available memory for disk cache is the best way to make Solr fast.  If
you increased to 16GB RAM, you'd probably see some performance
increase.  Going to 32GB would be better, and 64GB would let your system
load nearly the entire index into the disk cache.

Is matchAll possibly an aggregated field with information copied from
the other fields that you are searching?  If so, especially since you
are using dismax, you'd want to strongly consider dropping it entirely,
which would make your index a lot smaller.  Check your schema for
information that could be trimmed.  You might not need "stored" on some
fields, especially if the original values are available from another
source (like a database, or a central filesystem).  You may not need
advanced features on everything, like termvectors, termpositions, etc.

If you can't make significant chances in server memory or index size,
you might want to consider going distributed.  You'd need more servers.  
A few things (More Like This being the one that comes to mind) do not
work in a distributed index.

Can you reduce the java heap size and still have Solr work correctly?  
You probably do not need your internal Solr caches to be so huge, and
dropping them would greatly reduce your heap needs.  Here's my cache
settings, with the numbers being size, initialsize, then autowarm count.

filterCache: 256, 256, 0
queryResultCache: 1024, 512, 128
documentCache: 16384, 4096, n/a

I'm using distributed search with six large shards that each take up
nearly 13GB.  The machines (VMs) have 9GB of RAM and the java heap size
is 1280MB.  I'm not using a lot of the advanced features like
highlighting, so I'm not using termvectors.  Right now, we use facets
for data mining, but not in production.  My average query time is about
100 milliseconds, with each shard's average about half that.  
Autowarming usually takes about 10-20 seconds, though sometimes it
balloons to about 45 seconds.  I started out with much larger cache
numbers, but that just made my autowarm times huge.

Based on my experience, I imagine that your system takes several minutes
to autowarm your caches when you do a commit or optimize.  If you are
doing frequent updates, that would be a major drag on performance.

Two of your caches have a larger initialsize than size, with the former
meaning the number of slots allocated immediately and the latter
referring to the maximum size of the cache.  Apparently it's not leading
to any disastrous problems, but you'll want to adjust accordingly.


On 8/18/2010 9:00 AM, Muneeb Ali wrote:

> First, thanks very much for a prompt reply. Here is more info:
>
> ===============
>
> a) What operating system?
> Debian GNU/Linux 5.0
>
> b) What Java container (Tomcat/Jetty)
> Jetty
>
> c) What JAVA_OPTIONS? I.e. memory, garbage collection etc.
> -Xmx9000m   -DDEBUG   -Djava.awt.headless=true
> -Dorg.mortbay.log.class=org.mortbay.log.StdErrLog
> -Dcom.sun.management.jmxremote.port=3000
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false
> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
> -javaagent:/usr/local/lib/newrelic/newrelic.jar
>
> d) Example queries? I.e. what features, how many facets, sort fields etc
> /select?start=0&rows=20&fl=id&hl=true&hl.fl=title%2Cabstract%2Cauthors&hl.fragsize=300&hl.simple.pre=<strong>&hl.simple.post=<%2Fstrong>&qt=dismax&q=gene
> therapy
>
> We also get queries with filters examples:
>
> /select?start=0&rows=20&fl=id&hl=true&hl.fl=title%2Cabstract%2Cauthors&hl.fragsize=300&hl.simple.pre=<strong>&hl.simple.post=<%2Fstrong>&qt=dismax&q=gene
> therapy&fq=meshterm:(gene)&fq=author:(david)
>
> e) How do you load balance queries between the slaves?
>
> proxy based load balance
>
> f) What is your search latency now and @ what QPS? Also, where do you
> measure time - on the API or on the end-user page?
>
> Average response time: 2600 - 3000 ms  with average throughput: 4-6 rpm
> (from 'new relic RPM' solr performance monitor)
>
> g) How often do you replicate?
> Daily (indexer runs each night) and replicates after indexing completes at
> master. However lately we are experiencing problems right after replication,
> and have to restart jetty (its most likely that slaves are running out of
> memory).
>
> h) Are you using warm-up-queries?
> Yes, using autoWarmCount variable in cache configuration/ these are
> specified as:
>
> <filterCache class="solr.FastLRUCache"  size="5000" initialSize="1000"
> autowarmCount="500"/>
> <queryResultCache class="solr.LRUCache" size="10000" initialSize="20000"
> autowarmCount="20000"/>
> <documentCache  class="solr.LRUCache"   size="10000"  initialSize="10000"
> autowarmCount="5000"/>
>
> i) Are you ever optimizing your index?
>
> Yes, daily after indexing. We are not doing dynamic updates to index, so I
> guess its not needed to be done multiple times.
>
> j) Are you using highlighting? If so, are you using the fast vector
> highlighter or the regex?
>
> Yes, we are using the default highlight component, with default fragmenter
> called 'gap' and not regex. solr.highlight.GapFragmenter, with fragsize=300.
>
> k) What other search components are you using?
> spellcheck component, we will be using faceting in future soon.
>
> i) Are you using RAID setup for the disks? If so, what kind of RAID, what
> stripe-size and block size?
>
> Yes, RAID-0:
> $>  cat /proc/mdstat
> Personalities : [raid0]
> md0 : active raid0 sda1[0] sdb1[1]
>        449225344 blocks 64k chunks
>
>
> ==============
>
> I havn't benchmarked it yet as such, however here is the debugQuery
> <section>  from query results:
>
> <lst name="debug">
> <str name="rawquerystring">case study research</str>
> <str name="querystring">case study research</str>
> −
> <str name="parsedquery">
> +(DisjunctionMaxQuery((tags:case^1.2 | authors:case^7.5 | title:case^65.5 |
> matchAll:case | keywords:case^2.5 | meshterm:case^3.2 |
> abstract1:case^9.5)~0.01) DisjunctionMaxQuery((tags:studi^1.2 |
> authors:study^7.5 | title:study^65.5 | matchAll:study | keywords:studi^2.5 |
> meshterm:studi^3.2 | abstract1:studi^9.5)~0.01)
> DisjunctionMaxQuery((tags:research^1.2 | authors:research^7.5 |
> title:research^65.5 | matchAll:research | keywords:research^2.5 |
> meshterm:research^3.2 | abstract1:research^9.5)~0.01))
> DisjunctionMaxQuery((tags:"case studi research"~50^1.2 | authors:"case study
> research"~50^7.5 | title:"case study research"~50^65.5 | matchAll:case study
> research | keywords:"case studi research"~50^2.5 | meshterm:"case studi
> research"~50^3.2 | abstract1:"case studi research"~50^9.5)~0.01)
> FunctionQuery((sum(sdouble(yearScore)))^1.1)
> FunctionQuery((sum(sdouble(readerScore)))^2.0)
> </str>
> −
> <str name="parsedquery_toString">
> +((tags:case^1.2 | authors:case^7.5 | title:case^65.5 | matchAll:case |
> keywords:case^2.5 | meshterm:case^3.2 | abstract1:case^9.5)~0.01
> (tags:studi^1.2 | authors:study^7.5 | title:study^65.5 | matchAll:study |
> keywords:studi^2.5 | meshterm:studi^3.2 | abstract1:studi^9.5)~0.01
> (tags:research^1.2 | authors:research^7.5 | title:research^65.5 |
> matchAll:research | keywords:research^2.5 | meshterm:research^3.2 |
> abstract1:research^9.5)~0.01) (tags:"case studi research"~50^1.2 |
> authors:"case study research"~50^7.5 | title:"case study research"~50^65.5 |
> matchAll:case study research | keywords:"case studi research"~50^2.5 |
> meshterm:"case studi research"~50^3.2 | abstract1:"case studi
> research"~50^9.5)~0.01 (sum(sdouble(yearScore)))^1.1
> (sum(sdouble(readerScore)))^2.0
> </str>
> −
> <lst name="explain">
> −
> <str name="7644c450-6d00-11df-a2b2-0026b95e3eb7">
>
> 9.473454 = (MATCH) sum of:
>    2.247054 = (MATCH) sum of:
>      0.7535966 = (MATCH) max plus 0.01 times others of:
>        0.7535966 = (MATCH) weight(title:case^65.5 in 6557735), product of:
>          0.29090396 = queryWeight(title:case^65.5), product of:
>            65.5 = boost
>            5.181068 = idf(docFreq=204956, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.590534 = (MATCH) fieldWeight(title:case in 6557735), product of:
>            1.0 = tf(termFreq(title:case)=1)
>            5.181068 = idf(docFreq=204956, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=6557735)
>      0.5454388 = (MATCH) max plus 0.01 times others of:
>        0.5454388 = (MATCH) weight(title:study^65.5 in 6557735), product of:
>          0.24748746 = queryWeight(title:study^65.5), product of:
>            65.5 = boost
>            4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.2039049 = (MATCH) fieldWeight(title:study in 6557735), product of:
>            1.0 = tf(termFreq(title:study)=1)
>            4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=6557735)
>      0.9480188 = (MATCH) max plus 0.01 times others of:
>        0.9480188 = (MATCH) weight(title:research^65.5 in 6557735), product
> of:
>          0.32627863 = queryWeight(title:research^65.5), product of:
>            65.5 = boost
>            5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.9055498 = (MATCH) fieldWeight(title:research in 6557735), product
> of:
>            1.0 = tf(termFreq(title:research)=1)
>            5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=6557735)
>    6.6579494 = (MATCH) max plus 0.01 times others of:
>      6.6579494 = weight(title:"case study research"~50^65.5 in 6557735),
> product of:
>        0.86467004 = queryWeight(title:"case study research"~50^65.5), product
> of:
>          65.5 = boost
>          15.399977 = idf(title: case=204956 study=444103 research=109154)
>          8.5721357E-4 = queryNorm
>        7.6999884 = fieldWeight(title:"case study research" in 6557735),
> product of:
>          1.0 = tf(phraseFreq=1.0)
>          15.399977 = idf(title: case=204956 study=444103 research=109154)
>          0.5 = fieldNorm(field=title, doc=6557735)
>    0.053200547 = (MATCH) FunctionQuery(sum(sdouble(yearScore))), product of:
>      56.420166 = sum(sdouble(yearScore)=56.42016783216783)
>      1.1 = boost
>      8.5721357E-4 = queryNorm
>    0.5152504 = (MATCH) FunctionQuery(sum(sdouble(readerScore))), product of:
>      300.53793 = sum(sdouble(readerScore)=300.5379289983797)
>      2.0 = boost
>      8.5721357E-4 = queryNorm
> </str>
> −
> ...
> ...
> ...
> −
> <str name="e3542c60-6d06-11df-afb8-0026b95d30b2">
>
> 9.212496 = (MATCH) sum of:
>    2.247054 = (MATCH) sum of:
>      0.7535966 = (MATCH) max plus 0.01 times others of:
>        0.7535966 = (MATCH) weight(title:case^65.5 in 12274669), product of:
>          0.29090396 = queryWeight(title:case^65.5), product of:
>            65.5 = boost
>            5.181068 = idf(docFreq=204956, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.590534 = (MATCH) fieldWeight(title:case in 12274669), product of:
>            1.0 = tf(termFreq(title:case)=1)
>            5.181068 = idf(docFreq=204956, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=12274669)
>      0.5454388 = (MATCH) max plus 0.01 times others of:
>        0.5454388 = (MATCH) weight(title:study^65.5 in 12274669), product of:
>          0.24748746 = queryWeight(title:study^65.5), product of:
>            65.5 = boost
>            4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.2039049 = (MATCH) fieldWeight(title:study in 12274669), product
> of:
>            1.0 = tf(termFreq(title:study)=1)
>            4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=12274669)
>      0.9480188 = (MATCH) max plus 0.01 times others of:
>        0.9480188 = (MATCH) weight(title:research^65.5 in 12274669), product
> of:
>          0.32627863 = queryWeight(title:research^65.5), product of:
>            65.5 = boost
>            5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.9055498 = (MATCH) fieldWeight(title:research in 12274669), product
> of:
>            1.0 = tf(termFreq(title:research)=1)
>            5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=12274669)
>    6.6579494 = (MATCH) max plus 0.01 times others of:
>      6.6579494 = weight(title:"case study research"~50^65.5 in 12274669),
> product of:
>        0.86467004 = queryWeight(title:"case study research"~50^65.5), product
> of:
>          65.5 = boost
>          15.399977 = idf(title: case=204956 study=444103 research=109154)
>          8.5721357E-4 = queryNorm
>        7.6999884 = fieldWeight(title:"case study research" in 12274669),
> product of:
>          1.0 = tf(phraseFreq=1.0)
>          15.399977 = idf(title: case=204956 study=444103 research=109154)
>          0.5 = fieldNorm(field=title, doc=12274669)
>    0.030677302 = (MATCH) FunctionQuery(sum(sdouble(yearScore))), product of:
>      32.533848 = sum(sdouble(yearScore)=32.533846153846156)
>      1.1 = boost
>      8.5721357E-4 = queryNorm
>    0.27681494 = (MATCH) FunctionQuery(sum(sdouble(readerScore))), product of:
>      161.46207 = sum(sdouble(readerScore)=161.46207100162033)
>      2.0 = boost
>      8.5721357E-4 = queryNorm
> </str>
> </lst>
> <str name="QParser">DisMaxQParser</str>
> <null name="altquerystring"/>
> −
> <arr name="boostfuncs">
> −
> <str>
>
>           sum(readerScore)^2  sum(yearScore)^1.1
>
>
> </str>
> </arr>
> −
> <lst name="timing">
> <double name="time">5468.0</double>
> −
> <lst name="prepare">
> <double name="time">1.0</double>
> −
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">1.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.SpellCheckComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">0.0</double>
> </lst>
> </lst>
> −
> <lst name="process">
> <double name="time">5467.0</double>
> −
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">4734.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">231.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.SpellCheckComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">501.0</double>
> </lst>
> </lst>
> </lst>
> </lst>

Reply | Threaded
Open this post in threaded view
|

Re: improving search response time

Lance Norskog-2
More on this: you should give Solr enough memory to run comfortably,
then stop. Leave as much as you can for the OS to manage its disk
cache. The OS is better at this than Solr is. Also, it does not have
to do garbage collection.

Filter queries are a big help. You should create a set of your basic
filter queries, then compose them as needed. Filters AND together.
Lucene applies them very early in the search process, and they are
effective at cutting the amount of relevance/ranking calculation.

If you want to be really adventurous, there is a crazy new operating
system hack called 'giant pages'. You'll need IT experience to try
this. You'll have to do your own research, sorry.

On Wed, Aug 18, 2010 at 9:27 AM, Shawn Heisey <[hidden email]> wrote:

>  Most of your time is spent doing the query itself, which in the light of
> other information provided, does not surprise me.  With 12GB of RAM and 9GB
> dedicated to the java heap, the available RAM for disk caching is pretty
> low, especially if Solr is actually using all 9GB.
>
> Since your index is 60GB, the system is most likely I/O bound.  Available
> memory for disk cache is the best way to make Solr fast.  If you increased
> to 16GB RAM, you'd probably see some performance increase.  Going to 32GB
> would be better, and 64GB would let your system load nearly the entire index
> into the disk cache.
>
> Is matchAll possibly an aggregated field with information copied from the
> other fields that you are searching?  If so, especially since you are using
> dismax, you'd want to strongly consider dropping it entirely, which would
> make your index a lot smaller.  Check your schema for information that could
> be trimmed.  You might not need "stored" on some fields, especially if the
> original values are available from another source (like a database, or a
> central filesystem).  You may not need advanced features on everything, like
> termvectors, termpositions, etc.
>
> If you can't make significant chances in server memory or index size, you
> might want to consider going distributed.  You'd need more servers.  A few
> things (More Like This being the one that comes to mind) do not work in a
> distributed index.
>
> Can you reduce the java heap size and still have Solr work correctly?  You
> probably do not need your internal Solr caches to be so huge, and dropping
> them would greatly reduce your heap needs.  Here's my cache settings, with
> the numbers being size, initialsize, then autowarm count.
>
> filterCache: 256, 256, 0
> queryResultCache: 1024, 512, 128
> documentCache: 16384, 4096, n/a
>
> I'm using distributed search with six large shards that each take up nearly
> 13GB.  The machines (VMs) have 9GB of RAM and the java heap size is 1280MB.
>  I'm not using a lot of the advanced features like highlighting, so I'm not
> using termvectors.  Right now, we use facets for data mining, but not in
> production.  My average query time is about 100 milliseconds, with each
> shard's average about half that.  Autowarming usually takes about 10-20
> seconds, though sometimes it balloons to about 45 seconds.  I started out
> with much larger cache numbers, but that just made my autowarm times huge.
>
> Based on my experience, I imagine that your system takes several minutes to
> autowarm your caches when you do a commit or optimize.  If you are doing
> frequent updates, that would be a major drag on performance.
>
> Two of your caches have a larger initialsize than size, with the former
> meaning the number of slots allocated immediately and the latter referring
> to the maximum size of the cache.  Apparently it's not leading to any
> disastrous problems, but you'll want to adjust accordingly.
>
>
> On 8/18/2010 9:00 AM, Muneeb Ali wrote:
>>
>> First, thanks very much for a prompt reply. Here is more info:
>>
>> ===============
>>
>> a) What operating system?
>> Debian GNU/Linux 5.0
>>
>> b) What Java container (Tomcat/Jetty)
>> Jetty
>>
>> c) What JAVA_OPTIONS? I.e. memory, garbage collection etc.
>> -Xmx9000m   -DDEBUG   -Djava.awt.headless=true
>> -Dorg.mortbay.log.class=org.mortbay.log.StdErrLog
>> -Dcom.sun.management.jmxremote.port=3000
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dcom.sun.management.jmxremote.ssl=false
>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
>> -javaagent:/usr/local/lib/newrelic/newrelic.jar
>>
>> d) Example queries? I.e. what features, how many facets, sort fields etc
>>
>> /select?start=0&rows=20&fl=id&hl=true&hl.fl=title%2Cabstract%2Cauthors&hl.fragsize=300&hl.simple.pre=<strong>&hl.simple.post=<%2Fstrong>&qt=dismax&q=gene
>> therapy
>>
>> We also get queries with filters examples:
>>
>>
>> /select?start=0&rows=20&fl=id&hl=true&hl.fl=title%2Cabstract%2Cauthors&hl.fragsize=300&hl.simple.pre=<strong>&hl.simple.post=<%2Fstrong>&qt=dismax&q=gene
>> therapy&fq=meshterm:(gene)&fq=author:(david)
>>
>> e) How do you load balance queries between the slaves?
>>
>> proxy based load balance
>>
>> f) What is your search latency now and @ what QPS? Also, where do you
>> measure time - on the API or on the end-user page?
>>
>> Average response time: 2600 - 3000 ms  with average throughput: 4-6 rpm
>> (from 'new relic RPM' solr performance monitor)
>>
>> g) How often do you replicate?
>> Daily (indexer runs each night) and replicates after indexing completes at
>> master. However lately we are experiencing problems right after
>> replication,
>> and have to restart jetty (its most likely that slaves are running out of
>> memory).
>>
>> h) Are you using warm-up-queries?
>> Yes, using autoWarmCount variable in cache configuration/ these are
>> specified as:
>>
>> <filterCache class="solr.FastLRUCache"  size="5000" initialSize="1000"
>> autowarmCount="500"/>
>> <queryResultCache class="solr.LRUCache" size="10000" initialSize="20000"
>> autowarmCount="20000"/>
>> <documentCache  class="solr.LRUCache"   size="10000"  initialSize="10000"
>> autowarmCount="5000"/>
>>
>> i) Are you ever optimizing your index?
>>
>> Yes, daily after indexing. We are not doing dynamic updates to index, so I
>> guess its not needed to be done multiple times.
>>
>> j) Are you using highlighting? If so, are you using the fast vector
>> highlighter or the regex?
>>
>> Yes, we are using the default highlight component, with default fragmenter
>> called 'gap' and not regex. solr.highlight.GapFragmenter, with
>> fragsize=300.
>>
>> k) What other search components are you using?
>> spellcheck component, we will be using faceting in future soon.
>>
>> i) Are you using RAID setup for the disks? If so, what kind of RAID, what
>> stripe-size and block size?
>>
>> Yes, RAID-0:
>> $>  cat /proc/mdstat
>> Personalities : [raid0]
>> md0 : active raid0 sda1[0] sdb1[1]
>>       449225344 blocks 64k chunks
>>
>>
>> ==============
>>
>> I havn't benchmarked it yet as such, however here is the debugQuery
>> <section>  from query results:
>>
>> <lst name="debug">
>> <str name="rawquerystring">case study research</str>
>> <str name="querystring">case study research</str>
>> −
>> <str name="parsedquery">
>> +(DisjunctionMaxQuery((tags:case^1.2 | authors:case^7.5 | title:case^65.5
>> |
>> matchAll:case | keywords:case^2.5 | meshterm:case^3.2 |
>> abstract1:case^9.5)~0.01) DisjunctionMaxQuery((tags:studi^1.2 |
>> authors:study^7.5 | title:study^65.5 | matchAll:study | keywords:studi^2.5
>> |
>> meshterm:studi^3.2 | abstract1:studi^9.5)~0.01)
>> DisjunctionMaxQuery((tags:research^1.2 | authors:research^7.5 |
>> title:research^65.5 | matchAll:research | keywords:research^2.5 |
>> meshterm:research^3.2 | abstract1:research^9.5)~0.01))
>> DisjunctionMaxQuery((tags:"case studi research"~50^1.2 | authors:"case
>> study
>> research"~50^7.5 | title:"case study research"~50^65.5 | matchAll:case
>> study
>> research | keywords:"case studi research"~50^2.5 | meshterm:"case studi
>> research"~50^3.2 | abstract1:"case studi research"~50^9.5)~0.01)
>> FunctionQuery((sum(sdouble(yearScore)))^1.1)
>> FunctionQuery((sum(sdouble(readerScore)))^2.0)
>> </str>
>> −
>> <str name="parsedquery_toString">
>> +((tags:case^1.2 | authors:case^7.5 | title:case^65.5 | matchAll:case |
>> keywords:case^2.5 | meshterm:case^3.2 | abstract1:case^9.5)~0.01
>> (tags:studi^1.2 | authors:study^7.5 | title:study^65.5 | matchAll:study |
>> keywords:studi^2.5 | meshterm:studi^3.2 | abstract1:studi^9.5)~0.01
>> (tags:research^1.2 | authors:research^7.5 | title:research^65.5 |
>> matchAll:research | keywords:research^2.5 | meshterm:research^3.2 |
>> abstract1:research^9.5)~0.01) (tags:"case studi research"~50^1.2 |
>> authors:"case study research"~50^7.5 | title:"case study research"~50^65.5
>> |
>> matchAll:case study research | keywords:"case studi research"~50^2.5 |
>> meshterm:"case studi research"~50^3.2 | abstract1:"case studi
>> research"~50^9.5)~0.01 (sum(sdouble(yearScore)))^1.1
>> (sum(sdouble(readerScore)))^2.0
>> </str>
>> −
>> <lst name="explain">
>> −
>> <str name="7644c450-6d00-11df-a2b2-0026b95e3eb7">
>>
>> 9.473454 = (MATCH) sum of:
>>   2.247054 = (MATCH) sum of:
>>     0.7535966 = (MATCH) max plus 0.01 times others of:
>>       0.7535966 = (MATCH) weight(title:case^65.5 in 6557735), product of:
>>         0.29090396 = queryWeight(title:case^65.5), product of:
>>           65.5 = boost
>>           5.181068 = idf(docFreq=204956, maxDocs=13411507)
>>           8.5721357E-4 = queryNorm
>>         2.590534 = (MATCH) fieldWeight(title:case in 6557735), product of:
>>           1.0 = tf(termFreq(title:case)=1)
>>           5.181068 = idf(docFreq=204956, maxDocs=13411507)
>>           0.5 = fieldNorm(field=title, doc=6557735)
>>     0.5454388 = (MATCH) max plus 0.01 times others of:
>>       0.5454388 = (MATCH) weight(title:study^65.5 in 6557735), product of:
>>         0.24748746 = queryWeight(title:study^65.5), product of:
>>           65.5 = boost
>>           4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>>           8.5721357E-4 = queryNorm
>>         2.2039049 = (MATCH) fieldWeight(title:study in 6557735), product
>> of:
>>           1.0 = tf(termFreq(title:study)=1)
>>           4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>>           0.5 = fieldNorm(field=title, doc=6557735)
>>     0.9480188 = (MATCH) max plus 0.01 times others of:
>>       0.9480188 = (MATCH) weight(title:research^65.5 in 6557735), product
>> of:
>>         0.32627863 = queryWeight(title:research^65.5), product of:
>>           65.5 = boost
>>           5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>>           8.5721357E-4 = queryNorm
>>         2.9055498 = (MATCH) fieldWeight(title:research in 6557735),
>> product
>> of:
>>           1.0 = tf(termFreq(title:research)=1)
>>           5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>>           0.5 = fieldNorm(field=title, doc=6557735)
>>   6.6579494 = (MATCH) max plus 0.01 times others of:
>>     6.6579494 = weight(title:"case study research"~50^65.5 in 6557735),
>> product of:
>>       0.86467004 = queryWeight(title:"case study research"~50^65.5),
>> product
>> of:
>>         65.5 = boost
>>         15.399977 = idf(title: case=204956 study=444103 research=109154)
>>         8.5721357E-4 = queryNorm
>>       7.6999884 = fieldWeight(title:"case study research" in 6557735),
>> product of:
>>         1.0 = tf(phraseFreq=1.0)
>>         15.399977 = idf(title: case=204956 study=444103 research=109154)
>>         0.5 = fieldNorm(field=title, doc=6557735)
>>   0.053200547 = (MATCH) FunctionQuery(sum(sdouble(yearScore))), product
>> of:
>>     56.420166 = sum(sdouble(yearScore)=56.42016783216783)
>>     1.1 = boost
>>     8.5721357E-4 = queryNorm
>>   0.5152504 = (MATCH) FunctionQuery(sum(sdouble(readerScore))), product
>> of:
>>     300.53793 = sum(sdouble(readerScore)=300.5379289983797)
>>     2.0 = boost
>>     8.5721357E-4 = queryNorm
>> </str>
>> −
>> ...
>> ...
>> ...
>> −
>> <str name="e3542c60-6d06-11df-afb8-0026b95d30b2">
>>
>> 9.212496 = (MATCH) sum of:
>>   2.247054 = (MATCH) sum of:
>>     0.7535966 = (MATCH) max plus 0.01 times others of:
>>       0.7535966 = (MATCH) weight(title:case^65.5 in 12274669), product of:
>>         0.29090396 = queryWeight(title:case^65.5), product of:
>>           65.5 = boost
>>           5.181068 = idf(docFreq=204956, maxDocs=13411507)
>>           8.5721357E-4 = queryNorm
>>         2.590534 = (MATCH) fieldWeight(title:case in 12274669), product
>> of:
>>           1.0 = tf(termFreq(title:case)=1)
>>           5.181068 = idf(docFreq=204956, maxDocs=13411507)
>>           0.5 = fieldNorm(field=title, doc=12274669)
>>     0.5454388 = (MATCH) max plus 0.01 times others of:
>>       0.5454388 = (MATCH) weight(title:study^65.5 in 12274669), product
>> of:
>>         0.24748746 = queryWeight(title:study^65.5), product of:
>>           65.5 = boost
>>           4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>>           8.5721357E-4 = queryNorm
>>         2.2039049 = (MATCH) fieldWeight(title:study in 12274669), product
>> of:
>>           1.0 = tf(termFreq(title:study)=1)
>>           4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>>           0.5 = fieldNorm(field=title, doc=12274669)
>>     0.9480188 = (MATCH) max plus 0.01 times others of:
>>       0.9480188 = (MATCH) weight(title:research^65.5 in 12274669), product
>> of:
>>         0.32627863 = queryWeight(title:research^65.5), product of:
>>           65.5 = boost
>>           5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>>           8.5721357E-4 = queryNorm
>>         2.9055498 = (MATCH) fieldWeight(title:research in 12274669),
>> product
>> of:
>>           1.0 = tf(termFreq(title:research)=1)
>>           5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>>           0.5 = fieldNorm(field=title, doc=12274669)
>>   6.6579494 = (MATCH) max plus 0.01 times others of:
>>     6.6579494 = weight(title:"case study research"~50^65.5 in 12274669),
>> product of:
>>       0.86467004 = queryWeight(title:"case study research"~50^65.5),
>> product
>> of:
>>         65.5 = boost
>>         15.399977 = idf(title: case=204956 study=444103 research=109154)
>>         8.5721357E-4 = queryNorm
>>       7.6999884 = fieldWeight(title:"case study research" in 12274669),
>> product of:
>>         1.0 = tf(phraseFreq=1.0)
>>         15.399977 = idf(title: case=204956 study=444103 research=109154)
>>         0.5 = fieldNorm(field=title, doc=12274669)
>>   0.030677302 = (MATCH) FunctionQuery(sum(sdouble(yearScore))), product
>> of:
>>     32.533848 = sum(sdouble(yearScore)=32.533846153846156)
>>     1.1 = boost
>>     8.5721357E-4 = queryNorm
>>   0.27681494 = (MATCH) FunctionQuery(sum(sdouble(readerScore))), product
>> of:
>>     161.46207 = sum(sdouble(readerScore)=161.46207100162033)
>>     2.0 = boost
>>     8.5721357E-4 = queryNorm
>> </str>
>> </lst>
>> <str name="QParser">DisMaxQParser</str>
>> <null name="altquerystring"/>
>> −
>> <arr name="boostfuncs">
>> −
>> <str>
>>
>>          sum(readerScore)^2  sum(yearScore)^1.1
>>
>>
>> </str>
>> </arr>
>> −
>> <lst name="timing">
>> <double name="time">5468.0</double>
>> −
>> <lst name="prepare">
>> <double name="time">1.0</double>
>> −
>> <lst name="org.apache.solr.handler.component.QueryComponent">
>> <double name="time">1.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.FacetComponent">
>> <double name="time">0.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
>> <double name="time">0.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.HighlightComponent">
>> <double name="time">0.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.StatsComponent">
>> <double name="time">0.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.SpellCheckComponent">
>> <double name="time">0.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.DebugComponent">
>> <double name="time">0.0</double>
>> </lst>
>> </lst>
>> −
>> <lst name="process">
>> <double name="time">5467.0</double>
>> −
>> <lst name="org.apache.solr.handler.component.QueryComponent">
>> <double name="time">4734.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.FacetComponent">
>> <double name="time">0.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
>> <double name="time">0.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.HighlightComponent">
>> <double name="time">231.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.StatsComponent">
>> <double name="time">0.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.SpellCheckComponent">
>> <double name="time">0.0</double>
>> </lst>
>> −
>> <lst name="org.apache.solr.handler.component.DebugComponent">
>> <double name="time">501.0</double>
>> </lst>
>> </lst>
>> </lst>
>> </lst>
>
>



--
Lance Norskog
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: improving search response time

Muneeb Ali
Thanks for your input guys. I will surely try these suggestions, in particular, reducing heap size JAVA_OPTION and adjusting cache sizes to see if that makes a difference.

I am also considering upgrading RAM for slave nodes, and also looking into moving from SATA enterprise HDD to SSD flash/DRAM storage... Is anyone using SSDs for solr application?

What would be a better route to take? more memory or flash based SSD hard drive?

Thanks,
-Muneeb

Reply | Threaded
Open this post in threaded view
|

Re: improving search response time

Jan Høydahl / Cominvent
It is crucial to MEASURE your system to confirm your bottleneck.
I agree that you are very likely to be disk I/O bound with such little
memory left for the OS, a large index and many terms in each query.

Have your IT guys do some monitoring on your disks and log this while
under load. Then you should easily be able to see whether disk I/O
is peaking while CPU is health.

You should also look into whether you can shorten down your query size:

+((tags:case^1.2 | authors:case^7.5 | title:case^65.5 | matchAll:case |
keywords:case^2.5 | meshterm:case^3.2 | abstract1:case^9.5)~0.01
(tags:studi^1.2 | authors:study^7.5 | title:study^65.5 | matchAll:study |
keywords:studi^2.5 | meshterm:studi^3.2 | abstract1:studi^9.5)~0.01
(tags:research^1.2 | authors:research^7.5 | title:research^65.5 |
matchAll:research | keywords:research^2.5 | meshterm:research^3.2 |
abstract1:research^9.5)~0.01) (tags:"case studi research"~50^1.2 |
authors:"case study research"~50^7.5 | title:"case study research"~50^65.5 |
matchAll:case study research | keywords:"case studi research"~50^2.5 |
meshterm:"case studi research"~50^3.2 | abstract1:"case studi
research"~50^9.5)~0.01 (sum(sdouble(yearScore)))^1.1
(sum(sdouble(readerScore)))^2.0

Do you need "pf" at all? Can you smash together similarly weighted fields
with copyfield into a new one, reducing the number of fiels to lookup
from 7 to perhaps 5?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 19. aug. 2010, at 16.58, Muneeb Ali wrote:

>
> Thanks for your input guys. I will surely try these suggestions, in
> particular, reducing heap size JAVA_OPTION and adjusting cache sizes to see
> if that makes a difference.
>
> I am also considering upgrading RAM for slave nodes, and also looking into
> moving from SATA enterprise HDD to SSD flash/DRAM storage... Is anyone using
> SSDs for solr application?
>
> What would be a better route to take? more memory or flash based SSD hard
> drive?
>
> Thanks,
> -Muneeb
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/improving-search-response-time-tp1204491p1226372.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: improving search response time

Anurag
In reply to this post by Lance Norskog-2
I am using spellchecker in the query part. Now my search time has become more. say initiallly it was 1000ms now its 3000ms.I have data index of size 9GB.
My queryhttp://localhost:8983/solr/spellCheckCompRH/?q="+search+"&spellcheck=true&fl=spellcheck,title,url,hl&hl=true&start=0&rows=10&indent=on

How can i improve the search time.
i have
1) Fedora 11 as OS
2) Solr run on Jetty Server
3) Front page (search page) is on Tomcat 6
4)Index size is 9GB
5)RAM is 1GB

Kumar Anurag
Reply | Threaded
Open this post in threaded view
|

Re: improving search response time

Shawn Heisey-4
On 12/21/2010 3:02 AM, Anurag wrote:

> I am using spellchecker in the query part. Now my search time has become
> more. say initiallly it was 1000ms now its 3000ms.I have data index of size
> 9GB.
> My query http://localhost:8983/solr/spellCheckCompRH/?q=
> http://localhost:8983/solr/spellCheckCompRH/?q="+search+"&spellcheck=true&fl=spellcheck,title,url,hl&hl=true&start=0&rows=10&indent=on
>
> How can i improve the search time.
> i have
> 1) Fedora 11 as OS
> 2) Solr run on Jetty Server
> 3) Front page (search page) is on Tomcat 6
> 4)Index size is 9GB
> 5)RAM is 1GB

Install more memory.  8GB would be a good place to be, more would let
you fit your entire index into RAM for incredible speed.  Once you get
above 4GB RAM, it's best if you run a 64-bit OS and Java, which requires
64-bit processors.  If your index is growing, you might want to have
more memory than that.

Shawn

Reply | Threaded
Open this post in threaded view
|

Re: improving search response time

Anurag
Thanks  a lot!
you mean i have to increase the resources.
1.Can the distributed search improve the speed.?
2.I have read from some thread that spellchecker takes time.Is spellchecker is one of the curlprit for  more response time?

On Tue, Dec 21, 2010 at 10:20 PM, Shawn Heisey-4 [via Lucene] <[hidden email]> wrote:
On 12/21/2010 3:02 AM, Anurag wrote:

> I am using spellchecker in the query part. Now my search time has become
> more. say initiallly it was 1000ms now its 3000ms.I have data index of size
> 9GB.
> My query http://localhost:8983/solr/spellCheckCompRH/?q=

> http://localhost:8983/solr/spellCheckCompRH/?q="+search+"&spellcheck=true&fl=spellcheck,title,url,hl&hl=true&start=0&rows=10&indent=on
>
> How can i improve the search time.
> i have
> 1) Fedora 11 as OS
> 2) Solr run on Jetty Server
> 3) Front page (search page) is on Tomcat 6
> 4)Index size is 9GB
> 5)RAM is 1GB

Install more memory.  8GB would be a good place to be, more would let
you fit your entire index into RAM for incredible speed.  Once you get
above 4GB RAM, it's best if you run a 64-bit OS and Java, which requires
64-bit processors.  If your index is growing, you might want to have
more memory than that.

Shawn




View message @ http://lucene.472066.n3.nabble.com/improving-search-response-time-tp1204491p2126869.html
To unsubscribe from improving search response time, click here.



--
Kumar Anurag
Kumar Anurag