improve performance after commit

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

improve performance after commit

Kaan Erdener
hello,

I'm looking for some tips / suggestions around reducing the query  
time for Solr after I've post'ed a commit request. My Lucene index  
contains around 2,000,000 documents, and I have a job that  
periodically removes artibrary documents from Lucene and replaces  
them with fresh copies from a database. Whenever that cycle occurs, I  
send a commit to Solr to expose the updates. The problem is that  
immediately after the commit, a Solr query that previously took  
5-20ms now takes 20-25 seconds. Ouch.

I know that commit can be expensive, although I don't know by how  
much, or what I might do to mitigate the expense. I haven't much doc  
around this topic. I've also tried different cache settings  
(basically using high values for cache and auto-warm sizes) but that  
doesn't seem to make much of a difference.

I'll keep investigating on my own, but if anyone has any suggestions  
or additional info, I would greatly appreciate it.

thanks,
Kaan


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace
Managed Hosting. Any dissemination, distribution or copying of the enclosed
material is prohibited. If you receive this transmission in error, please
notify us immediately by e-mail at [hidden email], and delete the
original message. Your cooperation is appreciated.

Reply | Threaded
Open this post in threaded view
|

Re: improve performance after commit

Yonik Seeley-2
On 3/6/07, Kaan Erdener <[hidden email]> wrote:
> I'm looking for some tips / suggestions around reducing the query
> time for Solr after I've post'ed a commit request. My Lucene index
> contains around 2,000,000 documents, and I have a job that
> periodically removes artibrary documents from Lucene and replaces
> them with fresh copies from a database. Whenever that cycle occurs, I
> send a commit to Solr to expose the updates. The problem is that
> immediately after the commit, a Solr query that previously took
> 5-20ms now takes 20-25 seconds. Ouch.

If this is a normal query (no faceting) then most likely the time is spent
populating a lucene FieldCache entry used for sorting results.
Put a static warming entry in solrconfig.xml that queries for a small
number of documents and sorts that query by all the fields you
commonly sort by.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: improve performance after commit

Kaan Erdener

On Mar 6, 2007, at 1:55 PM, Yonik Seeley wrote:

> On 3/6/07, Kaan Erdener <[hidden email]> wrote:
>> I'm looking for some tips / suggestions around reducing the query
>> time for Solr after I've post'ed a commit request. My Lucene index
>> contains around 2,000,000 documents, and I have a job that
>> periodically removes artibrary documents from Lucene and replaces
>> them with fresh copies from a database. Whenever that cycle occurs, I
>> send a commit to Solr to expose the updates. The problem is that
>> immediately after the commit, a Solr query that previously took
>> 5-20ms now takes 20-25 seconds. Ouch.
>
> If this is a normal query (no faceting) then most likely the time  
> is spent
> populating a lucene FieldCache entry used for sorting results.
> Put a static warming entry in solrconfig.xml that queries for a small
> number of documents and sorts that query by all the fields you
> commonly sort by.
>
> -Yonik

I'm not exactly sure this is what you meant, but I did some more  
research and it looks close. I added the following to my solrconfig.xml:

     <listener event="newSearcher" class="solr.QuerySenderListener">
       <arr name="queries">
         <lst> <str name="q">allMessageContent:test</str> <str  
name="start">0</str> <str name="rows">10</str> </lst>
       </arr>
     </listener>

and also:

     <listener event="firstSearcher" class="solr.QuerySenderListener">
       <arr name="queries">
         <lst> <str name="q">allMessageContent:trying</str> <str  
name="start">0</str> <str name="rows">10</str> </lst>
       </arr>
     </listener>

 From what I can see in the logs, these are both invoked after the  
commit. However, the query times after a commit are still slow  
(around 20 seconds). I'm guessing I didn't set up the warming  
correctly? I had some sorting parameters in there, but the syntax was  
wrong, produced errors on startup, so I took them out for now.

Mar 6, 2007 4:51:52 PM org.apache.solr.update.DirectUpdateHandler2  
commit
INFO: end_commit_flush
Mar 6, 2007 4:51:52 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@4cd580 main from Searcher@7be8c2 main
         documentCache
{lookups=10,hits=0,hitratio=0.00,inserts=20,evictions=0,size=20,cumulati
ve_lookups=120,cumulative_hits=68,cumulative_hitratio=0.56,cumulative_in
serts=52,cumulative_evictions=0}
Mar 6, 2007 4:51:52 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@4cd580 main
         documentCache
{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_
lookups=120,cumulative_hits=68,cumulative_hitratio=0.56,cumulative_inser
ts=52,cumulative_evictions=0}
Mar 6, 2007 4:51:52 PM org.apache.solr.core.QuerySenderListener  
newSearcher
INFO: QuerySenderListener sending requests to Searcher@4cd580 main
Mar 6, 2007 4:51:52 PM org.apache.solr.core.SolrCore execute
INFO: rows=10&start=0&q=allMessageContent:trying 0 410
Mar 6, 2007 4:51:52 PM org.apache.solr.core.QuerySenderListener  
newSearcher



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace
Managed Hosting. Any dissemination, distribution or copying of the enclosed
material is prohibited. If you receive this transmission in error, please
notify us immediately by e-mail at [hidden email], and delete the
original message. Your cooperation is appreciated.

Reply | Threaded
Open this post in threaded view
|

Re: improve performance after commit

Yonik Seeley-2
On 3/6/07, Kaan Erdener <[hidden email]> wrote:
> From what I can see in the logs, these are both invoked after the
> commit. However, the query times after a commit are still slow
> (around 20 seconds).

Your warming script didn't do any sorts.
Why don't you also show the part of the log with the slow query...
that would make it much easier for people to help.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: improve performance after commit

Kaan Erdener

On Mar 6, 2007, at 9:50 PM, Yonik Seeley wrote:

> On 3/6/07, Kaan Erdener <[hidden email]> wrote:
>> From what I can see in the logs, these are both invoked after the
>> commit. However, the query times after a commit are still slow
>> (around 20 seconds).
>
> Your warming script didn't do any sorts.
> Why don't you also show the part of the log with the slow query...
> that would make it much easier for people to help.
>
> -Yonik

Right, I had some initially, but Solr threw exceptions. I put them  
back in just now. Here's an example trying to warm using a sort on  
field name "subject". I tried query of  
"allMessageContent:trying;subject+asc" as well as  
"allMessageContent:trying;subject" (without "+asc") - either way  
throws an exception. Both are shown below. The generated exception  
isn't clear (not to me, anyway), and I didn't find any examples of  
this elsewhere for reference. What's the correct way to specify a  
sort and direction when setting up a listener?

thanks,
Kaan

<lst> <str name="q">allMessageContent:test;subject+asc</str> <str  
name="start">0</str> <str name="rows">50</str> </lst>

Mar 6, 2007 11:47:27 PM org.apache.solr.core.SolrCore execute
INFO: rows=50&start=0&q=allMessageContent:test;subject%2Basc 0 4
Mar 6, 2007 11:47:27 PM org.apache.solr.core.QuerySenderListener  
newSearcher
INFO: QuerySenderListener done.
Mar 6, 2007 11:47:27 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 1
         at org.apache.solr.search.QueryParsing.parseSort
(QueryParsing.java:189)
         at  
org.apache.solr.request.StandardRequestHandler.handleRequest
(StandardRequestHandler.java:115)
         at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
         at org.apache.solr.core.QuerySenderListener.newSearcher
(QuerySenderListener.java:51)
         at org.apache.solr.core.SolrCore$3.call(SolrCore.java:451)
         at java.util.concurrent.FutureTask$Sync.innerRun
(FutureTask.java:269)
         at java.util.concurrent.FutureTask.run(FutureTask.java:123)
         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:650)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:675)
         at java.lang.Thread.run(Thread.java:595)

<lst> <str name="q">allMessageContent:test;subject+asc</str> <str  
name="start">0</str> <str name="rows">50</str> </lst>

Mar 6, 2007 11:51:57 PM org.apache.solr.core.SolrCore execute
INFO: rows=50&start=0&q=allMessageContent:test;subject 0 2
Mar 6, 2007 11:51:57 PM org.apache.solr.core.QuerySenderListener  
newSearcher
INFO: QuerySenderListener done.
Mar 6, 2007 11:51:57 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 1
         at org.apache.solr.search.QueryParsing.parseSort
(QueryParsing.java:189)
         at  
org.apache.solr.request.StandardRequestHandler.handleRequest
(StandardRequestHandler.java:115)
         at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
         at org.apache.solr.core.QuerySenderListener.newSearcher
(QuerySenderListener.java:51)
         at org.apache.solr.core.SolrCore$3.call(SolrCore.java:451)
         at java.util.concurrent.FutureTask$Sync.innerRun
(FutureTask.java:269)
         at java.util.concurrent.FutureTask.run(FutureTask.java:123)
         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:650)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:675)
         at java.lang.Thread.run(Thread.java:595)



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace
Managed Hosting. Any dissemination, distribution or copying of the enclosed
material is prohibited. If you receive this transmission in error, please
notify us immediately by e-mail at [hidden email], and delete the
original message. Your cooperation is appreciated.

Reply | Threaded
Open this post in threaded view
|

Re: improve performance after commit

Ryan McKinley
>
> <str name="q">allMessageContent:test;subject+asc</str>
>

there should be a space between subject and asc,

try: http://host/select?q=allMessageContent:test;subject%20asc

+ is supposed to become a space, but it looks like it is staying "+"
Reply | Threaded
Open this post in threaded view
|

Re: improve performance after commit

Chris Hostetter-3
In reply to this post by Kaan Erdener
: back in just now. Here's an example trying to warm using a sort on
: field name "subject". I tried query of
: "allMessageContent:trying;subject+asc" as well as
: "allMessageContent:trying;subject" (without "+asc") - either way

when expressing params in XML (either as init params for a request
handler, or in a QuerySenderListener the params don't need to be URL
escaped ... they just need to be XML escaped, try something like...


     <listener event="newSearcher" class="solr.QuerySenderListener">
       <arr name="queries">
         <lst>
             <str name="q">allMessageContent:test; subject asc</str>
             <str name="start">0</str>
             <str name="rows">10</str>
         </lst>
       </arr>
     </listener>

-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: improve performance after commit

Kaan Erdener
On Mar 7, 2007, at 11:34 AM, Chris Hostetter wrote:

> : back in just now. Here's an example trying to warm using a sort on
> : field name "subject". I tried query of
> : "allMessageContent:trying;subject+asc" as well as
> : "allMessageContent:trying;subject" (without "+asc") - either way
>
> when expressing params in XML (either as init params for a request
> handler, or in a QuerySenderListener the params don't need to be URL
> escaped ... they just need to be XML escaped, try something like...
>
>
>      <listener event="newSearcher" class="solr.QuerySenderListener">
>        <arr name="queries">
>          <lst>
>              <str name="q">allMessageContent:test; subject asc</str>
>              <str name="start">0</str>
>              <str name="rows">10</str>
>          </lst>
>        </arr>
>      </listener>
>
> -Hoss

Thanks to you and Ryan for that suggestion, that was indeed the  
problem. Using a warming query of "allMessageContent:trying;subject  
asc" (without my hand-escaped whitespace) worked great.

In the end, this is what I've got in my solrconfig.xml, and the  
overall query performance is now consistently fast, even after  
post'ing a commit message.

<listener event="newSearcher" class="solr.QuerySenderListener">
   <arr name="queries">
     <lst> <str name="q">text:trying;date asc</str> <str  
name="start">0</str> <str name="rows">50</str> </lst>
     <lst> <str name="q">text:trying;refId asc</str> <str  
name="start">0</str> <str name="rows">50</str> </lst>
     <lst> <str name="q">text:trying;subject asc</str> <str  
name="start">0</str> <str name="rows">50</str> </lst>
     <lst> <str name="q">text:trying;name asc</str> <str  
name="start">0</str> <str name="rows">50</str> </lst>
   </arr>
</listener>

<listener event="firstSearcher" class="solr.QuerySenderListener">
   <arr name="queries">
     <lst> <str name="q">text:trying;date asc</str> <str  
name="start">0</str> <str name="rows">50</str> </lst>
     <lst> <str name="q">text:trying;refId asc</str> <str  
name="start">0</str> <str name="rows">50</str> </lst>
     <lst> <str name="q">text:trying;subject asc</str> <str  
name="start">0</str> <str name="rows">50</str> </lst>
     <lst> <str name="q">text:trying;name asc</str> <str  
name="start">0</str> <str name="rows">50</str> </lst>
   </arr>
</listener>

Thanks again,
Kaan


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace
Managed Hosting. Any dissemination, distribution or copying of the enclosed
material is prohibited. If you receive this transmission in error, please
notify us immediately by e-mail at [hidden email], and delete the
original message. Your cooperation is appreciated.