Improvising solr queries

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Improvising solr queries

Dipti Khullar
Hi

We have tried out various configurations settings to improvise the
performance of the site which is majorly using Solr but still the response
time remains about 4-5 reqs/sec. We also did some performance tests on Solr
1.4 but still there is a very minute improvement in performance. Currently
we are using Solr 1.3.

So our last resort remains, improvising the queries. We are using SolrJ -
CommonsHttpSolrServer

We guys are trying to tune up Solr Queries being used in our project.
Following sample query takes about 6 secs to execute under normal traffic.
At peak hours this often increases to 10-15 secs.

sitename:XYZ OR sitename:"All Sites") AND (localeid:1237400589415) AND
((assettype:Gallery))  AND (rbcategory:"ABC XYZ" ) AND (startdate:[* TO
2009-12-07T23:59:00Z] AND enddate:[2009-12-07T00:00:00Z TO
*])&rows=9&start=63&sort=date
desc&facet=true&facet.field=assettype&facet.mincount=1

Similar to this query we have several much complex queries supporting all
major landing pages of our application.

Just want to confirm that whether anyone can identify any major flaws or
issues in the sample query?

Thanks
Dipti
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Shalin Shekhar Mangar
On Mon, Jan 4, 2010 at 6:39 PM, dipti khullar <[hidden email]>wrote:

> We have tried out various configurations settings to improvise the
> performance of the site which is majorly using Solr but still the response
> time remains about 4-5 reqs/sec. We also did some performance tests on Solr
> 1.4 but still there is a very minute improvement in performance. Currently
> we are using Solr 1.3.
>

That is too slow.

We need more information on your setup before we can help. What kind of
hardware you are using? Which OS/JVM? How much memory have you allocated to
the JVM?

What does your solrconfig look like? How many documents are there in your
index? What is the size of index on disk? What are the field types of the
fields you are searching on? Do you do highlighting on large fields? Can you
paste the cache section on the statistics page of your Solr dashboard
(preferably, just after a peak load)? How frequently is your index changed
(i.e. how frequently do you commit)?

I'd recommend an upgrade to Solr 1.4 anyway since it has major performance
improvements.


>
> So our last resort remains, improvising the queries. We are using SolrJ -
> CommonsHttpSolrServer
>
>
Actually that is one of the first things that you should look at.


> We guys are trying to tune up Solr Queries being used in our project.
> Following sample query takes about 6 secs to execute under normal traffic.
> At peak hours this often increases to 10-15 secs.
>
> sitename:XYZ OR sitename:"All Sites") AND (localeid:1237400589415) AND
> ((assettype:Gallery))  AND (rbcategory:"ABC XYZ" ) AND (startdate:[* TO
> 2009-12-07T23:59:00Z] AND enddate:[2009-12-07T00:00:00Z TO
> *])&rows=9&start=63&sort=date
> desc&facet=true&facet.field=assettype&facet.mincount=1
>
> Similar to this query we have several much complex queries supporting all
> major landing pages of our application.
>
> Just want to confirm that whether anyone can identify any major flaws or
> issues in the sample query?
>
>
Most of those AND conditions can be separate filter queries. Filter queries
can be cached separately and can therefore be re-used. See
http://wiki.apache.org/solr/FilterQueryGuidance

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Dipti Khullar
Thanks Shalin.

Following are the relevant details:

There are 2 search servers in a virtualized VMware environment. Each has  2 instances of Solr running on separates ports in tomcat.
Server 1: hosts 1 master(application 1), 1 slave (application 1)
Server 2: hosta 1 master (application 2), 1 slave (application 1)

Both servers have 4 CPUs and 4 GB RAM.

Master
- 4GB RAM
- 1GB JVM Heap memory is allocated to Solr
Slave1/Slave2:
- 4GB RAM
- 2GB JVM Heap memory is allocated to Solr

Solr Details:
apache-solr Version: 1.3.0
Lucene - 2.4-dev

- autocommit: 50 docs and 5 minutes
- optimize runs on master in every 7 minutes
- using postOptimize , we execute snapshooter on master
- snappuller/snapinstaller on 2 slaves runs after every 10 minutes

Master and Slave1 (solr1)are on single box and Slave2(solr2) on different box. We use HAProxy to load balance query requests between 
2 slaves. Master is only used for indexing.
 
Solrj client which is used to query slave solr,gets timedout and there is high CPU usage/load avg.T he problem is reported on slaves for application 1. The SolrJ client which queries Solr over HTTP times out (10 sec is the timeout value) though in the Solr tomcat access log we find all requests have 200 response.
During the tme, requests timeout the load avg. of the server goes extremely high (10-20).
The issue gets resolved as soon as we optimize the slave index. In the solr admin, it shows only 4 requests/sec is handled with 400 ms response time.

I am attaching solrconfig.xml for both master and slaves.

Thanks
Dipti


On Mon, Jan 4, 2010 at 7:16 PM, Shalin Shekhar Mangar <[hidden email]> wrote:
On Mon, Jan 4, 2010 at 6:39 PM, dipti khullar <[hidden email]>wrote:

> We have tried out various configurations settings to improvise the
> performance of the site which is majorly using Solr but still the response
> time remains about 4-5 reqs/sec. We also did some performance tests on Solr
> 1.4 but still there is a very minute improvement in performance. Currently
> we are using Solr 1.3.
>

That is too slow.

We need more information on your setup before we can help. What kind of
hardware you are using? Which OS/JVM? How much memory have you allocated to
the JVM?

What does your solrconfig look like? How many documents are there in your
index? What is the size of index on disk? What are the field types of the
fields you are searching on? Do you do highlighting on large fields? Can you
paste the cache section on the statistics page of your Solr dashboard
(preferably, just after a peak load)? How frequently is your index changed
(i.e. how frequently do you commit)?

I'd recommend an upgrade to Solr 1.4 anyway since it has major performance
improvements.


>
> So our last resort remains, improvising the queries. We are using SolrJ -
> CommonsHttpSolrServer
>
>
Actually that is one of the first things that you should look at.


> We guys are trying to tune up Solr Queries being used in our project.
> Following sample query takes about 6 secs to execute under normal traffic.
> At peak hours this often increases to 10-15 secs.
>
> sitename:XYZ OR sitename:"All Sites") AND (localeid:1237400589415) AND
> ((assettype:Gallery))  AND (rbcategory:"ABC XYZ" ) AND (startdate:[* TO
> 2009-12-07T23:59:00Z] AND enddate:[2009-12-07T00:00:00Z TO
> *])&rows=9&start=63&sort=date
> desc&facet=true&facet.field=assettype&facet.mincount=1
>
> Similar to this query we have several much complex queries supporting all
> major landing pages of our application.
>
> Just want to confirm that whether anyone can identify any major flaws or
> issues in the sample query?
>
>
Most of those AND conditions can be separate filter queries. Filter queries
can be cached separately and can therefore be re-used. See
http://wiki.apache.org/solr/FilterQueryGuidance

--
Regards,
Shalin Shekhar Mangar.


solrconfig_master.xml (40K) Download Attachment
solrconfig_slave1.xml (40K) Download Attachment
solrconfig_slave2.xml (40K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Shalin Shekhar Mangar
On Mon, Jan 4, 2010 at 7:25 PM, dipti khullar <[hidden email]>wrote:

> Thanks Shalin.
>
> Following are the relevant details:
>
> There are 2 search servers in a virtualized VMware environment. Each has  2
> instances of Solr running on separates ports in tomcat.
> Server 1: hosts 1 master(application 1), 1 slave (application 1)
> Server 2: hosta 1 master (application 2), 1 slave (application 1)
>
>
Have you tried a non-virtualized environment? Virtual instances are not that
great for high I/O throughput environments.


> Both servers have 4 CPUs and 4 GB RAM.
>
> Master
> - 4GB RAM
> - 1GB JVM Heap memory is allocated to Solr
> Slave1/Slave2:
> - 4GB RAM
> - 2GB JVM Heap memory is allocated to Solr
>
> Solr Details:
> apache-solr Version: 1.3.0
> Lucene - 2.4-dev
>
> - autocommit: 50 docs and 5 minutes
> - optimize runs on master in every 7 minutes
> - using postOptimize , we execute snapshooter on master
> - snappuller/snapinstaller on 2 slaves runs after every 10 minutes
>
>
You are committing every 5 minutes and optimizing every 7 minutes. Can you
try committing less often?


> Master and Slave1 (solr1)are on single box and Slave2(solr2) on different
> box. We use HAProxy to load balance query requests between
> 2 slaves. Master is only used for indexing.
>
> Solrj client which is used to query slave solr,gets timedout and there is
> high CPU usage/load avg.T he problem is reported on slaves for application
> 1. The SolrJ client which queries Solr over HTTP times out (10 sec is the
> timeout value) though in the Solr tomcat access log we find all requests
> have 200 response.
> During the tme, requests timeout the load avg. of the server goes extremely
> high (10-20).
> The issue gets resolved as soon as we optimize the slave index. In the solr
> admin, it shows only 4 requests/sec is handled with 400 ms response time.
>
> I am attaching solrconfig.xml for both master and slaves.
>
>
There is no autowarming on slaves which is probably OK if you are committing
so often. But do you really need to index new documents so often?

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Tom Hill-7
In reply to this post by Dipti Khullar
Hi -

Something doesn't make sense to me here:

On Mon, Jan 4, 2010 at 5:55 AM, dipti khullar <[hidden email]>wrote:

> - optimize runs on master in every 7 minutes
> - using postOptimize , we execute snapshooter on master
> - snappuller/snapinstaller on 2 slaves runs after every 10 minutes
>
>
Why would you optimize every 7 minutes, and update the slaves every ten?
After 70 minutes you'll be doing both at the same time.

How about optimizing every ten minutes, at :00,:10, :20, :30, :40, :50 and
then pulling every ten minutes at :01, :11, :21, :31, :41, :51 (assuming
your optimize completes in one minute).

Or did I misunderstand something?


> The issue gets resolved as soon as we optimize the slave index. In the solr
> admin, it shows only 4 requests/sec is handled with 400 ms response time.
>

From your earlier description, it seems like you should only be distributing
an optimized index, so optimizing the slave should be a no-op. Check to see
what files you have on the slave after snappulling.

Tom
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Ian Holsman (Lists)
In reply to this post by Shalin Shekhar Mangar
On 1/5/10 12:46 AM, Shalin Shekhar Mangar wrote:

>> sitename:XYZ OR sitename:"All Sites") AND (localeid:1237400589415) AND
>> >  ((assettype:Gallery))  AND (rbcategory:"ABC XYZ" ) AND (startdate:[* TO
>> >  2009-12-07T23:59:00Z] AND enddate:[2009-12-07T00:00:00Z TO
>> >  *])&rows=9&start=63&sort=date
>> >  desc&facet=true&facet.field=assettype&facet.mincount=1
>> >
>> >  Similar to this query we have several much complex queries supporting all
>> >  major landing pages of our application.
>> >
>> >  Just want to confirm that whether anyone can identify any major flaws or
>> >  issues in the sample query?
>> >
>> >
>>      
I'm not the expert Shalin is, but I seem to remember sorting by date was
pretty rough on CPU. (this could have been resolved since I last looked
at it)

the other thing I'd question is the facet. it looks like your only
retrieving a single assetType  (Gallery).
so you will only get a single field back. if thats the case, wouldn't
the rows returned (which is part of the response)
give you the same answer ?

> Most of those AND conditions can be separate filter queries. Filter queries
> can be cached separately and can therefore be re-used. See
> http://wiki.apache.org/solr/FilterQueryGuidance
>
>    

Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Dipti Khullar
Hey Ian

This assettype is variable. It can have around 6 values at a time.
But this is true that we apply facet mostly on just one field - assettype.

Any idea if the use of date range queries is expensive? Also if Shalin can
put in some comments on
"sorting by date was pretty rough on CPU", I can start analyzing sort by
date specific queries.

Will look into suggestions/queries by Tom and Shalin and then post the
findings.

Thanks
Dipti


On Tue, Jan 5, 2010 at 9:17 AM, Ian Holsman <[hidden email]> wrote:

> On 1/5/10 12:46 AM, Shalin Shekhar Mangar wrote:
>
>> sitename:XYZ OR sitename:"All Sites") AND (localeid:1237400589415) AND
>>> >  ((assettype:Gallery))  AND (rbcategory:"ABC XYZ" ) AND (startdate:[*
>>> TO
>>> >  2009-12-07T23:59:00Z] AND enddate:[2009-12-07T00:00:00Z TO
>>> >  *])&rows=9&start=63&sort=date
>>> >  desc&facet=true&facet.field=assettype&facet.mincount=1
>>> >
>>> >  Similar to this query we have several much complex queries supporting
>>> all
>>> >  major landing pages of our application.
>>> >
>>> >  Just want to confirm that whether anyone can identify any major flaws
>>> or
>>> >  issues in the sample query?
>>> >
>>> >
>>>
>>>
>> I'm not the expert Shalin is, but I seem to remember sorting by date was
> pretty rough on CPU. (this could have been resolved since I last looked at
> it)
>
> the other thing I'd question is the facet. it looks like your only
> retrieving a single assetType  (Gallery).
> so you will only get a single field back. if thats the case, wouldn't the
> rows returned (which is part of the response)
> give you the same answer ?
>
>
>  Most of those AND conditions can be separate filter queries. Filter
>> queries
>> can be cached separately and can therefore be re-used. See
>> http://wiki.apache.org/solr/FilterQueryGuidance
>>
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Shalin Shekhar Mangar
On Tue, Jan 5, 2010 at 11:16 AM, dipti khullar <[hidden email]>wrote:

>
> This assettype is variable. It can have around 6 values at a time.
> But this is true that we apply facet mostly on just one field - assettype.
>
>
Ian has a good point. You are faceting on assettype and you are also
filtering on it so you will get only one facet value "Gallery" with a count
equal to numFound.


> Any idea if the use of date range queries is expensive? Also if Shalin can
> put in some comments on
> "sorting by date was pretty rough on CPU", I can start analyzing sort by
> date specific queries.
>
>
This is a range search and not a sort. I don't know if range search on dates
is especially costly compared to a range search on any other type. But I do
know that trie fields in Solr 1.4 are much faster for range searches at the
cost of more tokens in the index.

With a date field, instead of using NOW, you should always try to round it
down to the coarsest interval you can use. So if it is possible to use
NOW/DAY instead of NOW, you should do that. The problem with querying on NOW
is that it is always unique and therefore the query can never be cached
(actually, it is cached but can never be hit). If you use NOW/DAY, the query
can be cached for a day.

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Dipti Khullar
Hi

Sorry for getting back late on the thread, but we are focusing on
configuration of master and slave for improving performance issues.

We have observed following trend on production slaves:
After every 10 minutes the response time increases considerably. In between
all the queries are served by cache.
It seems, after every 10th minute installation and then commit takes time
and hence results in slow response time.

Following are the logs taken for a complete cycle for master/slave sync up
process:

2010/01/21 14:28:02 started by solr
2010/01/21 14:28:02 command: /opt/solr/solr_master/solr/solr/bin/snapshooter
2010/01/21 14:28:02 taking snapshot
/opt/solr/solr_master/solr/data/snapshot.20100121142802
2010/01/21 14:28:02 ended (elapsed time: 0 sec)
2010/01/21 14:28:01 started by solr
2010/01/21 14:28:01 command: /opt/solr/solr_master/solr/solr/bin/optimize
2010/01/21 14:28:02 ended (elapsed time: 1 sec)
2010/01/21 14:30:02 started by solr
2010/01/21 14:30:02 command: /opt/solr/solr_slave/solr/solr/bin/snappuller
2010/01/21 14:30:06 pulling snapshot snapshot.20100121142802
2010/01/21 14:30:14 ended (elapsed time: 12 sec)
2010/01/21 14:30:14 started by solr
2010/01/21 14:30:14 command:
/opt/solr/solr_slave/solr/solr/bin/snapinstaller
2010/01/21 14:30:15 installing snapshot
/opt/solr/solr_slave/solr/data/snapshot.20100121142802
2010/01/21 14:30:16 notifing Solr to open a new Searcher
2010/01/21 14:30:17 ended (elapsed time: 3 sec)
2010/01/21 14:30:17 started by solr
2010/01/21 14:30:17 command: /opt/solr/solr_slave/solr/solr/bin/commit
2010/01/21 14:30:17 ended (elapsed time: 0 sec)

Response Time at 14:30:24 on:
Slave 1 - 243
Slave 2 - 111266

Are we missing on some configuration. Or perhaps the frequency of execution
of scripts needs to be changed?
Any pointers will be helpful !!

Thanks
Dipti


On Tue, Jan 5, 2010 at 1:16 PM, Shalin Shekhar Mangar <
[hidden email]> wrote:

> On Tue, Jan 5, 2010 at 11:16 AM, dipti khullar <[hidden email]
> >wrote:
>
> >
> > This assettype is variable. It can have around 6 values at a time.
> > But this is true that we apply facet mostly on just one field -
> assettype.
> >
> >
> Ian has a good point. You are faceting on assettype and you are also
> filtering on it so you will get only one facet value "Gallery" with a count
> equal to numFound.
>
>
> > Any idea if the use of date range queries is expensive? Also if Shalin
> can
> > put in some comments on
> > "sorting by date was pretty rough on CPU", I can start analyzing sort by
> > date specific queries.
> >
> >
> This is a range search and not a sort. I don't know if range search on
> dates
> is especially costly compared to a range search on any other type. But I do
> know that trie fields in Solr 1.4 are much faster for range searches at the
> cost of more tokens in the index.
>
> With a date field, instead of using NOW, you should always try to round it
> down to the coarsest interval you can use. So if it is possible to use
> NOW/DAY instead of NOW, you should do that. The problem with querying on
> NOW
> is that it is always unique and therefore the query can never be cached
> (actually, it is cached but can never be hit). If you use NOW/DAY, the
> query
> can be cached for a day.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Erick Erickson
What this looks like (and I've only glanced) is that your
index updates are causing a new searcher to
be opened, and the first few queries after
the reopen will be slow.

Have you tried warmup queries after the reopen?

FWIW
Erick

On Thu, Jan 21, 2010 at 11:48 AM, dipti khullar <[hidden email]>wrote:

> Hi
>
> Sorry for getting back late on the thread, but we are focusing on
> configuration of master and slave for improving performance issues.
>
> We have observed following trend on production slaves:
> After every 10 minutes the response time increases considerably. In between
> all the queries are served by cache.
> It seems, after every 10th minute installation and then commit takes time
> and hence results in slow response time.
>
> Following are the logs taken for a complete cycle for master/slave sync up
> process:
>
> 2010/01/21 14:28:02 started by solr
> 2010/01/21 14:28:02 command:
> /opt/solr/solr_master/solr/solr/bin/snapshooter
> 2010/01/21 14:28:02 taking snapshot
> /opt/solr/solr_master/solr/data/snapshot.20100121142802
> 2010/01/21 14:28:02 ended (elapsed time: 0 sec)
> 2010/01/21 14:28:01 started by solr
> 2010/01/21 14:28:01 command: /opt/solr/solr_master/solr/solr/bin/optimize
> 2010/01/21 14:28:02 ended (elapsed time: 1 sec)
> 2010/01/21 14:30:02 started by solr
> 2010/01/21 14:30:02 command: /opt/solr/solr_slave/solr/solr/bin/snappuller
> 2010/01/21 14:30:06 pulling snapshot snapshot.20100121142802
> 2010/01/21 14:30:14 ended (elapsed time: 12 sec)
> 2010/01/21 14:30:14 started by solr
> 2010/01/21 14:30:14 command:
> /opt/solr/solr_slave/solr/solr/bin/snapinstaller
> 2010/01/21 14:30:15 installing snapshot
> /opt/solr/solr_slave/solr/data/snapshot.20100121142802
> 2010/01/21 14:30:16 notifing Solr to open a new Searcher
> 2010/01/21 14:30:17 ended (elapsed time: 3 sec)
> 2010/01/21 14:30:17 started by solr
> 2010/01/21 14:30:17 command: /opt/solr/solr_slave/solr/solr/bin/commit
> 2010/01/21 14:30:17 ended (elapsed time: 0 sec)
>
> Response Time at 14:30:24 on:
> Slave 1 - 243
> Slave 2 - 111266
>
> Are we missing on some configuration. Or perhaps the frequency of execution
> of scripts needs to be changed?
> Any pointers will be helpful !!
>
> Thanks
> Dipti
>
>
> On Tue, Jan 5, 2010 at 1:16 PM, Shalin Shekhar Mangar <
> [hidden email]> wrote:
>
> > On Tue, Jan 5, 2010 at 11:16 AM, dipti khullar <[hidden email]
> > >wrote:
> >
> > >
> > > This assettype is variable. It can have around 6 values at a time.
> > > But this is true that we apply facet mostly on just one field -
> > assettype.
> > >
> > >
> > Ian has a good point. You are faceting on assettype and you are also
> > filtering on it so you will get only one facet value "Gallery" with a
> count
> > equal to numFound.
> >
> >
> > > Any idea if the use of date range queries is expensive? Also if Shalin
> > can
> > > put in some comments on
> > > "sorting by date was pretty rough on CPU", I can start analyzing sort
> by
> > > date specific queries.
> > >
> > >
> > This is a range search and not a sort. I don't know if range search on
> > dates
> > is especially costly compared to a range search on any other type. But I
> do
> > know that trie fields in Solr 1.4 are much faster for range searches at
> the
> > cost of more tokens in the index.
> >
> > With a date field, instead of using NOW, you should always try to round
> it
> > down to the coarsest interval you can use. So if it is possible to use
> > NOW/DAY instead of NOW, you should do that. The problem with querying on
> > NOW
> > is that it is always unique and therefore the query can never be cached
> > (actually, it is cached but can never be hit). If you use NOW/DAY, the
> > query
> > can be cached for a day.
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Otis Gospodnetic-2
In reply to this post by Dipti Khullar
Dipti,

If I'm reading that correctly, you are optimizing the index on the master before replicating it?
There is no need to do that if you are constantly updating your index and replicating it every 10 minutes.
Don't optimize, and you'll replicate smaller portion of an index, and thus you won't bust the OS cache on the slave as much.
The upgrade to Solr 1.4 and you'll see further benefits from faster searcher warmup times.

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----

> From: dipti khullar <[hidden email]>
> To: [hidden email]
> Sent: Thu, January 21, 2010 11:48:20 AM
> Subject: Re: Improvising solr queries
>
> Hi
>
> Sorry for getting back late on the thread, but we are focusing on
> configuration of master and slave for improving performance issues.
>
> We have observed following trend on production slaves:
> After every 10 minutes the response time increases considerably. In between
> all the queries are served by cache.
> It seems, after every 10th minute installation and then commit takes time
> and hence results in slow response time.
>
> Following are the logs taken for a complete cycle for master/slave sync up
> process:
>
> 2010/01/21 14:28:02 started by solr
> 2010/01/21 14:28:02 command: /opt/solr/solr_master/solr/solr/bin/snapshooter
> 2010/01/21 14:28:02 taking snapshot
> /opt/solr/solr_master/solr/data/snapshot.20100121142802
> 2010/01/21 14:28:02 ended (elapsed time: 0 sec)
> 2010/01/21 14:28:01 started by solr
> 2010/01/21 14:28:01 command: /opt/solr/solr_master/solr/solr/bin/optimize
> 2010/01/21 14:28:02 ended (elapsed time: 1 sec)
> 2010/01/21 14:30:02 started by solr
> 2010/01/21 14:30:02 command: /opt/solr/solr_slave/solr/solr/bin/snappuller
> 2010/01/21 14:30:06 pulling snapshot snapshot.20100121142802
> 2010/01/21 14:30:14 ended (elapsed time: 12 sec)
> 2010/01/21 14:30:14 started by solr
> 2010/01/21 14:30:14 command:
> /opt/solr/solr_slave/solr/solr/bin/snapinstaller
> 2010/01/21 14:30:15 installing snapshot
> /opt/solr/solr_slave/solr/data/snapshot.20100121142802
> 2010/01/21 14:30:16 notifing Solr to open a new Searcher
> 2010/01/21 14:30:17 ended (elapsed time: 3 sec)
> 2010/01/21 14:30:17 started by solr
> 2010/01/21 14:30:17 command: /opt/solr/solr_slave/solr/solr/bin/commit
> 2010/01/21 14:30:17 ended (elapsed time: 0 sec)
>
> Response Time at 14:30:24 on:
> Slave 1 - 243
> Slave 2 - 111266
>
> Are we missing on some configuration. Or perhaps the frequency of execution
> of scripts needs to be changed?
> Any pointers will be helpful !!
>
> Thanks
> Dipti
>
>
> On Tue, Jan 5, 2010 at 1:16 PM, Shalin Shekhar Mangar <
> [hidden email]> wrote:
>
> > On Tue, Jan 5, 2010 at 11:16 AM, dipti khullar
> > >wrote:
> >
> > >
> > > This assettype is variable. It can have around 6 values at a time.
> > > But this is true that we apply facet mostly on just one field -
> > assettype.
> > >
> > >
> > Ian has a good point. You are faceting on assettype and you are also
> > filtering on it so you will get only one facet value "Gallery" with a count
> > equal to numFound.
> >
> >
> > > Any idea if the use of date range queries is expensive? Also if Shalin
> > can
> > > put in some comments on
> > > "sorting by date was pretty rough on CPU", I can start analyzing sort by
> > > date specific queries.
> > >
> > >
> > This is a range search and not a sort. I don't know if range search on
> > dates
> > is especially costly compared to a range search on any other type. But I do
> > know that trie fields in Solr 1.4 are much faster for range searches at the
> > cost of more tokens in the index.
> >
> > With a date field, instead of using NOW, you should always try to round it
> > down to the coarsest interval you can use. So if it is possible to use
> > NOW/DAY instead of NOW, you should do that. The problem with querying on
> > NOW
> > is that it is always unique and therefore the query can never be cached
> > (actually, it is cached but can never be hit). If you use NOW/DAY, the
> > query
> > can be cached for a day.
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >

Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Dipti Khullar
Hi

Eric, thanks for your reply.
I am not sure what exactly you mean by warmup queries. But if its related to
the settings we are using in solrconfig.xml, following are the
configurations for query caching:

<queryResultCache class="solr.LRUCache" size="512" initialSize="512"
autowarmCount="0"/>

Also, as we are using snapinstall script on slaves, which eventually calls
commit script. I was just wondering that whether, we need to change the
simple commit command to

<commit waitFlush="false" waitSearcher="false"/>

Otis, we executed a performance test on our local environments for Solr 1.4
but there were not considerable performance improvement. Hence, we have as
of now dropped the idea of upgrading to Solr 1.4.
Regarding optimization, we initially were not using optimize at all, but
then at peak hours load on slaves increased considerably. Hence, we
configured the optimize script to get the system running.
But we can try this on local environment and then analyze the results.

Thanks
Dipti


On Fri, Jan 22, 2010 at 10:36 AM, Otis Gospodnetic <
[hidden email]> wrote:

> Dipti,
>
> If I'm reading that correctly, you are optimizing the index on the master
> before replicating it?
> There is no need to do that if you are constantly updating your index and
> replicating it every 10 minutes.
> Don't optimize, and you'll replicate smaller portion of an index, and thus
> you won't bust the OS cache on the slave as much.
> The upgrade to Solr 1.4 and you'll see further benefits from faster
> searcher warmup times.
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> ----- Original Message ----
> > From: dipti khullar <[hidden email]>
> > To: [hidden email]
> > Sent: Thu, January 21, 2010 11:48:20 AM
> > Subject: Re: Improvising solr queries
> >
> > Hi
> >
> > Sorry for getting back late on the thread, but we are focusing on
> > configuration of master and slave for improving performance issues.
> >
> > We have observed following trend on production slaves:
> > After every 10 minutes the response time increases considerably. In
> between
> > all the queries are served by cache.
> > It seems, after every 10th minute installation and then commit takes time
> > and hence results in slow response time.
> >
> > Following are the logs taken for a complete cycle for master/slave sync
> up
> > process:
> >
> > 2010/01/21 14:28:02 started by solr
> > 2010/01/21 14:28:02 command:
> /opt/solr/solr_master/solr/solr/bin/snapshooter
> > 2010/01/21 14:28:02 taking snapshot
> > /opt/solr/solr_master/solr/data/snapshot.20100121142802
> > 2010/01/21 14:28:02 ended (elapsed time: 0 sec)
> > 2010/01/21 14:28:01 started by solr
> > 2010/01/21 14:28:01 command: /opt/solr/solr_master/solr/solr/bin/optimize
> > 2010/01/21 14:28:02 ended (elapsed time: 1 sec)
> > 2010/01/21 14:30:02 started by solr
> > 2010/01/21 14:30:02 command:
> /opt/solr/solr_slave/solr/solr/bin/snappuller
> > 2010/01/21 14:30:06 pulling snapshot snapshot.20100121142802
> > 2010/01/21 14:30:14 ended (elapsed time: 12 sec)
> > 2010/01/21 14:30:14 started by solr
> > 2010/01/21 14:30:14 command:
> > /opt/solr/solr_slave/solr/solr/bin/snapinstaller
> > 2010/01/21 14:30:15 installing snapshot
> > /opt/solr/solr_slave/solr/data/snapshot.20100121142802
> > 2010/01/21 14:30:16 notifing Solr to open a new Searcher
> > 2010/01/21 14:30:17 ended (elapsed time: 3 sec)
> > 2010/01/21 14:30:17 started by solr
> > 2010/01/21 14:30:17 command: /opt/solr/solr_slave/solr/solr/bin/commit
> > 2010/01/21 14:30:17 ended (elapsed time: 0 sec)
> >
> > Response Time at 14:30:24 on:
> > Slave 1 - 243
> > Slave 2 - 111266
> >
> > Are we missing on some configuration. Or perhaps the frequency of
> execution
> > of scripts needs to be changed?
> > Any pointers will be helpful !!
> >
> > Thanks
> > Dipti
> >
> >
> > On Tue, Jan 5, 2010 at 1:16 PM, Shalin Shekhar Mangar <
> > [hidden email]> wrote:
> >
> > > On Tue, Jan 5, 2010 at 11:16 AM, dipti khullar
> > > >wrote:
> > >
> > > >
> > > > This assettype is variable. It can have around 6 values at a time.
> > > > But this is true that we apply facet mostly on just one field -
> > > assettype.
> > > >
> > > >
> > > Ian has a good point. You are faceting on assettype and you are also
> > > filtering on it so you will get only one facet value "Gallery" with a
> count
> > > equal to numFound.
> > >
> > >
> > > > Any idea if the use of date range queries is expensive? Also if
> Shalin
> > > can
> > > > put in some comments on
> > > > "sorting by date was pretty rough on CPU", I can start analyzing sort
> by
> > > > date specific queries.
> > > >
> > > >
> > > This is a range search and not a sort. I don't know if range search on
> > > dates
> > > is especially costly compared to a range search on any other type. But
> I do
> > > know that trie fields in Solr 1.4 are much faster for range searches at
> the
> > > cost of more tokens in the index.
> > >
> > > With a date field, instead of using NOW, you should always try to round
> it
> > > down to the coarsest interval you can use. So if it is possible to use
> > > NOW/DAY instead of NOW, you should do that. The problem with querying
> on
> > > NOW
> > > is that it is always unique and therefore the query can never be cached
> > > (actually, it is cached but can never be hit). If you use NOW/DAY, the
> > > query
> > > can be cached for a day.
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Erick Erickson
Take a look at the Wiki, here's a bit to start...

http://lucene.apache.org/solr/features.html

<http://lucene.apache.org/solr/features.html>The short form is that when an
index is first opened,
there are various caches that are initialized. The
first few queries that run against a new searcher
are slowed down by filling up these caches. Warmup
queries can be fired that'll pre-populate these caches
in the background. You have to configure this, and
only *after* the warmup queries have run does
SOLR switch over to the newly-opened searchers.

I suspect that what you're seeing is that the first few
queries after you update your index are paying this
penalty....

HTH
Erick

On Fri, Jan 22, 2010 at 12:30 AM, dipti khullar <[hidden email]>wrote:

> Hi
>
> Eric, thanks for your reply.
> I am not sure what exactly you mean by warmup queries. But if its related
> to
> the settings we are using in solrconfig.xml, following are the
> configurations for query caching:
>
> <queryResultCache class="solr.LRUCache" size="512" initialSize="512"
> autowarmCount="0"/>
>
> Also, as we are using snapinstall script on slaves, which eventually calls
> commit script. I was just wondering that whether, we need to change the
> simple commit command to
>
> <commit waitFlush="false" waitSearcher="false"/>
>
> Otis, we executed a performance test on our local environments for Solr 1.4
> but there were not considerable performance improvement. Hence, we have as
> of now dropped the idea of upgrading to Solr 1.4.
> Regarding optimization, we initially were not using optimize at all, but
> then at peak hours load on slaves increased considerably. Hence, we
> configured the optimize script to get the system running.
> But we can try this on local environment and then analyze the results.
>
> Thanks
> Dipti
>
>
> On Fri, Jan 22, 2010 at 10:36 AM, Otis Gospodnetic <
> [hidden email]> wrote:
>
> > Dipti,
> >
> > If I'm reading that correctly, you are optimizing the index on the master
> > before replicating it?
> > There is no need to do that if you are constantly updating your index and
> > replicating it every 10 minutes.
> > Don't optimize, and you'll replicate smaller portion of an index, and
> thus
> > you won't bust the OS cache on the slave as much.
> > The upgrade to Solr 1.4 and you'll see further benefits from faster
> > searcher warmup times.
> >
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > ----- Original Message ----
> > > From: dipti khullar <[hidden email]>
> > > To: [hidden email]
> > > Sent: Thu, January 21, 2010 11:48:20 AM
> > > Subject: Re: Improvising solr queries
> > >
> > > Hi
> > >
> > > Sorry for getting back late on the thread, but we are focusing on
> > > configuration of master and slave for improving performance issues.
> > >
> > > We have observed following trend on production slaves:
> > > After every 10 minutes the response time increases considerably. In
> > between
> > > all the queries are served by cache.
> > > It seems, after every 10th minute installation and then commit takes
> time
> > > and hence results in slow response time.
> > >
> > > Following are the logs taken for a complete cycle for master/slave sync
> > up
> > > process:
> > >
> > > 2010/01/21 14:28:02 started by solr
> > > 2010/01/21 14:28:02 command:
> > /opt/solr/solr_master/solr/solr/bin/snapshooter
> > > 2010/01/21 14:28:02 taking snapshot
> > > /opt/solr/solr_master/solr/data/snapshot.20100121142802
> > > 2010/01/21 14:28:02 ended (elapsed time: 0 sec)
> > > 2010/01/21 14:28:01 started by solr
> > > 2010/01/21 14:28:01 command:
> /opt/solr/solr_master/solr/solr/bin/optimize
> > > 2010/01/21 14:28:02 ended (elapsed time: 1 sec)
> > > 2010/01/21 14:30:02 started by solr
> > > 2010/01/21 14:30:02 command:
> > /opt/solr/solr_slave/solr/solr/bin/snappuller
> > > 2010/01/21 14:30:06 pulling snapshot snapshot.20100121142802
> > > 2010/01/21 14:30:14 ended (elapsed time: 12 sec)
> > > 2010/01/21 14:30:14 started by solr
> > > 2010/01/21 14:30:14 command:
> > > /opt/solr/solr_slave/solr/solr/bin/snapinstaller
> > > 2010/01/21 14:30:15 installing snapshot
> > > /opt/solr/solr_slave/solr/data/snapshot.20100121142802
> > > 2010/01/21 14:30:16 notifing Solr to open a new Searcher
> > > 2010/01/21 14:30:17 ended (elapsed time: 3 sec)
> > > 2010/01/21 14:30:17 started by solr
> > > 2010/01/21 14:30:17 command: /opt/solr/solr_slave/solr/solr/bin/commit
> > > 2010/01/21 14:30:17 ended (elapsed time: 0 sec)
> > >
> > > Response Time at 14:30:24 on:
> > > Slave 1 - 243
> > > Slave 2 - 111266
> > >
> > > Are we missing on some configuration. Or perhaps the frequency of
> > execution
> > > of scripts needs to be changed?
> > > Any pointers will be helpful !!
> > >
> > > Thanks
> > > Dipti
> > >
> > >
> > > On Tue, Jan 5, 2010 at 1:16 PM, Shalin Shekhar Mangar <
> > > [hidden email]> wrote:
> > >
> > > > On Tue, Jan 5, 2010 at 11:16 AM, dipti khullar
> > > > >wrote:
> > > >
> > > > >
> > > > > This assettype is variable. It can have around 6 values at a time.
> > > > > But this is true that we apply facet mostly on just one field -
> > > > assettype.
> > > > >
> > > > >
> > > > Ian has a good point. You are faceting on assettype and you are also
> > > > filtering on it so you will get only one facet value "Gallery" with a
> > count
> > > > equal to numFound.
> > > >
> > > >
> > > > > Any idea if the use of date range queries is expensive? Also if
> > Shalin
> > > > can
> > > > > put in some comments on
> > > > > "sorting by date was pretty rough on CPU", I can start analyzing
> sort
> > by
> > > > > date specific queries.
> > > > >
> > > > >
> > > > This is a range search and not a sort. I don't know if range search
> on
> > > > dates
> > > > is especially costly compared to a range search on any other type.
> But
> > I do
> > > > know that trie fields in Solr 1.4 are much faster for range searches
> at
> > the
> > > > cost of more tokens in the index.
> > > >
> > > > With a date field, instead of using NOW, you should always try to
> round
> > it
> > > > down to the coarsest interval you can use. So if it is possible to
> use
> > > > NOW/DAY instead of NOW, you should do that. The problem with querying
> > on
> > > > NOW
> > > > is that it is always unique and therefore the query can never be
> cached
> > > > (actually, it is cached but can never be hit). If you use NOW/DAY,
> the
> > > > query
> > > > can be cached for a day.
> > > >
> > > > --
> > > > Regards,
> > > > Shalin Shekhar Mangar.
> > > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Dipti Khullar
Thanks Eric

Correctly said!!
Initially we used to have a different settings for queryResultCache which
used to serve the purpose of serving queries from the cache.

<queryResultCache class="solr.LRUCache" size="512" initialSize="512"
autowarmCount="256"/>

But we changed the settings some days back to see if there were any
issues/improvements.
I believe we need to switch back to some similar settings after some of
analysis.

Also, removing <optimize> showed good results on local environment, I think
we will deploy the same on production.

Thanks guys for your help. Will keep posting further queries and findings on
the issue.

Dipti

On Fri, Jan 22, 2010 at 9:05 PM, Erick Erickson <[hidden email]>wrote:

> Take a look at the Wiki, here's a bit to start...
>
> http://lucene.apache.org/solr/features.html
>
> <http://lucene.apache.org/solr/features.html>The short form is that when
> an
> index is first opened,
> there are various caches that are initialized. The
> first few queries that run against a new searcher
> are slowed down by filling up these caches. Warmup
> queries can be fired that'll pre-populate these caches
> in the background. You have to configure this, and
> only *after* the warmup queries have run does
> SOLR switch over to the newly-opened searchers.
>
> I suspect that what you're seeing is that the first few
> queries after you update your index are paying this
> penalty....
>
> HTH
> Erick
>
> On Fri, Jan 22, 2010 at 12:30 AM, dipti khullar <[hidden email]
> >wrote:
>
> > Hi
> >
> > Eric, thanks for your reply.
> > I am not sure what exactly you mean by warmup queries. But if its related
> > to
> > the settings we are using in solrconfig.xml, following are the
> > configurations for query caching:
> >
> > <queryResultCache class="solr.LRUCache" size="512" initialSize="512"
> > autowarmCount="0"/>
> >
> > Also, as we are using snapinstall script on slaves, which eventually
> calls
> > commit script. I was just wondering that whether, we need to change the
> > simple commit command to
> >
> > <commit waitFlush="false" waitSearcher="false"/>
> >
> > Otis, we executed a performance test on our local environments for Solr
> 1.4
> > but there were not considerable performance improvement. Hence, we have
> as
> > of now dropped the idea of upgrading to Solr 1.4.
> > Regarding optimization, we initially were not using optimize at all, but
> > then at peak hours load on slaves increased considerably. Hence, we
> > configured the optimize script to get the system running.
> > But we can try this on local environment and then analyze the results.
> >
> > Thanks
> > Dipti
> >
> >
> > On Fri, Jan 22, 2010 at 10:36 AM, Otis Gospodnetic <
> > [hidden email]> wrote:
> >
> > > Dipti,
> > >
> > > If I'm reading that correctly, you are optimizing the index on the
> master
> > > before replicating it?
> > > There is no need to do that if you are constantly updating your index
> and
> > > replicating it every 10 minutes.
> > > Don't optimize, and you'll replicate smaller portion of an index, and
> > thus
> > > you won't bust the OS cache on the slave as much.
> > > The upgrade to Solr 1.4 and you'll see further benefits from faster
> > > searcher warmup times.
> > >
> > >  Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> > >
> > >
> > >
> > > ----- Original Message ----
> > > > From: dipti khullar <[hidden email]>
> > > > To: [hidden email]
> > > > Sent: Thu, January 21, 2010 11:48:20 AM
> > > > Subject: Re: Improvising solr queries
> > > >
> > > > Hi
> > > >
> > > > Sorry for getting back late on the thread, but we are focusing on
> > > > configuration of master and slave for improving performance issues.
> > > >
> > > > We have observed following trend on production slaves:
> > > > After every 10 minutes the response time increases considerably. In
> > > between
> > > > all the queries are served by cache.
> > > > It seems, after every 10th minute installation and then commit takes
> > time
> > > > and hence results in slow response time.
> > > >
> > > > Following are the logs taken for a complete cycle for master/slave
> sync
> > > up
> > > > process:
> > > >
> > > > 2010/01/21 14:28:02 started by solr
> > > > 2010/01/21 14:28:02 command:
> > > /opt/solr/solr_master/solr/solr/bin/snapshooter
> > > > 2010/01/21 14:28:02 taking snapshot
> > > > /opt/solr/solr_master/solr/data/snapshot.20100121142802
> > > > 2010/01/21 14:28:02 ended (elapsed time: 0 sec)
> > > > 2010/01/21 14:28:01 started by solr
> > > > 2010/01/21 14:28:01 command:
> > /opt/solr/solr_master/solr/solr/bin/optimize
> > > > 2010/01/21 14:28:02 ended (elapsed time: 1 sec)
> > > > 2010/01/21 14:30:02 started by solr
> > > > 2010/01/21 14:30:02 command:
> > > /opt/solr/solr_slave/solr/solr/bin/snappuller
> > > > 2010/01/21 14:30:06 pulling snapshot snapshot.20100121142802
> > > > 2010/01/21 14:30:14 ended (elapsed time: 12 sec)
> > > > 2010/01/21 14:30:14 started by solr
> > > > 2010/01/21 14:30:14 command:
> > > > /opt/solr/solr_slave/solr/solr/bin/snapinstaller
> > > > 2010/01/21 14:30:15 installing snapshot
> > > > /opt/solr/solr_slave/solr/data/snapshot.20100121142802
> > > > 2010/01/21 14:30:16 notifing Solr to open a new Searcher
> > > > 2010/01/21 14:30:17 ended (elapsed time: 3 sec)
> > > > 2010/01/21 14:30:17 started by solr
> > > > 2010/01/21 14:30:17 command:
> /opt/solr/solr_slave/solr/solr/bin/commit
> > > > 2010/01/21 14:30:17 ended (elapsed time: 0 sec)
> > > >
> > > > Response Time at 14:30:24 on:
> > > > Slave 1 - 243
> > > > Slave 2 - 111266
> > > >
> > > > Are we missing on some configuration. Or perhaps the frequency of
> > > execution
> > > > of scripts needs to be changed?
> > > > Any pointers will be helpful !!
> > > >
> > > > Thanks
> > > > Dipti
> > > >
> > > >
> > > > On Tue, Jan 5, 2010 at 1:16 PM, Shalin Shekhar Mangar <
> > > > [hidden email]> wrote:
> > > >
> > > > > On Tue, Jan 5, 2010 at 11:16 AM, dipti khullar
> > > > > >wrote:
> > > > >
> > > > > >
> > > > > > This assettype is variable. It can have around 6 values at a
> time.
> > > > > > But this is true that we apply facet mostly on just one field -
> > > > > assettype.
> > > > > >
> > > > > >
> > > > > Ian has a good point. You are faceting on assettype and you are
> also
> > > > > filtering on it so you will get only one facet value "Gallery" with
> a
> > > count
> > > > > equal to numFound.
> > > > >
> > > > >
> > > > > > Any idea if the use of date range queries is expensive? Also if
> > > Shalin
> > > > > can
> > > > > > put in some comments on
> > > > > > "sorting by date was pretty rough on CPU", I can start analyzing
> > sort
> > > by
> > > > > > date specific queries.
> > > > > >
> > > > > >
> > > > > This is a range search and not a sort. I don't know if range search
> > on
> > > > > dates
> > > > > is especially costly compared to a range search on any other type.
> > But
> > > I do
> > > > > know that trie fields in Solr 1.4 are much faster for range
> searches
> > at
> > > the
> > > > > cost of more tokens in the index.
> > > > >
> > > > > With a date field, instead of using NOW, you should always try to
> > round
> > > it
> > > > > down to the coarsest interval you can use. So if it is possible to
> > use
> > > > > NOW/DAY instead of NOW, you should do that. The problem with
> querying
> > > on
> > > > > NOW
> > > > > is that it is always unique and therefore the query can never be
> > cached
> > > > > (actually, it is cached but can never be hit). If you use NOW/DAY,
> > the
> > > > > query
> > > > > can be cached for a day.
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Shalin Shekhar Mangar.
> > > > >
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Dipti Khullar
Hi

I am back again with further queries.

Just to check whether caching helps in rectifying our problem, we did a
simple test:

Restarted solr slave and executed one of the heavy queries immediately to
test the query response time. It was again high, somewhat about 700 ms,
which means now no caching is coming into picture and still the response
time is too high.!

(sitename:ABC OR sitename:"All Sites") AND (localeid:1237404875471) AND NOT
photocid:0 AND (assettype:Event) AND (startdate:[* TO 2009-12-07T23:59:00Z]
AND enddate:[2009-12-07T00:00:00Z TO *])

Which implies that even if some queries are served from cache, response time
at first hit will always be high and perhaps when many such queries hit solr
slaves, they hang and thus the server at times throws read time outs?
Any suggestions?

Thanks
Dipti

On Sat, Jan 23, 2010 at 6:22 PM, dipti khullar <[hidden email]>wrote:

> Thanks Eric
>
> Correctly said!!
> Initially we used to have a different settings for queryResultCache which
> used to serve the purpose of serving queries from the cache.
>
> <queryResultCache class="solr.LRUCache" size="512" initialSize="512"
> autowarmCount="256"/>
>
> But we changed the settings some days back to see if there were any
> issues/improvements.
> I believe we need to switch back to some similar settings after some of
> analysis.
>
> Also, removing <optimize> showed good results on local environment, I think
> we will deploy the same on production.
>
> Thanks guys for your help. Will keep posting further queries and findings
> on the issue.
>
> Dipti
>
>
> On Fri, Jan 22, 2010 at 9:05 PM, Erick Erickson <[hidden email]>wrote:
>
>> Take a look at the Wiki, here's a bit to start...
>>
>> http://lucene.apache.org/solr/features.html
>>
>> <http://lucene.apache.org/solr/features.html>The short form is that when
>> an
>> index is first opened,
>> there are various caches that are initialized. The
>> first few queries that run against a new searcher
>> are slowed down by filling up these caches. Warmup
>> queries can be fired that'll pre-populate these caches
>> in the background. You have to configure this, and
>> only *after* the warmup queries have run does
>> SOLR switch over to the newly-opened searchers.
>>
>> I suspect that what you're seeing is that the first few
>> queries after you update your index are paying this
>> penalty....
>>
>> HTH
>> Erick
>>
>> On Fri, Jan 22, 2010 at 12:30 AM, dipti khullar <[hidden email]
>> >wrote:
>>
>> > Hi
>> >
>> > Eric, thanks for your reply.
>> > I am not sure what exactly you mean by warmup queries. But if its
>> related
>> > to
>> > the settings we are using in solrconfig.xml, following are the
>> > configurations for query caching:
>> >
>> > <queryResultCache class="solr.LRUCache" size="512" initialSize="512"
>> > autowarmCount="0"/>
>> >
>> > Also, as we are using snapinstall script on slaves, which eventually
>> calls
>> > commit script. I was just wondering that whether, we need to change the
>> > simple commit command to
>> >
>> > <commit waitFlush="false" waitSearcher="false"/>
>> >
>> > Otis, we executed a performance test on our local environments for Solr
>> 1.4
>> > but there were not considerable performance improvement. Hence, we have
>> as
>> > of now dropped the idea of upgrading to Solr 1.4.
>> > Regarding optimization, we initially were not using optimize at all, but
>> > then at peak hours load on slaves increased considerably. Hence, we
>> > configured the optimize script to get the system running.
>> > But we can try this on local environment and then analyze the results.
>> >
>> > Thanks
>> > Dipti
>> >
>> >
>> > On Fri, Jan 22, 2010 at 10:36 AM, Otis Gospodnetic <
>> > [hidden email]> wrote:
>> >
>> > > Dipti,
>> > >
>> > > If I'm reading that correctly, you are optimizing the index on the
>> master
>> > > before replicating it?
>> > > There is no need to do that if you are constantly updating your index
>> and
>> > > replicating it every 10 minutes.
>> > > Don't optimize, and you'll replicate smaller portion of an index, and
>> > thus
>> > > you won't bust the OS cache on the slave as much.
>> > > The upgrade to Solr 1.4 and you'll see further benefits from faster
>> > > searcher warmup times.
>> > >
>> > >  Otis
>> > > --
>> > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>> > >
>> > >
>> > >
>> > > ----- Original Message ----
>> > > > From: dipti khullar <[hidden email]>
>> > > > To: [hidden email]
>> > > > Sent: Thu, January 21, 2010 11:48:20 AM
>> > > > Subject: Re: Improvising solr queries
>> > > >
>> > > > Hi
>> > > >
>> > > > Sorry for getting back late on the thread, but we are focusing on
>> > > > configuration of master and slave for improving performance issues.
>> > > >
>> > > > We have observed following trend on production slaves:
>> > > > After every 10 minutes the response time increases considerably. In
>> > > between
>> > > > all the queries are served by cache.
>> > > > It seems, after every 10th minute installation and then commit takes
>> > time
>> > > > and hence results in slow response time.
>> > > >
>> > > > Following are the logs taken for a complete cycle for master/slave
>> sync
>> > > up
>> > > > process:
>> > > >
>> > > > 2010/01/21 14:28:02 started by solr
>> > > > 2010/01/21 14:28:02 command:
>> > > /opt/solr/solr_master/solr/solr/bin/snapshooter
>> > > > 2010/01/21 14:28:02 taking snapshot
>> > > > /opt/solr/solr_master/solr/data/snapshot.20100121142802
>> > > > 2010/01/21 14:28:02 ended (elapsed time: 0 sec)
>> > > > 2010/01/21 14:28:01 started by solr
>> > > > 2010/01/21 14:28:01 command:
>> > /opt/solr/solr_master/solr/solr/bin/optimize
>> > > > 2010/01/21 14:28:02 ended (elapsed time: 1 sec)
>> > > > 2010/01/21 14:30:02 started by solr
>> > > > 2010/01/21 14:30:02 command:
>> > > /opt/solr/solr_slave/solr/solr/bin/snappuller
>> > > > 2010/01/21 14:30:06 pulling snapshot snapshot.20100121142802
>> > > > 2010/01/21 14:30:14 ended (elapsed time: 12 sec)
>> > > > 2010/01/21 14:30:14 started by solr
>> > > > 2010/01/21 14:30:14 command:
>> > > > /opt/solr/solr_slave/solr/solr/bin/snapinstaller
>> > > > 2010/01/21 14:30:15 installing snapshot
>> > > > /opt/solr/solr_slave/solr/data/snapshot.20100121142802
>> > > > 2010/01/21 14:30:16 notifing Solr to open a new Searcher
>> > > > 2010/01/21 14:30:17 ended (elapsed time: 3 sec)
>> > > > 2010/01/21 14:30:17 started by solr
>> > > > 2010/01/21 14:30:17 command:
>> /opt/solr/solr_slave/solr/solr/bin/commit
>> > > > 2010/01/21 14:30:17 ended (elapsed time: 0 sec)
>> > > >
>> > > > Response Time at 14:30:24 on:
>> > > > Slave 1 - 243
>> > > > Slave 2 - 111266
>> > > >
>> > > > Are we missing on some configuration. Or perhaps the frequency of
>> > > execution
>> > > > of scripts needs to be changed?
>> > > > Any pointers will be helpful !!
>> > > >
>> > > > Thanks
>> > > > Dipti
>> > > >
>> > > >
>> > > > On Tue, Jan 5, 2010 at 1:16 PM, Shalin Shekhar Mangar <
>> > > > [hidden email]> wrote:
>> > > >
>> > > > > On Tue, Jan 5, 2010 at 11:16 AM, dipti khullar
>> > > > > >wrote:
>> > > > >
>> > > > > >
>> > > > > > This assettype is variable. It can have around 6 values at a
>> time.
>> > > > > > But this is true that we apply facet mostly on just one field -
>> > > > > assettype.
>> > > > > >
>> > > > > >
>> > > > > Ian has a good point. You are faceting on assettype and you are
>> also
>> > > > > filtering on it so you will get only one facet value "Gallery"
>> with a
>> > > count
>> > > > > equal to numFound.
>> > > > >
>> > > > >
>> > > > > > Any idea if the use of date range queries is expensive? Also if
>> > > Shalin
>> > > > > can
>> > > > > > put in some comments on
>> > > > > > "sorting by date was pretty rough on CPU", I can start analyzing
>> > sort
>> > > by
>> > > > > > date specific queries.
>> > > > > >
>> > > > > >
>> > > > > This is a range search and not a sort. I don't know if range
>> search
>> > on
>> > > > > dates
>> > > > > is especially costly compared to a range search on any other type.
>> > But
>> > > I do
>> > > > > know that trie fields in Solr 1.4 are much faster for range
>> searches
>> > at
>> > > the
>> > > > > cost of more tokens in the index.
>> > > > >
>> > > > > With a date field, instead of using NOW, you should always try to
>> > round
>> > > it
>> > > > > down to the coarsest interval you can use. So if it is possible to
>> > use
>> > > > > NOW/DAY instead of NOW, you should do that. The problem with
>> querying
>> > > on
>> > > > > NOW
>> > > > > is that it is always unique and therefore the query can never be
>> > cached
>> > > > > (actually, it is cached but can never be hit). If you use NOW/DAY,
>> > the
>> > > > > query
>> > > > > can be cached for a day.
>> > > > >
>> > > > > --
>> > > > > Regards,
>> > > > > Shalin Shekhar Mangar.
>> > > > >
>> > >
>> > >
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Improvising solr queries

Lance Norskog-2
The <listener> firstSearcher and nextSearcher events describe queries
to run when Solr starts and when receives an updated index. When you
start Solr, the firstSearch queries are immediately run.

You should put queries there that you want to warm up for your
searches. For example, the date range query. Also, date ranges will be
faster if you limit the granularity of date fields.

http://wiki.apache.org/solr/IndexingDates

On Thu, Jan 28, 2010 at 5:50 AM, dipti khullar <[hidden email]> wrote:

> Hi
>
> I am back again with further queries.
>
> Just to check whether caching helps in rectifying our problem, we did a
> simple test:
>
> Restarted solr slave and executed one of the heavy queries immediately to
> test the query response time. It was again high, somewhat about 700 ms,
> which means now no caching is coming into picture and still the response
> time is too high.!
>
> (sitename:ABC OR sitename:"All Sites") AND (localeid:1237404875471) AND NOT
> photocid:0 AND (assettype:Event) AND (startdate:[* TO 2009-12-07T23:59:00Z]
> AND enddate:[2009-12-07T00:00:00Z TO *])
>
> Which implies that even if some queries are served from cache, response time
> at first hit will always be high and perhaps when many such queries hit solr
> slaves, they hang and thus the server at times throws read time outs?
> Any suggestions?
>
> Thanks
> Dipti
>
> On Sat, Jan 23, 2010 at 6:22 PM, dipti khullar <[hidden email]>wrote:
>
>> Thanks Eric
>>
>> Correctly said!!
>> Initially we used to have a different settings for queryResultCache which
>> used to serve the purpose of serving queries from the cache.
>>
>> <queryResultCache class="solr.LRUCache" size="512" initialSize="512"
>> autowarmCount="256"/>
>>
>> But we changed the settings some days back to see if there were any
>> issues/improvements.
>> I believe we need to switch back to some similar settings after some of
>> analysis.
>>
>> Also, removing <optimize> showed good results on local environment, I think
>> we will deploy the same on production.
>>
>> Thanks guys for your help. Will keep posting further queries and findings
>> on the issue.
>>
>> Dipti
>>
>>
>> On Fri, Jan 22, 2010 at 9:05 PM, Erick Erickson <[hidden email]>wrote:
>>
>>> Take a look at the Wiki, here's a bit to start...
>>>
>>> http://lucene.apache.org/solr/features.html
>>>
>>> <http://lucene.apache.org/solr/features.html>The short form is that when
>>> an
>>> index is first opened,
>>> there are various caches that are initialized. The
>>> first few queries that run against a new searcher
>>> are slowed down by filling up these caches. Warmup
>>> queries can be fired that'll pre-populate these caches
>>> in the background. You have to configure this, and
>>> only *after* the warmup queries have run does
>>> SOLR switch over to the newly-opened searchers.
>>>
>>> I suspect that what you're seeing is that the first few
>>> queries after you update your index are paying this
>>> penalty....
>>>
>>> HTH
>>> Erick
>>>
>>> On Fri, Jan 22, 2010 at 12:30 AM, dipti khullar <[hidden email]
>>> >wrote:
>>>
>>> > Hi
>>> >
>>> > Eric, thanks for your reply.
>>> > I am not sure what exactly you mean by warmup queries. But if its
>>> related
>>> > to
>>> > the settings we are using in solrconfig.xml, following are the
>>> > configurations for query caching:
>>> >
>>> > <queryResultCache class="solr.LRUCache" size="512" initialSize="512"
>>> > autowarmCount="0"/>
>>> >
>>> > Also, as we are using snapinstall script on slaves, which eventually
>>> calls
>>> > commit script. I was just wondering that whether, we need to change the
>>> > simple commit command to
>>> >
>>> > <commit waitFlush="false" waitSearcher="false"/>
>>> >
>>> > Otis, we executed a performance test on our local environments for Solr
>>> 1.4
>>> > but there were not considerable performance improvement. Hence, we have
>>> as
>>> > of now dropped the idea of upgrading to Solr 1.4.
>>> > Regarding optimization, we initially were not using optimize at all, but
>>> > then at peak hours load on slaves increased considerably. Hence, we
>>> > configured the optimize script to get the system running.
>>> > But we can try this on local environment and then analyze the results.
>>> >
>>> > Thanks
>>> > Dipti
>>> >
>>> >
>>> > On Fri, Jan 22, 2010 at 10:36 AM, Otis Gospodnetic <
>>> > [hidden email]> wrote:
>>> >
>>> > > Dipti,
>>> > >
>>> > > If I'm reading that correctly, you are optimizing the index on the
>>> master
>>> > > before replicating it?
>>> > > There is no need to do that if you are constantly updating your index
>>> and
>>> > > replicating it every 10 minutes.
>>> > > Don't optimize, and you'll replicate smaller portion of an index, and
>>> > thus
>>> > > you won't bust the OS cache on the slave as much.
>>> > > The upgrade to Solr 1.4 and you'll see further benefits from faster
>>> > > searcher warmup times.
>>> > >
>>> > >  Otis
>>> > > --
>>> > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>>> > >
>>> > >
>>> > >
>>> > > ----- Original Message ----
>>> > > > From: dipti khullar <[hidden email]>
>>> > > > To: [hidden email]
>>> > > > Sent: Thu, January 21, 2010 11:48:20 AM
>>> > > > Subject: Re: Improvising solr queries
>>> > > >
>>> > > > Hi
>>> > > >
>>> > > > Sorry for getting back late on the thread, but we are focusing on
>>> > > > configuration of master and slave for improving performance issues.
>>> > > >
>>> > > > We have observed following trend on production slaves:
>>> > > > After every 10 minutes the response time increases considerably. In
>>> > > between
>>> > > > all the queries are served by cache.
>>> > > > It seems, after every 10th minute installation and then commit takes
>>> > time
>>> > > > and hence results in slow response time.
>>> > > >
>>> > > > Following are the logs taken for a complete cycle for master/slave
>>> sync
>>> > > up
>>> > > > process:
>>> > > >
>>> > > > 2010/01/21 14:28:02 started by solr
>>> > > > 2010/01/21 14:28:02 command:
>>> > > /opt/solr/solr_master/solr/solr/bin/snapshooter
>>> > > > 2010/01/21 14:28:02 taking snapshot
>>> > > > /opt/solr/solr_master/solr/data/snapshot.20100121142802
>>> > > > 2010/01/21 14:28:02 ended (elapsed time: 0 sec)
>>> > > > 2010/01/21 14:28:01 started by solr
>>> > > > 2010/01/21 14:28:01 command:
>>> > /opt/solr/solr_master/solr/solr/bin/optimize
>>> > > > 2010/01/21 14:28:02 ended (elapsed time: 1 sec)
>>> > > > 2010/01/21 14:30:02 started by solr
>>> > > > 2010/01/21 14:30:02 command:
>>> > > /opt/solr/solr_slave/solr/solr/bin/snappuller
>>> > > > 2010/01/21 14:30:06 pulling snapshot snapshot.20100121142802
>>> > > > 2010/01/21 14:30:14 ended (elapsed time: 12 sec)
>>> > > > 2010/01/21 14:30:14 started by solr
>>> > > > 2010/01/21 14:30:14 command:
>>> > > > /opt/solr/solr_slave/solr/solr/bin/snapinstaller
>>> > > > 2010/01/21 14:30:15 installing snapshot
>>> > > > /opt/solr/solr_slave/solr/data/snapshot.20100121142802
>>> > > > 2010/01/21 14:30:16 notifing Solr to open a new Searcher
>>> > > > 2010/01/21 14:30:17 ended (elapsed time: 3 sec)
>>> > > > 2010/01/21 14:30:17 started by solr
>>> > > > 2010/01/21 14:30:17 command:
>>> /opt/solr/solr_slave/solr/solr/bin/commit
>>> > > > 2010/01/21 14:30:17 ended (elapsed time: 0 sec)
>>> > > >
>>> > > > Response Time at 14:30:24 on:
>>> > > > Slave 1 - 243
>>> > > > Slave 2 - 111266
>>> > > >
>>> > > > Are we missing on some configuration. Or perhaps the frequency of
>>> > > execution
>>> > > > of scripts needs to be changed?
>>> > > > Any pointers will be helpful !!
>>> > > >
>>> > > > Thanks
>>> > > > Dipti
>>> > > >
>>> > > >
>>> > > > On Tue, Jan 5, 2010 at 1:16 PM, Shalin Shekhar Mangar <
>>> > > > [hidden email]> wrote:
>>> > > >
>>> > > > > On Tue, Jan 5, 2010 at 11:16 AM, dipti khullar
>>> > > > > >wrote:
>>> > > > >
>>> > > > > >
>>> > > > > > This assettype is variable. It can have around 6 values at a
>>> time.
>>> > > > > > But this is true that we apply facet mostly on just one field -
>>> > > > > assettype.
>>> > > > > >
>>> > > > > >
>>> > > > > Ian has a good point. You are faceting on assettype and you are
>>> also
>>> > > > > filtering on it so you will get only one facet value "Gallery"
>>> with a
>>> > > count
>>> > > > > equal to numFound.
>>> > > > >
>>> > > > >
>>> > > > > > Any idea if the use of date range queries is expensive? Also if
>>> > > Shalin
>>> > > > > can
>>> > > > > > put in some comments on
>>> > > > > > "sorting by date was pretty rough on CPU", I can start analyzing
>>> > sort
>>> > > by
>>> > > > > > date specific queries.
>>> > > > > >
>>> > > > > >
>>> > > > > This is a range search and not a sort. I don't know if range
>>> search
>>> > on
>>> > > > > dates
>>> > > > > is especially costly compared to a range search on any other type.
>>> > But
>>> > > I do
>>> > > > > know that trie fields in Solr 1.4 are much faster for range
>>> searches
>>> > at
>>> > > the
>>> > > > > cost of more tokens in the index.
>>> > > > >
>>> > > > > With a date field, instead of using NOW, you should always try to
>>> > round
>>> > > it
>>> > > > > down to the coarsest interval you can use. So if it is possible to
>>> > use
>>> > > > > NOW/DAY instead of NOW, you should do that. The problem with
>>> querying
>>> > > on
>>> > > > > NOW
>>> > > > > is that it is always unique and therefore the query can never be
>>> > cached
>>> > > > > (actually, it is cached but can never be hit). If you use NOW/DAY,
>>> > the
>>> > > > > query
>>> > > > > can be cached for a day.
>>> > > > >
>>> > > > > --
>>> > > > > Regards,
>>> > > > > Shalin Shekhar Mangar.
>>> > > > >
>>> > >
>>> > >
>>> >
>>>
>>
>>
>



--
Lance Norskog
[hidden email]