Date filter query

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Date filter query

ku3ia
Hi all!

Please advice me:
1) q=test&fq=date:[NOW-30DAY+TO+NOW]
2) q=test&fq=date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]
3) q=test+AND+date:[NOW-30DAY+TO+NOW]
4) q=test+AND+date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]

where date:
<fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/>
<field name="date" type="tdate" indexed="true" stored="true"/>

Which of these queries will be faster by QTime at Solr 3.5? Thanks!
Em
Reply | Threaded
Open this post in threaded view
|

Re: Date filter query

Em
Hi,

1) and 2) should have equal performance, given that several searches are
performed with the same fq-param.
Since the filters are cached, 1) and 2) perform better.

Kind regards,
Em

Am 21.02.2012 19:06, schrieb ku3ia:

> Hi all!
>
> Please advice me:
> 1) q=test&fq=date:[NOW-30DAY+TO+NOW]
> 2) q=test&fq=date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]
> 3) q=test+AND+date:[NOW-30DAY+TO+NOW]
> 4) q=test+AND+date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]
>
> where date:
> <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true"
> precisionStep="6" positionIncrementGap="0"/>
> <field name="date" type="tdate" indexed="true" stored="true"/>
>
> Which of these queries will be faster by QTime at Solr 3.5? Thanks!
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764349.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Date filter query

ku3ia
Hi, Em, thanks for your response. But seems a have a problem.
I wrote a script, which sends a queries (curl based), with a certain delay. I had made a dictionary of matched words. I run my script with 500ms delay during 60 seconds. Take look at catalina logs:

INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500} status=0 QTime=1735
INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500} status=0 QTime=9794

INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500} status=0 QTime=13885
INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500} status=0 QTime=33995

Note, that not all queries from the second test are slower, for example:
INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500} status=0 QTime=18645

INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500} status=0 QTime=7877

but in average I have:
*** Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] ***
Queries processed: 110
Queries cancelled: 4
Max QTime is: 22728 ms
Avg QTime is: 6681.31 ms
Min QTime is:  ms

*** Date:[NOW-30DAY+TO+NOW] ***
Queries processed: 20
Queries cancelled: 94
Max QTime is: 45203 ms
Avg QTime is: 39195.2 ms
Min QTime is:  ms

I repeated this test more times - results seems equal. Is it true, that
[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] is faster than [NOW-30DAY+TO+NOW]
?
Em
Reply | Threaded
Open this post in threaded view
|

Re: Date filter query

Em
Hi,

your QTimes are somewhat slow!
First: I am really surprised that the difference between explicit
Date-Values and the more friendly date-keywords is that large.
Did you made a server restart between both tests?

Second: Could you show us your solrconfig to make sure that your caches
are configured well?

How many documents are part of that test-index?

I suggest you to adjust the precisionStep-definition of your TrieDateField.

Furthermore: Take into consideration, whether you really need 500 rows
per request.

Kind regards,
Em


Am 21.02.2012 21:49, schrieb ku3ia:

> Hi, Em, thanks for your response. But seems a have a problem.
> I wrote a script, which sends a queries (curl based), with a certain delay.
> I had made a dictionary of matched words. I run my script with 500ms delay
> during 60 seconds. Take look at catalina logs:
>
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500}
> status=0 QTime=1735
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500}
> status=0 QTime=9794
>
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500}
> status=0 QTime=13885
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500}
> status=0 QTime=33995
>
> Note, that not all queries from the second test are slower, for example:
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500}
> status=0 QTime=18645
>
> INFO: [] webapp=/solr path=/select
> params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500}
> status=0 QTime=7877
>
> but in average I have:
> *** Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] ***
> Queries processed: 110
> Queries cancelled: 4
> Max QTime is: 22728 ms
> Avg QTime is: 6681.31 ms
> Min QTime is:  ms
>
> *** Date:[NOW-30DAY+TO+NOW] ***
> Queries processed: 20
> Queries cancelled: 94
> Max QTime is: 45203 ms
> Avg QTime is: 39195.2 ms
> Min QTime is:  ms
>
> I repeated this test more times - results seems equal. Is it true, that
> [2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] is faster than
> [NOW-30DAY+TO+NOW]
> ?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764781.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Date filter query

ku3ia
Hi,

>>First: I am really surprised that the difference between explicit
>>Date-Values and the more friendly date-keywords is that large.
Maybe it is that I use shards. I have 11 shards, summary ~310M docs.

>>Did you made a server restart between both tests?
I tried to run these test one after another, I'd rebooted my tomcats, I'd run second test first and vice versa.

>>Second: Could you show us your solrconfig to make sure that your caches
>>are configured well?
I'm using solrconfig from solr/example directory. The difference is that I only commented out unused components. Filter, document and query result cache is default. But they are default for both tests, can it affect on results?

>>Furthermore: Take into consideration, whether you really need 500 rows
>>per request.
Yes, I need 500 rows.

Thanks
Em
Reply | Threaded
Open this post in threaded view
|

Re: Date filter query

Em
Hi,

> But they [the cache configurations] are default for both tests, can it
affect on
> results?
Yes, they affect both results. Try to increase the values for
queryResultCache and documentCache from 512 to 1024 (provided that you
got two distinct queries "bay" and "girl"). In general they should fit
the amount of documents and results you are expecting to have in a way
that chances are good to have a cache-hit.

> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
11 shards on the same machine? Could lead to decreased performance due
to disk-io.

Did you tried my advice of adjusting the precisionSteps of your
TrieDateFields and reindexed your documents afterwards?

Kind regards,
Em


Am 21.02.2012 22:52, schrieb ku3ia:

> Hi,
>
>>> First: I am really surprised that the difference between explicit
>>> Date-Values and the more friendly date-keywords is that large.
> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
>
>>> Did you made a server restart between both tests?
> I tried to run these test one after another, I'd rebooted my tomcats, I'd
> run second test first and vice versa.
>
>>> Second: Could you show us your solrconfig to make sure that your caches
>>> are configured well?
> I'm using solrconfig from solr/example directory. The difference is that I
> only commented out unused components. Filter, document and query result
> cache is default. But they are default for both tests, can it affect on
> results?
>
>>> Furthermore: Take into consideration, whether you really need 500 rows
>>> per request.
> Yes, I need 500 rows.
>
> Thanks
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Date filter query

Erick Erickson
Be a little careful here. Any "fq" that references NOW will probably
NOT be effectively cached. Think of the fq cache as a map, with
the key being the fq clause and the value being the set of
documents that match that value.

So something like NOW gives
2012-01-23T00:00:00Z
but issuing that a second later gives:
2012-01-23T00:00:01Z

so the keys don't match, they're considered
different fq clauses and the calculations are all
done all over again.

Using the rounding for date math will help here,
something like NOW/DAY+1DAY to get midnight tonight
will give you something that's re-used, similarly for
NOW/DAY-30DAY etc.

All that said, your query times are pretty long. I doubt
that your fq clause is really the culprit here. You need
to find out what the bottleneck is here, consider using
jconsole to see what your machine is occupying its
time with. Examine your cache statistics to see
if your getting good usage from your cache. You
haven't detailed what you're measuring. If this is just
a half-dozen queries after starting Solr, you may get
much better performance if you autowarm.

You may have too little memory allocated. You may be
swapping to disk a lot. You may.....

What have you tried and what have the results been?

In short, these times are very suspect and you haven't
really provided much info to go on.

Best
Erick

On Tue, Feb 21, 2012 at 5:25 PM, Em <[hidden email]> wrote:

> Hi,
>
>> But they [the cache configurations] are default for both tests, can it
> affect on
>> results?
> Yes, they affect both results. Try to increase the values for
> queryResultCache and documentCache from 512 to 1024 (provided that you
> got two distinct queries "bay" and "girl"). In general they should fit
> the amount of documents and results you are expecting to have in a way
> that chances are good to have a cache-hit.
>
>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
> 11 shards on the same machine? Could lead to decreased performance due
> to disk-io.
>
> Did you tried my advice of adjusting the precisionSteps of your
> TrieDateFields and reindexed your documents afterwards?
>
> Kind regards,
> Em
>
>
> Am 21.02.2012 22:52, schrieb ku3ia:
>> Hi,
>>
>>>> First: I am really surprised that the difference between explicit
>>>> Date-Values and the more friendly date-keywords is that large.
>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
>>
>>>> Did you made a server restart between both tests?
>> I tried to run these test one after another, I'd rebooted my tomcats, I'd
>> run second test first and vice versa.
>>
>>>> Second: Could you show us your solrconfig to make sure that your caches
>>>> are configured well?
>> I'm using solrconfig from solr/example directory. The difference is that I
>> only commented out unused components. Filter, document and query result
>> cache is default. But they are default for both tests, can it affect on
>> results?
>>
>>>> Furthermore: Take into consideration, whether you really need 500 rows
>>>> per request.
>> Yes, I need 500 rows.
>>
>> Thanks
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
Em
Reply | Threaded
Open this post in threaded view
|

Re: Date filter query

Em
Erick,

damn!

The NOW of now isn't the same NOW a second later. So obvisiously. How
could I overlook it?

Kind regards,
Em

Am 22.02.2012 00:17, schrieb Erick Erickson:

> Be a little careful here. Any "fq" that references NOW will probably
> NOT be effectively cached. Think of the fq cache as a map, with
> the key being the fq clause and the value being the set of
> documents that match that value.
>
> So something like NOW gives
> 2012-01-23T00:00:00Z
> but issuing that a second later gives:
> 2012-01-23T00:00:01Z
>
> so the keys don't match, they're considered
> different fq clauses and the calculations are all
> done all over again.
>
> Using the rounding for date math will help here,
> something like NOW/DAY+1DAY to get midnight tonight
> will give you something that's re-used, similarly for
> NOW/DAY-30DAY etc.
>
> All that said, your query times are pretty long. I doubt
> that your fq clause is really the culprit here. You need
> to find out what the bottleneck is here, consider using
> jconsole to see what your machine is occupying its
> time with. Examine your cache statistics to see
> if your getting good usage from your cache. You
> haven't detailed what you're measuring. If this is just
> a half-dozen queries after starting Solr, you may get
> much better performance if you autowarm.
>
> You may have too little memory allocated. You may be
> swapping to disk a lot. You may.....
>
> What have you tried and what have the results been?
>
> In short, these times are very suspect and you haven't
> really provided much info to go on.
>
> Best
> Erick
>
> On Tue, Feb 21, 2012 at 5:25 PM, Em <[hidden email]> wrote:
>> Hi,
>>
>>> But they [the cache configurations] are default for both tests, can it
>> affect on
>>> results?
>> Yes, they affect both results. Try to increase the values for
>> queryResultCache and documentCache from 512 to 1024 (provided that you
>> got two distinct queries "bay" and "girl"). In general they should fit
>> the amount of documents and results you are expecting to have in a way
>> that chances are good to have a cache-hit.
>>
>>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
>> 11 shards on the same machine? Could lead to decreased performance due
>> to disk-io.
>>
>> Did you tried my advice of adjusting the precisionSteps of your
>> TrieDateFields and reindexed your documents afterwards?
>>
>> Kind regards,
>> Em
>>
>>
>> Am 21.02.2012 22:52, schrieb ku3ia:
>>> Hi,
>>>
>>>>> First: I am really surprised that the difference between explicit
>>>>> Date-Values and the more friendly date-keywords is that large.
>>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
>>>
>>>>> Did you made a server restart between both tests?
>>> I tried to run these test one after another, I'd rebooted my tomcats, I'd
>>> run second test first and vice versa.
>>>
>>>>> Second: Could you show us your solrconfig to make sure that your caches
>>>>> are configured well?
>>> I'm using solrconfig from solr/example directory. The difference is that I
>>> only commented out unused components. Filter, document and query result
>>> cache is default. But they are default for both tests, can it affect on
>>> results?
>>>
>>>>> Furthermore: Take into consideration, whether you really need 500 rows
>>>>> per request.
>>> Yes, I need 500 rows.
>>>
>>> Thanks
>>>
>>> --
>>> View this message in context: http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Date filter query

Erick Erickson
bq: How could I overlook it?

Easy, the same way I did for a year and more <G>....

Best
Erick

On Tue, Feb 21, 2012 at 6:50 PM, Em <[hidden email]> wrote:

> Erick,
>
> damn!
>
> The NOW of now isn't the same NOW a second later. So obvisiously. How
> could I overlook it?
>
> Kind regards,
> Em
>
> Am 22.02.2012 00:17, schrieb Erick Erickson:
>> Be a little careful here. Any "fq" that references NOW will probably
>> NOT be effectively cached. Think of the fq cache as a map, with
>> the key being the fq clause and the value being the set of
>> documents that match that value.
>>
>> So something like NOW gives
>> 2012-01-23T00:00:00Z
>> but issuing that a second later gives:
>> 2012-01-23T00:00:01Z
>>
>> so the keys don't match, they're considered
>> different fq clauses and the calculations are all
>> done all over again.
>>
>> Using the rounding for date math will help here,
>> something like NOW/DAY+1DAY to get midnight tonight
>> will give you something that's re-used, similarly for
>> NOW/DAY-30DAY etc.
>>
>> All that said, your query times are pretty long. I doubt
>> that your fq clause is really the culprit here. You need
>> to find out what the bottleneck is here, consider using
>> jconsole to see what your machine is occupying its
>> time with. Examine your cache statistics to see
>> if your getting good usage from your cache. You
>> haven't detailed what you're measuring. If this is just
>> a half-dozen queries after starting Solr, you may get
>> much better performance if you autowarm.
>>
>> You may have too little memory allocated. You may be
>> swapping to disk a lot. You may.....
>>
>> What have you tried and what have the results been?
>>
>> In short, these times are very suspect and you haven't
>> really provided much info to go on.
>>
>> Best
>> Erick
>>
>> On Tue, Feb 21, 2012 at 5:25 PM, Em <[hidden email]> wrote:
>>> Hi,
>>>
>>>> But they [the cache configurations] are default for both tests, can it
>>> affect on
>>>> results?
>>> Yes, they affect both results. Try to increase the values for
>>> queryResultCache and documentCache from 512 to 1024 (provided that you
>>> got two distinct queries "bay" and "girl"). In general they should fit
>>> the amount of documents and results you are expecting to have in a way
>>> that chances are good to have a cache-hit.
>>>
>>>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
>>> 11 shards on the same machine? Could lead to decreased performance due
>>> to disk-io.
>>>
>>> Did you tried my advice of adjusting the precisionSteps of your
>>> TrieDateFields and reindexed your documents afterwards?
>>>
>>> Kind regards,
>>> Em
>>>
>>>
>>> Am 21.02.2012 22:52, schrieb ku3ia:
>>>> Hi,
>>>>
>>>>>> First: I am really surprised that the difference between explicit
>>>>>> Date-Values and the more friendly date-keywords is that large.
>>>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
>>>>
>>>>>> Did you made a server restart between both tests?
>>>> I tried to run these test one after another, I'd rebooted my tomcats, I'd
>>>> run second test first and vice versa.
>>>>
>>>>>> Second: Could you show us your solrconfig to make sure that your caches
>>>>>> are configured well?
>>>> I'm using solrconfig from solr/example directory. The difference is that I
>>>> only commented out unused components. Filter, document and query result
>>>> cache is default. But they are default for both tests, can it affect on
>>>> results?
>>>>
>>>>>> Furthermore: Take into consideration, whether you really need 500 rows
>>>>>> per request.
>>>> Yes, I need 500 rows.
>>>>
>>>> Thanks
>>>>
>>>> --
>>>> View this message in context: http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Date filter query

ku3ia
Hi, all
Thanks for your responses.

I'd tried
[NOW/DAY-30DAY+TO+NOW/DAY-1DAY-1SECOND]
and seems it works fine for me.

Thanks a lot!