filter query speed

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

filter query speed

Michael Thessel
Hello UG,

I've got a problem with filtered queries. I have an index with about 8
million documents. I save a timestamp (not the time of indexing) for
each document as an integer field. Querying the index is pretty fast.
But when I filter on the timestamp the queries are extremely slow, even
if the unfiltered search is already cached.

schema.xml:
...
<field name="dateline" type="integer" indexed="true" stored="false"/>
...

INFO: /select/ rows=25&start=0&q=((title:(test)+AND+is_starter:true)^8
+OR+pagetext:(test)^6+OR+title_pagetext:(test)^4+);+score+desc&fl=
+score,postid&qt=standard&stylesheet=&version=2.1 0 5

INFO: /select/ rows=25&start=0&fq=dateline:[0+TO
+1181237598]+&q=((title:(test)+AND+is_starter:true)^8+OR
+pagetext:(test)^6+OR+title_pagetext:(test)^4+);+score+desc&fl=
+score,postid&qt=standard&stylesheet=&version=2.1 0 79495

I currently run version:
Solr Specification Version: 1.1.2007.05.24.08.06.21
Solr Implementation Version: nightly - yonik - 2007-05-24 08:06:21
Lucene Specification Version: 2007-05-20_00-04-53
Lucene Implementation Version: build 2007-05-20
Tomcat: 6.0.10


Cheers,

Michael




--
Michael Thessel <[hidden email]>
Gossamer Threads Inc. http://www.gossamer-threads.com/
Tel: (604) 687-5804 Fax: (604) 687-5806

Reply | Threaded
Open this post in threaded view
|

Re: filter query speed

Yonik Seeley-2
On 6/7/07, Michael Thessel <[hidden email]> wrote:
> I've got a problem with filtered queries. I have an index with about 8
> million documents. I save a timestamp (not the time of indexing) for
> each document as an integer field. Querying the index is pretty fast.
> But when I filter on the timestamp the queries are extremely slow, even
> if the unfiltered search is already cached.

Filters are cached independently of queries, but cached queries
consist of the sort *and* any applied filters.

> INFO: /select/ rows=25&start=0&q=((title:(test)+AND+is_starter:true)^8
> +OR+pagetext:(test)^6+OR+title_pagetext:(test)^4+);+score+desc&fl=
> +score,postid&qt=standard&stylesheet=&version=2.1 0 5
>
> INFO: /select/ rows=25&start=0&fq=dateline:[0+TO
> +1181237598]+&q=((title:(test)+AND+is_starter:true)^8+OR
> +pagetext:(test)^6+OR+title_pagetext:(test)^4+);+score+desc&fl=
> +score,postid&qt=standard&stylesheet=&version=2.1 0 79495

I suspect that the endpoint to your dateline filter changes often,
hence caching is doing no good.  Is then endpoint (1181237598) derived
from the current time?
If so, there are some things you can do:
1) make it faster to generate a new filter by limiting the number of
terms in the dateline field (during indexing, always round it to the
nearest day)
2) allow solr to reuse previously generated filters more often by
rounding the dateline endpoint during query time.

You most likely want to do #2, and probably #1 (depending on how often
you commit new changes to the index).

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: filter query speed

Michael Thessel
Hey Yoink,

thanks a lot for your quick reply.

> I suspect that the endpoint to your dateline filter changes often,
> hence caching is doing no good.  Is then endpoint (1181237598) derived
> from the current time?
Yes, it is.

> If so, there are some things you can do:
> 1) make it faster to generate a new filter by limiting the number of
> terms in the dateline field (during indexing, always round it to the
> nearest day)
> 2) allow solr to reuse previously generated filters more often by
> rounding the dateline endpoint during query time.
>
> You most likely want to do #2, and probably #1 (depending on how often
> you commit new changes to the index).

I will give both of them a try.

Is there a general speed problem with range searches in solr? It looks a bit strange for me, that a query for a term takes 5 ms while adding a filter to the same resultset takes 80s?

Cheers,

Michael


--
Michael Thessel <[hidden email]>
Gossamer Threads Inc. http://www.gossamer-threads.com/
Tel: (604) 687-5804 Fax: (604) 687-5806

Reply | Threaded
Open this post in threaded view
|

Re: filter query speed

Yonik Seeley-2
On 6/7/07, Michael Thessel <[hidden email]> wrote:
> Is there a general speed problem with range searches in solr? It looks a bit strange for me, that a query for a term takes 5 ms while adding a filter to the same resultset takes 80s?

It's completely dependent on the number of terms in the range.
The unit of indexing in lucene is the term, so finding docs for a
single term is fast.
There are many terms in a range though.

The algorithm is simply:
for every term in the range: collect the docs for that term

-Yonik