Speeding up RangeQueries?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Speeding up RangeQueries?

Niels Ott
Hi all,

I'm working on my prototype system and it turns out that RangeQueries
are quite slow. In a first test I have about 80.000 documents in my
index and I combine two range queries with a normal text query using the
BooleanQuery.

On the long run I will need to enhance my index at indexing-time so that
the range queries will be substituted by simple keywords.

For now, I'm interested in a possibility to speed up range queries. Does
the performance of a range query depend on the length of contents in the
field in question?

Best,

    Niels

--
Niels Ott
Computational Linguist (B.A.)
http://www.drni.de/niels/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up RangeQueries?

Yonik Seeley-2-2
On Sat, Mar 14, 2009 at 8:38 AM, Niels Ott <[hidden email]> wrote:
> For now, I'm interested in a possibility to speed up range queries. Does the
> performance of a range query depend on the length of contents in the field
> in question?

Usually the biggest factor is the number of terms in the range.  The
second biggest is the number of documents that term points to (i.e.
the number of documents containing that term).

For single-valued numeric or date fields, TrieRangeQuery in Lucene
trunk will speed up range queries.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up RangeQueries?

Paul Elschot
In reply to this post by Niels Ott
On Saturday 14 March 2009 13:38:16 Niels Ott wrote:
> Hi all,
>
> I'm working on my prototype system and it turns out that RangeQueries
> are quite slow. In a first test I have about 80.000 documents in my
> index and I combine two range queries with a normal text query using the
> BooleanQuery.
>
> On the long run I will need to enhance my index at indexing-time so that
> the range queries will be substituted by simple keywords.

Perhaps that is avoidable, see the reference below.

> For now, I'm interested in a possibility to speed up range queries. Does
> the performance of a range query depend on the length of contents in the
> field in question?

Performance normally mostly depends on the number of terms indexed within
the queried range. To limit the number of terms used during a range search,
have a look here for more info on the new TrieRangeQuery:
http://wiki.apache.org/lucene-java/SearchNumericalFields

Regards,
Paul Elschot
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up RangeQueries?

Niels Ott
Hi Paul,

Paul Elschot schrieb:
> Performance normally mostly depends on the number of terms indexed within
> the queried range. To limit the number of terms used during a range search,
> have a look here for more info on the new TrieRangeQuery:
> http://wiki.apache.org/lucene-java/SearchNumericalFields

This looks very promising.

As far as I understand this is only available from the unreleased
development version, right? How safe is this version for use?

Is it possible to use only the org.apache.lucene.search.trie package
from there together with the old and stable Lucene?

Best

    Niels

--
Niels Ott
Computational Linguist (B.A.)
http://www.drni.de/niels/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up RangeQueries?

Yonik Seeley-2-2
On Sat, Mar 14, 2009 at 11:37 AM, Niels Ott <[hidden email]> wrote:
> As far as I understand this is only available from the unreleased
> development version, right? How safe is this version for use?
>
> Is it possible to use only the org.apache.lucene.search.trie package from
> there together with the old and stable Lucene?

It's unreleased, so the API could end up changing a little, but it's
very well tested already and should be independent of the rest of
Lucene (so yes, you should be able to just grab the trie package and
use with the latest official Lucene release).


-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]