hybrid query (lucene + db)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

hybrid query (lucene + db)

Stephane Nicoll
Hi there,

We're using lucene with Hibernate search and we're very happy so far
with the performance and the usability of lucene. We have however a
specific use cases that prevent us to use only lucene: spatial
queries. I already sent a mail on this list a while back about the
problem and we started investigating multiple solutions.

When the user selects a geographic area and some keywords we do the following:

* Perform a search on the lucene index for the keywords with a
projection that returns only the primaryKey of the element sorted by
primary key
* Perform a search on the database with other criterias and a
projection that returns only the primary key of the elements
* Iterate on both list to find N matching IDs, optionally with paging
(some from X to X + N where X is the first result of the page)
* Run a query on the database to return the actual objects (select a
from MyClass a where a.id IN (the list of matching IDs) ) We limit the
page to 1000 results

We have searched a way to optimize the queries and to avoid to consume
too much memory, knowing that we must support paging.

With a single user a search by kewyords takes 30msec to complete, a
search by box takes 45msec. With both (keywords + spatial area)  it
takes 300msec

With 10 concurrent users, a search by keywords takes 150msec/user  but
for both it takes 3 sec/user !!!

I had the profiler running on this scenario and I've found that *all*
threads are waiting on org.apache.lucene.index.SegmentReader. I then
configured Hibernate Search to use a separate index reader per thread.
The deadlocks disappeared but it's still very slow (2.8sec).

Some questions:

* Does anyone knows where the deadlocks on SegmentReader are coming from?
* Is the sorting on the primary keys a bad idea regarding performance
and memory usage?
* Does anyone has an idea to perform this kind of hybrid query in an
efficient way?

I am using lucene 2.3.1 and Hibernate Search 3.0.1. I already ask for
support on the Hibernate Search forum but did not get any answer so


Large Systems Suck: This rule is 100% transitive. If you build one,
you suck" -- S.Yegge

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]