Performance question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance question

Scott Smith-2
I was reading a book on SQL query tuning.  The gist of it was that the
way to get the best performance (fastest execution) out of a SQL select
statement was to "create" execution plans where the most selective term
in the "where" clause is used first, the next most selective term is
used next, etc.  Those of you who write SQL statements know that's
easier said than done.

 

However, that got me to thinking about the lucene queries I do.  I often
have a Boolean query that's made of a number of sub-queries.  Suppose I
have a BooleanQuery Q which is made up of sub-queries Q1, Q2, and Q3.
Q1 is more selective (gets fewer hits if applied all by itself) than Q2
or Q3.  Q2 is more selective than Q3 (but less selective than Q1).

 

Does it matter what order I add the sub-queries to the BooleanQuery Q.
That is, is the execution speed for the search faster (slower) if I do:

 

            Q.add(Q1, BooleanClause.Occur.MUST);

            Q.add(Q2, BooleanClause.Occur.MUST);

            Q.add(Q3, BooleanClause.Occur.MUST);

 

As opposed to:

 

            Q.add(Q3, BooleanClause.Occur.MUST);

            Q.add(Q2, BooleanClause.Occur.MUST);

            Q.add(Q1, BooleanClause.Occur.MUST);

 

Or does it matter at all?

 

There are cases where I know that, for 99% of the time, certain portions
of my queries are likely to be more selective and I could affect the
order they get added to the BooleanQuery.  Those of you who know lucene
internals, is there anything worth doing here?

 

Scott

 

Reply | Threaded
Open this post in threaded view
|

Re: Performance question

Doron Cohen
> Does it matter what order I add the sub-queries to the BooleanQuery Q.
> That is, is the execution speed for the search faster (slower) if I do:
>             Q.add(Q1, BooleanClause.Occur.MUST);
>             Q.add(Q2, BooleanClause.Occur.MUST);
>             Q.add(Q3, BooleanClause.Occur.MUST);
> As opposed to:
>             Q.add(Q3, BooleanClause.Occur.MUST);
>             Q.add(Q2, BooleanClause.Occur.MUST);
>             Q.add(Q1, BooleanClause.Occur.MUST);
> Or does it matter at all?
>
> There are cases where I know that, for 99% of the time, certain portions
> of my queries are likely to be more selective and I could affect the
> order they get added to the BooleanQuery.  Those of you who know lucene
> internals, is there anything worth doing here?
> Scott

I think the order in which the query is composed would not matter, as
Lucene already takes care of this: the order by which the
<list-of-doc-ids-of-a-term> are traversed by ConjunctionScorer (in effect
for a boolean AND query) lets the more selective term lead the query
execution. In fact it seems to be stronger - assume an index with 3000
docs, and a query with 3 terms:
   +t1 +t2 +t3
Also assume that t1, t2, t3 is more selective (rare) for docs [d0,d1000),
t2 more selective for [d1000,d2000) and t3 more selective in [d2000,d3000).
The way Lucene traverses these doc-ids lists, would let t1 lead the
computation in the first range [0,1000], t2 lead it in the second range
[1000,2000), and t3 in the 3rd range.

Regards,
Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Performance question

Scott Smith-2
Interesting and thanks for the answer.  I guess I won't write code to
control the order clauses get added--one less thing to do :-)


-----Original Message-----
From: Doron Cohen [mailto:[hidden email]]
Sent: Thursday, July 20, 2006 6:47 PM
To: [hidden email]
Subject: Re: Performance question

> Does it matter what order I add the sub-queries to the BooleanQuery Q.
> That is, is the execution speed for the search faster (slower) if I
do:

>             Q.add(Q1, BooleanClause.Occur.MUST);
>             Q.add(Q2, BooleanClause.Occur.MUST);
>             Q.add(Q3, BooleanClause.Occur.MUST);
> As opposed to:
>             Q.add(Q3, BooleanClause.Occur.MUST);
>             Q.add(Q2, BooleanClause.Occur.MUST);
>             Q.add(Q1, BooleanClause.Occur.MUST);
> Or does it matter at all?
>
> There are cases where I know that, for 99% of the time, certain
portions
> of my queries are likely to be more selective and I could affect the
> order they get added to the BooleanQuery.  Those of you who know
lucene
> internals, is there anything worth doing here?
> Scott

I think the order in which the query is composed would not matter, as
Lucene already takes care of this: the order by which the
<list-of-doc-ids-of-a-term> are traversed by ConjunctionScorer (in
effect
for a boolean AND query) lets the more selective term lead the query
execution. In fact it seems to be stronger - assume an index with 3000
docs, and a query with 3 terms:
   +t1 +t2 +t3
Also assume that t1, t2, t3 is more selective (rare) for docs
[d0,d1000),
t2 more selective for [d1000,d2000) and t3 more selective in
[d2000,d3000).
The way Lucene traverses these doc-ids lists, would let t1 lead the
computation in the first range [0,1000], t2 lead it in the second range
[1000,2000), and t3 in the 3rd range.

Regards,
Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]