Correct of Query.combine() bugs with new MultiSearcher

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Correct of Query.combine() bugs with new MultiSearcher

Chuck Williams
As noted in the patch description I just submitted, it should be a
complete, correct, robust (relative to possible user Query
implementations) and reasonably optimal solution for Query.combine().  
It also simplifies the previous methods, deleting all overrides of
Query.combine() and Query.mergeBooleanQueries().  The current
implementation fails to account for queries that rewrite into different
primitive types on different sub-searchers and fails to account for the
fact that the rewritten query type of the first sub-searcher is nothing
special.  The current solution looks at all rewritten subsearcher
queries as a whole and computes the (reasonably) best single query to
distribute.  This patch is slightly better than what I sent via email
last night:
  1.  It's a patch that can be applied in the usual way
  2.  It handles the missing optimization cases I noted in last night's
email
  3.  It fixes potential bugs that would not arise with Lucene's query
types but could arise with user-written queries (e.g., user queries that
rewrite differently in arbitrary ways for the different sub-serarchers).

Doug and Wolf, please review the patch.  All tests pass.

Thanks,

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Correct of Query.combine() bugs with new MultiSearcher

Erik Hatcher
I've confirmed Chuck's patch does fix the Highlighter test.  I'm set to
commit it once it gets the thumbs-up from Doug.

        Erik

On Apr 26, 2005, at 4:58 PM, Chuck Williams wrote:

> As noted in the patch description I just submitted, it should be a
> complete, correct, robust (relative to possible user Query
> implementations) and reasonably optimal solution for Query.combine().  
> It also simplifies the previous methods, deleting all overrides of
> Query.combine() and Query.mergeBooleanQueries().  The current
> implementation fails to account for queries that rewrite into
> different primitive types on different sub-searchers and fails to
> account for the fact that the rewritten query type of the first
> sub-searcher is nothing special.  The current solution looks at all
> rewritten subsearcher queries as a whole and computes the (reasonably)
> best single query to distribute.  This patch is slightly better than
> what I sent via email last night:
>  1.  It's a patch that can be applied in the usual way
>  2.  It handles the missing optimization cases I noted in last night's
> email
>  3.  It fixes potential bugs that would not arise with Lucene's query
> types but could arise with user-written queries (e.g., user queries
> that rewrite differently in arbitrary ways for the different
> sub-serarchers).
>
> Doug and Wolf, please review the patch.  All tests pass.
>
> Thanks,
>
> Chuck
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Correct of Query.combine() bugs with new MultiSearcher

Chuck Williams
Thanks Erik.  If you don't here more, I'm sure this fixes a whole class
of problems and is better than the previous situation.  I'm also
confident that it will do the right thing for all the query types built
into Lucene.  My remaining uncertainty concerns whether user query types
might somehow cause a problem, and in that regard the pre-patch
implementation might throw an exception when the patch will try to do
something that should work (this occurs when the new query types do
non-trivial rewrites, sometimes rewriting into built-in query types).  
Such user query types that need to work with MultiSearcher might need to
provide their own combine() method or a patch to this one.  That was
true before as well.

My message however below however is pretty poorly written!  
Clarifications to make it intelligible follow:

Erik Hatcher wrote:

> I've confirmed Chuck's patch does fix the Highlighter test.  I'm set
> to commit it once it gets the thumbs-up from Doug.
>
>     Erik
>
> On Apr 26, 2005, at 4:58 PM, Chuck Williams wrote:
>
>> As noted in the patch description I just submitted, it should be a
>> complete, correct, robust (relative to possible user Query
>> implementations) and reasonably optimal solution for
>> Query.combine().  It also simplifies the previous methods, deleting
>> all overrides of Query.combine() and Query.mergeBooleanQueries().  
>> The current implementation fails to account for queries that rewrite
>> into different primitive types on different
>
"current implentation" = before this patch

>> sub-searchers and fails to account for the fact that the rewritten
>> query type of the first sub-searcher is nothing special.  The current
>> solution
>
"current solution" = this patch

>> looks at all rewritten subsearcher queries as a whole and computes
>> the (reasonably) best single query to distribute.  This patch is
>> slightly better than what I sent via email last night:
>>  1.  It's a patch that can be applied in the usual way
>>  2.  It handles the missing optimization cases I noted in last
>> night's email
>>  3.  It fixes potential bugs that would not arise with Lucene's query
>> types but could arise with user-written queries (e.g., user queries
>> that rewrite differently in arbitrary ways for the different
>> sub-serarchers).
>>
>> Doug and Wolf, please review the patch.  All tests pass.
>>
>> Thanks,
>>
>> Chuck
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


--
*Chuck Williams*
All Things Local
Founder and CEO
V: (415)464-1889
C: (415)846-9018
[hidden email] <mailto:[hidden email]>
AIM: hawimanawiz

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Correct of Query.combine() bugs with new MultiSearcher

Wolf Siberski
Chuck Williams wrote:
> Thanks Erik.  If you don't here more, I'm sure this fixes a whole class
> of problems and is better than the previous situation.  I'm also
> confident that it will do the right thing for all the query types built
> into Lucene.  
To me Chucks patch also looks very good.

In parallel to his activity I had rewritten TestMultiSearcherRanking and
added more sample queries to get a better insight into the issue. These
tests also check if rewriting yields the same query for MultiSearcher as
for the single IndexSearcher case. Chucks modifications work fine for these
new tests too, when a small optimization is added: if there is only one
sub-query, then return that instead of wrapping it into a BooleanQuery.
An extended/modified patch is attached to the Bugzilla issue.

More test queries, especially 'weird' cases, would be great.
I'm not very familiar with all query subclasses and their features,
so if someone could add interesting test queries or point me to suitable
queries in other test classes, that would be very much appreciated.

--Wolf


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...