Results ranking on filtered multi-field query

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Results ranking on filtered multi-field query

Mike Baranczak-3
I'm building a search engine that searches multiple document fields by
default. Given a query string like "Bruce Lee", I would expect the
results list to first show the documents containing both "Bruce" and
"Lee", and then the documents which only contain one of those names.
Most of the time, this is indeed what happens, but I've noticed that in
certain circumstances, Lucene doesn't rank the results in the expected
order. Specifically, it happens when I enter a query containing
multiple words, searching multiple fields, AND try to put the results
of that through a filter.

Code example is below. Is this a bug, or am I doing something wrong?

Thanks in advance. I can provide more information, if needed.

-MB

------------------------

             String[] fields = new String[] {"title", "description",
"body"};
             IndexSearcher index = new IndexSearcher(INDEX_DIR);
             Analyzer analyzer = new StandardAnalyzer();
             String queryStr = "Bruce Lee";

             // OK
             System.out.println("\n\n - test run 0:");
             Query q0 = QueryParser.parse(queryStr, "title", analyzer);
             printResults(index.search(q0));

             // OK
             System.out.println("\n\n - test run 1:");
             Query q1 = MultiFieldQueryParser.parse(queryStr, fields,
analyzer);
             printResults(index.search(q1));

             // WRONG!
             System.out.println("\n\n - test run 2:");
             Query q2 = MultiFieldQueryParser.parse(queryStr, fields,
analyzer);
             Filter filt0 = new QueryFilter(new TermQuery(new
Term("category", "movies")));
             Query q2f = new FilteredQuery(q2, filt0);
             printResults(index.search(q2f));

             // OK
             System.out.println("\n\n - test run 3:");
             Query q3 = QueryParser.parse(queryStr, "description",
analyzer);
             Query q3f = new FilteredQuery(q3, filt0);
             printResults(index.search(q3f));

             // WRONG!
             System.out.println("\n\n - test run 4:");
             printResults(index.search(q1, filt0));


--------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Results ranking on filtered multi-field query

Chuck Williams
Mike Baranczak wrote:

> I'm building a search engine that searches multiple document fields by
> default. Given a query string like "Bruce Lee", I would expect the
> results list to first show the documents containing both "Bruce" and
> "Lee", and then the documents which only contain one of those names.
> Most of the time, this is indeed what happens, but I've noticed that
> in certain circumstances, Lucene doesn't rank the results in the
> expected order. Specifically, it happens when I enter a query
> containing multiple words, searching multiple fields, AND try to put
> the results of that through a filter.
>
> Code example is below. Is this a bug, or am I doing something wrong?

I suspect the existence of the filter is only relevant to the extent
that you are ranking a different set of results. MultiFieldQueryParser
does not in general provide the ranking you are looking for, especially
for default OR queries. I found this to be a problem as well and created
alternative classes, DistributedMultiFieldQueryParser and
MaxDisjunctionQuery, which are available here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=32674

You might check these out and see if they provide the ranking you are
looking for (I think they will). They were written for 1.4.3; they
should work in the trunk (1.9) but will get deprecation warnings. I'm
using them now in 1.9 and have newer clean versions which I'd be happy
to post if anybody else is using them (although the versions I use still
require java 1.5).

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Results ranking on filtered multi-field query

Mike Baranczak-3

On May 1, 2005, at 7:34 PM, Chuck Williams wrote:

> Mike Baranczak wrote:
>
>> I'm building a search engine that searches multiple document fields
>> by default. Given a query string like "Bruce Lee", I would expect the
>> results list to first show the documents containing both "Bruce" and
>> "Lee", and then the documents which only contain one of those names.
>> Most of the time, this is indeed what happens, but I've noticed that
>> in certain circumstances, Lucene doesn't rank the results in the
>> expected order. Specifically, it happens when I enter a query
>> containing multiple words, searching multiple fields, AND try to put
>> the results of that through a filter.
>>
>> Code example is below. Is this a bug, or am I doing something wrong?
>
> I suspect the existence of the filter is only relevant to the extent
> that you are ranking a different set of results. MultiFieldQueryParser
> does not in general provide the ranking you are looking for,
> especially for default OR queries. I found this to be a problem as
> well and created alternative classes, DistributedMultiFieldQueryParser
> and MaxDisjunctionQuery, which are available here:
> http://issues.apache.org/bugzilla/show_bug.cgi?id=32674
>
> You might check these out and see if they provide the ranking you are
> looking for (I think they will). They were written for 1.4.3; they
> should work in the trunk (1.9) but will get deprecation warnings. I'm
> using them now in 1.9 and have newer clean versions which I'd be happy
> to post if anybody else is using them (although the versions I use
> still require java 1.5).
>
> Chuck


I substituted DistributedMultiFieldQueryParser for
MultiFieldQueryParser, leaving everything else the same, and now it
works. Thanks for your help.

Any chance that these classes, or something like them, will be included
in the main distribution at some point?

-MB


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Results ranking on filtered multi-field query

Chuck Williams
Mike Baranczak wrote:

> I substituted DistributedMultiFieldQueryParser for
> MultiFieldQueryParser, leaving everything else the same, and now it
> works. Thanks for your help.
>
> Any chance that these classes, or something like them, will be
> included in the main distribution at some point?

There was a process to do an empirical evaluation of the current Lucene
query parsing and default score computation formulas (DefaultSimilarity)
vs. mine (and any others who wanted to parpticipate). Personally I think
the point was proven, but the effort seemed to lose steam. I'm only in a
position to lobby.

Glad it worked for you,

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Results ranking on filtered multi-field query

Doug Cutting
In reply to this post by Chuck Williams
Chuck Williams wrote:
> I found this to be a problem as well and created
> alternative classes, DistributedMultiFieldQueryParser and
> MaxDisjunctionQuery, which are available here:
> http://issues.apache.org/bugzilla/show_bug.cgi?id=32674
>
> You might check these out and see if they provide the ranking you are
> looking for (I think they will). They were written for 1.4.3; they
> should work in the trunk (1.9) but will get deprecation warnings.

Other changes have been made to the trunk which should improve these
results with the standard MultiFieldQueryParser.  So please also try the
SVN trunk to see if these changes already resolve your problem.

Thanks,

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...