DisMax, multi fields, and phrase fields

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

DisMax, multi fields, and phrase fields

Nick Hall
In my application, I have documents like:

DOCUMENT 1:
part_num: ABC123 Spark Plug
application: 2008 Toyota Corolla
application: 2007 Honda Civic

DOCUMENT 2:
part_num: FGH234 Spark Plug
application: 2007 Toyota Corolla
application: 2008 Honda Civic

The "application" field is set up to be a multi-valued field, and I am using
the DisMax request handler.

My goal is to be able to have the user search for something like:

2008 Toyota Corolla Spark Plug

and have it match Document 1 in this case. This currently works by using
DisMax and having it search both the part_num and application field.
However, this search also finds Document 2 because all the terms, "2008",
"Toyota", and "Corolla" all appear in the application fields, even though
they do not belong together in this case.

I understand that it may be hard to eliminate Document 2 from the search
results because the search has to be allowed to be a little fuzzy, but if I
check the scores of the documents, Document 1 is just barely ahead of
Document 2 in its score. I would like to figure out a way to get Document 1
to score higher in this case, since part of the query matches the phrase in
its application exactly.

I've been playing around with the phrase fields (pf) and phrase slop (ps)
parameters to try to get it to realize that "2008 Toyota Corolla" is a
phrase, in this example, and weight it higher for Document 1, but I haven't
been able to get Solr to identify this as a phrase. I've been looking at the
debug query and it will identify it as a phrase if the user only types in
something like:

2008 Toyota Corolla

but as soon as the Spark Plug terms are added, it looks like Solr is trying
to make the entire search expression into one long phrase.

Does anyone have a recommendation of how this can be done, so it can break
the search expression down and automatically make a phrase out of part of
it? Or, should I approach this whole problem from a different angle? Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: DisMax, multi fields, and phrase fields

Jan Høydahl / Cominvent
Hi,

Check out the new eDisMax handler (src) and the new pf2 parameter. Also available as path SOLR-1553.
Another option to avoid match for doc2 is to add application specific logic in your frontend which detects car brands and years and rewrite the query into a phrase or a filter.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 1. juli 2010, at 19.43, Nick Hall wrote:

> In my application, I have documents like:
>
> DOCUMENT 1:
> part_num: ABC123 Spark Plug
> application: 2008 Toyota Corolla
> application: 2007 Honda Civic
>
> DOCUMENT 2:
> part_num: FGH234 Spark Plug
> application: 2007 Toyota Corolla
> application: 2008 Honda Civic
>
> The "application" field is set up to be a multi-valued field, and I am using
> the DisMax request handler.
>
> My goal is to be able to have the user search for something like:
>
> 2008 Toyota Corolla Spark Plug
>
> and have it match Document 1 in this case. This currently works by using
> DisMax and having it search both the part_num and application field.
> However, this search also finds Document 2 because all the terms, "2008",
> "Toyota", and "Corolla" all appear in the application fields, even though
> they do not belong together in this case.
>
> I understand that it may be hard to eliminate Document 2 from the search
> results because the search has to be allowed to be a little fuzzy, but if I
> check the scores of the documents, Document 1 is just barely ahead of
> Document 2 in its score. I would like to figure out a way to get Document 1
> to score higher in this case, since part of the query matches the phrase in
> its application exactly.
>
> I've been playing around with the phrase fields (pf) and phrase slop (ps)
> parameters to try to get it to realize that "2008 Toyota Corolla" is a
> phrase, in this example, and weight it higher for Document 1, but I haven't
> been able to get Solr to identify this as a phrase. I've been looking at the
> debug query and it will identify it as a phrase if the user only types in
> something like:
>
> 2008 Toyota Corolla
>
> but as soon as the Spark Plug terms are added, it looks like Solr is trying
> to make the entire search expression into one long phrase.
>
> Does anyone have a recommendation of how this can be done, so it can break
> the search expression down and automatically make a phrase out of part of
> it? Or, should I approach this whole problem from a different angle? Thanks.