Fuzzy searching documents over multiple fields using Solr

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Fuzzy searching documents over multiple fields using Solr

britske
Not sure if this has ever come up (or perhaps even implemented without me knowing) , but I'm interested in doing Fuzzy search over multiple fields using Solr.

What I mean is the ability to returns documents based on some 'distance calculation' without documents having to match 100% to the query.

Usecase: a user is searching for a tv with a couple of filters selected. No tv matches all filters. How to come up with a bunch of suggestions that match the selected filters as closely as possible? The hard part is to determine what 'closely' means in this context, etc.

This relates to (approximate) nearest neighbor, Kd-trees, etc. Has anyone ever tried to do something similar? any plugins, etc? or reasons Solr/Lucene would/wouldn't be the correct system to build on?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy searching documents over multiple fields using Solr

Jack Krupansky-2
A simple "OR" boolean query will boost documents that have more matches. You
can also selectively boost individual OR terms to control importance. And do
and "AND" for the required terms, like "tv".

-- Jack Krupansky
-----Original Message-----
From: britske
Sent: Thursday, May 09, 2013 11:21 AM
To: [hidden email]
Subject: Fuzzy searching documents over multiple fields using Solr

Not sure if this has ever come up (or perhaps even implemented without me
knowing) , but I'm interested in doing Fuzzy search over multiple fields
using Solr.

What I mean is the ability to returns documents based on some 'distance
calculation' without documents having to match 100% to the query.

Usecase: a user is searching for a tv with a couple of filters selected. No
tv matches all filters. How to come up with a bunch of suggestions that
match the selected filters as closely as possible? The hard part is to
determine what 'closely' means in this context, etc.

This relates to (approximate) nearest neighbor, Kd-trees, etc. Has anyone
ever tried to do something similar? any plugins, etc? or reasons Solr/Lucene
would/wouldn't be the correct system to build on?

Thanks



--
View this message in context:
http://lucene.472066.n3.nabble.com/Fuzzy-searching-documents-over-multiple-fields-using-Solr-tp4061867.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy searching documents over multiple fields using Solr

britske
I didn't mention it but I'd like individual fields to contribute to the
overall score on a continuum instead of 1 (match) and 0 (no match), which
will lead to more fine-grained scoring.

A contrived example: all other things equal a tv of 40 inch should score
higher than a 38 inch tv when searching for a 42 inch tv.
This based on some distance modeling on the 'size' -field. (eg:
score(42,40) = 0.6 and score(42,38) = 0,4).
Other qualitative fields may be modeled in the same way: (e.g: restaurants
with field 'price' with values: 'budget','mid-range', 'expensive', ...)

Any way to incorporate this?



2013/5/9 Jack Krupansky <[hidden email]>

> A simple "OR" boolean query will boost documents that have more matches.
> You can also selectively boost individual OR terms to control importance.
> And do and "AND" for the required terms, like "tv".
>
> -- Jack Krupansky
> -----Original Message----- From: britske
> Sent: Thursday, May 09, 2013 11:21 AM
> To: [hidden email]
> Subject: Fuzzy searching documents over multiple fields using Solr
>
>
> Not sure if this has ever come up (or perhaps even implemented without me
> knowing) , but I'm interested in doing Fuzzy search over multiple fields
> using Solr.
>
> What I mean is the ability to returns documents based on some 'distance
> calculation' without documents having to match 100% to the query.
>
> Usecase: a user is searching for a tv with a couple of filters selected. No
> tv matches all filters. How to come up with a bunch of suggestions that
> match the selected filters as closely as possible? The hard part is to
> determine what 'closely' means in this context, etc.
>
> This relates to (approximate) nearest neighbor, Kd-trees, etc. Has anyone
> ever tried to do something similar? any plugins, etc? or reasons
> Solr/Lucene
> would/wouldn't be the correct system to build on?
>
> Thanks
>
>
>
> --
> View this message in context: http://lucene.472066.n3.**
> nabble.com/Fuzzy-searching-**documents-over-multiple-**
> fields-using-Solr-tp4061867.**html<http://lucene.472066.n3.nabble.com/Fuzzy-searching-documents-over-multiple-fields-using-Solr-tp4061867.html>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy searching documents over multiple fields using Solr

Jack Krupansky-2
You can use function queries to boost documents as well. Sorry, but it can
get messy to figure out.

See:
http://wiki.apache.org/solr/FunctionQuery

See also the edismax "bf" parameter:
http://wiki.apache.org/solr/ExtendedDisMax#bf_.28Boost_Function.2C_additive.29

-- Jack Krupansky

-----Original Message-----
From: Geert-Jan Brits
Sent: Thursday, May 09, 2013 12:32 PM
To: [hidden email]
Subject: Re: Fuzzy searching documents over multiple fields using Solr

I didn't mention it but I'd like individual fields to contribute to the
overall score on a continuum instead of 1 (match) and 0 (no match), which
will lead to more fine-grained scoring.

A contrived example: all other things equal a tv of 40 inch should score
higher than a 38 inch tv when searching for a 42 inch tv.
This based on some distance modeling on the 'size' -field. (eg:
score(42,40) = 0.6 and score(42,38) = 0,4).
Other qualitative fields may be modeled in the same way: (e.g: restaurants
with field 'price' with values: 'budget','mid-range', 'expensive', ...)

Any way to incorporate this?



2013/5/9 Jack Krupansky <[hidden email]>

> A simple "OR" boolean query will boost documents that have more matches.
> You can also selectively boost individual OR terms to control importance.
> And do and "AND" for the required terms, like "tv".
>
> -- Jack Krupansky
> -----Original Message----- From: britske
> Sent: Thursday, May 09, 2013 11:21 AM
> To: [hidden email]
> Subject: Fuzzy searching documents over multiple fields using Solr
>
>
> Not sure if this has ever come up (or perhaps even implemented without me
> knowing) , but I'm interested in doing Fuzzy search over multiple fields
> using Solr.
>
> What I mean is the ability to returns documents based on some 'distance
> calculation' without documents having to match 100% to the query.
>
> Usecase: a user is searching for a tv with a couple of filters selected.
> No
> tv matches all filters. How to come up with a bunch of suggestions that
> match the selected filters as closely as possible? The hard part is to
> determine what 'closely' means in this context, etc.
>
> This relates to (approximate) nearest neighbor, Kd-trees, etc. Has anyone
> ever tried to do something similar? any plugins, etc? or reasons
> Solr/Lucene
> would/wouldn't be the correct system to build on?
>
> Thanks
>
>
>
> --
> View this message in context: http://lucene.472066.n3.**
> nabble.com/Fuzzy-searching-**documents-over-multiple-**
> fields-using-Solr-tp4061867.**html<http://lucene.472066.n3.nabble.com/Fuzzy-searching-documents-over-multiple-fields-using-Solr-tp4061867.html>
> Sent from the Solr - User mailing list archive at Nabble.com.
>