How to do a fuzzy query

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to do a fuzzy query

Jack L
Hello solr-user,

I have some documents, each has a number of tags. I'd like to
have a query to return "similar" documents which share largest
number of tags with a given document. For example, if I have
doc that has 4 tags, and I'd like to return docs that also
have these 4 tags. And if this doesn't make up a number of records,
say, 10 records, I'd like to have some more docs that share 3
of the tags. And if this is still not enough, those sharing 2
tags... I wonder if there is a way to construct a solr query
to do this?

--
Best regards,
Jack

Reply | Threaded
Open this post in threaded view
|

Re: How to do a fuzzy query

Erik Hatcher

On Jun 23, 2007, at 11:24 PM, Jack L wrote:
> I have some documents, each has a number of tags. I'd like to
> have a query to return "similar" documents which share largest
> number of tags with a given document. For example, if I have
> doc that has 4 tags, and I'd like to return docs that also
> have these 4 tags. And if this doesn't make up a number of records,
> say, 10 records, I'd like to have some more docs that share 3
> of the tags. And if this is still not enough, those sharing 2
> tags... I wonder if there is a way to construct a solr query
> to do this?

The MoreLikeThis handler may get you close enough to what you're after:

        <http://wiki.apache.org/solr/MoreLikeThis>

It's in the trunk codebase, so try out a recent nightly build.

        Erik


Reply | Threaded
Open this post in threaded view
|

Re: How to do a fuzzy query

Chris Hostetter-3
In reply to this post by Jack L
: I have some documents, each has a number of tags. I'd like to
: have a query to return "similar" documents which share largest
: number of tags with a given document. For example, if I have
: doc that has 4 tags, and I'd like to return docs that also
: have these 4 tags. And if this doesn't make up a number of records,
: say, 10 records, I'd like to have some more docs that share 3

if by "tags": you mean in the web folksonomy sense then assuming:
  1) your tag field doesn't contain any duplicate tags per doc
  2) you omitNorms="true" on your tags field

...a generic search on all of hte tag names you are interested in should
be almost exctly what you asked for ... the one difference being that
Lucene by defualt weights terms that are infrequent in your index more
then terms that are frequent .. so doc A matching on 3 tags might score
higher then doc B matching on 4 tags if the tags A matches on are really
rare tags that not a lot of docs match on but the tags B matches on are
really REALLY common.

...not exactly what you asked about, but probably something that you'll
appreciate hving once you see it in action.




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re[2]: How to do a fuzzy query

Jack L
In reply to this post by Erik Hatcher
Hello Erik,

MoreLikeThis is interesting. So in order to use it through the
MoreLikeThisHandler, I should use the unique field in
the "q" param to uniquely identify the "this" document? Or, does
it also support a more common query and works as "More Like These"
just like using this feature through the StandardRequestHandler?

--
Best regards,
Jack

Sunday, June 24, 2007, 3:04:26 AM, you wrote:


> On Jun 23, 2007, at 11:24 PM, Jack L wrote:
>> I have some documents, each has a number of tags. I'd like to
>> have a query to return "similar" documents which share largest
>> number of tags with a given document. For example, if I have
>> doc that has 4 tags, and I'd like to return docs that also
>> have these 4 tags. And if this doesn't make up a number of records,
>> say, 10 records, I'd like to have some more docs that share 3
>> of the tags. And if this is still not enough, those sharing 2
>> tags... I wonder if there is a way to construct a solr query
>> to do this?

> The MoreLikeThis handler may get you close enough to what you're after:

> <http://wiki.apache.org/solr/MoreLikeThis>

> It's in the trunk codebase, so try out a recent nightly build.

> Erik


Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: How to do a fuzzy query

Erik Hatcher

On Jun 25, 2007, at 3:43 PM, Jack L wrote:
> MoreLikeThis is interesting. So in order to use it through the
> MoreLikeThisHandler, I should use the unique field in
> the "q" param to uniquely identify the "this" document? Or, does
> it also support a more common query and works as "More Like These"
> just like using this feature through the StandardRequestHandler?


Jack - I think all of your questions are answered on the wiki page  
with the examples.  The "q" parameter is a full Solr query parser  
expression, so you can pick one or more documents with it.

        Erik


>
> --
> Best regards,
> Jack
>
> Sunday, June 24, 2007, 3:04:26 AM, you wrote:
>
>
>> On Jun 23, 2007, at 11:24 PM, Jack L wrote:
>>> I have some documents, each has a number of tags. I'd like to
>>> have a query to return "similar" documents which share largest
>>> number of tags with a given document. For example, if I have
>>> doc that has 4 tags, and I'd like to return docs that also
>>> have these 4 tags. And if this doesn't make up a number of records,
>>> say, 10 records, I'd like to have some more docs that share 3
>>> of the tags. And if this is still not enough, those sharing 2
>>> tags... I wonder if there is a way to construct a solr query
>>> to do this?
>
>> The MoreLikeThis handler may get you close enough to what you're  
>> after:
>
>> <http://wiki.apache.org/solr/MoreLikeThis>
>
>> It's in the trunk codebase, so try out a recent nightly build.
>
>> Erik
>