Tags and Folksonomies

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Tags and Folksonomies

nchandra
Suppose I have content which has title and description. Users can tag content
and search content based on tag, title and description. Tag has more
weightage.

Any inputs on how indexing and retrieval will work given there is content
and tags using Solr? Has anyone implemented search based on collaborative
tagging?

Thanks,
Nishant
Reply | Threaded
Open this post in threaded view
|

Re: Tags and Folksonomies

Richard Noble
Hi

I have not actually done this yet, but will need to do something similar.
We will also be using user tagging, and ratings to influence relevancy for
the searches.

I take it that you want something like if a document has been tagged 8
times with the tag "tagvalue"
but only 4 times with the tag "othervalue" then you want to boost rate the
tag tagvalue higher?

The route I plan to go down would be to store the tag value count against
the document, and
use a (possibly custom) function to boost accordingly.

Just a theory at this point, and I am sure that there may be better ways.

Hope it helps

Richard


On Fri, Mar 23, 2012 at 5:44 PM, Nishant Chandra
<[hidden email]>wrote:

> Suppose I have content which has title and description. Users can tag
> content
> and search content based on tag, title and description. Tag has more
> weightage.
>
> Any inputs on how indexing and retrieval will work given there is content
> and tags using Solr? Has anyone implemented search based on collaborative
> tagging?
>
> Thanks,
> Nishant
>



--
*nix has users, Mac has fans, Windows has victims.
Reply | Threaded
Open this post in threaded view
|

Re: Tags and Folksonomies

Chris Hostetter-3
In reply to this post by nchandra

: Suppose I have content which has title and description. Users can tag content
: and search content based on tag, title and description. Tag has more
: weightage.
:
: Any inputs on how indexing and retrieval will work given there is content
: and tags using Solr? Has anyone implemented search based on collaborative
: tagging?

simple stuff would be to have your 3 fields, and search them with a
weighted boosting -- giving more importance to the tag field.

where things get more complicated is when you want docA to score
higher for hte query "boat" then docB because 100 users have taged docA
with boat, but only 5 users have taged docB "boat"

The canonical way to deal with this would be using payloads to boost the
weight of a term -- the DelimitedPayloadTokenFilterFactory can help with
this at index time, but off the top of my head i don't think any of the
existing Solr QParsers will build the neccessary PayloadTermQuery, so you
might have to roll your own -- there are afew Jira issues with patches
that you might be able to re-use or get inspired from...

https://issues.apache.org/jira/browse/SOLR-1485




-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: Tags and Folksonomies

Ravish Bhagdev
Hi Hoss,

I am not sure why you suggest Payload for ranking documents with more
frequent tags above those with fewer tags.  Wont the term frequency part of
relevancy score ensure this by default?  If you make tags a 'lowercase'
field (with full value tokenisation), the frequency of tags in multivalued
field should improve score for doc A in below scenario?

Payloads, I thought would be more useful when you want some tags in a
record to be weighted more than others?  Or have I missed some point maybe.

Thanks,
Rav

On Tue, Apr 3, 2012 at 1:02 AM, Chris Hostetter <[hidden email]>wrote:

>
> : Suppose I have content which has title and description. Users can tag
> content
> : and search content based on tag, title and description. Tag has more
> : weightage.
> :
> : Any inputs on how indexing and retrieval will work given there is content
> : and tags using Solr? Has anyone implemented search based on collaborative
> : tagging?
>
> simple stuff would be to have your 3 fields, and search them with a
> weighted boosting -- giving more importance to the tag field.
>
> where things get more complicated is when you want docA to score
> higher for hte query "boat" then docB because 100 users have taged docA
> with boat, but only 5 users have taged docB "boat"
>
> The canonical way to deal with this would be using payloads to boost the
> weight of a term -- the DelimitedPayloadTokenFilterFactory can help with
> this at index time, but off the top of my head i don't think any of the
> existing Solr QParsers will build the neccessary PayloadTermQuery, so you
> might have to roll your own -- there are afew Jira issues with patches
> that you might be able to re-use or get inspired from...
>
> https://issues.apache.org/jira/browse/SOLR-1485
>
>
>
>
> -Hoss
>
Reply | Threaded
Open this post in threaded view
|

Re: Tags and Folksonomies

Chris Hostetter-3

: I am not sure why you suggest Payload for ranking documents with more
: frequent tags above those with fewer tags.  Wont the term frequency part of
: relevancy score ensure this by default?  If you make tags a 'lowercase'

Sorry, yes ... absolutely - if you use omitNormws=false on the tags
field, and add these two docs...

  { id: doc1; tags: [house, house, house, boat] }
  { id: doc2; tags: [house, boat, car, vegas] }

...then doc1 will score higher on a query for "tags:house.

my suggestion to use payloads was because sending the same value many many
times (ie: if 100,000 users apply the tag "house" you would need to index
that doc with the word "house" repeated 100,000 times) can be prohibitive.


-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: Tags and Folksonomies

Ravish Bhagdev
OK, yes that's true.  Although I'd expect term vectors to just increment
term count when a tag is re-applied (if you have term vectors enabled),
increasing a boost stored as a payload with each tag, each time an existing
tag is re-tagged maybe a more sensible approach if this is the case.
 You'll still have to rewrite the whole record for this though as its not
possible to 'update' a specific field value in Solr for efficiency reasons.

Rav

On Tue, Apr 3, 2012 at 4:50 PM, Chris Hostetter <[hidden email]>wrote:

>
> : I am not sure why you suggest Payload for ranking documents with more
> : frequent tags above those with fewer tags.  Wont the term frequency part
> of
> : relevancy score ensure this by default?  If you make tags a 'lowercase'
>
> Sorry, yes ... absolutely - if you use omitNormws=false on the tags
> field, and add these two docs...
>
>  { id: doc1; tags: [house, house, house, boat] }
>  { id: doc2; tags: [house, boat, car, vegas] }
>
> ...then doc1 will score higher on a query for "tags:house.
>
> my suggestion to use payloads was because sending the same value many many
> times (ie: if 100,000 users apply the tag "house" you would need to index
> that doc with the word "house" repeated 100,000 times) can be prohibitive.
>
>
> -Hoss
>