Is Taste appropriate for...

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Is Taste appropriate for...

Otis Gospodnetic-2
Hi,

Taste is appropriate for scenarios where there are users and where these users have item preferences.  But is it appropriate for scenarios where there are no users in the game?  For example, instead of looking at item preferences of authenticated users, could you look at, say, Amazon shopping carts to figure out which books people buy together to arrive at the "People who bought Lucene in Action also bought Managing Gigabytes" recommendation?  In other words, could you simply look at item-item correlation without paying any attention to users?

Based on http://lucene.apache.org/mahout/taste.html#Item-based+Recommender I'd think this is possible, but I looked at some of the classes mentioned in the example there, and they all have references to User objects.  Does that mean item-item recommendations like in my example above are not possible with Taste?

I do see GenericItemSimilarity.ItemItemCorrelation, but even there I see references to DataModel class which references the User class.  Perhaps for item-item recommendations one can simply not use the ctors with the DataModel argument?  Is that the idea?

Also, is the idea that something (e.g. my app) calculates "correlatedness" (that float) of 2 Items and feeds that to GenericItemSimilarity.ItemItemCorrelation's ctor?  If so, what exactly does GenericItemSimilarity.ItemItemCorrelation do?  Doesn't it then simply serve the purpose of finding top N most correlated items for any given item?  Actually, I see only this:

  public double itemCorrelation(Item item1, Item item2)

So it's not really a structure that gives top N items for a given Item.  Is there a way to get Taste to do that?  If my app has to calculate how related 2 items are, then is there is a need for Taste in this purely item-item scenario?

Thanks,
Otis
P.S.
A few
The doc on the site mentions ItemCorrelation, but there is no such class.  There may be other missing classes, I didn't check closely.
I saw this in the PearsonCorrelationSimilarity:

 * <p><code>sumXY / sqrt(sumX2 * sumY2)</code></p>
 *
 * <p>where <code>size</code> is the number of {@link Item}s in the {@link DataModel}.</p>

Is that really "size"?  Or should it be sumSomething?
Reply | Threaded
Open this post in threaded view
|

Re: Is Taste appropriate for...

Sean Owen
Yes the whole framework is written in terms of "users preferring
items" but you can make your "user" and "item" anything you like. I
think naming it that way matches the conventional use case for
collaborative filtering, so helps understand what is going on better
than if I called it "things expressing preferences" and "things that
are preferred" or something.

So sure, a cart could be a "user". You could also make yours users
both "users" and "items" to create some kind of friend recommender. Up
to you and how you implement the User and Item interfaces in your
DataModel. So yes all you describe is possible.

ItemSimilarity (until recently it was called ItemCorrelation but I
renamed it) is an interface,
org.apache.mahout.cf.taste.similiarity.ItemSimilarity IIRC. Yes you
can implement this to provide some notion of item similarity. There is
an implementation based on the Pearson correlation for example that
would give you some notion of item similarity based on that metric.

GenericItemCorrelation just takes a fixed list of item-item
similarities, a hard-coded list. Maybe this is useful if you want to
feed in some precomputed set of similarities. I can say more if you
like about why you might particularly want to do this for items, in an
item-based recommender; it is not 100% symmetric with a user-based
recommender in practice.

To your last point -- yeah, pure typo. I will fix that.


On Tue, Sep 9, 2008 at 10:57 PM, Otis Gospodnetic
<[hidden email]> wrote:

> Hi,
>
> Taste is appropriate for scenarios where there are users and where these users have item preferences.  But is it appropriate for scenarios where there are no users in the game?  For example, instead of looking at item preferences of authenticated users, could you look at, say, Amazon shopping carts to figure out which books people buy together to arrive at the "People who bought Lucene in Action also bought Managing Gigabytes" recommendation?  In other words, could you simply look at item-item correlation without paying any attention to users?
>
> Based on http://lucene.apache.org/mahout/taste.html#Item-based+Recommender I'd think this is possible, but I looked at some of the classes mentioned in the example there, and they all have references to User objects.  Does that mean item-item recommendations like in my example above are not possible with Taste?
>
> I do see GenericItemSimilarity.ItemItemCorrelation, but even there I see references to DataModel class which references the User class.  Perhaps for item-item recommendations one can simply not use the ctors with the DataModel argument?  Is that the idea?
>
> Also, is the idea that something (e.g. my app) calculates "correlatedness" (that float) of 2 Items and feeds that to GenericItemSimilarity.ItemItemCorrelation's ctor?  If so, what exactly does GenericItemSimilarity.ItemItemCorrelation do?  Doesn't it then simply serve the purpose of finding top N most correlated items for any given item?  Actually, I see only this:
>
>  public double itemCorrelation(Item item1, Item item2)
>
> So it's not really a structure that gives top N items for a given Item.  Is there a way to get Taste to do that?  If my app has to calculate how related 2 items are, then is there is a need for Taste in this purely item-item scenario?
>
> Thanks,
> Otis
> P.S.
> A few
> The doc on the site mentions ItemCorrelation, but there is no such class.  There may be other missing classes, I didn't check closely.
> I saw this in the PearsonCorrelationSimilarity:
>
>  * <p><code>sumXY / sqrt(sumX2 * sumY2)</code></p>
>  *
>  * <p>where <code>size</code> is the number of {@link Item}s in the {@link DataModel}.</p>
>
> Is that really "size"?  Or should it be sumSomething?
>