Distributed User-based Collaborative Filtering

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Distributed User-based Collaborative Filtering

Kris Jack
This post has NOT been accepted by the mailing list yet.

I recently implemented a distributed user-based collaborative filtering algorithm.  I've tested it experimentally and found that it is better suited to Mendeley's data set for generating recommendations than the item-based implementation (http://www.slideshare.net/KrisJack/mahout-becomes-a-researcher-large-scale-recommendations-at-mendeley).  This is mostly because Mendeley's data set has far more items than users.

I'd like to contribute this code to the Mahout project.  This will be the first patch that I write for Mahout so I'm following the instructions at https://cwiki.apache.org/MAHOUT/how-to-contribute.html

In brief, so far I've taken the code for the existing org.apache.mahout.cf.taste.hadoop.item.RecommenderJob and created a new org.apache.mahout.cf.taste.hadoop.user.RecommenderJob.  With help from Sean Owen, I followed a similar approach to the item-based implementation, but multiplied a user-user matrix with a user-item vector rather than an item-item matrix with an item-user vector.  The result of the multiplication then needs to be transposed in order to output recommendations by user id.

Rather than changing the item-based code, I've created new classes for the user-based version, which tend to be modified versions of the originals.  It would be much tidier to merge these together, where possible, and to parametrise them.  I didn't want to change the item-based code straight off, however, without consulting you all.

Would be great to get some feedback.