[jira] Commented: (MAHOUT-4) Simple prototype for Expectation Maximization (EM)

Tim Allison (Jira)

Ankur commented on MAHOUT-4:

Hi Isabel,
                 The algorithm sure can be ported to a Map-reduce setting on Hadoop. In-fact the algorithm has already been map-reduced as mentioned in the Google new personalization paper (Please see the Javadoc for details).

I wrote the non-distributed version of the algorithm to help myself understand, visualize and see the EM algorithm in action starting with a very small dataset. The iterative logic and small dataset particularly helps in seeing how probability values of user and items belonging to a cluster converge for  users sharing large number of common items.

I also have a fair idea of how to Map-reduce it. Once the prototype is accepted suggesting features/changes that would be desirable in the map-reduce implementation, It shouldn't take me long to contribute the distributed version.

> Create a simple prototype implementing Expectation Maximization - EM that demonstrates the algorithm functionality given a set of (user, click-url) data.
> The prototype should be functionally complete and should serve as a basis for the Map-Reduce version of the EM algorithm.

