[jira] Commented: (MAHOUT-4) Simple prototype for Expectation Maximization (EM)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-4) Simple prototype for Expectation Maximization (EM)

Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/MAHOUT-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576885#action_12576885 ]

Isabel Drost commented on MAHOUT-4:
-----------------------------------

Your plan of first trying to understand the non-distributed version and then map-reducing the algorithm sounds great :) Some comments from my point of view:

Maybe you might want to chose more verbose variable names than u, s and z and provide the mapping to the names used in the paper in a comment. Should make it easier for the reader of your code to distinguish users, stories and clusters (z).

I think you might want to inline() the initialize method. For me personally this makes it easier to follow what is done in the constructors. As for the default constructor, you could simply delegate initialization to PLSI_engine(u, s, z) by giving the default values for initialization.

Concerning the method calculate P_z_u_s - how many cluster numbers do you expect? It seems like this computation could become numerically unstable in case of very large numbers of clusters.

It would be nice if you could provide some unit tests to prove that your code is working correctly.

I know EM as a rather general principle - your implementation seems rather focussed on the setup of the google news clustering solution. I was wondering, whether it would be possible to generalize the implementation a little but still support the new personalization use case? Maybe others would like to reuse a general EM framework but not the exact same formulas that you used. Don't know whether that is possible and whether it can be done in a way that is easy to read....

> Simple prototype for Expectation Maximization (EM)
> --------------------------------------------------
>
>                 Key: MAHOUT-4
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-4
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ankur
>         Attachments: Mahout_EM.patch
>
>
> Create a simple prototype implementing Expectation Maximization - EM that demonstrates the algorithm functionality given a set of (user, click-url) data.
> The prototype should be functionally complete and should serve as a basis for the Map-Reduce version of the EM algorithm.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.