[jira] Commented: (MAHOUT-4) Simple prototype for Expectation Maximization (EM)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-4) Simple prototype for Expectation Maximization (EM)

Shalin Shekhar Mangar (Jira)

    [ https://issues.apache.org/jira/browse/MAHOUT-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576890#action_12576890 ]

Ankur commented on MAHOUT-4:
----------------------------

Thanks for your comment. A few of my replies below:-
> Maybe you might ..
Will make these changes in the next patch update.

> ... - how many cluster numbers do you expect ...?
Well typically I would expect a user:cluster ratio of 1000:1. So for 1 million users, 1000 clusters would be created.

In main method, a sample user-story matrix is provided which can be changed to experiment. However if required I can write a small unit test case to produce randomnly generated user-story matrix but am not sure if that will help better.

> I know EM as ...
I like the idea of general EM framework. Will definitely try to change the code so that it reflect EM more generically as suggested.



> Simple prototype for Expectation Maximization (EM)
> --------------------------------------------------
>
>                 Key: MAHOUT-4
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-4
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ankur
>         Attachments: Mahout_EM.patch
>
>
> Create a simple prototype implementing Expectation Maximization - EM that demonstrates the algorithm functionality given a set of (user, click-url) data.
> The prototype should be functionally complete and should serve as a basis for the Map-Reduce version of the EM algorithm.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (MAHOUT-4) Simple prototype for Expectation Maximization (EM)

Ted Dunning-3

Ankur,

You might like to take a quick look at the following two papers which
provide a strong extension to PLSI,

www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf
cosco.hiit.fi/Articles/buntineBohinj.pdf

The Buntine/Jakulin paper especially provides a relatively simple algorithm
that has significant advantages over simple pLSI and which would be quite
amenable to parallelization in the style of your EM work.


On 3/10/08 1:01 AM, "Ankur (JIRA)" <[hidden email]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-4?page=com.atlassian.jira.plugin.
> system.issuetabpanels:comment-tabpanel&focusedCommentId=12576890#action_125768
> 90 ]
>
> Ankur commented on MAHOUT-4:
> ----------------------------
>
> Thanks for your comment. A few of my replies below:-
>> Maybe you might ..
> Will make these changes in the next patch update.
>
>> ... - how many cluster numbers do you expect ...?
> Well typically I would expect a user:cluster ratio of 1000:1. So for 1 million
> users, 1000 clusters would be created.
>
> In main method, a sample user-story matrix is provided which can be changed to
> experiment. However if required I can write a small unit test case to produce
> randomnly generated user-story matrix but am not sure if that will help
> better.
>
>> I know EM as ...
> I like the idea of general EM framework. Will definitely try to change the
> code so that it reflect EM more generically as suggested.
>
>
>
>> Simple prototype for Expectation Maximization (EM)
>> --------------------------------------------------
>>
>>                 Key: MAHOUT-4
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-4
>>             Project: Mahout
>>          Issue Type: New Feature
>>            Reporter: Ankur
>>         Attachments: Mahout_EM.patch
>>
>>
>> Create a simple prototype implementing Expectation Maximization - EM that
>> demonstrates the algorithm functionality given a set of (user, click-url)
>> data.
>> The prototype should be functionally complete and should serve as a basis for
>> the Map-Reduce version of the EM algorithm.

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (MAHOUT-4) Simple prototype for Expectation Maximization (EM)

Jason Rennie-2
On Mon, Mar 10, 2008 at 11:31 AM, Ted Dunning <[hidden email]> wrote:

> www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf<http://www.cs.princeton.edu/%7Eblei/papers/BleiNgJordan2003.pdf>
> cosco.hiit.fi/Articles/buntineBohinj.pdf


Ah... music to my ears.  Ted, have I met you at a NIPS conference?

Jason
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (MAHOUT-4) Simple prototype for Expectation Maximization (EM)

Isabel Drost-3
In reply to this post by Ted Dunning-3
On Monday 10 March 2008, Ted Dunning wrote:
> The Buntine/Jakulin paper especially provides a relatively simple algorithm
> that has significant advantages over simple pLSI and which would be quite
> amenable to parallelization in the style of your EM work.

During the past week I had the chance to have a quick look at the
Buntine/Jakulin Paper myself - it looks really interesting for us. I
especially like the analysis of the related algorithms focussed on their
applicability to text data.

Thanks for the pointers,
Isabel


--
The truth you speak has no past and no future.  It is, and that's all it needs
to be.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (MAHOUT-4) Simple prototype for Expectation Maximization (EM)

Isabel Drost-3
In reply to this post by Shalin Shekhar Mangar (Jira)

Hello Ankur,

On Monday 10 March 2008, Ankur (JIRA) wrote:
> Ankur commented on MAHOUT-4:
> ----------------------------
> Thanks for your comment. A few of my replies below:-

did you have a chance to work on the patch again since your last comment? Do
you have any further questions? Or is real life simply eating up your time?

Isabel

--
Weinberg's First Law: Progress is only made on alternate Fridays.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (MAHOUT-4) Simple prototype for Expectation Maximization (EM)

Ted Dunning-3
In reply to this post by Isabel Drost-3

Excellent.


On 3/28/08 11:24 AM, "Isabel Drost" <[hidden email]> wrote:

> On Monday 10 March 2008, Ted Dunning wrote:
>> The Buntine/Jakulin paper especially provides a relatively simple algorithm
>> that has significant advantages over simple pLSI and which would be quite
>> amenable to parallelization in the style of your EM work.
>
> During the past week I had the chance to have a quick look at the
> Buntine/Jakulin Paper myself - it looks really interesting for us. I
> especially like the analysis of the related algorithms focussed on their
> applicability to text data.
>
> Thanks for the pointers,
> Isabel
>