[jira] Created: (MAHOUT-340) org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (MAHOUT-340) org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id

JIRA jira@apache.org
org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id
-----------------------------------------------------------------------------------------

                 Key: MAHOUT-340
                 URL: https://issues.apache.org/jira/browse/MAHOUT-340
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
    Affects Versions: 0.3
            Reporter: Hui Wen Han
             Fix For: 0.4


I have preferences data using long as user_id and item_id,
hadoop cooccurence arithmetic  can not support it

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-340) org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/MAHOUT-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847268#action_12847268 ]

Sean Owen commented on MAHOUT-340:
----------------------------------

This is exactly the input format that the entire library uses. What problem are you having? You would need to provide a lot more detail.

> org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id
> -----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-340
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-340
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.3
>            Reporter: Hui Wen Han
>             Fix For: 0.4
>
>
> I have preferences data using long as user_id and item_id,
> hadoop cooccurence arithmetic  can not support it

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-340) org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/MAHOUT-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849671#action_12849671 ]

Hui Wen Han commented on MAHOUT-340:
------------------------------------

I used long type as the user_id and item_id ,
it can not parse the input format when run the ItemBigramGenerator  job.


> org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id
> -----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-340
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-340
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.3
>            Reporter: Hui Wen Han
>             Fix For: 0.4
>
>
> I have preferences data using long as user_id and item_id,
> hadoop cooccurence arithmetic  can not support it

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (MAHOUT-340) org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/MAHOUT-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-340.
------------------------------

    Resolution: Duplicate

Try the implementation in org.apache.mahout.cf.taste.hadoop.item instead, which reads longs and matches the rest of the framework a little more. The two implementations are being merged into the .item implementation.

I'm marking this as a sort of 'duplicate' of that task, to merge the implementations, since i don't think this implementation will otherwise be updated (?)

> org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id
> -----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-340
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-340
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.3
>            Reporter: Hui Wen Han
>             Fix For: 0.4
>
>
> I have preferences data using long as user_id and item_id,
> hadoop cooccurence arithmetic  can not support it

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-340) org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/MAHOUT-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849676#action_12849676 ]

Hui Wen Han commented on MAHOUT-340:
------------------------------------

Thanks for your quick response, I have simulated the cooccurence and make new one that support long for our project.

do you thinks which one can get better performance ? cooccurence or item?

> org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id
> -----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-340
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-340
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.3
>            Reporter: Hui Wen Han
>             Fix For: 0.4
>
>
> I have preferences data using long as user_id and item_id,
> hadoop cooccurence arithmetic  can not support it

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-340) org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/MAHOUT-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849679#action_12849679 ]

Sean Owen commented on MAHOUT-340:
----------------------------------

That is a good question -- the two implement the same algorithm, but 'cooccurrence' tries to distribute the matrix - user vector multiplication, while 'item' does not. It's not yet clear what's better. You could adapt either one's approach to completing this multiplication.

The 'item' handles long IDs as inputs. To do this, you need to create a long <-> int mapping between the original long IDs, and the dimensions in the vector or matrix they map to -- which must be ints. It collects this information and reverses the transformation later. For this reason, if you need long IDs, you may find it more natural to adapt 'item' since it handles this issue.

> org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id
> -----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-340
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-340
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.3
>            Reporter: Hui Wen Han
>             Fix For: 0.4
>
>
> I have preferences data using long as user_id and item_id,
> hadoop cooccurence arithmetic  can not support it

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-340) org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/MAHOUT-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849683#action_12849683 ]

Hui Wen Han commented on MAHOUT-340:
------------------------------------

I just replaced all related int type with long type, it works fine and the performance is very good.

I has another question :
the original data has 21,545 users  totally and has about 640,000 items,
it can only  generate 153,942 recommendations for 2,694 users,
many users has no recommendations generated

> org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id
> -----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-340
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-340
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.3
>            Reporter: Hui Wen Han
>             Fix For: 0.4
>
>
> I have preferences data using long as user_id and item_id,
> hadoop cooccurence arithmetic  can not support it

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-340) org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/MAHOUT-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849684#action_12849684 ]

Sean Owen commented on MAHOUT-340:
----------------------------------

I'm not sure what your final implementation looks like, but be careful about moving to longs from ints. It's not a one-line change. If you're just parsing longs, then casting them down to ints to use as dimensions in a vector or matrix, it won't work correctly at all. You'll be truncating long IDs to ints, and then trying to interpret them as long IDs later, but they won't be valid IDs.

Is that what you did? then I could imagine recommendations being all wrong.

If you have long IDs, you will need the steps you see in the 'item' implementation. In particular you need the step that generates and saves the long <-> int mappings.

> org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id
> -----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-340
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-340
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.3
>            Reporter: Hui Wen Han
>             Fix For: 0.4
>
>
> I have preferences data using long as user_id and item_id,
> hadoop cooccurence arithmetic  can not support it

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-340) org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/MAHOUT-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849689#action_12849689 ]

Hui Wen Han commented on MAHOUT-340:
------------------------------------

Thanks your advice, I get your mean, you are right.
I replaced all related int to long,
I will compare the result using item and using  the reformed one.


> org.apache.mahout.cf.taste.hadoop.cooccurence can not support long as user_id and item_id
> -----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-340
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-340
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.3
>            Reporter: Hui Wen Han
>             Fix For: 0.4
>
>
> I have preferences data using long as user_id and item_id,
> hadoop cooccurence arithmetic  can not support it

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.