Re : FYI Cloud Computing Resources

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re : FYI Cloud Computing Resources

deneche abdelhakim
I came across the following competition

http://www.netflixprize.com/index


It's about recommender systems, so I think it's a Taste stuff. The training dataset consists of more than 100M ratings.


----- Message d'origine ----
De : Josh Myer <[hidden email]>
À : [hidden email]
Envoyé le : Mercredi, 30 Juillet 2008, 18h19mn 25s
Objet : Re: FYI Cloud Computing Resources

On Wed, Jul 30, 2008 at 11:26:29AM -0400, Grant Ingersoll wrote:
> http://research.yahoo.com/node/2328
>
> It _MAY_ (stressed, emphasized, etc.) be possible for Mahouters (or  
> are we just Mahouts?) to get some access to these resources.  One big  
> question is where can we get some fairly large data sets (large, but  
> not super large, I think, but am not sure)
>
> If you have ideas, etc. please let us know.
>

It's worth plugging (theinfo), http://theinfo.org/.  It's a project to
collect references to datasets, and may help here.  Unfortunately, it
seems to be laggy at the moment.  I'll poke Aaron about that =)

HtH,
--
Josh Myer
[hidden email]




Reply | Threaded
Open this post in threaded view
|

Re: Re : FYI Cloud Computing Resources

Sean Owen
Yeah it's almost over unfortunately. :) I tried this a while ago with
a slope-one recommender, and was only about able to match Netflix's
current performance. I published some support code for people who
wanted to play with it but removed it from Mahout's copy as legacy
code.

I didn't really have time to investigate more. Some of the insights
that have fallen out from the competition are pretty great. For
example: one person took advantage of a sort of "memory effect" for
recommendations.... people tend to at times over-rate movies and at
times under-rate movies. So if you kind of correct for this -- that a
sequence of 5-star ratings may not be as meaningful as a 5-star rating
in the middle of several 2-star ratings, you get much better
performance.

This nugget of knowledge may be specific to Netflix, not sure. But it
was interesting.

On Wed, Sep 3, 2008 at 9:28 AM, deneche abdelhakim <[hidden email]> wrote:

> I came across the following competition
>
> http://www.netflixprize.com/index
>
>
> It's about recommender systems, so I think it's a Taste stuff. The training dataset consists of more than 100M ratings.
>
>
> ----- Message d'origine ----
> De : Josh Myer <[hidden email]>
> À : [hidden email]
> Envoyé le : Mercredi, 30 Juillet 2008, 18h19mn 25s
> Objet : Re: FYI Cloud Computing Resources
>
> On Wed, Jul 30, 2008 at 11:26:29AM -0400, Grant Ingersoll wrote:
>> http://research.yahoo.com/node/2328
>>
>> It _MAY_ (stressed, emphasized, etc.) be possible for Mahouters (or
>> are we just Mahouts?) to get some access to these resources.  One big
>> question is where can we get some fairly large data sets (large, but
>> not super large, I think, but am not sure)
>>
>> If you have ideas, etc. please let us know.
>>
>
> It's worth plugging (theinfo), http://theinfo.org/.  It's a project to
> collect references to datasets, and may help here.  Unfortunately, it
> seems to be laggy at the moment.  I'll poke Aaron about that =)
>
> HtH,
> --
> Josh Myer
> [hidden email]
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Re : FYI Cloud Computing Resources

Grant Ingersoll-2

On Sep 3, 2008, at 4:34 AM, Sean Owen wrote:

> Yeah it's almost over unfortunately. :) I tried this a while ago with
> a slope-one recommender, and was only about able to match Netflix's
> current performance. I published some support code for people who
> wanted to play with it but removed it from Mahout's copy as legacy
> code.

Hmm, probably useful to keep the code around, even if it's just used  
as a sample of how to do things w/ Taste.  I imagine the Netflix data  
will live on for quite some time.

>
>
> I didn't really have time to investigate more. Some of the insights
> that have fallen out from the competition are pretty great. For
> example: one person took advantage of a sort of "memory effect" for
> recommendations.... people tend to at times over-rate movies and at
> times under-rate movies. So if you kind of correct for this -- that a
> sequence of 5-star ratings may not be as meaningful as a 5-star rating
> in the middle of several 2-star ratings, you get much better
> performance.
>
> This nugget of knowledge may be specific to Netflix, not sure. But it
> was interesting.
>
> On Wed, Sep 3, 2008 at 9:28 AM, deneche abdelhakim  
> <[hidden email]> wrote:
>> I came across the following competition
>>
>> http://www.netflixprize.com/index
>>
>>
>> It's about recommender systems, so I think it's a Taste stuff. The  
>> training dataset consists of more than 100M ratings.
>>
>>
>> ----- Message d'origine ----
>> De : Josh Myer <[hidden email]>
>> À : [hidden email]
>> Envoyé le : Mercredi, 30 Juillet 2008, 18h19mn 25s
>> Objet : Re: FYI Cloud Computing Resources
>>
>> On Wed, Jul 30, 2008 at 11:26:29AM -0400, Grant Ingersoll wrote:
>>> http://research.yahoo.com/node/2328
>>>
>>> It _MAY_ (stressed, emphasized, etc.) be possible for Mahouters (or
>>> are we just Mahouts?) to get some access to these resources.  One  
>>> big
>>> question is where can we get some fairly large data sets (large, but
>>> not super large, I think, but am not sure)
>>>
>>> If you have ideas, etc. please let us know.
>>>
>>
>> It's worth plugging (theinfo), http://theinfo.org/.  It's a project  
>> to
>> collect references to datasets, and may help here.  Unfortunately, it
>> seems to be laggy at the moment.  I'll poke Aaron about that =)
>>
>> HtH,
>> --
>> Josh Myer
>> [hidden email]
>>
>>
>>
>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ