Taste speed

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Taste speed

Otis Gospodnetic-2
Hello,

I've been using Taste for a while, but it's not scaling well, and I suspect I'm doing something wrong.
When I say "not scaling well", this is what I mean:
* I have 1 week's worth of data (user,item datapoints)
* I don't have item preferences, so I'm using the boolean model
* I have caching in front of Taste, so the rate of requests that Taste needs to handle is only 150-300 reqs/minute/server
* The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
* I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for Spring)

** The bottom line is that with all of the above, I have to filter out less popular items and less active users in order to be able to return recommendations in a reasonable amount of time (e.g. 100-200 ms at the 150-300 reqs/min rate).  In the end, after this filtering, I end up with, say, 30K users and 50K items, and that's what I use to build the DataModel.  If I remove filtering and let more data in, the performance goes down the drain.

My feeling is 30K users and 50K items makes for an awfully small data set and that Taste, esp. at only
150-300 reqs/min on an 8-core server should be much faster.  I have a feeling I'm doing something wrong and that Taste is really capable of handling more data, faster.  Here is the code I use to construct the recommender:

    idMigrator = LocalMemoryIDMigrator.getInstance();
    model = MyDataModel.getInstance("itemType");

    // ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
    similarity = new TanimotoCoefficientSimilarity(model);
    similarity = new CachingUserSimilarity(similarity, model);

    // hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
    hood = new NearestNUserNeighborhood(hoodSize, minSimilarity,similarity, model, samplingRate);

    recommender = new GenericUserBasedRecommender(model, hood, similarity);
    recommender = new CachingRecommender(recommender);

What do you think of the above numbers?

Thanks,
Otis
Reply | Threaded
Open this post in threaded view
|

Re: Taste speed

Sean Owen
Yes, that's quite small.  As a reference. I'm currently writing up a
case study on a data set with 130K users and 160K items and
recommendation time is from 10ms to 200ms, depending on the algorithm.
Your use case seems to require 200-400ms per recommendation -- on a
1-core machine.

I'd recommender the -d64 flag if your system is 64-bit, but that's marginal.

I think the big slowdown is probably the translation from strings to
longs and back.

What's your DataModel like? getting good performance in the case where
you need to do translation can be tricky and I suspect this is the
issue.

On Tue, Nov 24, 2009 at 7:10 PM, Otis Gospodnetic
<[hidden email]> wrote:

> Hello,
>
> I've been using Taste for a while, but it's not scaling well, and I suspect I'm doing something wrong.
> When I say "not scaling well", this is what I mean:
> * I have 1 week's worth of data (user,item datapoints)
> * I don't have item preferences, so I'm using the boolean model
> * I have caching in front of Taste, so the rate of requests that Taste needs to handle is only 150-300 reqs/minute/server
> * The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
> * I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for Spring)
>
> ** The bottom line is that with all of the above, I have to filter out less popular items and less active users in order to be able to return recommendations in a reasonable amount of time (e.g. 100-200 ms at the 150-300 reqs/min rate).  In the end, after this filtering, I end up with, say, 30K users and 50K items, and that's what I use to build the DataModel.  If I remove filtering and let more data in, the performance goes down the drain.
>
> My feeling is 30K users and 50K items makes for an awfully small data set and that Taste, esp. at only
> 150-300 reqs/min on an 8-core server should be much faster.  I have a feeling I'm doing something wrong and that Taste is really capable of handling more data, faster.  Here is the code I use to construct the recommender:
>
>    idMigrator = LocalMemoryIDMigrator.getInstance();
>    model = MyDataModel.getInstance("itemType");
>
>    // ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
>    similarity = new TanimotoCoefficientSimilarity(model);
>    similarity = new CachingUserSimilarity(similarity, model);
>
>    // hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
>    hood = new NearestNUserNeighborhood(hoodSize, minSimilarity,similarity, model, samplingRate);
>
>    recommender = new GenericUserBasedRecommender(model, hood, similarity);
>    recommender = new CachingRecommender(recommender);
>
> What do you think of the above numbers?
>
> Thanks,
> Otis
>
Reply | Threaded
Open this post in threaded view
|

Re: Taste speed

Otis Gospodnetic-2
In reply to this post by Otis Gospodnetic-2
Correction for the number of user and item data:
Users: 25K
Items: 2K

I am less worried about increasing the number of potential items to recommend.
I am more interested in getting more users into Taste, so the larger percentage of my users can get recommendations.
For example, to filter out users I require certain level of activity in terms of the number of items previously consumed.
With that threshold at 15, I get about 25K users (the above) -- so 25K users consumed 15 or more items
With 10, I get about 50K users who consumed 10 or more items.
With 5, I get about 200K users who consumed 5 or more items (presumably just 5 items would produce good-enough recommendations)

I know I could lower the sampling rate and get more users in, but that feels like cheating and will lower the quality of recommendations.  I have a feeling even with the sampling rate of 1.0 I should be able to get more users into Taste and still have Taste give me recommendations in 100-200ms with only 150-300 reqs/minute.


Otis



----- Original Message ----

> From: Otis Gospodnetic <[hidden email]>
> To: [hidden email]
> Sent: Tue, November 24, 2009 2:10:07 PM
> Subject: Taste speed
>
> Hello,
>
> I've been using Taste for a while, but it's not scaling well, and I suspect I'm
> doing something wrong.
> When I say "not scaling well", this is what I mean:
> * I have 1 week's worth of data (user,item datapoints)
> * I don't have item preferences, so I'm using the boolean model
> * I have caching in front of Taste, so the rate of requests that Taste needs to
> handle is only 150-300 reqs/minute/server
> * The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
> * I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap
> -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled
> -XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for Spring)
>
> ** The bottom line is that with all of the above, I have to filter out less
> popular items and less active users in order to be able to return
> recommendations in a reasonable amount of time (e.g. 100-200 ms at the 150-300
> reqs/min rate).  In the end, after this filtering, I end up with, say, 30K users
> and 50K items, and that's what I use to build the DataModel.  If I remove
> filtering and let more data in, the performance goes down the drain.
>
> My feeling is 30K users and 50K items makes for an awfully small data set and
> that Taste, esp. at only
> 150-300 reqs/min on an 8-core server should be much faster.  I have a feeling
> I'm doing something wrong and that Taste is really capable of handling more
> data, faster.  Here is the code I use to construct the recommender:
>
>     idMigrator = LocalMemoryIDMigrator.getInstance();
>     model = MyDataModel.getInstance("itemType");
>
>     // ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
>     similarity = new TanimotoCoefficientSimilarity(model);
>     similarity = new CachingUserSimilarity(similarity, model);
>
>     // hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
>     hood = new NearestNUserNeighborhood(hoodSize, minSimilarity,similarity,
> model, samplingRate);
>
>     recommender = new GenericUserBasedRecommender(model, hood, similarity);
>     recommender = new CachingRecommender(recommender);
>
> What do you think of the above numbers?
>
> Thanks,
> Otis

Reply | Threaded
Open this post in threaded view
|

Re: Taste speed

Otis Gospodnetic-2
In reply to this post by Sean Owen
Hi,

> Yes, that's quite small.  As a reference. I'm currently writing up a
> case study on a data set with 130K users and 160K items and
> recommendation time is from 10ms to 200ms, depending on the algorithm.
> Your use case seems to require 200-400ms per recommendation -- on a
> 1-core machine.

Yeah.  That sounds really off to me.

> I'd recommender the -d64 flag if your system is 64-bit, but that's marginal.

Can't do it on my servers. :(

> I think the big slowdown is probably the translation from strings to
> longs and back.
>
> What's your DataModel like? getting good performance in the case where
> you need to do translation can be tricky and I suspect this is the
> issue.

My data looks like this (user,item), if that's what you're asking:

111.111.111.111-1629385632.30042258,DHHDE59E0Q920007715
222.222.222.222-1251641952.30039838,KDJDE5AJ31I20003422
333.333.333.333-1193732240.30032560,AKNDKDKJDJD320079784

I believe the string->long conversion is basically the same as what you committed a few months back.

Otis

> On Tue, Nov 24, 2009 at 7:10 PM, Otis Gospodnetic
> wrote:
> > Hello,
> >
> > I've been using Taste for a while, but it's not scaling well, and I suspect
> I'm doing something wrong.
> > When I say "not scaling well", this is what I mean:
> > * I have 1 week's worth of data (user,item datapoints)
> > * I don't have item preferences, so I'm using the boolean model
> > * I have caching in front of Taste, so the rate of requests that Taste needs
> to handle is only 150-300 reqs/minute/server
> > * The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
> > * I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap
> -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled
> -XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for Spring)
> >
> > ** The bottom line is that with all of the above, I have to filter out less
> popular items and less active users in order to be able to return
> recommendations in a reasonable amount of time (e.g. 100-200 ms at the 150-300
> reqs/min rate).  In the end, after this filtering, I end up with, say, 30K users
> and 50K items, and that's what I use to build the DataModel.  If I remove
> filtering and let more data in, the performance goes down the drain.
> >
> > My feeling is 30K users and 50K items makes for an awfully small data set and
> that Taste, esp. at only
> > 150-300 reqs/min on an 8-core server should be much faster.  I have a feeling
> I'm doing something wrong and that Taste is really capable of handling more
> data, faster.  Here is the code I use to construct the recommender:
> >
> >    idMigrator = LocalMemoryIDMigrator.getInstance();
> >    model = MyDataModel.getInstance("itemType");
> >
> >    // ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
> >    similarity = new TanimotoCoefficientSimilarity(model);
> >    similarity = new CachingUserSimilarity(similarity, model);
> >
> >    // hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
> >    hood = new NearestNUserNeighborhood(hoodSize, minSimilarity,similarity,
> model, samplingRate);
> >
> >    recommender = new GenericUserBasedRecommender(model, hood, similarity);
> >    recommender = new CachingRecommender(recommender);
> >
> > What do you think of the above numbers?
> >
> > Thanks,
> > Otis
> >

Reply | Threaded
Open this post in threaded view
|

Re: Taste speed

Ted Dunning
As another data point, at Veoh we analyzed 7 months of data at 100-250
million events per day.  This involved a few tens of millions of users and a
few million items.  We down-sampled the most common items so that no item
had more than 1000 users.

Offline analysis took 10 hours on about 10-20 cores (this was Hadoop 15 and
then 16).  Recommendations were done by various technologies at different
times, but we typically had to produce 100-800 recommendations per second.
Initially, we used a combined Solr instance for all search, navigation, site
structuring and recommendation, but it quickly became clear that it was
better to specialize.  Later we used a specially designed static web site to
serve the item vectors and combined them on light-weight servers or even in
the browser.  That let us have good cacheability and enormous scalability.

The actual algorithms were to weight item vectors according to IDF of the
item in the history, but otherwise just used a fixed score vs rank in the
item vectors themselves.  This worked as well as any fancier solution we
tried.  Item vectors were directly from the cooccurrence matrix, sparsified
using LLR.

On Tue, Nov 24, 2009 at 11:46 AM, Otis Gospodnetic <
[hidden email]> wrote:

> Hi,
>
> > Yes, that's quite small.  As a reference. I'm currently writing up a
> > case study on a data set with 130K users and 160K items and
> > recommendation time is from 10ms to 200ms, depending on the algorithm.
> > Your use case seems to require 200-400ms per recommendation -- on a
> > 1-core machine.
>
> Yeah.  That sounds really off to me.
>
> > I'd recommender the -d64 flag if your system is 64-bit, but that's
> marginal.
>
> Can't do it on my servers. :(
>
> > I think the big slowdown is probably the translation from strings to
> > longs and back.
> >
> > What's your DataModel like? getting good performance in the case where
> > you need to do translation can be tricky and I suspect this is the
> > issue.
>
> My data looks like this (user,item), if that's what you're asking:
>
> 111.111.111.111-1629385632.30042258,DHHDE59E0Q920007715
> 222.222.222.222-1251641952.30039838,KDJDE5AJ31I20003422
> 333.333.333.333-1193732240.30032560,AKNDKDKJDJD320079784
>
> I believe the string->long conversion is basically the same as what you
> committed a few months back.
>
> Otis
>
> > On Tue, Nov 24, 2009 at 7:10 PM, Otis Gospodnetic
> > wrote:
> > > Hello,
> > >
> > > I've been using Taste for a while, but it's not scaling well, and I
> suspect
> > I'm doing something wrong.
> > > When I say "not scaling well", this is what I mean:
> > > * I have 1 week's worth of data (user,item datapoints)
> > > * I don't have item preferences, so I'm using the boolean model
> > > * I have caching in front of Taste, so the rate of requests that Taste
> needs
> > to handle is only 150-300 reqs/minute/server
> > > * The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
> > > * I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap
> > -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled
> > -XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for
> Spring)
> > >
> > > ** The bottom line is that with all of the above, I have to filter out
> less
> > popular items and less active users in order to be able to return
> > recommendations in a reasonable amount of time (e.g. 100-200 ms at the
> 150-300
> > reqs/min rate).  In the end, after this filtering, I end up with, say,
> 30K users
> > and 50K items, and that's what I use to build the DataModel.  If I remove
> > filtering and let more data in, the performance goes down the drain.
> > >
> > > My feeling is 30K users and 50K items makes for an awfully small data
> set and
> > that Taste, esp. at only
> > > 150-300 reqs/min on an 8-core server should be much faster.  I have a
> feeling
> > I'm doing something wrong and that Taste is really capable of handling
> more
> > data, faster.  Here is the code I use to construct the recommender:
> > >
> > >    idMigrator = LocalMemoryIDMigrator.getInstance();
> > >    model = MyDataModel.getInstance("itemType");
> > >
> > >    // ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
> > >    similarity = new TanimotoCoefficientSimilarity(model);
> > >    similarity = new CachingUserSimilarity(similarity, model);
> > >
> > >    // hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
> > >    hood = new NearestNUserNeighborhood(hoodSize,
> minSimilarity,similarity,
> > model, samplingRate);
> > >
> > >    recommender = new GenericUserBasedRecommender(model, hood,
> similarity);
> > >    recommender = new CachingRecommender(recommender);
> > >
> > > What do you think of the above numbers?
> > >
> > > Thanks,
> > > Otis
> > >
>
>


--
Ted Dunning, CTO
DeepDyve
Reply | Threaded
Open this post in threaded view
|

Re: Taste speed

Grant Ingersoll-2
In reply to this post by Otis Gospodnetic-2
Have you done any profiling?  It would be interesting to know where the bottlenecks are on your dataset.

-Grant

On Nov 24, 2009, at 2:37 PM, Otis Gospodnetic wrote:

> Correction for the number of user and item data:
> Users: 25K
> Items: 2K
>
> I am less worried about increasing the number of potential items to recommend.
> I am more interested in getting more users into Taste, so the larger percentage of my users can get recommendations.
> For example, to filter out users I require certain level of activity in terms of the number of items previously consumed.
> With that threshold at 15, I get about 25K users (the above) -- so 25K users consumed 15 or more items
> With 10, I get about 50K users who consumed 10 or more items.
> With 5, I get about 200K users who consumed 5 or more items (presumably just 5 items would produce good-enough recommendations)
>
> I know I could lower the sampling rate and get more users in, but that feels like cheating and will lower the quality of recommendations.  I have a feeling even with the sampling rate of 1.0 I should be able to get more users into Taste and still have Taste give me recommendations in 100-200ms with only 150-300 reqs/minute.
>
>
> Otis
>
>
>
> ----- Original Message ----
>> From: Otis Gospodnetic <[hidden email]>
>> To: [hidden email]
>> Sent: Tue, November 24, 2009 2:10:07 PM
>> Subject: Taste speed
>>
>> Hello,
>>
>> I've been using Taste for a while, but it's not scaling well, and I suspect I'm
>> doing something wrong.
>> When I say "not scaling well", this is what I mean:
>> * I have 1 week's worth of data (user,item datapoints)
>> * I don't have item preferences, so I'm using the boolean model
>> * I have caching in front of Taste, so the rate of requests that Taste needs to
>> handle is only 150-300 reqs/minute/server
>> * The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
>> * I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap
>> -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled
>> -XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for Spring)
>>
>> ** The bottom line is that with all of the above, I have to filter out less
>> popular items and less active users in order to be able to return
>> recommendations in a reasonable amount of time (e.g. 100-200 ms at the 150-300
>> reqs/min rate).  In the end, after this filtering, I end up with, say, 30K users
>> and 50K items, and that's what I use to build the DataModel.  If I remove
>> filtering and let more data in, the performance goes down the drain.
>>
>> My feeling is 30K users and 50K items makes for an awfully small data set and
>> that Taste, esp. at only
>> 150-300 reqs/min on an 8-core server should be much faster.  I have a feeling
>> I'm doing something wrong and that Taste is really capable of handling more
>> data, faster.  Here is the code I use to construct the recommender:
>>
>>    idMigrator = LocalMemoryIDMigrator.getInstance();
>>    model = MyDataModel.getInstance("itemType");
>>
>>    // ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
>>    similarity = new TanimotoCoefficientSimilarity(model);
>>    similarity = new CachingUserSimilarity(similarity, model);
>>
>>    // hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
>>    hood = new NearestNUserNeighborhood(hoodSize, minSimilarity,similarity,
>> model, samplingRate);
>>
>>    recommender = new GenericUserBasedRecommender(model, hood, similarity);
>>    recommender = new CachingRecommender(recommender);
>>
>> What do you think of the above numbers?
>>
>> Thanks,
>> Otis
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply | Threaded
Open this post in threaded view
|

Re: Taste speed

Otis Gospodnetic-2
I did, some 6+ months ago (pre all-IDs-are-longs changes).  I remember seeing the most time spent in TanimotoCoefficientSimilarity and thinking "damn, this is all just set intersection and basic math operations - how do I speed that up?".

Otis



----- Original Message ----

> From: Grant Ingersoll <[hidden email]>
> To: [hidden email]
> Sent: Tue, November 24, 2009 3:25:53 PM
> Subject: Re: Taste speed
>
> Have you done any profiling?  It would be interesting to know where the
> bottlenecks are on your dataset.
>
> -Grant
>
> On Nov 24, 2009, at 2:37 PM, Otis Gospodnetic wrote:
>
> > Correction for the number of user and item data:
> > Users: 25K
> > Items: 2K
> >
> > I am less worried about increasing the number of potential items to recommend.
> > I am more interested in getting more users into Taste, so the larger
> percentage of my users can get recommendations.
> > For example, to filter out users I require certain level of activity in terms
> of the number of items previously consumed.
> > With that threshold at 15, I get about 25K users (the above) -- so 25K users
> consumed 15 or more items
> > With 10, I get about 50K users who consumed 10 or more items.
> > With 5, I get about 200K users who consumed 5 or more items (presumably just 5
> items would produce good-enough recommendations)
> >
> > I know I could lower the sampling rate and get more users in, but that feels
> like cheating and will lower the quality of recommendations.  I have a feeling
> even with the sampling rate of 1.0 I should be able to get more users into Taste
> and still have Taste give me recommendations in 100-200ms with only 150-300
> reqs/minute.
> >
> >
> > Otis
> >
> >
> >
> > ----- Original Message ----
> >> From: Otis Gospodnetic
> >> To: [hidden email]
> >> Sent: Tue, November 24, 2009 2:10:07 PM
> >> Subject: Taste speed
> >>
> >> Hello,
> >>
> >> I've been using Taste for a while, but it's not scaling well, and I suspect
> I'm
> >> doing something wrong.
> >> When I say "not scaling well", this is what I mean:
> >> * I have 1 week's worth of data (user,item datapoints)
> >> * I don't have item preferences, so I'm using the boolean model
> >> * I have caching in front of Taste, so the rate of requests that Taste needs
> to
> >> handle is only 150-300 reqs/minute/server
> >> * The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
> >> * I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap
> >> -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled
> >> -XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for Spring)
> >>
> >> ** The bottom line is that with all of the above, I have to filter out less
> >> popular items and less active users in order to be able to return
> >> recommendations in a reasonable amount of time (e.g. 100-200 ms at the
> 150-300
> >> reqs/min rate).  In the end, after this filtering, I end up with, say, 30K
> users
> >> and 50K items, and that's what I use to build the DataModel.  If I remove
> >> filtering and let more data in, the performance goes down the drain.
> >>
> >> My feeling is 30K users and 50K items makes for an awfully small data set and
>
> >> that Taste, esp. at only
> >> 150-300 reqs/min on an 8-core server should be much faster.  I have a feeling
>
> >> I'm doing something wrong and that Taste is really capable of handling more
> >> data, faster.  Here is the code I use to construct the recommender:
> >>
> >>    idMigrator = LocalMemoryIDMigrator.getInstance();
> >>    model = MyDataModel.getInstance("itemType");
> >>
> >>    // ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
> >>    similarity = new TanimotoCoefficientSimilarity(model);
> >>    similarity = new CachingUserSimilarity(similarity, model);
> >>
> >>    // hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
> >>    hood = new NearestNUserNeighborhood(hoodSize, minSimilarity,similarity,
> >> model, samplingRate);
> >>
> >>    recommender = new GenericUserBasedRecommender(model, hood, similarity);
> >>    recommender = new CachingRecommender(recommender);
> >>
> >> What do you think of the above numbers?
> >>
> >> Thanks,
> >> Otis
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search

Reply | Threaded
Open this post in threaded view
|

Re: Taste speed

Sean Owen
In reply to this post by Grant Ingersoll-2
Agree, and I am almost sure it is in the string-long conversion.
Particularly, this could be nasty in your DataModel.

I did an implementation this summer where conversion was needed, and
the data was in a database. For this to work, you really have to
translate longs back to strings before hitting the database, for
example, so that queries can use the natively-indexed string IDs.
There are many ways to work it, and only a few that perform.

Reply privately if the details are a little confidential, I think I
can provide more insights. I don't think this is the best that can be
done even with translation.

On Tue, Nov 24, 2009 at 8:25 PM, Grant Ingersoll <[hidden email]> wrote:

> Have you done any profiling?  It would be interesting to know where the bottlenecks are on your dataset.
>
> -Grant
>
> On Nov 24, 2009, at 2:37 PM, Otis Gospodnetic wrote:
>
>> Correction for the number of user and item data:
>> Users: 25K
>> Items: 2K
>>
>> I am less worried about increasing the number of potential items to recommend.
>> I am more interested in getting more users into Taste, so the larger percentage of my users can get recommendations.
>> For example, to filter out users I require certain level of activity in terms of the number of items previously consumed.
>> With that threshold at 15, I get about 25K users (the above) -- so 25K users consumed 15 or more items
>> With 10, I get about 50K users who consumed 10 or more items.
>> With 5, I get about 200K users who consumed 5 or more items (presumably just 5 items would produce good-enough recommendations)
>>
>> I know I could lower the sampling rate and get more users in, but that feels like cheating and will lower the quality of recommendations.  I have a feeling even with the sampling rate of 1.0 I should be able to get more users into Taste and still have Taste give me recommendations in 100-200ms with only 150-300 reqs/minute.
>>
>>
>> Otis
>>
>>
>>
>> ----- Original Message ----
>>> From: Otis Gospodnetic <[hidden email]>
>>> To: [hidden email]
>>> Sent: Tue, November 24, 2009 2:10:07 PM
>>> Subject: Taste speed
>>>
>>> Hello,
>>>
>>> I've been using Taste for a while, but it's not scaling well, and I suspect I'm
>>> doing something wrong.
>>> When I say "not scaling well", this is what I mean:
>>> * I have 1 week's worth of data (user,item datapoints)
>>> * I don't have item preferences, so I'm using the boolean model
>>> * I have caching in front of Taste, so the rate of requests that Taste needs to
>>> handle is only 150-300 reqs/minute/server
>>> * The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
>>> * I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap
>>> -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled
>>> -XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for Spring)
>>>
>>> ** The bottom line is that with all of the above, I have to filter out less
>>> popular items and less active users in order to be able to return
>>> recommendations in a reasonable amount of time (e.g. 100-200 ms at the 150-300
>>> reqs/min rate).  In the end, after this filtering, I end up with, say, 30K users
>>> and 50K items, and that's what I use to build the DataModel.  If I remove
>>> filtering and let more data in, the performance goes down the drain.
>>>
>>> My feeling is 30K users and 50K items makes for an awfully small data set and
>>> that Taste, esp. at only
>>> 150-300 reqs/min on an 8-core server should be much faster.  I have a feeling
>>> I'm doing something wrong and that Taste is really capable of handling more
>>> data, faster.  Here is the code I use to construct the recommender:
>>>
>>>    idMigrator = LocalMemoryIDMigrator.getInstance();
>>>    model = MyDataModel.getInstance("itemType");
>>>
>>>    // ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
>>>    similarity = new TanimotoCoefficientSimilarity(model);
>>>    similarity = new CachingUserSimilarity(similarity, model);
>>>
>>>    // hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
>>>    hood = new NearestNUserNeighborhood(hoodSize, minSimilarity,similarity,
>>> model, samplingRate);
>>>
>>>    recommender = new GenericUserBasedRecommender(model, hood, similarity);
>>>    recommender = new CachingRecommender(recommender);
>>>
>>> What do you think of the above numbers?
>>>
>>> Thanks,
>>> Otis
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Taste speed

Sean Owen
In reply to this post by Otis Gospodnetic-2
That ought to be much faster since those operations are in terms of
long primitives. I think your bottleneck has moved, and agree that
it's most useful to re-run your checks there.

On Tue, Nov 24, 2009 at 8:33 PM, Otis Gospodnetic
<[hidden email]> wrote:
> I did, some 6+ months ago (pre all-IDs-are-longs changes).  I remember seeing the most time spent in TanimotoCoefficientSimilarity and thinking "damn, this is all just set intersection and basic math operations - how do I speed that up?".
Reply | Threaded
Open this post in threaded view
|

Re: Taste speed

Otis Gospodnetic-2
In reply to this post by Sean Owen
Hi,

> Yes, that's quite small.  As a reference. I'm currently writing up a
> case study on a data set with 130K users and 160K items and
> recommendation time is from 10ms to 200ms, depending on the algorithm.

At what load/concurrency, on what type of hardware, with how large of a heap and with what similarity & friends?

Thanks,
Otis

> Your use case seems to require 200-400ms per recommendation -- on a
> 1-core machine.
>
> I'd recommender the -d64 flag if your system is 64-bit, but that's marginal.
>
> I think the big slowdown is probably the translation from strings to
> longs and back.
>
> What's your DataModel like? getting good performance in the case where
> you need to do translation can be tricky and I suspect this is the
> issue.
>
> On Tue, Nov 24, 2009 at 7:10 PM, Otis Gospodnetic
> wrote:
> > Hello,
> >
> > I've been using Taste for a while, but it's not scaling well, and I suspect
> I'm doing something wrong.
> > When I say "not scaling well", this is what I mean:
> > * I have 1 week's worth of data (user,item datapoints)
> > * I don't have item preferences, so I'm using the boolean model
> > * I have caching in front of Taste, so the rate of requests that Taste needs
> to handle is only 150-300 reqs/minute/server
> > * The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
> > * I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap
> -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled
> -XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for Spring)
> >
> > ** The bottom line is that with all of the above, I have to filter out less
> popular items and less active users in order to be able to return
> recommendations in a reasonable amount of time (e.g. 100-200 ms at the 150-300
> reqs/min rate).  In the end, after this filtering, I end up with, say, 30K users
> and 50K items, and that's what I use to build the DataModel.  If I remove
> filtering and let more data in, the performance goes down the drain.
> >
> > My feeling is 30K users and 50K items makes for an awfully small data set and
> that Taste, esp. at only
> > 150-300 reqs/min on an 8-core server should be much faster.  I have a feeling
> I'm doing something wrong and that Taste is really capable of handling more
> data, faster.  Here is the code I use to construct the recommender:
> >
> >    idMigrator = LocalMemoryIDMigrator.getInstance();
> >    model = MyDataModel.getInstance("itemType");
> >
> >    // ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
> >    similarity = new TanimotoCoefficientSimilarity(model);
> >    similarity = new CachingUserSimilarity(similarity, model);
> >
> >    // hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
> >    hood = new NearestNUserNeighborhood(hoodSize, minSimilarity,similarity,
> model, samplingRate);
> >
> >    recommender = new GenericUserBasedRecommender(model, hood, similarity);
> >    recommender = new CachingRecommender(recommender);
> >
> > What do you think of the above numbers?
> >
> > Thanks,
> > Otis
> >

Reply | Threaded
Open this post in threaded view
|

Re: Taste speed

Sean Owen
This is on a 2-core machine, running 2 threads of load. It's a MacBook
Pro, dual 3 GHz Intel Core Duos I think (64-bit). I allow it 2GB of
heap though usage is about 1GB or so depending on the algorithm. I'm
trying a lot of variants but, for example, a simple user-based
recommender with Euclidean distance similarity and nearest-2
neighborhood recommends in about 100ms per user.

I strongly suspect the difference is this translation. You have a much
smaller set, simpler algorithms, and beefier hardware.

On Tue, Nov 24, 2009 at 8:37 PM, Otis Gospodnetic
<[hidden email]> wrote:

> Hi,
>
>> Yes, that's quite small.  As a reference. I'm currently writing up a
>> case study on a data set with 130K users and 160K items and
>> recommendation time is from 10ms to 200ms, depending on the algorithm.
>
> At what load/concurrency, on what type of hardware, with how large of a heap and with what similarity & friends?
>
> Thanks,
> Otis