Taste on Hbase?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Taste on Hbase?

Otis Gospodnetic-2
Hi,

I was looking at some Hbase stuff earlier and I started wondering whether Taste would benefit from using Hbase as its data store instead of a RDBMS.  Would it?  Oh, now I see notes about DB/MySQL performance at the bottom of this section: http://lucene.apache.org/mahout/taste.html#Runtime+Performance

Here is what I think is an easy to understand explanation of some of the Hbase vs. RDBMS differences:

  http://markmail.org/message/fz6jhlph6bdvsrio

I'm wondering what people more familiar with Hbase and Taste think about Taste using Hbase as its data store.  Would it be possible?  Would it make anything better?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Reply | Threaded
Open this post in threaded view
|

Re: Taste on Hbase?

Sean Owen
I admit I don't know much about HBase, but if I am right that it is
roughly like BigTable, then yeah it would be a better choice -- in
theory at least. The library just needs a very simple table, and very
fast access to it, almost entirely reads, few writes, no transactions.
I'll put it on the to-do list to build an implementation on HBase.

On Fri, Aug 29, 2008 at 5:29 AM, Otis Gospodnetic
<[hidden email]> wrote:

> Hi,
>
> I was looking at some Hbase stuff earlier and I started wondering whether Taste would benefit from using Hbase as its data store instead of a RDBMS.  Would it?  Oh, now I see notes about DB/MySQL performance at the bottom of this section: http://lucene.apache.org/mahout/taste.html#Runtime+Performance
>
> Here is what I think is an easy to understand explanation of some of the Hbase vs. RDBMS differences:
>
>  http://markmail.org/message/fz6jhlph6bdvsrio
>
> I'm wondering what people more familiar with Hbase and Taste think about Taste using Hbase as its data store.  Would it be possible?  Would it make anything better?
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Taste on Hbase?

Cosmin Lehene
You should know though that you can only retrieve data from HBase by  rowid (the equivalent of a primary key in a database). You can't do SELECT WHERE statements. This is because HBase is only indexed by the rowid so you need a separate indexing system like Lucene or Solr to be able to retrieve data in a flexible manner.

For a good understanding of how HBase is different from a RDBMS there's a nice article here: http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable

Cosmin


On 8/29/08 9:49 AM, "Sean Owen" <[hidden email]> wrote:

I admit I don't know much about HBase, but if I am right that it is
roughly like BigTable, then yeah it would be a better choice -- in
theory at least. The library just needs a very simple table, and very
fast access to it, almost entirely reads, few writes, no transactions.
I'll put it on the to-do list to build an implementation on HBase.

On Fri, Aug 29, 2008 at 5:29 AM, Otis Gospodnetic
<[hidden email]> wrote:

> Hi,
>
> I was looking at some Hbase stuff earlier and I started wondering whether Taste would benefit from using Hbase as its data store instead of a RDBMS.  Would it?  Oh, now I see notes about DB/MySQL performance at the bottom of this section: http://lucene.apache.org/mahout/taste.html#Runtime+Performance
>
> Here is what I think is an easy to understand explanation of some of the Hbase vs. RDBMS differences:
>
>  http://markmail.org/message/fz6jhlph6bdvsrio
>
> I'm wondering what people more familiar with Hbase and Taste think about Taste using Hbase as its data store.  Would it be possible?  Would it make anything better?
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Taste on Hbase?

Otis Gospodnetic-2
In reply to this post by Otis Gospodnetic-2
Sean, are those reads random or sequential?  I'd think they'd be sequential during batch computation of recommendations, but I'm not sure.

Here are some numbers: http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation#0_2_0

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----

> From: Sean Owen <[hidden email]>
> To: [hidden email]
> Sent: Friday, August 29, 2008 2:49:33 AM
> Subject: Re: Taste on Hbase?
>
> I admit I don't know much about HBase, but if I am right that it is
> roughly like BigTable, then yeah it would be a better choice -- in
> theory at least. The library just needs a very simple table, and very
> fast access to it, almost entirely reads, few writes, no transactions.
> I'll put it on the to-do list to build an implementation on HBase.
>
> On Fri, Aug 29, 2008 at 5:29 AM, Otis Gospodnetic
> wrote:
> > Hi,
> >
> > I was looking at some Hbase stuff earlier and I started wondering whether
> Taste would benefit from using Hbase as its data store instead of a RDBMS.  
> Would it?  Oh, now I see notes about DB/MySQL performance at the bottom of this
> section: http://lucene.apache.org/mahout/taste.html#Runtime+Performance
> >
> > Here is what I think is an easy to understand explanation of some of the Hbase
> vs. RDBMS differences:
> >
> >  http://markmail.org/message/fz6jhlph6bdvsrio
> >
> > I'm wondering what people more familiar with Hbase and Taste think about Taste
> using Hbase as its data store.  Would it be possible?  Would it make anything
> better?
> >
> > Thanks,
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >

Reply | Threaded
Open this post in threaded view
|

Re: Taste on Hbase?

Sean Owen
Nah, pretty much random -- well, most of the queries are like "show me
all ratings for item ID x" or "... from user ID x" though some are a
bit more complex. I think you can get 90% of what's needed from two
HBase tables, one keyed by user and the other by item, though you end
up duplicating a lot of data. Perhaps there are answers to that, and
to the other sorts of queries that are needed. It could be that it's
just not a fit but seems like there might be some way to use it
effectively for this purpose.

On Fri, Aug 29, 2008 at 5:33 PM, Otis Gospodnetic
<[hidden email]> wrote:

> Sean, are those reads random or sequential?  I'd think they'd be sequential during batch computation of recommendations, but I'm not sure.
>
> Here are some numbers: http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation#0_2_0
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: Sean Owen <[hidden email]>
>> To: [hidden email]
>> Sent: Friday, August 29, 2008 2:49:33 AM
>> Subject: Re: Taste on Hbase?
>>
>> I admit I don't know much about HBase, but if I am right that it is
>> roughly like BigTable, then yeah it would be a better choice -- in
>> theory at least. The library just needs a very simple table, and very
>> fast access to it, almost entirely reads, few writes, no transactions.
>> I'll put it on the to-do list to build an implementation on HBase.
>>
>> On Fri, Aug 29, 2008 at 5:29 AM, Otis Gospodnetic
>> wrote:
>> > Hi,
>> >
>> > I was looking at some Hbase stuff earlier and I started wondering whether
>> Taste would benefit from using Hbase as its data store instead of a RDBMS.
>> Would it?  Oh, now I see notes about DB/MySQL performance at the bottom of this
>> section: http://lucene.apache.org/mahout/taste.html#Runtime+Performance
>> >
>> > Here is what I think is an easy to understand explanation of some of the Hbase
>> vs. RDBMS differences:
>> >
>> >  http://markmail.org/message/fz6jhlph6bdvsrio
>> >
>> > I'm wondering what people more familiar with Hbase and Taste think about Taste
>> using Hbase as its data store.  Would it be possible?  Would it make anything
>> better?
>> >
>> > Thanks,
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Taste on Hbase?

Sean Owen
I looked at this some more and I am not sure HBase will work out... it
doesn't support anything like a query it seems. Really is just a
distributed sorted map, which is a bit less than BigTable. Not even a
"how many rows are in the table method" it seems.

On Fri, Aug 29, 2008 at 6:20 PM, Sean Owen <[hidden email]> wrote:
> Nah, pretty much random -- well, most of the queries are like "show me
> all ratings for item ID x" or "... from user ID x" though some are a
> bit more complex. I think you can get 90% of what's needed from two
> HBase tables, one keyed by user and the other by item, though you end
> up duplicating a lot of data. Perhaps there are answers to that, and
> to the other sorts of queries that are needed. It could be that it's
> just not a fit but seems like there might be some way to use it
> effectively for this purpose.
Reply | Threaded
Open this post in threaded view
|

Re: Taste on Hbase?

Edward J. Yoon-3
AFAIK, Since hadoop doesn't provide file-append function, Current
Hbase have a problem of data loss when Hbase crashed.

BTW, We also think about CF for example -
http://wiki.apache.org/hama/TraditionalCollaborativeFiltering

If you have some advanced idea, please let me know.

Regards, Edward

On Mon, Sep 1, 2008 at 9:16 AM, Sean Owen <[hidden email]> wrote:

> I looked at this some more and I am not sure HBase will work out... it
> doesn't support anything like a query it seems. Really is just a
> distributed sorted map, which is a bit less than BigTable. Not even a
> "how many rows are in the table method" it seems.
>
> On Fri, Aug 29, 2008 at 6:20 PM, Sean Owen <[hidden email]> wrote:
>> Nah, pretty much random -- well, most of the queries are like "show me
>> all ratings for item ID x" or "... from user ID x" though some are a
>> bit more complex. I think you can get 90% of what's needed from two
>> HBase tables, one keyed by user and the other by item, though you end
>> up duplicating a lot of data. Perhaps there are answers to that, and
>> to the other sorts of queries that are needed. It could be that it's
>> just not a fit but seems like there might be some way to use it
>> effectively for this purpose.
>



--
Best regards, Edward J. Yoon
[hidden email]
http://blog.udanax.org
Reply | Threaded
Open this post in threaded view
|

Re: Taste on Hbase?

Karl Wettin

1 sep 2008 kl. 08.38 skrev Edward J. Yoon:

> AFAIK, Since hadoop doesn't provide file-append function, Current
> Hbase have a problem of data loss when Hbase crashed.

Actually, hdfs handle append since not too long ago. Not much support  
for it yet though.


      karl
Reply | Threaded
Open this post in threaded view
|

Re: Taste on Hbase?

Sean Owen
In reply to this post by Edward J. Yoon-3
Yes, this is a sketch of basic user-based collaborative filtering,
using a cosine-measure correlation as a similarity metric? (I think it
needs to divide out by the size of the two vectors?).

The analog in Mahout would be
org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender,
and org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity

I agree that one could parallelize computation of the user-user
similarity. Indeed I think any scalable recommender is going to have
to do a lot of intense precomputation, via something like Hadoop, and
then relatively little at runtime.

On Mon, Sep 1, 2008 at 7:38 AM, Edward J. Yoon <[hidden email]> wrote:
> BTW, We also think about CF for example -
> http://wiki.apache.org/hama/TraditionalCollaborativeFiltering
>
> If you have some advanced idea, please let me know.