
12

Hama contrib package for the mahout

Key: MAHOUT16
URL: https://issues.apache.org/jira/browse/MAHOUT16 Project: Mahout
Issue Type: New Feature
Environment: All environment
Reporter: Edward Yoon
*Introduction*
Hama will develop a highperformance and largescale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively largescale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
Hama approach proposes the use of 3dimensional Row and Column (Qualifier), Time space and multidimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of nonzero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
*Current Status*
In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
It would be great if we can collaboration with the mahout members.
*Members*
The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
 Edward Yoon (edward AT udanax DOT org)
 Chanwit Kaewkasi (chanwit AT gmail DOT com)
 Min Cha (minslovey AT gmail DOT com)
At least, I and Min Cha will be involved fulltime with this work.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.


[ https://issues.apache.org/jira/browse/MAHOUT16?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel ]
Edward Yoon updated MAHOUT16:

Attachment: hama.tar.gz
This is a current code of hama.
I would appreciate any advice you could give me.
> Hama contrib package for the mahout
> 
>
> Key: MAHOUT16
> URL: https://issues.apache.org/jira/browse/MAHOUT16> Project: Mahout
> Issue Type: New Feature
> Environment: All environment
> Reporter: Edward Yoon
> Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a highperformance and largescale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively largescale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3dimensional Row and Column (Qualifier), Time space and multidimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of nonzero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
>  Edward Yoon (edward AT udanax DOT org)
>  Chanwit Kaewkasi (chanwit AT gmail DOT com)
>  Min Cha (minslovey AT gmail DOT com)
> At least, I and Min Cha will be involved fulltime with this work.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.


In reply to this post by JIRA jira@apache.org
On Wednesday 12 March 2008, Edward Yoon (JIRA) wrote:
> At least, I and Min Cha will be involved fulltime with this work.
That sounds nice  so you are doing your Ph.D thesis in this area or is your
employer interested in the project?
Isabel

MSDOS must die!
\ _,,,,,_ Web: < http://www.isabeldrost.de>
/,`.'`' . ;;;,_
,4 ) ),_..;\ ( `''
'''(_/' `'\_) (fL) IM: <xmpp:// [hidden email]>


I received a master's degree in mathematical informatics. Min Cha
received a Ph.D. degree in parallel architectures. Chanwit Kaewkasi is
a Ph.D candidate in computer science department at the The University
of Manchester.
I and Min Cha are the software engineer in service statistics and data
mining at NHN, corp. ( http://en.wikipedia.org/wiki/NHN)
We will be involved fulltime with this work.
Thanks,
Edward.
On 3/12/08, Isabel Drost < [hidden email]> wrote:
> On Wednesday 12 March 2008, Edward Yoon (JIRA) wrote:
> > At least, I and Min Cha will be involved fulltime with this work.
>
>
> That sounds nice  so you are doing your Ph.D thesis in this area or is your
> employer interested in the project?
>
> Isabel
>
>
>
> 
> MSDOS must die!
> \ _,,,,,_ Web: < http://www.isabeldrost.de>
> /,`.'`' . ;;;,_
> ,4 ) ),_..;\ ( `''
> '''(_/' `'\_) (fL) IM: <xmpp:// [hidden email]>
>
>

B. Regards,
Edward yoon @ NHN, corp.


In reply to this post by JIRA jira@apache.org
I've downloaded the Hama package and integrated it into my mahout source
tree. The code could use some more comments and a pile of unit tests are
needed to demonstrate its correctness. Other than that, the implementation
seems complementary with MAHOUT6 and intended for manipulating very large
matrices that are stored in Hbase. The two interfaces (Matrix and
MatrixInterface) are pretty similar and could be coalesced easily to provide
a single abstraction for inmemory and Hbased representations.
If both were in trunk, I'd be willing to work on those tasks.
+1
Jeff
> Original Message
> From: Edward Yoon (JIRA) [mailto: [hidden email]]
> Sent: Tuesday, March 11, 2008 11:41 PM
> To: [hidden email]
> Subject: [jira] Updated: (MAHOUT16) Hama contrib package for the mahout
>
>
> [ https://issues.apache.org/jira/browse/MAHOUT> 16?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel ]
>
> Edward Yoon updated MAHOUT16:
> 
>
> Attachment: hama.tar.gz
>
> This is a current code of hama.
> I would appreciate any advice you could give me.
>
> > Hama contrib package for the mahout
> > 
> >
> > Key: MAHOUT16
> > URL: https://issues.apache.org/jira/browse/MAHOUT16> > Project: Mahout
> > Issue Type: New Feature
> > Environment: All environment
> > Reporter: Edward Yoon
> > Attachments: hama.tar.gz
> >
> >
> > *Introduction*
> > Hama will develop a highperformance and largescale parallel matrix
> computational package based on Hadoop Map/Reduce. It will be useful for a
> massively largescale Numerical Analysis and Data Mining, which need the
> intensive computation power of matrix inversion, e.g. linear regression,
> PCA, SVM and etc. It will be also useful for many scientific applications,
> e.g. physics computations, linear algebra, computational fluid dynamics,
> statistics, graphic rendering and many more.
> > Hama approach proposes the use of 3dimensional Row and Column
> (Qualifier), Time space and multidimensional Columnfamilies of Hbase
> (BigTable Clone), which is able to store large sparse and various type of
> matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto
> partitioned sparsity substructure will be efficiently managed and
> serviced by Hbase. Row and Column operations can be done in lineartime,
> where several algorithms, such as structured Gaussian elimination or
> iterative methods, run in O(the number of nonzero elements in the matrix
> / number of mappers) time on Hadoop Map/Reduce.
> > So, it has a strong relationship with the mahout project, and it would
> be great if the "hama" can become a contrib project of the mahout.
> > *Current Status*
> > In its current state, the 'hama' is buggy and needs filling out, but
> generalized matrix interface and basic linear algebra operations was
> implemented within a large prototype system. In the future, We need new
> parallel algorithms based on Map/Reduce for performance of heavy
> decompositions and factorizations. It also needs tools to compose an
> arbitrary matrix only with certain data filtered from hbase array
> structure.
> > It would be great if we can collaboration with the mahout members.
> > *Members*
> > The initial set of committers includes folks from the Hadoop & Hbase
> communities, and We have a master's (or Ph.D) degrees in the mathematics
> and computer science.
> >  Edward Yoon (edward AT udanax DOT org)
> >  Chanwit Kaewkasi (chanwit AT gmail DOT com)
> >  Min Cha (minslovey AT gmail DOT com)
> > At least, I and Min Cha will be involved fulltime with this work.
>
> 
> This message is automatically generated by JIRA.
> 
> You can reply to this email to add a comment to the issue online.


Jeff Eastman skrev:
> I've downloaded the Hama package and integrated it into my mahout source
I will download and take a look at it too, this week(end?).
> matrices that are stored in Hbase. The two interfaces (Matrix and
> MatrixInterface) are pretty similar and could be coalesced easily to provide
> a single abstraction for inmemory and Hbased representations.
>
> If both were in trunk, I'd be willing to work on those tasks.
>
> +1
I'm not sure we want to put it in the trunk before they are compatible.
karl


Fair enough, I understand we don't want to publish worksinprogress.
Perhaps we could alternatively put them into a feature branch. Working
outside of SVN with multiple authors is problematic.
Jeff
> Original Message
> From: Karl Wettin [mailto: [hidden email]]
> Sent: Wednesday, March 12, 2008 11:28 AM
> To: [hidden email]
> Subject: Re: [jira] Updated: (MAHOUT16) Hama contrib package for the
> mahout
>
> Jeff Eastman skrev:
> > I've downloaded the Hama package and integrated it into my mahout source
>
> I will download and take a look at it too, this week(end?).
>
> > matrices that are stored in Hbase. The two interfaces (Matrix and
> > MatrixInterface) are pretty similar and could be coalesced easily to
> provide
> > a single abstraction for inmemory and Hbased representations.
> >
> > If both were in trunk, I'd be willing to work on those tasks.
> >
> > +1
>
> I'm not sure we want to put it in the trunk before they are compatible.
>
>
> karl


On Mar 12, 2008, at 2:31 PM, Jeff Eastman wrote:
> Fair enough, I understand we don't want to publish worksinprogress.
> Perhaps we could alternatively put them into a feature branch. Working
> outside of SVN with multiple authors is problematic.
That doesn't really work either, since we can't give permissions that
way, either, so you would just be creating patches against the
branch. Uploading and applying patches really is the way to go.


To avoid the situation from the start I've always tried to make a
comment in the issue as soon I think on something, what I plan to work
on and usually if I start to work on something. That way at least people
know where to stay out.
But honestly, I was never a part of any issue involving that many people
or was that active. :)
Is it perhaps possible to diff two patches?
karl
Jeff Eastman skrev:
> Fair enough, I understand we don't want to publish worksinprogress.
> Perhaps we could alternatively put them into a feature branch. Working
> outside of SVN with multiple authors is problematic.
>
> Jeff
>
>> Original Message
>> From: Karl Wettin [mailto: [hidden email]]
>> Sent: Wednesday, March 12, 2008 11:28 AM
>> To: [hidden email]
>> Subject: Re: [jira] Updated: (MAHOUT16) Hama contrib package for the
>> mahout
>>
>> Jeff Eastman skrev:
>>> I've downloaded the Hama package and integrated it into my mahout source
>> I will download and take a look at it too, this week(end?).
>>
>>> matrices that are stored in Hbase. The two interfaces (Matrix and
>>> MatrixInterface) are pretty similar and could be coalesced easily to
>> provide
>>> a single abstraction for inmemory and Hbased representations.
>>>
>>> If both were in trunk, I'd be willing to work on those tasks.
>> >
>> > +1
>>
>> I'm not sure we want to put it in the trunk before they are compatible.
>>
>>
>> karl
>
>


Gee, I think SVN can grant permissions on a branch basis. This must be an
Apache commit policy. Creating and integrating patches against a branch is
much preferable to diffdiffing. If we had a feature branch, any one of the
committers could commit the various branch patches, presumably without as
much due diligence as required for a trunk commit.
Jeff
> Original Message
> From: Grant Ingersoll [mailto: [hidden email]]
> Sent: Wednesday, March 12, 2008 11:43 AM
> To: [hidden email]
> Subject: Re: [jira] Updated: (MAHOUT16) Hama contrib package for the
> mahout
>
>
> On Mar 12, 2008, at 2:31 PM, Jeff Eastman wrote:
>
> > Fair enough, I understand we don't want to publish worksinprogress.
> > Perhaps we could alternatively put them into a feature branch. Working
> > outside of SVN with multiple authors is problematic.
>
> That doesn't really work either, since we can't give permissions that
> way, either, so you would just be creating patches against the
> branch. Uploading and applying patches really is the way to go.


On Mar 12, 2008, at 3:12 PM, Jeff Eastman wrote:
> Gee, I think SVN can grant permissions on a branch basis. This must
> be an
> Apache commit policy. Creating and integrating patches against a
> branch is
> much preferable to diffdiffing. If we had a feature branch, any one
> of the
> committers could commit the various branch patches, presumably
> without as
> much due diligence as required for a trunk commit.
>
Yeah SVN can do it, but everyone would need to have committer status.
Even w/ the branch, you still have the same problem of managing all
the patches against the branch assuming there are others working on it
who are noncommitters.


The benefit of SVN's excellent diff and merge tools are significant,
especially when working with the large initial patch files this effort would
require. Once most of the code is in SVN, the individual patches become
smaller, much more targeted and merging them is much more manageable.
Jeff
> Original Message
> From: Grant Ingersoll [mailto: [hidden email]]
> Sent: Wednesday, March 12, 2008 12:25 PM
> To: [hidden email]
> Subject: Re: [jira] Updated: (MAHOUT16) Hama contrib package for the
> mahout
>
>
> On Mar 12, 2008, at 3:12 PM, Jeff Eastman wrote:
>
> > Gee, I think SVN can grant permissions on a branch basis. This must
> > be an
> > Apache commit policy. Creating and integrating patches against a
> > branch is
> > much preferable to diffdiffing. If we had a feature branch, any one
> > of the
> > committers could commit the various branch patches, presumably
> > without as
> > much due diligence as required for a trunk commit.
> >
>
> Yeah SVN can do it, but everyone would need to have committer status.
> Even w/ the branch, you still have the same problem of managing all
> the patches against the branch assuming there are others working on it
> who are noncommitters.


> Fair enough, I understand we don't want to publish worksinprogress.
> Perhaps we could alternatively put them into a feature branch. Working
> outside of SVN with multiple authors is problematic.
Dear Jeff, I agree with you, and think it's a good method.
I also hope we can share new ideas and learn from our experiences in here.
Thanks,
Edward.
On 3/13/08, Jeff Eastman < [hidden email]> wrote:
> Fair enough, I understand we don't want to publish worksinprogress.
> Perhaps we could alternatively put them into a feature branch. Working
> outside of SVN with multiple authors is problematic.
>
>
> Jeff
>
>
> > Original Message
> > From: Karl Wettin [mailto: [hidden email]]
> > Sent: Wednesday, March 12, 2008 11:28 AM
> > To: [hidden email]
>
> > Subject: Re: [jira] Updated: (MAHOUT16) Hama contrib package for the
> > mahout
> >
>
> > Jeff Eastman skrev:
> > > I've downloaded the Hama package and integrated it into my mahout source
> >
> > I will download and take a look at it too, this week(end?).
> >
> > > matrices that are stored in Hbase. The two interfaces (Matrix and
> > > MatrixInterface) are pretty similar and could be coalesced easily to
> > provide
> > > a single abstraction for inmemory and Hbased representations.
> > >
> > > If both were in trunk, I'd be willing to work on those tasks.
> > >
> > > +1
> >
> > I'm not sure we want to put it in the trunk before they are compatible.
> >
> >
> > karl
>
>
>

B. Regards,
Edward yoon @ NHN, corp.


In reply to this post by JIRA jira@apache.org
[ https://issues.apache.org/jira/browse/MAHOUT16?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel ]
Edward Yoon updated MAHOUT16:

Description:
*Introduction*
Hama will develop a highperformance and largescale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively largescale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
Hama approach proposes the use of 3dimensional Row and Column (Qualifier), Time space and multidimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of nonzero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
*Current Status*
In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
It would be great if we can collaboration with the mahout members.
*Members*
The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
 Edward Yoon (edward AT udanax DOT org)
 Chanwit Kaewkasi (chanwit AT gmail DOT com)
 Min Cha (minslovey AT gmail DOT com)
 Antonio Suh (bluesvm AT gmail DOT com)
At least, I and Min Cha will be involved fulltime with this work.
was:
*Introduction*
Hama will develop a highperformance and largescale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively largescale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
Hama approach proposes the use of 3dimensional Row and Column (Qualifier), Time space and multidimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of nonzero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
*Current Status*
In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
It would be great if we can collaboration with the mahout members.
*Members*
The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
 Edward Yoon (edward AT udanax DOT org)
 Chanwit Kaewkasi (chanwit AT gmail DOT com)
 Min Cha (minslovey AT gmail DOT com)
 Antonio Suh (bluesvm AT gmail DOT com)
At least, I and Min Cha will be involved fulltime with this work.
> Hama contrib package for the mahout
> 
>
> Key: MAHOUT16
> URL: https://issues.apache.org/jira/browse/MAHOUT16> Project: Mahout
> Issue Type: New Feature
> Environment: All environment
> Reporter: Edward Yoon
> Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a highperformance and largescale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively largescale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3dimensional Row and Column (Qualifier), Time space and multidimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of nonzero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
>  Edward Yoon (edward AT udanax DOT org)
>  Chanwit Kaewkasi (chanwit AT gmail DOT com)
>  Min Cha (minslovey AT gmail DOT com)
>  Antonio Suh (bluesvm AT gmail DOT com)
> At least, I and Min Cha will be involved fulltime with this work.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.


In reply to this post by JIRA jira@apache.org
[ https://issues.apache.org/jira/browse/MAHOUT16?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel ]
Edward Yoon updated MAHOUT16:

Description:
*Introduction*
Hama will develop a highperformance and largescale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively largescale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
Hama approach proposes the use of 3dimensional Row and Column (Qualifier), Time space and multidimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of nonzero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
*Current Status*
In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
It would be great if we can collaboration with the mahout members.
*Members*
The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
 Edward Yoon (edward AT udanax DOT org)
 Chanwit Kaewkasi (chanwit AT gmail DOT com)
 Min Cha (minslovey AT gmail DOT com)
 Antonio Suh (bluesvm AT gmail DOT com)
At least, I and Min Cha will be involved fulltime with this work.
was:
*Introduction*
Hama will develop a highperformance and largescale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively largescale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
Hama approach proposes the use of 3dimensional Row and Column (Qualifier), Time space and multidimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of nonzero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
*Current Status*
In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
It would be great if we can collaboration with the mahout members.
*Members*
The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
 Edward Yoon (edward AT udanax DOT org)
 Chanwit Kaewkasi (chanwit AT gmail DOT com)
 Min Cha (minslovey AT gmail DOT com)
At least, I and Min Cha will be involved fulltime with this work.
Antonio Suh was joined to this project. He is my fellow worker.
> Hama contrib package for the mahout
> 
>
> Key: MAHOUT16
> URL: https://issues.apache.org/jira/browse/MAHOUT16> Project: Mahout
> Issue Type: New Feature
> Environment: All environment
> Reporter: Edward Yoon
> Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a highperformance and largescale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively largescale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3dimensional Row and Column (Qualifier), Time space and multidimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of nonzero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
>  Edward Yoon (edward AT udanax DOT org)
>  Chanwit Kaewkasi (chanwit AT gmail DOT com)
>  Min Cha (minslovey AT gmail DOT com)
>  Antonio Suh (bluesvm AT gmail DOT com)
> At least, I and Min Cha will be involved fulltime with this work.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.


Sorry, Min Char recieved a bachelor degree in computer science.
And, Antonio(NHN) was joined to hama. He is my fellow worker, too.
Thanks,
Edward.
On 3/12/08, edward yoon < [hidden email]> wrote:
> I received a master's degree in mathematical informatics. Min Cha
> received a Ph.D. degree in parallel architectures. Chanwit Kaewkasi is
> a Ph.D candidate in computer science department at the The University
> of Manchester.
>
> I and Min Cha are the software engineer in service statistics and data
> mining at NHN, corp. ( http://en.wikipedia.org/wiki/NHN)
>
> We will be involved fulltime with this work.
>
> Thanks,
> Edward.
>
> On 3/12/08, Isabel Drost < [hidden email]> wrote:
> > On Wednesday 12 March 2008, Edward Yoon (JIRA) wrote:
> > > At least, I and Min Cha will be involved fulltime with this work.
> >
> >
> > That sounds nice  so you are doing your Ph.D thesis in this area or is your
> > employer interested in the project?
> >
> > Isabel
> >
> >
> >
> > 
> > MSDOS must die!
> > \ _,,,,,_ Web: < http://www.isabeldrost.de>
> > /,`.'`' . ;;;,_
> > ,4 ) ),_..;\ ( `''
> > '''(_/' `'\_) (fL) IM: <xmpp:// [hidden email]>
> >
> >
>
>
> 
> B. Regards,
> Edward yoon @ NHN, corp.
>

B. Regards,
Edward yoon @ NHN, corp.


I don't think you can make a branch of Apache's SVN available to noncommitters
(even if fine grained access level is possible to set up, you still need to log
on to the branch).
Working with JIRA patches is a pain, I agree, but it seems like the only
sensible way to go. Another is to set up an SVN somewhere else (google code),
but this redirects attention from the project some place else. Yet another is to
have a local SVN and make local merges with the other trunk, plus a branch for
one's needs.
D.
Jeff Eastman wrote:
> The benefit of SVN's excellent diff and merge tools are significant,
> especially when working with the large initial patch files this effort would
> require. Once most of the code is in SVN, the individual patches become
> smaller, much more targeted and merging them is much more manageable.
>
> Jeff
>
>> Original Message
>> From: Grant Ingersoll [mailto: [hidden email]]
>> Sent: Wednesday, March 12, 2008 12:25 PM
>> To: [hidden email]
>> Subject: Re: [jira] Updated: (MAHOUT16) Hama contrib package for the
>> mahout
>>
>>
>> On Mar 12, 2008, at 3:12 PM, Jeff Eastman wrote:
>>
>>> Gee, I think SVN can grant permissions on a branch basis. This must
>>> be an
>>> Apache commit policy. Creating and integrating patches against a
>>> branch is
>>> much preferable to diffdiffing. If we had a feature branch, any one
>>> of the
>>> committers could commit the various branch patches, presumably
>>> without as
>>> much due diligence as required for a trunk commit.
>>>
>> Yeah SVN can do it, but everyone would need to have committer status.
>> Even w/ the branch, you still have the same problem of managing all
>> the patches against the branch assuming there are others working on it
>> who are noncommitters.
>
>
>


In reply to this post by JIRA jira@apache.org
[ https://issues.apache.org/jira/browse/MAHOUT16?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12579118#action_12579118 ]
Karl Wettin commented on MAHOUT16:

If I understand everything correct, Hama mainly differs to the Mahout matrix by its native ability to distribute some of the computation made on it?
I want to see at least a few test cases demonstrating the functionality before considering committing. Benchmarks that makes it shine when compared to the Mahout matrix is of course a bonus.
You say it is a bit buggy, can you please be more specific?
There are many empty classes that only by their name suggest what they are thought to be used for. Is there some correspondence regarding these that can be pasted in as javadocs?
> Hama contrib package for the mahout
> 
>
> Key: MAHOUT16
> URL: https://issues.apache.org/jira/browse/MAHOUT16> Project: Mahout
> Issue Type: New Feature
> Environment: All environment
> Reporter: Edward Yoon
> Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a highperformance and largescale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively largescale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3dimensional Row and Column (Qualifier), Time space and multidimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of nonzero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
>  Edward Yoon (edward AT udanax DOT org)
>  Chanwit Kaewkasi (chanwit AT gmail DOT com)
>  Min Cha (minslovey AT gmail DOT com)
>  Antonio Suh (bluesvm AT gmail DOT com)
> At least, I and Min Cha will be involved fulltime with this work.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.


In reply to this post by JIRA jira@apache.org
[ https://issues.apache.org/jira/browse/MAHOUT16?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12579172#action_12579172 ]
Edward Yoon commented on MAHOUT16:

Karl Wetting:
Thanks for your review.
{quote}If I understand everything correct, Hama mainly differs to the Mahout matrix by its native ability to distribute some of the computation made on it?{quote}
Hmm, yes. Additionally, Hama can easily consider many different matrix partitioning strategies for some computation with hbase.
{quote} I want to see at least a few test cases demonstrating the functionality before considering committing. Benchmarks that makes it shine when compared to the Mahout matrix is of course a bonus.{quote}
I have a plan for working off comments and tests in this week. And, we'll try to benchmark its.
{quote}You say it is a bit buggy, can you please be more specific?{quote}
That is a mere inference because I didn't exact check the result yet. (So we much need an tests)
Some issues needs to be a hbase improvement. (e.g. HBASE491)
{quote}There are many empty classes that only by their name suggest what they are thought to be used for. Is there some correspondence regarding these that can be pasted in as javadocs?{quote}
I'll do it this week, too.
> Hama contrib package for the mahout
> 
>
> Key: MAHOUT16
> URL: https://issues.apache.org/jira/browse/MAHOUT16> Project: Mahout
> Issue Type: New Feature
> Environment: All environment
> Reporter: Edward Yoon
> Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a highperformance and largescale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively largescale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3dimensional Row and Column (Qualifier), Time space and multidimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of nonzero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
>  Edward Yoon (edward AT udanax DOT org)
>  Chanwit Kaewkasi (chanwit AT gmail DOT com)
>  Min Cha (minslovey AT gmail DOT com)
>  Antonio Suh (bluesvm AT gmail DOT com)
> At least, I and Min Cha will be involved fulltime with this work.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.


In reply to this post by JIRA jira@apache.org
[ https://issues.apache.org/jira/browse/MAHOUT16?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12579185#action_12579185 ]
Edward Yoon commented on MAHOUT16:

I'm not sure whether hama will become a contrib of the mahout, If proposal goes through the mahout PMC i would ask for a contrib committer privilege of the mahout project to enable me to manage our project and our members and our issues. I would appreciate any advice you could give me.
> Hama contrib package for the mahout
> 
>
> Key: MAHOUT16
> URL: https://issues.apache.org/jira/browse/MAHOUT16> Project: Mahout
> Issue Type: New Feature
> Environment: All environment
> Reporter: Edward Yoon
> Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a highperformance and largescale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively largescale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3dimensional Row and Column (Qualifier), Time space and multidimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of nonzero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
>  Edward Yoon (edward AT udanax DOT org)
>  Chanwit Kaewkasi (chanwit AT gmail DOT com)
>  Min Cha (minslovey AT gmail DOT com)
>  Antonio Suh (bluesvm AT gmail DOT com)
> At least, I and Min Cha will be involved fulltime with this work.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.

12
