[jira] Created: (MAHOUT-16) Hama contrib package for the mahout

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (MAHOUT-16) Hama contrib package for the mahout

JIRA jira@apache.org
Hama contrib package for the mahout
-----------------------------------

                 Key: MAHOUT-16
                 URL: https://issues.apache.org/jira/browse/MAHOUT-16
             Project: Mahout
          Issue Type: New Feature
         Environment: All environment
            Reporter: Edward Yoon


*Introduction*

Hama will develop a high-performance and large-scale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively large-scale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.

Hama approach proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in linear-time, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of non-zero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.

So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.

*Current Status*

In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.

It would be great if we can collaboration with the mahout members.

*Members*

The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.

- Edward Yoon (edward AT udanax DOT org)
- Chanwit Kaewkasi (chanwit AT gmail DOT com)
- Min Cha (minslovey AT gmail DOT com)

At least, I and Min Cha will be involved full-time with this work.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/MAHOUT-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated MAHOUT-16:
------------------------------

    Attachment: hama.tar.gz

This is a current code of hama.
I would appreciate any advice you could give me.

> Hama contrib package for the mahout
> -----------------------------------
>
>                 Key: MAHOUT-16
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-16
>             Project: Mahout
>          Issue Type: New Feature
>         Environment: All environment
>            Reporter: Edward Yoon
>         Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a high-performance and large-scale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively large-scale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in linear-time, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of non-zero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
> - Edward Yoon (edward AT udanax DOT org)
> - Chanwit Kaewkasi (chanwit AT gmail DOT com)
> - Min Cha (minslovey AT gmail DOT com)
> At least, I and Min Cha will be involved full-time with this work.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (MAHOUT-16) Hama contrib package for the mahout

Isabel Drost-3
In reply to this post by JIRA jira@apache.org
On Wednesday 12 March 2008, Edward Yoon (JIRA) wrote:
> At least, I and Min Cha will be involved full-time with this work.

That sounds nice - so you are doing your Ph.D thesis in this area or is your
employer interested in the project?

Isabel


--
MS-DOS must die!
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (MAHOUT-16) Hama contrib package for the mahout

Edward J. Yoon
I received a master's degree in mathematical informatics. Min Cha
received a Ph.D. degree in parallel architectures. Chanwit Kaewkasi is
a Ph.D candidate in computer science department at the The University
of Manchester.

I and Min Cha are the software engineer in service statistics and data
mining at NHN, corp. (http://en.wikipedia.org/wiki/NHN)

We will be involved full-time with this work.

Thanks,
Edward.

On 3/12/08, Isabel Drost <[hidden email]> wrote:

> On Wednesday 12 March 2008, Edward Yoon (JIRA) wrote:
>  > At least, I and Min Cha will be involved full-time with this work.
>
>
> That sounds nice - so you are doing your Ph.D thesis in this area or is your
>  employer interested in the project?
>
>  Isabel
>
>
>
>  --
>  MS-DOS must die!
>   |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>   /,`.-'`'    -.  ;-;;,_
>   |,4-  ) )-,_..;\ (  `'-'
>  '---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>
>
>


--
B. Regards,
Edward yoon @ NHN, corp.
Reply | Threaded
Open this post in threaded view
|

RE: [jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

Jeff Eastman-2-2
In reply to this post by JIRA jira@apache.org
I've downloaded the Hama package and integrated it into my mahout source
tree. The code could use some more comments and a pile of unit tests are
needed to demonstrate its correctness. Other than that, the implementation
seems complementary with MAHOUT-6 and intended for manipulating very large
matrices that are stored in Hbase. The two interfaces (Matrix and
MatrixInterface) are pretty similar and could be coalesced easily to provide
a single abstraction for in-memory and Hbased representations.

If both were in trunk, I'd be willing to work on those tasks.

+1

Jeff

> -----Original Message-----
> From: Edward Yoon (JIRA) [mailto:[hidden email]]
> Sent: Tuesday, March 11, 2008 11:41 PM
> To: [hidden email]
> Subject: [jira] Updated: (MAHOUT-16) Hama contrib package for the mahout
>
>
>      [ https://issues.apache.org/jira/browse/MAHOUT-
> 16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Edward Yoon updated MAHOUT-16:
> ------------------------------
>
>     Attachment: hama.tar.gz
>
> This is a current code of hama.
> I would appreciate any advice you could give me.
>
> > Hama contrib package for the mahout
> > -----------------------------------
> >
> >                 Key: MAHOUT-16
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-16
> >             Project: Mahout
> >          Issue Type: New Feature
> >         Environment: All environment
> >            Reporter: Edward Yoon
> >         Attachments: hama.tar.gz
> >
> >
> > *Introduction*
> > Hama will develop a high-performance and large-scale parallel matrix
> computational package based on Hadoop Map/Reduce. It will be useful for a
> massively large-scale Numerical Analysis and Data Mining, which need the
> intensive computation power of matrix inversion, e.g. linear regression,
> PCA, SVM and etc. It will be also useful for many scientific applications,
> e.g. physics computations, linear algebra, computational fluid dynamics,
> statistics, graphic rendering and many more.
> > Hama approach proposes the use of 3-dimensional Row and Column
> (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase
> (BigTable Clone), which is able to store large sparse and various type of
> matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-
> partitioned sparsity sub-structure will be efficiently managed and
> serviced by Hbase. Row and Column operations can be done in linear-time,
> where several algorithms, such as structured Gaussian elimination or
> iterative methods, run in O(the number of non-zero elements in the matrix
> / number of mappers) time on Hadoop Map/Reduce.
> > So, it has a strong relationship with the mahout project, and it would
> be great if the "hama" can become a contrib project of the mahout.
> > *Current Status*
> > In its current state, the 'hama' is buggy and needs filling out, but
> generalized matrix interface and basic linear algebra operations was
> implemented within a large prototype system. In the future, We need new
> parallel algorithms based on Map/Reduce for performance of heavy
> decompositions and factorizations. It also needs tools to compose an
> arbitrary matrix only with certain data filtered from hbase array
> structure.
> > It would be great if we can collaboration with the mahout members.
> > *Members*
> > The initial set of committers includes folks from the Hadoop & Hbase
> communities, and We have a master's (or Ph.D) degrees in the mathematics
> and computer science.
> > - Edward Yoon (edward AT udanax DOT org)
> > - Chanwit Kaewkasi (chanwit AT gmail DOT com)
> > - Min Cha (minslovey AT gmail DOT com)
> > At least, I and Min Cha will be involved full-time with this work.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.



Reply | Threaded
Open this post in threaded view
|

Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

Karl Wettin
Jeff Eastman skrev:
> I've downloaded the Hama package and integrated it into my mahout source

I will download and take a look at it too, this week(end?).

> matrices that are stored in Hbase. The two interfaces (Matrix and
> MatrixInterface) are pretty similar and could be coalesced easily to provide
> a single abstraction for in-memory and Hbased representations.
>
> If both were in trunk, I'd be willing to work on those tasks.
 >
 > +1

I'm not sure we want to put it in the trunk before they are compatible.


     karl
Reply | Threaded
Open this post in threaded view
|

RE: [jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

Jeff Eastman-2-2
Fair enough, I understand we don't want to publish works-in-progress.
Perhaps we could alternatively put them into a feature branch. Working
outside of SVN with multiple authors is problematic.

Jeff

> -----Original Message-----
> From: Karl Wettin [mailto:[hidden email]]
> Sent: Wednesday, March 12, 2008 11:28 AM
> To: [hidden email]
> Subject: Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the
> mahout
>
> Jeff Eastman skrev:
> > I've downloaded the Hama package and integrated it into my mahout source
>
> I will download and take a look at it too, this week(end?).
>
> > matrices that are stored in Hbase. The two interfaces (Matrix and
> > MatrixInterface) are pretty similar and could be coalesced easily to
> provide
> > a single abstraction for in-memory and Hbased representations.
> >
> > If both were in trunk, I'd be willing to work on those tasks.
>  >
>  > +1
>
> I'm not sure we want to put it in the trunk before they are compatible.
>
>
>      karl


Reply | Threaded
Open this post in threaded view
|

Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

Grant Ingersoll-2

On Mar 12, 2008, at 2:31 PM, Jeff Eastman wrote:

> Fair enough, I understand we don't want to publish works-in-progress.
> Perhaps we could alternatively put them into a feature branch. Working
> outside of SVN with multiple authors is problematic.

That doesn't really work either, since we can't give permissions that  
way, either, so you would just be creating patches against the  
branch.  Uploading and applying patches really is the way to go.
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

Karl Wettin
In reply to this post by Jeff Eastman-2-2
To avoid the situation from the start I've always tried to make a
comment in the issue as soon I think on something, what I plan to work
on and usually if I start to work on something. That way at least people
know where to stay out.

But honestly, I was never a part of any issue involving that many people
or was that active. :)

Is it perhaps possible to diff two patches?

      karl


Jeff Eastman skrev:

> Fair enough, I understand we don't want to publish works-in-progress.
> Perhaps we could alternatively put them into a feature branch. Working
> outside of SVN with multiple authors is problematic.
>
> Jeff
>
>> -----Original Message-----
>> From: Karl Wettin [mailto:[hidden email]]
>> Sent: Wednesday, March 12, 2008 11:28 AM
>> To: [hidden email]
>> Subject: Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the
>> mahout
>>
>> Jeff Eastman skrev:
>>> I've downloaded the Hama package and integrated it into my mahout source
>> I will download and take a look at it too, this week(end?).
>>
>>> matrices that are stored in Hbase. The two interfaces (Matrix and
>>> MatrixInterface) are pretty similar and could be coalesced easily to
>> provide
>>> a single abstraction for in-memory and Hbased representations.
>>>
>>> If both were in trunk, I'd be willing to work on those tasks.
>>  >
>>  > +1
>>
>> I'm not sure we want to put it in the trunk before they are compatible.
>>
>>
>>      karl
>
>

Reply | Threaded
Open this post in threaded view
|

RE: [jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

Jeff Eastman-2-2
In reply to this post by Grant Ingersoll-2
Gee, I think SVN can grant permissions on a branch basis. This must be an
Apache commit policy. Creating and integrating patches against a branch is
much preferable to diff-diffing. If we had a feature branch, any one of the
committers could commit the various branch patches, presumably without as
much due diligence as required for a trunk commit.

Jeff

> -----Original Message-----
> From: Grant Ingersoll [mailto:[hidden email]]
> Sent: Wednesday, March 12, 2008 11:43 AM
> To: [hidden email]
> Subject: Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the
> mahout
>
>
> On Mar 12, 2008, at 2:31 PM, Jeff Eastman wrote:
>
> > Fair enough, I understand we don't want to publish works-in-progress.
> > Perhaps we could alternatively put them into a feature branch. Working
> > outside of SVN with multiple authors is problematic.
>
> That doesn't really work either, since we can't give permissions that
> way, either, so you would just be creating patches against the
> branch.  Uploading and applying patches really is the way to go.


Reply | Threaded
Open this post in threaded view
|

Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

Grant Ingersoll-2

On Mar 12, 2008, at 3:12 PM, Jeff Eastman wrote:

> Gee, I think SVN can grant permissions on a branch basis. This must  
> be an
> Apache commit policy. Creating and integrating patches against a  
> branch is
> much preferable to diff-diffing. If we had a feature branch, any one  
> of the
> committers could commit the various branch patches, presumably  
> without as
> much due diligence as required for a trunk commit.
>

Yeah SVN can do it, but everyone would need to have committer status.  
Even w/ the branch, you still have the same problem of managing all  
the patches against the branch assuming there are others working on it  
who are non-committers.

Reply | Threaded
Open this post in threaded view
|

RE: [jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

Jeff Eastman-2-2
The benefit of SVN's excellent diff and merge tools are significant,
especially when working with the large initial patch files this effort would
require. Once most of the code is in SVN, the individual patches become
smaller, much more targeted and merging them is much more manageable.

Jeff

> -----Original Message-----
> From: Grant Ingersoll [mailto:[hidden email]]
> Sent: Wednesday, March 12, 2008 12:25 PM
> To: [hidden email]
> Subject: Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the
> mahout
>
>
> On Mar 12, 2008, at 3:12 PM, Jeff Eastman wrote:
>
> > Gee, I think SVN can grant permissions on a branch basis. This must
> > be an
> > Apache commit policy. Creating and integrating patches against a
> > branch is
> > much preferable to diff-diffing. If we had a feature branch, any one
> > of the
> > committers could commit the various branch patches, presumably
> > without as
> > much due diligence as required for a trunk commit.
> >
>
> Yeah SVN can do it, but everyone would need to have committer status.
> Even w/ the branch, you still have the same problem of managing all
> the patches against the branch assuming there are others working on it
> who are non-committers.



Reply | Threaded
Open this post in threaded view
|

Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

Edward J. Yoon
In reply to this post by Jeff Eastman-2-2
> Fair enough, I understand we don't want to publish works-in-progress.
>  Perhaps we could alternatively put them into a feature branch. Working
>  outside of SVN with multiple authors is problematic.

Dear Jeff, I agree with you, and think it's a good method.
I also hope we can share new ideas and learn from our experiences in here.

Thanks,
Edward.

On 3/13/08, Jeff Eastman <[hidden email]> wrote:

> Fair enough, I understand we don't want to publish works-in-progress.
>  Perhaps we could alternatively put them into a feature branch. Working
>  outside of SVN with multiple authors is problematic.
>
>
>  Jeff
>
>
>  > -----Original Message-----
>  > From: Karl Wettin [mailto:[hidden email]]
>  > Sent: Wednesday, March 12, 2008 11:28 AM
>  > To: [hidden email]
>
> > Subject: Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the
>  > mahout
>  >
>
> > Jeff Eastman skrev:
>  > > I've downloaded the Hama package and integrated it into my mahout source
>  >
>  > I will download and take a look at it too, this week(end?).
>  >
>  > > matrices that are stored in Hbase. The two interfaces (Matrix and
>  > > MatrixInterface) are pretty similar and could be coalesced easily to
>  > provide
>  > > a single abstraction for in-memory and Hbased representations.
>  > >
>  > > If both were in trunk, I'd be willing to work on those tasks.
>  >  >
>  >  > +1
>  >
>  > I'm not sure we want to put it in the trunk before they are compatible.
>  >
>  >
>  >      karl
>
>
>


--
B. Regards,
Edward yoon @ NHN, corp.
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/MAHOUT-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated MAHOUT-16:
------------------------------

    Description:
*Introduction*

Hama will develop a high-performance and large-scale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively large-scale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.

Hama approach proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in linear-time, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of non-zero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.

So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.

*Current Status*

In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.

It would be great if we can collaboration with the mahout members.

*Members*

The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.

- Edward Yoon (edward AT udanax DOT org)
- Chanwit Kaewkasi (chanwit AT gmail DOT com)
- Min Cha (minslovey AT gmail DOT com)
- Antonio Suh (bluesvm AT gmail DOT com)

At least, I and Min Cha will be involved full-time with this work.

  was:
*Introduction*

Hama will develop a high-performance and large-scale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively large-scale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.

Hama approach proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in linear-time, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of non-zero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.

So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.

*Current Status*

In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.

It would be great if we can collaboration with the mahout members.

*Members*

The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.

- Edward Yoon (edward AT udanax DOT org)
- Chanwit Kaewkasi (chanwit AT gmail DOT com)
- Min Cha (minslovey AT gmail DOT com)
- Antonio Suh (bluesvm AT gmail DOT com)
At least, I and Min Cha will be involved full-time with this work.


> Hama contrib package for the mahout
> -----------------------------------
>
>                 Key: MAHOUT-16
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-16
>             Project: Mahout
>          Issue Type: New Feature
>         Environment: All environment
>            Reporter: Edward Yoon
>         Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a high-performance and large-scale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively large-scale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in linear-time, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of non-zero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
> - Edward Yoon (edward AT udanax DOT org)
> - Chanwit Kaewkasi (chanwit AT gmail DOT com)
> - Min Cha (minslovey AT gmail DOT com)
> - Antonio Suh (bluesvm AT gmail DOT com)
> At least, I and Min Cha will be involved full-time with this work.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/MAHOUT-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated MAHOUT-16:
------------------------------

    Description:
*Introduction*

Hama will develop a high-performance and large-scale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively large-scale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.

Hama approach proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in linear-time, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of non-zero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.

So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.

*Current Status*

In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.

It would be great if we can collaboration with the mahout members.

*Members*

The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.

- Edward Yoon (edward AT udanax DOT org)
- Chanwit Kaewkasi (chanwit AT gmail DOT com)
- Min Cha (minslovey AT gmail DOT com)
- Antonio Suh (bluesvm AT gmail DOT com)
At least, I and Min Cha will be involved full-time with this work.

  was:
*Introduction*

Hama will develop a high-performance and large-scale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively large-scale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.

Hama approach proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in linear-time, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of non-zero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.

So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.

*Current Status*

In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.

It would be great if we can collaboration with the mahout members.

*Members*

The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.

- Edward Yoon (edward AT udanax DOT org)
- Chanwit Kaewkasi (chanwit AT gmail DOT com)
- Min Cha (minslovey AT gmail DOT com)

At least, I and Min Cha will be involved full-time with this work.


Antonio Suh was joined to this project. He is my fellow worker.


> Hama contrib package for the mahout
> -----------------------------------
>
>                 Key: MAHOUT-16
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-16
>             Project: Mahout
>          Issue Type: New Feature
>         Environment: All environment
>            Reporter: Edward Yoon
>         Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a high-performance and large-scale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively large-scale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in linear-time, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of non-zero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
> - Edward Yoon (edward AT udanax DOT org)
> - Chanwit Kaewkasi (chanwit AT gmail DOT com)
> - Min Cha (minslovey AT gmail DOT com)
> - Antonio Suh (bluesvm AT gmail DOT com)
> At least, I and Min Cha will be involved full-time with this work.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (MAHOUT-16) Hama contrib package for the mahout

Edward J. Yoon
In reply to this post by Edward J. Yoon
Sorry, Min Char recieved a bachelor degree in computer science.
And, Antonio(NHN) was joined to hama. He is my fellow worker, too.

Thanks,
Edward.

On 3/12/08, edward yoon <[hidden email]> wrote:

> I received a master's degree in mathematical informatics. Min Cha
> received a Ph.D. degree in parallel architectures. Chanwit Kaewkasi is
> a Ph.D candidate in computer science department at the The University
> of Manchester.
>
> I and Min Cha are the software engineer in service statistics and data
> mining at NHN, corp. (http://en.wikipedia.org/wiki/NHN)
>
> We will be involved full-time with this work.
>
> Thanks,
> Edward.
>
> On 3/12/08, Isabel Drost <[hidden email]> wrote:
> > On Wednesday 12 March 2008, Edward Yoon (JIRA) wrote:
> >  > At least, I and Min Cha will be involved full-time with this work.
> >
> >
> > That sounds nice - so you are doing your Ph.D thesis in this area or is your
> >  employer interested in the project?
> >
> >  Isabel
> >
> >
> >
> >  --
> >  MS-DOS must die!
> >   |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
> >   /,`.-'`'    -.  ;-;;,_
> >   |,4-  ) )-,_..;\ (  `'-'
> >  '---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>
> >
> >
>
>
> --
> B. Regards,
> Edward yoon @ NHN, corp.
>


--
B. Regards,
Edward yoon @ NHN, corp.
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the mahout

Dawid Weiss
In reply to this post by Jeff Eastman-2-2

I don't think you can make a branch of Apache's SVN available to non-committers
(even if fine grained access level is possible to set up, you still need to log
on to the branch).

Working with JIRA patches is a pain, I agree, but it seems like the only
sensible way to go. Another is to set up an SVN somewhere else (google code),
but this redirects attention from the project some place else. Yet another is to
have a local SVN and make local merges with the other trunk, plus a branch for
one's needs.

D.


Jeff Eastman wrote:

> The benefit of SVN's excellent diff and merge tools are significant,
> especially when working with the large initial patch files this effort would
> require. Once most of the code is in SVN, the individual patches become
> smaller, much more targeted and merging them is much more manageable.
>
> Jeff
>
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:[hidden email]]
>> Sent: Wednesday, March 12, 2008 12:25 PM
>> To: [hidden email]
>> Subject: Re: [jira] Updated: (MAHOUT-16) Hama contrib package for the
>> mahout
>>
>>
>> On Mar 12, 2008, at 3:12 PM, Jeff Eastman wrote:
>>
>>> Gee, I think SVN can grant permissions on a branch basis. This must
>>> be an
>>> Apache commit policy. Creating and integrating patches against a
>>> branch is
>>> much preferable to diff-diffing. If we had a feature branch, any one
>>> of the
>>> committers could commit the various branch patches, presumably
>>> without as
>>> much due diligence as required for a trunk commit.
>>>
>> Yeah SVN can do it, but everyone would need to have committer status.
>> Even w/ the branch, you still have the same problem of managing all
>> the patches against the branch assuming there are others working on it
>> who are non-committers.
>
>
>
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-16) Hama contrib package for the mahout

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/MAHOUT-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579118#action_12579118 ]

Karl Wettin commented on MAHOUT-16:
-----------------------------------

If I understand everything correct, Hama mainly differs to the Mahout matrix by its native ability to distribute some of the computation made on it?

I want to see at least a few test cases demonstrating the functionality before considering committing. Benchmarks that makes it shine when compared to the Mahout matrix is of course a bonus.

You say it is a bit buggy, can you please be more specific?

There are many empty classes that only by their name suggest what they are thought to be used for. Is there some correspondence regarding these that can be pasted in as javadocs?

> Hama contrib package for the mahout
> -----------------------------------
>
>                 Key: MAHOUT-16
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-16
>             Project: Mahout
>          Issue Type: New Feature
>         Environment: All environment
>            Reporter: Edward Yoon
>         Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a high-performance and large-scale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively large-scale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in linear-time, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of non-zero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
> - Edward Yoon (edward AT udanax DOT org)
> - Chanwit Kaewkasi (chanwit AT gmail DOT com)
> - Min Cha (minslovey AT gmail DOT com)
> - Antonio Suh (bluesvm AT gmail DOT com)
> At least, I and Min Cha will be involved full-time with this work.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-16) Hama contrib package for the mahout

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/MAHOUT-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579172#action_12579172 ]

Edward Yoon commented on MAHOUT-16:
-----------------------------------

Karl Wetting:

Thanks for your review.

{quote}If I understand everything correct, Hama mainly differs to the Mahout matrix by its native ability to distribute some of the computation made on it?{quote}

Hmm, yes. Additionally, Hama can easily consider many different matrix partitioning strategies for some computation with hbase.

{quote} I want to see at least a few test cases demonstrating the functionality before considering committing. Benchmarks that makes it shine when compared to the Mahout matrix is of course a bonus.{quote}
I have a plan for working off comments and tests in this week. And, we'll try to benchmark its.

{quote}You say it is a bit buggy, can you please be more specific?{quote}
That is a mere inference because I didn't exact check the result yet. (So we much need an tests)
Some issues needs to be a hbase improvement. (e.g. HBASE-491)
{quote}There are many empty classes that only by their name suggest what they are thought to be used for. Is there some correspondence regarding these that can be pasted in as javadocs?{quote}
I'll do it this week, too.



> Hama contrib package for the mahout
> -----------------------------------
>
>                 Key: MAHOUT-16
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-16
>             Project: Mahout
>          Issue Type: New Feature
>         Environment: All environment
>            Reporter: Edward Yoon
>         Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a high-performance and large-scale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively large-scale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in linear-time, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of non-zero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
> - Edward Yoon (edward AT udanax DOT org)
> - Chanwit Kaewkasi (chanwit AT gmail DOT com)
> - Min Cha (minslovey AT gmail DOT com)
> - Antonio Suh (bluesvm AT gmail DOT com)
> At least, I and Min Cha will be involved full-time with this work.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-16) Hama contrib package for the mahout

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/MAHOUT-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579185#action_12579185 ]

Edward Yoon commented on MAHOUT-16:
-----------------------------------

I'm not sure whether hama will become a contrib of the mahout, If proposal goes through the mahout PMC i would ask for a contrib committer privilege of the mahout project to enable me to manage our project and our members and our issues. I would appreciate any advice you could give me.


> Hama contrib package for the mahout
> -----------------------------------
>
>                 Key: MAHOUT-16
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-16
>             Project: Mahout
>          Issue Type: New Feature
>         Environment: All environment
>            Reporter: Edward Yoon
>         Attachments: hama.tar.gz
>
>
> *Introduction*
> Hama will develop a high-performance and large-scale parallel matrix computational package based on Hadoop Map/Reduce. It will be useful for a massively large-scale Numerical Analysis and Data Mining, which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
> Hama approach proposes the use of 3-dimensional Row and Column (Qualifier), Time space and multi-dimensional Columnfamilies of Hbase (BigTable Clone), which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). its auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in linear-time, where several algorithms, such as structured Gaussian elimination or iterative methods, run in O(the number of non-zero elements in the matrix / number of mappers) time on Hadoop Map/Reduce.
> So, it has a strong relationship with the mahout project, and it would be great if the "hama" can become a contrib project of the mahout.
> *Current Status*
> In its current state, the 'hama' is buggy and needs filling out, but generalized matrix interface and basic linear algebra operations was implemented within a large prototype system. In the future, We need new parallel algorithms based on Map/Reduce for performance of heavy decompositions and factorizations. It also needs tools to compose an arbitrary matrix only with certain data filtered from hbase array structure.
> It would be great if we can collaboration with the mahout members.
> *Members*
> The initial set of committers includes folks from the Hadoop & Hbase communities, and We have a master's (or Ph.D) degrees in the mathematics and computer science.
> - Edward Yoon (edward AT udanax DOT org)
> - Chanwit Kaewkasi (chanwit AT gmail DOT com)
> - Min Cha (minslovey AT gmail DOT com)
> - Antonio Suh (bluesvm AT gmail DOT com)
> At least, I and Min Cha will be involved full-time with this work.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

12