[jira] Commented: (MAHOUT-6) Need a matrix implementation

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-6) Need a matrix implementation

Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576371#action_12576371 ]

Jason Rennie commented on MAHOUT-6:
-----------------------------------

Re: Jeff

Sounds good.  It think I might actually have some time to do this.  One thing I didn't see when looking through the last patch was basic matrix/vector operations.  I'll go ahead and include a dot-product method to exhibit how it'd work and do some speed comparisons vs. a HashMap impl.

Yeah, would definitely be good to get this stuff in trunk, if only to make it easier to read/access! :)

Re: Ted

Didn't realize HashMaps were so fast.  Will be good to revisit the testing I did earlier.  Agreed on the CRS benefits.

One way I get around the sorted constraint while constructing a sparse vector is a SparseVectorBuilder class.  It basically has two methods: void add(int _idx, double _val), and SparseVector build().  Avoids having to keep state within the SparseVector.


> Need a matrix implementation
> ----------------------------
>
>                 Key: MAHOUT-6
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff, MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff, MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different reducers
> g) a reasonable set of matrix operations should be supported, these should eventually include:
>     simple matrix-matrix and matrix-vector and matrix-scalar linear algebra operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>     row and column sums  
>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u + beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

RE: [jira] Commented: (MAHOUT-6) Need a matrix implementation

Jeff Eastman-2
Vector has both dot() and cross() products. Are you looking at the
latest .diff?

I can easily move the vector package stuff back into matrix. It used to
be there and I moved it into its own package just to "organize" it
better. You are correct; however, that having them in the same package
would allow protected methods to be shared between the implementations.

Moving all the matrix and vector operations into
org.apache.mahout.matrix is also worth discussing. I can see its
importance is, perhaps, more than just as a utility.

Jeff



-----Original Message-----
From: Jason Rennie (JIRA) [mailto:[hidden email]]
Sent: Friday, March 07, 2008 1:00 PM
To: [hidden email]
Subject: [jira] Commented: (MAHOUT-6) Need a matrix implementation


    [
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576371#a
ction_12576371 ]

Jason Rennie commented on MAHOUT-6:
-----------------------------------

Re: Jeff

Sounds good.  It think I might actually have some time to do this.  One
thing I didn't see when looking through the last patch was basic
matrix/vector operations.  I'll go ahead and include a dot-product
method to exhibit how it'd work and do some speed comparisons vs. a
HashMap impl.

Yeah, would definitely be good to get this stuff in trunk, if only to
make it easier to read/access! :)

Re: Ted

Didn't realize HashMaps were so fast.  Will be good to revisit the
testing I did earlier.  Agreed on the CRS benefits.

One way I get around the sorted constraint while constructing a sparse
vector is a SparseVectorBuilder class.  It basically has two methods:
void add(int _idx, double _val), and SparseVector build().  Avoids
having to keep state within the SparseVector.


> Need a matrix implementation
> ----------------------------
>
>                 Key: MAHOUT-6
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff,
MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff,
MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP
is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to
different reducers
> g) a reasonable set of matrix operations should be supported, these
should eventually include:
>     simple matrix-matrix and matrix-vector and matrix-scalar linear
algebra operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>     row and column sums  
>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and
A u + beta v
> h) easy and efficient iteration constructs, especially for sparse
matrices
> i) easy to extend with new implementations

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (MAHOUT-6) Need a matrix implementation

Ted Dunning-3
In reply to this post by Nick Burch (Jira)

Sounds good.  I still like the idea of allowing updates, though, for people
with less discipline.


On 3/7/08 12:59 PM, "Jason Rennie (JIRA)" <[hidden email]> wrote:

> One way I get around the sorted constraint while constructing a sparse vector
> is a SparseVectorBuilder class.

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (MAHOUT-6) Need a matrix implementation

Jason Rennie-2
In reply to this post by Jeff Eastman-2
On Fri, Mar 7, 2008 at 7:39 PM, Jeff Eastman <[hidden email]> wrote:

> Vector has both dot() and cross() products. Are you looking at the
> latest .diff?


My bad, I was looking in the wrong place...

Jason