[jira] Commented: (MAHOUT-6) Need a matrix implementation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (MAHOUT-6) Need a matrix implementation

Jan Høydahl (Jira)

    [ https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579261#action_12579261 ]

Grant Ingersoll commented on MAHOUT-6:
--------------------------------------

Does it make sense to be able to assign labels to the rows and columns and maybe even have it accessible as a map?  For instance, I think I could use these for the bayesian classifier implementation I am working on and it would make sense to be able to label the features and the labels.  Naturally, I can store the information elsewhere as well, but didn't know whether it made sense to keep the info w/ the matrix.

> Need a matrix implementation
> ----------------------------
>
>                 Key: MAHOUT-6
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>            Assignee: Grant Ingersoll
>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff, MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff, MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff, MAHOUT-6k.diff, MAHOUT-6l.patch
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different reducers
> g) a reasonable set of matrix operations should be supported, these should eventually include:
>     simple matrix-matrix and matrix-vector and matrix-scalar linear algebra operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>     row and column sums  
>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u + beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (MAHOUT-6) Need a matrix implementation

Ted Dunning-3

I have been batting that question back and forth in my own head recently.

It IS absolutely a huge help to have labels.  R has the data.frame to do
this and it helps enormously.  I have done it in some applications and it
saved endless hassle.

On the other hand, there is a real danger that the label functionality would
get sucked into a single implementation.  Labels really are an orthogonal
concern that are (should be) independent of how the matrix is implemented.

So should there really be something like a LabeledMatrix wrapper that
provides this labeling service to any matrix?


On 3/16/08 2:23 PM, "Grant Ingersoll (JIRA)" <[hidden email]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.
> system.issuetabpanels:comment-tabpanel&focusedCommentId=12579261#action_125792
> 61 ]
>
> Grant Ingersoll commented on MAHOUT-6:
> --------------------------------------
>
> Does it make sense to be able to assign labels to the rows and columns and
> maybe even have it accessible as a map?  For instance, I think I could use
> these for the bayesian classifier implementation I am working on and it would
> make sense to be able to label the features and the labels.  Naturally, I can
> store the information elsewhere as well, but didn't know whether it made sense
> to keep the info w/ the matrix.
>
>> Need a matrix implementation
>> ----------------------------
>>
>>                 Key: MAHOUT-6
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
>>             Project: Mahout
>>          Issue Type: New Feature
>>            Reporter: Ted Dunning
>>            Assignee: Grant Ingersoll
>>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff,
>> MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff,
>> MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff, MAHOUT-6k.diff,
>> MAHOUT-6l.patch
>>
>>
>> We need matrices for Mahout.
>> An initial set of basic requirements includes:
>> a) sparse and dense support are required
>> b) row and column labels are important
>> c) serialization for hadoop use is required
>> d) reasonable floating point performance is required, but awesome FP is not
>> e) the API should be simple enough to understand
>> f) it should be easy to carve out sub-matrices for sending to different
>> reducers
>> g) a reasonable set of matrix operations should be supported, these should
>> eventually include:
>>     simple matrix-matrix and matrix-vector and matrix-scalar linear algebra
>> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>>     row and column sums
>>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u +
>> beta v
>> h) easy and efficient iteration constructs, especially for sparse matrices
>> i) easy to extend with new implementations

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (MAHOUT-6) Need a matrix implementation

Jason Rennie-2
Labels are certainly valuable (esp. for text) and if they are somehow built
into the matrix lib, it will make the user's life easier.  I share similar
concerns w/ Ted and think his idea for a LabelWrapper class is a great idea.

Jason

On Sun, Mar 16, 2008 at 5:28 PM, Ted Dunning <[hidden email]> wrote:

>
> I have been batting that question back and forth in my own head recently.
>
> It IS absolutely a huge help to have labels.  R has the data.frame to do
> this and it helps enormously.  I have done it in some applications and it
> saved endless hassle.
>
> On the other hand, there is a real danger that the label functionality
> would
> get sucked into a single implementation.  Labels really are an orthogonal
> concern that are (should be) independent of how the matrix is implemented.
>
> So should there really be something like a LabeledMatrix wrapper that
> provides this labeling service to any matrix?
>
>
> On 3/16/08 2:23 PM, "Grant Ingersoll (JIRA)" <[hidden email]> wrote:
>
> >
> >     [
> >
> https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin
> .
> >
> system.issuetabpanels:comment-tabpanel&focusedCommentId=12579261#action_125792
> > 61 ]
> >
> > Grant Ingersoll commented on MAHOUT-6:
> > --------------------------------------
> >
> > Does it make sense to be able to assign labels to the rows and columns
> and
> > maybe even have it accessible as a map?  For instance, I think I could
> use
> > these for the bayesian classifier implementation I am working on and it
> would
> > make sense to be able to label the features and the labels.  Naturally,
> I can
> > store the information elsewhere as well, but didn't know whether it made
> sense
> > to keep the info w/ the matrix.
> >
> >> Need a matrix implementation
> >> ----------------------------
> >>
> >>                 Key: MAHOUT-6
> >>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
> >>             Project: Mahout
> >>          Issue Type: New Feature
> >>            Reporter: Ted Dunning
> >>            Assignee: Grant Ingersoll
> >>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff,
> >> MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff,
> >> MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff, MAHOUT-6k.diff,
> >> MAHOUT-6l.patch
> >>
> >>
> >> We need matrices for Mahout.
> >> An initial set of basic requirements includes:
> >> a) sparse and dense support are required
> >> b) row and column labels are important
> >> c) serialization for hadoop use is required
> >> d) reasonable floating point performance is required, but awesome FP is
> not
> >> e) the API should be simple enough to understand
> >> f) it should be easy to carve out sub-matrices for sending to different
> >> reducers
> >> g) a reasonable set of matrix operations should be supported, these
> should
> >> eventually include:
> >>     simple matrix-matrix and matrix-vector and matrix-scalar linear
> algebra
> >> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
> >>     row and column sums
> >>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A
> u +
> >> beta v
> >> h) easy and efficient iteration constructs, especially for sparse
> matrices
> >> i) easy to extend with new implementations
>
>


--
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/
Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (MAHOUT-6) Need a matrix implementation

Grant Ingersoll-2
Yeah, +1 on the wrapper idea.

On Mar 17, 2008, at 11:35 AM, Jason Rennie wrote:

> Labels are certainly valuable (esp. for text) and if they are  
> somehow built
> into the matrix lib, it will make the user's life easier.  I share  
> similar
> concerns w/ Ted and think his idea for a LabelWrapper class is a  
> great idea.
>
> Jason
>
> On Sun, Mar 16, 2008 at 5:28 PM, Ted Dunning <[hidden email]>  
> wrote:
>
>>
>> I have been batting that question back and forth in my own head  
>> recently.
>>
>> It IS absolutely a huge help to have labels.  R has the data.frame  
>> to do
>> this and it helps enormously.  I have done it in some applications  
>> and it
>> saved endless hassle.
>>
>> On the other hand, there is a real danger that the label  
>> functionality
>> would
>> get sucked into a single implementation.  Labels really are an  
>> orthogonal
>> concern that are (should be) independent of how the matrix is  
>> implemented.
>>
>> So should there really be something like a LabeledMatrix wrapper that
>> provides this labeling service to any matrix?
>>
>>
>> On 3/16/08 2:23 PM, "Grant Ingersoll (JIRA)" <[hidden email]> wrote:
>>
>>>
>>>    [
>>>
>> https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin
>> .
>>>
>> system.issuetabpanels:comment-
>> tabpanel&focusedCommentId=12579261#action_125792
>>> 61 ]
>>>
>>> Grant Ingersoll commented on MAHOUT-6:
>>> --------------------------------------
>>>
>>> Does it make sense to be able to assign labels to the rows and  
>>> columns
>> and
>>> maybe even have it accessible as a map?  For instance, I think I  
>>> could
>> use
>>> these for the bayesian classifier implementation I am working on  
>>> and it
>> would
>>> make sense to be able to label the features and the labels.  
>>> Naturally,
>> I can
>>> store the information elsewhere as well, but didn't know whether  
>>> it made
>> sense
>>> to keep the info w/ the matrix.
>>>
>>>> Need a matrix implementation
>>>> ----------------------------
>>>>
>>>>                Key: MAHOUT-6
>>>>                URL: https://issues.apache.org/jira/browse/MAHOUT-6
>>>>            Project: Mahout
>>>>         Issue Type: New Feature
>>>>           Reporter: Ted Dunning
>>>>           Assignee: Grant Ingersoll
>>>>        Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff,
>>>> MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff,
>>>> MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff, MAHOUT-6k.diff,
>>>> MAHOUT-6l.patch
>>>>
>>>>
>>>> We need matrices for Mahout.
>>>> An initial set of basic requirements includes:
>>>> a) sparse and dense support are required
>>>> b) row and column labels are important
>>>> c) serialization for hadoop use is required
>>>> d) reasonable floating point performance is required, but awesome  
>>>> FP is
>> not
>>>> e) the API should be simple enough to understand
>>>> f) it should be easy to carve out sub-matrices for sending to  
>>>> different
>>>> reducers
>>>> g) a reasonable set of matrix operations should be supported, these
>> should
>>>> eventually include:
>>>>    simple matrix-matrix and matrix-vector and matrix-scalar linear
>> algebra
>>>> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>>>>    row and column sums
>>>>    generalized level 2 and 3 BLAS primitives, alpha A B + beta C  
>>>> and A
>> u +
>>>> beta v
>>>> h) easy and efficient iteration constructs, especially for sparse
>> matrices
>>>> i) easy to extend with new implementations
>>
>>
>
>
> --
> Jason Rennie
> Head of Machine Learning Technologies, StyleFeeder
> http://www.stylefeeder.com/
> Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (MAHOUT-6) Need a matrix implementation

Isabel Drost-3
On Monday 17 March 2008, Grant Ingersoll wrote:
> Yeah, +1 on the wrapper idea.

+1 on the wrapper as well. Especially as there might be matrix computations
that really don't have labels attached to the matrix.

Isabel

--
Once a word has been allowed to escape, it cannot be recalled. -- Quintus
Horatius Flaccus (Horace)
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>

signature.asc (196 bytes) Download Attachment