Fast Feather Track

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Fast Feather Track

Isabel Drost-3

Hello,

my proposal for presenting our project at the Fast Feather session at Apache
Con EU was accepted.

I am currently about to prepare the slides for my talk. I would like to
include one slide on the project members that were so crazy to start all this
half a year ago. It would be nice if I could add a little picture of each of
you, so there is a face beside the name ;)

Please find the initial slides at the following url:
http://www.isabel-drost.de/mahout_fast_feather.odp

If you have any comments on what is missing or should be done differently - I
am happy about any feedback, criticism, ... :)

Isabel


--
It'll be a nice world if they ever get it finished.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fast Feather Track

Ted Dunning-3

See here for a picture of me: http://www.veoh.com/users/ted


On 3/30/08 1:29 PM, "Isabel Drost" <[hidden email]> wrote:

>
> Hello,
>
> my proposal for presenting our project at the Fast Feather session at Apache
> Con EU was accepted.
>
> I am currently about to prepare the slides for my talk. I would like to
> include one slide on the project members that were so crazy to start all this
> half a year ago. It would be nice if I could add a little picture of each of
> you, so there is a face beside the name ;)
>
> Please find the initial slides at the following url:
> http://www.isabel-drost.de/mahout_fast_feather.odp
>
> If you have any comments on what is missing or should be done differently - I
> am happy about any feedback, criticism, ... :)
>
> Isabel
>

Reply | Threaded
Open this post in threaded view
|

Re: Fast Feather Track

Isabel Drost-3
In reply to this post by Isabel Drost-3
I have added a pdf version for those that do not have oo:

http://www.isabel-drost.de/mahout_fast_feather.pdf

This evening, I will add the missing content of the "Problem setting" slide
and refactor the "Who we are" slide with your pictures and the missing names.

Isabel

--
Most people want either less corruption or more of a chance to participate in
it.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fast Feather Track

Karl Wettin
Isabel Drost skrev:
 > I have added a pdf version for those that do not have oo:
 >
 > http://www.isabel-drost.de/mahout_fast_feather.pdf
 >
 > This evening, I will add the missing content of the "Problem setting"

I think it is worth listing all the algorithms people have submitted as
GSoC proposals. It is an amazingly large group of people when you
consider at how long the project has been around.

I also think you should add an introduction slide to ML so people that
does not yet know they can benefit from it will understand. Perhaps that
is the same thing as the "Problem setting"? I'll rant on though.

You already mention the many relationships with Lucene and that text
mining probably will be something big. How about listing some examples,
starting with the various pseudo-ML stuff already in existing in various
Lucene trunks, and perhaps how the new algorithms could improve or add
features to structured an unstructured data already available in their
applications.

Nutch has an ngram based language identifier. Lucene has a "more like
this" feature. Carrot cluster search results. LingPipe does a whole lot
of things with text I think many would like to see in Mahout.


One important thing is that people might not be aware that they store
structured minable data. There is a lot of facetted classifications,
tags, ratings and what not that is not used to its full potential.

There is more minable data to be extracted everywhere and it can often
be used as feedback to improve it self. (Did you ever make music on a
modular synthesizer?)

A photo site could extract social networks by using facial biometrics to
find out who is who in pictures. This social network can then be used to
improve the quality of the biometric classifer.

The site could further expand the social network by looking at who
writes comments on whos pictures. Trust between users could be evalutaed
and used to pune what ratings to extract from the from text comments to
picutes, ratings used be feed to collaborate filtering used by users to
find new interesting photographers and by the site to show ads that the
user is more probable to be interested in.

And so on.


     karl

Reply | Threaded
Open this post in threaded view
|

Re: Fast Feather Track

Lukáš Vlček
Hi,

Nice presentation! I regret I can't attend...

As a side note - try remember how do people react to the log draft and what
they say about it. This information could help me to shape it into final
version.

Regards,
Lukas

On Mon, Mar 31, 2008 at 3:27 PM, Karl Wettin <[hidden email]> wrote:

> Isabel Drost skrev:
>  > I have added a pdf version for those that do not have oo:
>  >
>  > http://www.isabel-drost.de/mahout_fast_feather.pdf
>  >
>  > This evening, I will add the missing content of the "Problem setting"
>
> I think it is worth listing all the algorithms people have submitted as
> GSoC proposals. It is an amazingly large group of people when you
> consider at how long the project has been around.
>
> I also think you should add an introduction slide to ML so people that
> does not yet know they can benefit from it will understand. Perhaps that
> is the same thing as the "Problem setting"? I'll rant on though.
>
> You already mention the many relationships with Lucene and that text
> mining probably will be something big. How about listing some examples,
> starting with the various pseudo-ML stuff already in existing in various
> Lucene trunks, and perhaps how the new algorithms could improve or add
> features to structured an unstructured data already available in their
> applications.
>
> Nutch has an ngram based language identifier. Lucene has a "more like
> this" feature. Carrot cluster search results. LingPipe does a whole lot
> of things with text I think many would like to see in Mahout.
>
>
> One important thing is that people might not be aware that they store
> structured minable data. There is a lot of facetted classifications,
> tags, ratings and what not that is not used to its full potential.
>
> There is more minable data to be extracted everywhere and it can often
> be used as feedback to improve it self. (Did you ever make music on a
> modular synthesizer?)
>
> A photo site could extract social networks by using facial biometrics to
> find out who is who in pictures. This social network can then be used to
> improve the quality of the biometric classifer.
>
> The site could further expand the social network by looking at who
> writes comments on whos pictures. Trust between users could be evalutaed
> and used to pune what ratings to extract from the from text comments to
> picutes, ratings used be feed to collaborate filtering used by users to
> find new interesting photographers and by the site to show ads that the
> user is more probable to be interested in.
>
> And so on.
>
>
>     karl
>
>


--
http://blog.lukas-vlcek.com/
Reply | Threaded
Open this post in threaded view
|

Re: Fast Feather Track

Isabel Drost-3
In reply to this post by Karl Wettin
On Monday 31 March 2008, Karl Wettin wrote:
> I think it is worth listing all the algorithms people have submitted as
> GSoC proposals. It is an amazingly large group of people when you
> consider at how long the project has been around.

+1 Thanks for the comment - added them. Looks really impressive now -
unfortunately I guess the list was outdated at the moment I wrote it down ;)


> I also think you should add an introduction slide to ML so people that
> does not yet know they can benefit from it will understand. Perhaps that
> is the same thing as the "Problem setting"? I'll rant on though.

+1 Thanks for ranting it. It should be the same as "Problem setting". Waking
up this morning I still think the essential part of learning models from data
is still missing - despite the many application examples. Will add that this
afternoon.


> Nutch has an ngram based language identifier. Lucene has a "more like
> this" feature. Carrot cluster search results. LingPipe does a whole lot
> of things with text I think many would like to see in Mahout.

Any other examples? I will add these to the next version. (Did not have that
mail when I made the corresponding slide.


> One important thing is that people might not be aware that they store
> structured minable data. There is a lot of facetted classifications,
> tags, ratings and what not that is not used to its full potential.

I tried to give a few examples on the Problem Setting slide. Maybe this slide
can move further back into some "We need you/what can you do with Mahout"
context and at the Problem setting I would put a slide on learning models
from data. Thanks for the examples you gave.


Isabel


--
If you wait long enough, it will go away... after having done its damage.If it
was bad, it will be back.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fast Feather Track

Isabel Drost-3
In reply to this post by Lukáš Vlček
On Monday 31 March 2008, Lukas Vlcek wrote:
> As a side note - try remember how do people react to the log draft and what
> they say about it. This information could help me to shape it into final
> version.

Sure! :)

Isabel

--
A foolish consistency is the hobgoblin of little minds. -- Ralph Waldo
Emerson
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fast Feather Track

Karl Wettin
In reply to this post by Isabel Drost-3
Isabel Drost skrev:
> On Monday 31 March 2008, Karl Wettin wrote:
>> Nutch has an ngram based language identifier. Lucene has a "more like
>> this" feature. Carrot cluster search results. LingPipe does a whole lot
>> of things with text I think many would like to see in Mahout.
>
> Any other examples? I will add these to the next version. (Did not have that
> mail when I made the corresponding slide.

Some "did you mean" must count as machine learning. Nice example where
there is no need for other data than users correcting their own typos,
accepting/declining suggestions and inspecting results. (Reinforcement
learning)


       karl