Thoughts on GSOC

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Thoughts on GSOC

Grant Ingersoll-2
Just thought I would mention a couple of things about GSOC.

First, thanks to all for the interest.  It is great to see ML alive  
and well in universities.  We have several good applications already.  
Since the ASF is only alloted a certain number of picks, we go through  
an internal ranking process to decide which projects get them (they  
try to spread them out over many projects).  So, while we have 4  
willing mentors, it doesn't necessarily mean we will get that many  
students.

However, to go along with that I would encourage all students to go  
through their proposals and make sure they can fill in details about  
their plans as much as possible, as well as their bios, etc.  Also,  
please don't be shy about discussing your plans here.  One of the  
requirements of doing this project is going to be to interact with  
your mentor and the community.

Finally, I would certainly like to encourage those who don't get  
selected to stick around and contribute.  I am sure others in the  
community can vouch for this: being an active contributor/committer to  
a project like this is a real edge when it comes to getting a job  
doing what you want to do. Think about being in a job interview for a  
ML company and saying, yeah: I contributed algorithm X to Mahout which  
was used to by Y on a 100 node cluster, or I contributed these 10  
patches to Mahout plus I'm an active discussion participant, go look  
it up.  It gives potential employers an incredible track record to  
review and shows you know how to get along with others in a "work"  
environment.

At any rate, enough of the pep talk.  Good luck to you all, I look  
forward to evaluating the ideas!

Cheers,
Grant
Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on GSOC

Khalil Honsali
Interesting post!

I agree with the idea of contributing even if not for GSoC :)

K. Honsali

On 29/03/2008, Grant Ingersoll <[hidden email]> wrote:

>
> Just thought I would mention a couple of things about GSOC.
>
> First, thanks to all for the interest.  It is great to see ML alive
> and well in universities.  We have several good applications already.
> Since the ASF is only alloted a certain number of picks, we go through
> an internal ranking process to decide which projects get them (they
> try to spread them out over many projects).  So, while we have 4
> willing mentors, it doesn't necessarily mean we will get that many
> students.
>
> However, to go along with that I would encourage all students to go
> through their proposals and make sure they can fill in details about
> their plans as much as possible, as well as their bios, etc.  Also,
> please don't be shy about discussing your plans here.  One of the
> requirements of doing this project is going to be to interact with
> your mentor and the community.
>
> Finally, I would certainly like to encourage those who don't get
> selected to stick around and contribute.  I am sure others in the
> community can vouch for this: being an active contributor/committer to
> a project like this is a real edge when it comes to getting a job
> doing what you want to do. Think about being in a job interview for a
> ML company and saying, yeah: I contributed algorithm X to Mahout which
> was used to by Y on a 100 node cluster, or I contributed these 10
> patches to Mahout plus I'm an active discussion participant, go look
> it up.  It gives potential employers an incredible track record to
> review and shows you know how to get along with others in a "work"
> environment.
>
> At any rate, enough of the pep talk.  Good luck to you all, I look
> forward to evaluating the ideas!
>
> Cheers,
>
> Grant
>
Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on GSOC

sarp
In reply to this post by Grant Ingersoll-2
Summary:
I'm very excited about Mahout project and my goal is to become one of the committers for this project. To start with, I'm willing to implement one of the algorithms that won't be covered by GSoC students. What I'm looking for is a mentor who would be willing to supervise me like a GSoC student, even though I won't be one, and help me in formulating these algorithms in the Map/Reduce framework, as well as provide feedback about design/implementation issues I will face during this process.

About Me:
I'm a Fulbright scholar from Turkey, studying at Georgia Institute of Technology for a M.S. degree in CS with a focus on machine learning. I've successfully participated in GSoC last year [1], and worked with OpenMRS organization on "Patient Matching and Record Linkage" project [2]. We used EM algorithm for estimating parameters in our statistical model, in order to predict whether two records belong to the same patient or not. I will be doing an internship during summer, so I thought it would be fair to give others who have more time on their hands a chance to experience GSoC, that's why I won't be applying this time.

My Background:
I've taken courses on Statistical Modeling, Data Mining [3], Machine Learning [4] and Computational Data Analysis [5], so I have the necessary background to implement most of the algorithms mentioned in the original paper [6]. Most of my experience is regarding unsupervised learning, I've implemented k-means and hiearchical clustering before, and currently I am working on implementing k-medoids and constrained k-means algorithms [7] in C++ for FASTlib (A library of Fundamental Algorithmic and Statistical Tools) [8]. My motivation for participating in this project is to enhance my understanding of ML algorithms by implementing them, learn about Map/Reduce framework, and possibly make use of Mahout in my MS research project.

Links:
[1] http://googlesummerofcode.blogspot.com/2007/12/friday-fulbright-and-rhodes.html
[2] http://code.google.com/soc/2007/openmrs/appinfo.html?csaid=E680200FD32E82D6
[3] http://www2.isye.gatech.edu/~shan/ISyE7406/ISYE7406.html
[4] http://www-static.cc.gatech.edu/fac/Charles.Isbell/classes/2008/cs7641_spring/syllabus.html
[5] http://www.cc.gatech.edu/~agray/spr08.html
[6] http://www.cs.stanford.edu/people/ang/papers/nips06-mapreducemulticore.pdf
[7] http://www.litech.org/~wkiri/Papers/wagstaff-kmeans-01.pdf
[8] http://www.cc.gatech.edu/~agray/fastlib.pdf

Grant Ingersoll-6 wrote
Just thought I would mention a couple of things about GSOC.

First, thanks to all for the interest.  It is great to see ML alive  
and well in universities.  We have several good applications already.  
Since the ASF is only alloted a certain number of picks, we go through  
an internal ranking process to decide which projects get them (they  
try to spread them out over many projects).  So, while we have 4  
willing mentors, it doesn't necessarily mean we will get that many  
students.

However, to go along with that I would encourage all students to go  
through their proposals and make sure they can fill in details about  
their plans as much as possible, as well as their bios, etc.  Also,  
please don't be shy about discussing your plans here.  One of the  
requirements of doing this project is going to be to interact with  
your mentor and the community.

Finally, I would certainly like to encourage those who don't get  
selected to stick around and contribute.  I am sure others in the  
community can vouch for this: being an active contributor/committer to  
a project like this is a real edge when it comes to getting a job  
doing what you want to do. Think about being in a job interview for a  
ML company and saying, yeah: I contributed algorithm X to Mahout which  
was used to by Y on a 100 node cluster, or I contributed these 10  
patches to Mahout plus I'm an active discussion participant, go look  
it up.  It gives potential employers an incredible track record to  
review and shows you know how to get along with others in a "work"  
environment.

At any rate, enough of the pep talk.  Good luck to you all, I look  
forward to evaluating the ideas!

Cheers,
Grant
Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on GSOC

Isabel Drost-3
In reply to this post by Grant Ingersoll-2
On Saturday 29 March 2008, Grant Ingersoll wrote:
> Finally, I would certainly like to encourage those who don't get
> selected to stick around and contribute.

+1 from me. In addition to what Grant already said, it is a great experience
to see your code end up in an Apache project.

Isabel


--
When one burns one's bridges, what a very nice fire it makes. -- Dylan Thomas
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Thoughts on GSOC

Grant Ingersoll-2
In reply to this post by sarp
Doesn't sound like you need a mentor :-)  I'd just start by picking  
something you are interested in and is useful for you and work on it  
and submit a patch.  Consider the community to be the mentor.  Just  
feel free to ask questions and put up patches.  Patches don't have to  
be perfect, they just need to be followed up on.

I've done some record linkage, so I will be interested to check out  
your approach w/ EM.

-Grant


On Mar 30, 2008, at 1:21 AM, sarp wrote:

>
> Summary:
> I'm very excited about Mahout project and my goal is to become one  
> of the
> committers for this project. To start with, I'm willing to implement  
> one of
> the algorithms that won't be covered by GSoC students. What I'm  
> looking for
> is a mentor who would be willing to supervise me like a GSoC  
> student, even
> though I won't be one, and help me in formulating these algorithms  
> in the
> Map/Reduce framework, as well as provide feedback about
> design/implementation issues I will face during this process.
>
> About Me:
> I'm a Fulbright scholar from Turkey, studying at Georgia Institute of
> Technology for a M.S. degree in CS with a focus on machine learning.  
> I've
> successfully participated in GSoC last year [1], and worked with  
> OpenMRS
> organization on "Patient Matching and Record Linkage" project [2].  
> We used
> EM algorithm for estimating parameters in our statistical model, in  
> order to
> predict whether two records belong to the same patient or not. I  
> will be
> doing an internship during summer, so I thought it would be fair to  
> give
> others who have more time on their hands a chance to experience  
> GSoC, that's
> why I won't be applying this time.
>
> My Background:
> I've taken courses on Statistical Modeling, Data Mining [3], Machine
> Learning [4] and Computational Data Analysis [5], so I have the  
> necessary
> background to implement most of the algorithms mentioned in the  
> original
> paper [6]. Most of my experience is regarding unsupervised learning,  
> I've
> implemented k-means and hiearchical clustering before, and currently  
> I am
> working on implementing k-medoids and constrained k-means algorithms  
> [7] in
> C++ for FASTlib (A library of Fundamental Algorithmic and  
> Statistical Tools)
> [8]. My motivation for participating in this project is to enhance my
> understanding of ML algorithms by implementing them, learn about Map/
> Reduce
> framework, and possibly make use of Mahout in my MS research project.
>
> Links:
> [1]
> http://googlesummerofcode.blogspot.com/2007/12/friday-fulbright-and-rhodes.html
> [2]
> http://code.google.com/soc/2007/openmrs/appinfo.html?csaid=E680200FD32E82D6
> [3] http://www2.isye.gatech.edu/~shan/ISyE7406/ISYE7406.html
> [4]
> http://www-static.cc.gatech.edu/fac/Charles.Isbell/classes/2008/cs7641_spring/syllabus.html
> [5] http://www.cc.gatech.edu/~agray/spr08.html
> [6]
> http://www.cs.stanford.edu/people/ang/papers/nips06-mapreducemulticore.pdf
> [7] http://www.litech.org/~wkiri/Papers/wagstaff-kmeans-01.pdf
> [8] http://www.cc.gatech.edu/~agray/fastlib.pdf
>
>
> Grant Ingersoll-6 wrote:
>>
>> Just thought I would mention a couple of things about GSOC.
>>
>> First, thanks to all for the interest.  It is great to see ML alive
>> and well in universities.  We have several good applications already.
>> Since the ASF is only alloted a certain number of picks, we go  
>> through
>> an internal ranking process to decide which projects get them (they
>> try to spread them out over many projects).  So, while we have 4
>> willing mentors, it doesn't necessarily mean we will get that many
>> students.
>>
>> However, to go along with that I would encourage all students to go
>> through their proposals and make sure they can fill in details about
>> their plans as much as possible, as well as their bios, etc.  Also,
>> please don't be shy about discussing your plans here.  One of the
>> requirements of doing this project is going to be to interact with
>> your mentor and the community.
>>
>> Finally, I would certainly like to encourage those who don't get
>> selected to stick around and contribute.  I am sure others in the
>> community can vouch for this: being an active contributor/committer  
>> to
>> a project like this is a real edge when it comes to getting a job
>> doing what you want to do. Think about being in a job interview for a
>> ML company and saying, yeah: I contributed algorithm X to Mahout  
>> which
>> was used to by Y on a 100 node cluster, or I contributed these 10
>> patches to Mahout plus I'm an active discussion participant, go look
>> it up.  It gives potential employers an incredible track record to
>> review and shows you know how to get along with others in a "work"
>> environment.
>>
>> At any rate, enough of the pep talk.  Good luck to you all, I look
>> forward to evaluating the ideas!
>>
>> Cheers,
>> Grant
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Thoughts-on-GSOC-tp16369329p16378749.html
> Sent from the Mahout Developer List mailing list archive at  
> Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ