Student contributions

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Student contributions

fmccown
Greetings.  I'm teaching a class on search engine development this
semester, and I am considering having my students use Nutch in their
projects (I'm new to Nutch myself).  I'd like them to get some
experience with an open source project and make a significant
contribution.  Are there any implementation tasks you guys think would
be appropriate for a small group of undergrad, upperclass CS students?
 I'm looking for ideas for improving Nutch that they could accomplish
in a few weeks time.

Thanks,

--
Frank McCown, Ph.D.
Assistant Professor of Computer Science
Harding University
http://www.harding.edu/fmccown/
Reply | Threaded
Open this post in threaded view
|

Re: Student contributions

jian chen
I am not a Nutch developer. But I have some thoughts.

Looking at your university's site, you are using Google Custom Search for
the university site search. A good project would be to convert to using
Nutch to power it ;-)

Jian
www.JiansNet.com
Quality site search at affordable price

On Jan 2, 2008 2:44 PM, Frank McCown <[hidden email]> wrote:

> Greetings.  I'm teaching a class on search engine development this
> semester, and I am considering having my students use Nutch in their
> projects (I'm new to Nutch myself).  I'd like them to get some
> experience with an open source project and make a significant
> contribution.  Are there any implementation tasks you guys think would
> be appropriate for a small group of undergrad, upperclass CS students?
>  I'm looking for ideas for improving Nutch that they could accomplish
> in a few weeks time.
>
> Thanks,
>
> --
> Frank McCown, Ph.D.
> Assistant Professor of Computer Science
> Harding University
> http://www.harding.edu/fmccown/
>
Reply | Threaded
Open this post in threaded view
|

Re: Student contributions

chrismattmann
In reply to this post by fmccown
Hi Frank,

Thanks for your interest in using Nutch!

The best way to see what's on the horizon, and needed in Nutch, is to check
out our JIRA issue tracking system, at:

http://issues.apache.org/jira/browse/NUTCH

At present, there are 39 current "issues" with Nutch, planned to be fixed,
or added (as a new feature), or improved (made to an existing feature), for
the upcoming 1.0.0 release. There are 222 open issues across all versions of
Nutch (including prior releases).

To help you digest the wealth of information that's there (and trust me,
there's plenty), I would offer a few of my own suggestions for class
projects:

(Difficulty: High) 1. Decouple Nutch's crawl infrastructure, and turn it
into its own extension point.The current Nutch crawl infrastructure is
highly coupled around a few, monolithic classes, Fetcher (or its big
brother, Fetcher2), Hadoop (as the underlying job/crawl execution platform),
etc. There have been several requests on the list to make the crawler its
own component, make it light-weight, make it configurable, etc. I think an
ambitious 2 week student project would be to take a stab at this decoupling.

(Difficulty: Medium) 2. Analyze the Nutch code base, and propose/suggest
architectural improvements. Currently, the Nutch code base is a behemoth of
plugins/extension points, configuration properties, and the like. It would
be nice to have a fresh look at its architecture, from an outsider's
perspective. The students would suggest places to cut/places to add, cleaner
interfaces, the appropriate underlying middleware substrates, e.g., is
Hadoop the only logical choice? What about other enterprise solutions such
as web services/EJB/JMS/etc.?

(Difficulty: Medium) 3. Use Spring as the underlying configuration framework
for Nutch, and overhaul Nutch's home-grown configuration infrastructure.
Spring is a an open source framework centered around providing configuration
and instantiation middleware capabilities: it lets developers focus on the
domain objects, and handles the rest. The student would first take a look at
Spring, then Nutch, then build a prototype that shows how Spring could be
used to configure Nutch.

There are plenty of others, but that should help get the juices flowing and
were just a few ideas off the top of my head.

Also, FYI, a course has been taught for a few semesters at the University of
Southern California (USC) by Dr. Ellis Horowitz on Search Engines. Here is a
pointer to that page. You can find some other Nutch project suggestions
there.

http://www-scf.usc.edu/~csci572/

Good luck!

Cheers,
 Chris


On 1/2/08 2:44 PM, "Frank McCown" <[hidden email]> wrote:

> Greetings.  I'm teaching a class on search engine development this
> semester, and I am considering having my students use Nutch in their
> projects (I'm new to Nutch myself).  I'd like them to get some
> experience with an open source project and make a significant
> contribution.  Are there any implementation tasks you guys think would
> be appropriate for a small group of undergrad, upperclass CS students?
>  I'm looking for ideas for improving Nutch that they could accomplish
> in a few weeks time.
>
> Thanks,

______________________________________________
Chris Mattmann, Ph.D.
[hidden email]
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.


Reply | Threaded
Open this post in threaded view
|

Re: Student contributions

fmccown
Thank you, Chris and Jian, for your ideas.  I also appreciate the link
to the search engine class.

Frank


On 1/2/08, Chris Mattmann <[hidden email]> wrote:

> Hi Frank,
>
> Thanks for your interest in using Nutch!
>
> The best way to see what's on the horizon, and needed in Nutch, is to check
> out our JIRA issue tracking system, at:
>
> http://issues.apache.org/jira/browse/NUTCH
>
> At present, there are 39 current "issues" with Nutch, planned to be fixed,
> or added (as a new feature), or improved (made to an existing feature), for
> the upcoming 1.0.0 release. There are 222 open issues across all versions of
> Nutch (including prior releases).
>
> To help you digest the wealth of information that's there (and trust me,
> there's plenty), I would offer a few of my own suggestions for class
> projects:
>
> (Difficulty: High) 1. Decouple Nutch's crawl infrastructure, and turn it
> into its own extension point.The current Nutch crawl infrastructure is
> highly coupled around a few, monolithic classes, Fetcher (or its big
> brother, Fetcher2), Hadoop (as the underlying job/crawl execution platform),
> etc. There have been several requests on the list to make the crawler its
> own component, make it light-weight, make it configurable, etc. I think an
> ambitious 2 week student project would be to take a stab at this decoupling.
>
> (Difficulty: Medium) 2. Analyze the Nutch code base, and propose/suggest
> architectural improvements. Currently, the Nutch code base is a behemoth of
> plugins/extension points, configuration properties, and the like. It would
> be nice to have a fresh look at its architecture, from an outsider's
> perspective. The students would suggest places to cut/places to add, cleaner
> interfaces, the appropriate underlying middleware substrates, e.g., is
> Hadoop the only logical choice? What about other enterprise solutions such
> as web services/EJB/JMS/etc.?
>
> (Difficulty: Medium) 3. Use Spring as the underlying configuration framework
> for Nutch, and overhaul Nutch's home-grown configuration infrastructure.
> Spring is a an open source framework centered around providing configuration
> and instantiation middleware capabilities: it lets developers focus on the
> domain objects, and handles the rest. The student would first take a look at
> Spring, then Nutch, then build a prototype that shows how Spring could be
> used to configure Nutch.
>
> There are plenty of others, but that should help get the juices flowing and
> were just a few ideas off the top of my head.
>
> Also, FYI, a course has been taught for a few semesters at the University of
> Southern California (USC) by Dr. Ellis Horowitz on Search Engines. Here is a
> pointer to that page. You can find some other Nutch project suggestions
> there.
>
> http://www-scf.usc.edu/~csci572/
>
> Good luck!
>
> Cheers,
>  Chris
>
>
> On 1/2/08 2:44 PM, "Frank McCown" <[hidden email]> wrote:
>
> > Greetings.  I'm teaching a class on search engine development this
> > semester, and I am considering having my students use Nutch in their
> > projects (I'm new to Nutch myself).  I'd like them to get some
> > experience with an open source project and make a significant
> > contribution.  Are there any implementation tasks you guys think would
> > be appropriate for a small group of undergrad, upperclass CS students?
> >  I'm looking for ideas for improving Nutch that they could accomplish
> > in a few weeks time.
> >
> > Thanks,
>
> ______________________________________________
> Chris Mattmann, Ph.D.
> [hidden email]
> Cognizant Development Engineer
> Early Detection Research Network Project
> _________________________________________________
> Jet Propulsion Laboratory            Pasadena, CA
> Office: 171-266B                     Mailstop:  171-246
> _______________________________________________________
>
> Disclaimer:  The opinions presented within are my own and do not reflect
> those of either NASA, JPL, or the California Institute of Technology.
>
>
>


--
Frank McCown, Ph.D.
Assistant Professor of Computer Science
Harding University
http://www.harding.edu/fmccown/