[DISCUSS] Nutch as a top level project (TLP)?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Nutch as a top level project (TLP)?

Andrzej Białecki-2
Hi devs,

The ASF Board indicated recently that so called "umbrella" projects,
i.e. projects that host many significant sub-projects, should examine
their structure towards simplification, such as merging or splitting out
sub-projects.

Lucene TLP is such a project. Recently the Lucene PMC accepted the merge
of Solr and Lucene core projects. Mahout project will most likely split
to its own TLP soon. Which leaves Nutch as a sort of odd duck ;)

Moving Nutch to its own TLP has some advantages, mostly an easier
decision process - voting on new committers and new releases involves
then only those who participate directly in Nutch dev., i.e. the Nutch
community.

Also, from the coding point of view, Nutch is not intrinsically tied to
the Lucene development as if both would require some careful
coordination - we just use Lucene as one of many dependencies, and in
fact we aim to cleanly separate Nutch search API from Lucene-based API.
I can easily imagine Nutch dropping completely the low-level
Lucene-based components and moving to a more general search fabric (e.g.
SolrCloud).

Being its own TLP could also give Nutch more exposure and help to
crystallize our mission.

There are some disadvantages to such a split, too: we would need to
spend some more effort on various administrative tasks, and maintain a
separate web site (under Apache, but not under Lucene), and probably
some other tasks that I'm not yet aware of. This would also mean that
Nutch would have to stand on its own merit, which considering the small
number of active committers may be challenging.

Let's discuss this, and after we collect some pros and cons I'm going to
call for a vote.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Nutch as a top level project (TLP)?

Mattmann, Chris A (3010)
Re: [DISCUSS] Nutch as a top level project (TLP)? Hey Andrzej,

I’d be +1 for Nutch being a TLP. I don’t think it’ll change much (other than to provide more visibility/etc., and to allow more focused decision making by the folks in the Nutch community). The infrastructure moves required to move to TLP status are moving mailing lists, moving JIRA, moving SVN, and moving the website (a bit of redesign/etc.), which shouldn’t be that hard, and the infra team can probably help with (at least the first 3 parts if we file issues for them).

I’d volunteer to help with things like list moderation, or whatever else I can do to help.

The important things to decide would be:

  • Who’s on the PMC (my suggestion, similar to Tika, make existing Nutch committers PMC members)
  • Who’s the VP (my +1 for you)

Cheers,
Chris



On 3/19/10 12:51 PM, "Andrzej Bialecki" <ab@...> wrote:

Hi devs,

The ASF Board indicated recently that so called "umbrella" projects,
i.e. projects that host many significant sub-projects, should examine
their structure towards simplification, such as merging or splitting out
sub-projects.

Lucene TLP is such a project. Recently the Lucene PMC accepted the merge
of Solr and Lucene core projects. Mahout project will most likely split
to its own TLP soon. Which leaves Nutch as a sort of odd duck ;)

Moving Nutch to its own TLP has some advantages, mostly an easier
decision process - voting on new committers and new releases involves
then only those who participate directly in Nutch dev., i.e. the Nutch
community.

Also, from the coding point of view, Nutch is not intrinsically tied to
the Lucene development as if both would require some careful
coordination - we just use Lucene as one of many dependencies, and in
fact we aim to cleanly separate Nutch search API from Lucene-based API.
I can easily imagine Nutch dropping completely the low-level
Lucene-based components and moving to a more general search fabric (e.g.
SolrCloud).

Being its own TLP could also give Nutch more exposure and help to
crystallize our mission.

There are some disadvantages to such a split, too: we would need to
spend some more effort on various administrative tasks, and maintain a
separate web site (under Apache, but not under Lucene), and probably
some other tasks that I'm not yet aware of. This would also mean that
Nutch would have to stand on its own merit, which considering the small
number of active committers may be challenging.

Let's discuss this, and after we collect some pros and cons I'm going to
call for a vote.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@...
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Nutch as a top level project (TLP)?

Otis Gospodnetic-2-2
Personally, I don't see the advantage of Nutch going for a TLP.  It's not like new committers are having a hard time getting in today, it's not like they are being proposed and rejected.  I also don't feel like Nutch lacks exposure/visibility -- lots of people know about it.  It's just that very few people need a massively scalable web-wide crawling machinery that Nutch provides.

Otis 
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/


From: "Mattmann, Chris A (388J)" <[hidden email]>
To: "[hidden email]" <[hidden email]>
Sent: Sat, March 20, 2010 7:30:54 PM
Subject: Re: [DISCUSS] Nutch as a top level project (TLP)?

Hey Andrzej,

I’d be +1 for Nutch being a TLP. I don’t think it’ll change much (other than to provide more visibility/etc., and to allow more focused decision making by the folks in the Nutch community). The infrastructure moves required to move to TLP status are moving mailing lists, moving JIRA, moving SVN, and moving the website (a bit of redesign/etc.), which shouldn’t be that hard, and the infra team can probably help with (at least the first 3 parts if we file issues for them).

I’d volunteer to help with things like list moderation, or whatever else I can do to help.

The important things to decide would be:

  • Who’s on the PMC (my suggestion, similar to Tika, make existing Nutch committers PMC members)
  • Who’s the VP (my +1 for you)

Cheers,
Chris



On 3/19/10 12:51 PM, "Andrzej Bialecki" <[hidden email]> wrote:

Hi devs,

The ASF Board indicated recently that so called "umbrella" projects,
i.e. projects that host many significant sub-projects, should examine
their structure towards simplification, such as merging or splitting out
sub-projects.

Lucene TLP is such a project. Recently the Lucene PMC accepted the merge
of Solr and Lucene core projects. Mahout project will most likely split
to its own TLP soon. Which leaves Nutch as a sort of odd duck ;)

Moving Nutch to its own TLP has some advantages, mostly an easier
decision process - voting on new committers and new releases involves
then only those who participate directly in Nutch dev., i.e. the Nutch
community.

Also, from the coding point of view, Nutch is not intrinsically tied to
the Lucene development as if both would require some careful
coordination - we just use Lucene as one of many dependencies, and in
fact we aim to cleanly separate Nutch search API from Lucene-based API.
I can easily imagine Nutch dropping completely the low-level
Lucene-based components and moving to a more general search fabric (e.g.
SolrCloud).

Being its own TLP could also give Nutch more exposure and help to
crystallize our mission.

There are some disadvantages to such a split, too: we would need to
spend some more effort on various administrative tasks, and maintain a
separate web site (under Apache, but not under Lucene), and probably
some other tasks that I'm not yet aware of. This would also mean that
Nutch would have to stand on its own merit, which considering the small
number of active committers may be challenging.

Let's discuss this, and after we collect some pros and cons I'm going to
call for a vote.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Nutch as a top level project (TLP)?

Grant Ingersoll-2
<base href="x-msg://1518/">
On Mar 20, 2010, at 11:25 PM, Otis Gospodnetic wrote:

Personally, I don't see the advantage of Nutch going for a TLP.  

I think it comes down to focus.  Nutch really isn't all that dependent on Lucene and there is almost no overlap with the other communities in terms of the people developing it.  Both of these things are signs from the Board that a TLP is warranted. 



It's just that very few people need a massively scalable web-wide crawling machinery that Nutch provides.

Correct, but that is also orthogonal to the issue. 

I don't care too much either way, but the Board definitely views non-overlapping subprojects as being less desirable.  Granted, Nutch has been a sub for a long time, so maybe not such a big deal, except AIUI, Nutch's primary focus these days is on crawling, which would suggest from an organization standpoint that it is not as related to Search as being under the Lucene PMC suggests.

At any rate, just my two cents from my view on the PMC and as a very occasional user of Nutch.

-Grant
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Nutch as a top level project (TLP)?

Sami Siren-3
In reply to this post by Andrzej Białecki-2
My opinion is neutral on this matter. I don't see any technical benefit
from going to top level project, exposure-wise I think the impact is
probably negative. So for me the reason would be strictly political.

But the fact is that Nutch is pretty independent from Lucene/Solr and
there is not much overlap with dev communities.

--
  Sami Siren