Fwd: Reviving Nutch 0.7

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: Reviving Nutch 0.7

Zaheed Haque
---------- Forwarded message ----------
From: Zaheed Haque <[hidden email]>
Date: Jan 22, 2007 10:13 AM
Subject: Re: Reviving Nutch 0.7
To: [hidden email]


On 1/22/07, Otis Gospodnetic <[hidden email]> wrote:
> Hi,
>
> I've been meaning to write this message for a while, and Andrzej's StrategicGoals made me compose it, finally.
>
> Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes, it will be even more valuable than it is today.  However, I think there is still a need for something much simpler, something like what Nutch 0.7 used to be.  Fairly regular nutch-user inquiries confirm this.  Nutch has too few developers to maintain and further develop both of these concepts, and the main Nutch developers need the more powerful version - 0.8 and beyond.  So, what is going to happen to 0.7?  Maintenance mode?
>
> I feel that there is enough need for 0.7-style Nutch that it might be worth at least considering and discussing the possibility of somehow branching that version into a parallel project that's not just in a maintenance mode, but has its own group of developers (not me, no time :( ) that pushes it forward.
>
> Thoughts?

I agree with you that there is a need for 0.7-style Nutch. I wouldn't
say reviving but more "Disecting and re-directing" :-). here you go
--- my focus here is 0.7 style i.e. mid-size, enterprise need.

Solr could use a good crawler cos it has everything else .. (AFAIK)
probably this is not technically "plug an pray :-)" also I am not sure
Solr community wants a crawler but it could benefit from such Solr add
on/snap on crawler. Furthermore I am sure some of the 0.7 plugins
could be re-factored to fit into Solr.

I will forward the mail to Solr community to see if there any interest.

Cheers
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Reviving Nutch 0.7

thorsten
On Mon, 2007-01-22 at 10:13 +0100, Zaheed Haque wrote:

> ---------- Forwarded message ----------
> From: Zaheed Haque <[hidden email]>
> Date: Jan 22, 2007 10:13 AM
> Subject: Re: Reviving Nutch 0.7
> To: [hidden email]
>
>
> On 1/22/07, Otis Gospodnetic <[hidden email]> wrote:
> > Hi,
> >
> > I've been meaning to write this message for a while, and Andrzej's StrategicGoals made me compose it, finally.
> >
> > Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes, it will be even more valuable than it is today.  However, I think there is still a need for something much simpler, something like what Nutch 0.7 used to be.  Fairly regular nutch-user inquiries confirm this.  Nutch has too few developers to maintain and further develop both of these concepts, and the main Nutch developers need the more powerful version - 0.8 and beyond.  So, what is going to happen to 0.7?  Maintenance mode?
> >
> > I feel that there is enough need for 0.7-style Nutch that it might be worth at least considering and discussing the possibility of somehow branching that version into a parallel project that's not just in a maintenance mode, but has its own group of developers (not me, no time :( ) that pushes it forward.
> >
> > Thoughts?
>

I do not really want to comment on the 0.7 part of this discussion.

> I agree with you that there is a need for 0.7-style Nutch. I wouldn't
> say reviving but more "Disecting and re-directing" :-). here you go
> --- my focus here is 0.7 style i.e. mid-size, enterprise need.
>
> Solr could use a good crawler cos it has everything else .. (AFAIK)
> probably this is not technically "plug an pray :-)" also I am not sure
> Solr community wants a crawler but it could benefit from such Solr add
> on/snap on crawler.

I used forrest/cocoon cli as crawler in a forrest plugin I wrote. I will
need to look into the nutch crawler code to see whether we could reuse
this code. Not sure how close this is married with the db but I guess
pretty close.

> Furthermore I am sure some of the 0.7 plugins
> could be re-factored to fit into Solr.

The thing about introducing all this plugin into solr we may come pretty
soon into the situation the original thread is describing. We may blow
the simple one thing that we want to solve to a well defined problem
with too much plugins and components.

I like to have solr tools that are doing some well defined processes
like updating the solr server with crawled content but like said they
are IMO tools not really part of solr core.

In the end if you want an enhanced search experience via solr with all
the filter goodies then you need to add more fields then the once from
the e.g. nutch standard xhtml parser.

Certain documents allow fine filtering based on additional information
this documents may provide (year, type, organization, author, etc.). It
is easy to write a single component to update a certain doc type or set
of information against solr, but IMO that should not be the focus of
main solr development.

I think that should go into a tools/ dir.

>
> I will forward the mail to Solr community to see if there any interest.

Thanks Zaheed. Fits good into the "Update Plugins" thread.

salu2

>
> Cheers
--
thorsten

"Together we stand, divided we fall!"
Hey you (Pink Floyd)