How to setup Nutch on existing Hadoop

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to setup Nutch on existing Hadoop

lonely Feb
Hello~ I just start to deploy Nutch on my distributed machines, and an
existing Hadoop system has already deplyed on these machines, I wonder how
to setup Nutch on them but do not change the Hadoop settings. Please give me
some advices, Thx~
Reply | Threaded
Open this post in threaded view
|

Re: How to setup Nutch on existing Hadoop

Sonal Goyal
You can use the Nutch job file which can be used with existing Hadoop
cluster like any other Hadoop job jar. You will have to call the injector,
generate etc jobs yourself. Have a look at bin/nutch and you should be able
to figure out the Job classes.

Thanks and Regards,
Sonal
www.meghsoft.com
http://in.linkedin.com/in/sonalgoyal


On Fri, Sep 10, 2010 at 9:07 AM, lonely Feb <[hidden email]> wrote:

> Hello~ I just start to deploy Nutch on my distributed machines, and an
> existing Hadoop system has already deplyed on these machines, I wonder how
> to setup Nutch on them but do not change the Hadoop settings. Please give
> me
> some advices, Thx~
>
Reply | Threaded
Open this post in threaded view
|

Re: How to setup Nutch on existing Hadoop

lonely Feb
Thanks for your advices, Can u specify the whole process ?
Need i generate a new jar with all the jars in nutch/lib ? And Nutch should
be put on the single Master or all the Master and Slaves?
Reply | Threaded
Open this post in threaded view
|

Re: How to setup Nutch on existing Hadoop

Sonal Goyal
No, just use the prebuilt nutch-version.job which is part of the Nutch
release. It can be used from the jobTracker like other Hadoop jobs.

Thanks and Regards,
Sonal
www.meghsoft.com
http://in.linkedin.com/in/sonalgoyal


On Fri, Sep 10, 2010 at 10:56 AM, lonely Feb <[hidden email]> wrote:

> Thanks for your advices, Can u specify the whole process ?
> Need i generate a new jar with all the jars in nutch/lib ? And Nutch should
> be put on the single Master or all the Master and Slaves?
>
Reply | Threaded
Open this post in threaded view
|

RE: How to setup Nutch on existing Hadoop

Brian Tingle

There is also an ant task to build the job file if you are building from source...  took me weeks to figure that out...

-----Original Message-----
From: Sonal Goyal [mailto:[hidden email]]
Sent: Thu 9/9/2010 10:30 PM
To: [hidden email]
Subject: Re: How to setup Nutch on existing Hadoop
 
No, just use the prebuilt nutch-version.job which is part of the Nutch
release. It can be used from the jobTracker like other Hadoop jobs.

Thanks and Regards,
Sonal
www.meghsoft.com
http://in.linkedin.com/in/sonalgoyal


On Fri, Sep 10, 2010 at 10:56 AM, lonely Feb <[hidden email]> wrote:

> Thanks for your advices, Can u specify the whole process ?
> Need i generate a new jar with all the jars in nutch/lib ? And Nutch should
> be put on the single Master or all the Master and Slaves?
>


Reply | Threaded
Open this post in threaded view
|

Re: How to setup Nutch on existing Hadoop

lonely Feb
> Thank u so much~
>
Reply | Threaded
Open this post in threaded view
|

Re: How to setup Nutch on existing Hadoop

lonely Feb
I've been successfully used the nutch-version.job to crawl on Hadoop, Thanks
for your helps~
But the new problem comes:
How can i setup tomcat for the web searching? How can i edit the
WEB-INF/classes/nutch-site.xml ?
Need i put tomcat on each node for distributed searching? If i did this, how
can i set the search.dir (which should be a HDFS path)
Or need i copyToLocal for a single node so that i can use the local path to
set the search.dir?