Hello,How can I just get nutch worked on this running hadoop cluster without bunch of works of compile and configuration.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Hello,How can I just get nutch worked on this running hadoop cluster without bunch of works of compile and configuration.

Alex Luya
Hello:
    According to this
tutorial:http://wiki.apache.org/nutch/NutchHadoopTutorial,hadoop already
shipped with nutch,and user can just use it,but
I  have already a hadoop cluster running now,How can I just get nutch worked
on this running hadoop cluster without bunch of works of compile and
configuration.
(I am using hadoop 0.20.2,and I want to use nutch v1.1,so few tutorial and
instruction  are available on web )
Reply | Threaded
Open this post in threaded view
|

RE: Hello,How can I just get nutch worked on this running hadoop cluster without bunch of works of compile and configuration.

Brian Tingle
I wasted a lot of time trying to figure this out before I realized there is an ant target where you can go 'ant job' and then you get a file 'nutch.job' that you can move to the hadoop cluster and then you can do something like  'hadoop -job nutch.job path.to.nutch.Class blah blah blah


-----Original Message-----
From: Alex Luya [mailto:[hidden email]]
Sent: Tue 7/20/2010 6:09 PM
To: [hidden email]
Subject: Hello,How can I just get nutch worked on this running hadoop cluster without bunch of works of compile and  configuration.
 
Hello:
    According to this
tutorial:http://wiki.apache.org/nutch/NutchHadoopTutorial,hadoop already
shipped with nutch,and user can just use it,but
I  have already a hadoop cluster running now,How can I just get nutch worked
on this running hadoop cluster without bunch of works of compile and
configuration.
(I am using hadoop 0.20.2,and I want to use nutch v1.1,so few tutorial and
instruction  are available on web )

Reply | Threaded
Open this post in threaded view
|

Re: Hello,How can I just get nutch worked on this running hadoop cluster without bunch of works of compile and configuration.

CatOs Mandros
I just soft-linked all the relevant configuration files from the nutch
instalation to the hadoop ones, and now I can use the nutch script
transparently.

On Wed, Jul 21, 2010 at 3:55 AM, Brian Tingle <[hidden email]> wrote:

> I wasted a lot of time trying to figure this out before I realized there is an ant target where you can go 'ant job' and then you get a file 'nutch.job' that you can move to the hadoop cluster and then you can do something like  'hadoop -job nutch.job path.to.nutch.Class blah blah blah
>
>
> -----Original Message-----
> From: Alex Luya [mailto:[hidden email]]
> Sent: Tue 7/20/2010 6:09 PM
> To: [hidden email]
> Subject: Hello,How can I just get nutch worked on this running hadoop cluster without bunch of works of compile and  configuration.
>
> Hello:
>    According to this
> tutorial:http://wiki.apache.org/nutch/NutchHadoopTutorial,hadoop already
> shipped with nutch,and user can just use it,but
> I  have already a hadoop cluster running now,How can I just get nutch worked
> on this running hadoop cluster without bunch of works of compile and
> configuration.
> (I am using hadoop 0.20.2,and I want to use nutch v1.1,so few tutorial and
> instruction  are available on web )
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Hello,How can I just get nutch worked on this running hadoop cluster without bunch of works of compile and configuration.

Alex Luya
CatOs Mandros:
       if you do that,I must restart hadoop cluster,and which command will be
run, $HADOOP_HOME/bin/start-all.sh or $NUTCH_HOME/bin/start-all.sh?

On Wednesday, July 21, 2010 01:54:06 pm CatOs Mandros wrote:

> I just soft-linked all the relevant configuration files from the nutch
> instalation to the hadoop ones, and now I can use the nutch script
> transparently.
>
> On Wed, Jul 21, 2010 at 3:55 AM, Brian Tingle <[hidden email]> wrote:
> > I wasted a lot of time trying to figure this out before I realized there
> > is an ant target where you can go 'ant job' and then you get a file
> > 'nutch.job' that you can move to the hadoop cluster and then you can do
> > something like  'hadoop -job nutch.job path.to.nutch.Class blah blah
> > blah
> >
> >
> > -----Original Message-----
> > From: Alex Luya [mailto:[hidden email]]
> > Sent: Tue 7/20/2010 6:09 PM
> > To: [hidden email]
> > Subject: Hello,How can I just get nutch worked on this running hadoop
> > cluster without bunch of works of compile and  configuration.
> >
> > Hello:
> >    According to this
> > tutorial:http://wiki.apache.org/nutch/NutchHadoopTutorial,hadoop already
> > shipped with nutch,and user can just use it,but
> > I  have already a hadoop cluster running now,How can I just get nutch
> > worked on this running hadoop cluster without bunch of works of compile
> > and configuration.
> > (I am using hadoop 0.20.2,and I want to use nutch v1.1,so few tutorial
> > and instruction  are available on web )
Reply | Threaded
Open this post in threaded view
|

Re: Hello,How can I just get nutch worked on this running hadoop cluster without bunch of works of compile and configuration.

CatOs Mandros
You should run $HADOOP_HOME/bin/start-all.sh

Specifically, the files I have soft-linked are:
$NUTCH_HOME/conf/core-site.xml  -> $HADOOP_HOME/conf/core-site.xml
$NUTCH_HOME/conf/hdfs-site.xml  -> $HADOOP_HOME/conf/hdfs-site.xml
$NUTCH_HOME/conf/mapred-site.xml  -> $HADOOP_HOME/conf/mapred-site.xml
$NUTCH_HOME/conf/masters  -> $HADOOP_HOME/conf/masters
$NUTCH_HOME/conf/slaves  -> $HADOOP_HOME/conf/slaves

I suppose not all the files are necessary, but its working for me :)

On Wed, Jul 21, 2010 at 3:37 PM, Alex Luya <[hidden email]> wrote:

> CatOs Mandros:
>       if you do that,I must restart hadoop cluster,and which command will be
> run, $HADOOP_HOME/bin/start-all.sh or $NUTCH_HOME/bin/start-all.sh?
>
> On Wednesday, July 21, 2010 01:54:06 pm CatOs Mandros wrote:
>> I just soft-linked all the relevant configuration files from the nutch
>> instalation to the hadoop ones, and now I can use the nutch script
>> transparently.
>>
>> On Wed, Jul 21, 2010 at 3:55 AM, Brian Tingle <[hidden email]> wrote:
>> > I wasted a lot of time trying to figure this out before I realized there
>> > is an ant target where you can go 'ant job' and then you get a file
>> > 'nutch.job' that you can move to the hadoop cluster and then you can do
>> > something like  'hadoop -job nutch.job path.to.nutch.Class blah blah
>> > blah
>> >
>> >
>> > -----Original Message-----
>> > From: Alex Luya [mailto:[hidden email]]
>> > Sent: Tue 7/20/2010 6:09 PM
>> > To: [hidden email]
>> > Subject: Hello,How can I just get nutch worked on this running hadoop
>> > cluster without bunch of works of compile and  configuration.
>> >
>> > Hello:
>> >    According to this
>> > tutorial:http://wiki.apache.org/nutch/NutchHadoopTutorial,hadoop already
>> > shipped with nutch,and user can just use it,but
>> > I  have already a hadoop cluster running now,How can I just get nutch
>> > worked on this running hadoop cluster without bunch of works of compile
>> > and configuration.
>> > (I am using hadoop 0.20.2,and I want to use nutch v1.1,so few tutorial
>> > and instruction  are available on web )
>
Reply | Threaded
Open this post in threaded view
|

Re: Hello,How can I just get nutch worked on this running hadoop cluster without bunch of works of compile and configuration.

Alex Luya
Hello
      the other question is how can use it?when I try to run this:


nutch crawl crawl/url -dir crawl -depth 3


got these errors:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist: file:/usr/local/hadoop/nutch-1.1/conf/crawl/url
        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
        at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Obviously,It is use local file system by default,I think I must do something in
nutch-site.xml or change this command,but how?I tried googling,but no
solutions,actually so few stuff about nutch are available,and there is a
tutorial in tutorial wiki,but it is long and bugging.  


On Thursday, July 22, 2010 03:04:04 pm CatOs Mandros wrote:

> You should run $HADOOP_HOME/bin/start-all.sh
>
> Specifically, the files I have soft-linked are:
> $NUTCH_HOME/conf/core-site.xml  -> $HADOOP_HOME/conf/core-site.xml
> $NUTCH_HOME/conf/hdfs-site.xml  -> $HADOOP_HOME/conf/hdfs-site.xml
> $NUTCH_HOME/conf/mapred-site.xml  -> $HADOOP_HOME/conf/mapred-site.xml
> $NUTCH_HOME/conf/masters  -> $HADOOP_HOME/conf/masters
> $NUTCH_HOME/conf/slaves  -> $HADOOP_HOME/conf/slaves
>
> I suppose not all the files are necessary, but its working for me :)
>
> On Wed, Jul 21, 2010 at 3:37 PM, Alex Luya <[hidden email]> wrote:
> > CatOs Mandros:
> >       if you do that,I must restart hadoop cluster,and which command will
> > be run, $HADOOP_HOME/bin/start-all.sh or $NUTCH_HOME/bin/start-all.sh?
> >
> > On Wednesday, July 21, 2010 01:54:06 pm CatOs Mandros wrote:
> >> I just soft-linked all the relevant configuration files from the nutch
> >> instalation to the hadoop ones, and now I can use the nutch script
> >> transparently.
> >>
> >> On Wed, Jul 21, 2010 at 3:55 AM, Brian Tingle <[hidden email]>
wrote:

> >> > I wasted a lot of time trying to figure this out before I realized
> >> > there is an ant target where you can go 'ant job' and then you get a
> >> > file 'nutch.job' that you can move to the hadoop cluster and then you
> >> > can do something like  'hadoop -job nutch.job path.to.nutch.Class
> >> > blah blah blah
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: Alex Luya [mailto:[hidden email]]
> >> > Sent: Tue 7/20/2010 6:09 PM
> >> > To: [hidden email]
> >> > Subject: Hello,How can I just get nutch worked on this running hadoop
> >> > cluster without bunch of works of compile and  configuration.
> >> >
> >> > Hello:
> >> >    According to this
> >> > tutorial:http://wiki.apache.org/nutch/NutchHadoopTutorial,hadoop
> >> > already shipped with nutch,and user can just use it,but
> >> > I  have already a hadoop cluster running now,How can I just get nutch
> >> > worked on this running hadoop cluster without bunch of works of
> >> > compile and configuration.
> >> > (I am using hadoop 0.20.2,and I want to use nutch v1.1,so few tutorial
> >> > and instruction  are available on web )