java.io.IOException: No input directories specified in

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

java.io.IOException: No input directories specified in

Peter Swoboda
hi

i've changed from nutch 0.7 to 0.8
done the following steps:
created an urls.txt in a dir. named seeds

bin/hadoop dfs -put seeds seeds

060317 121440 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060317 121441 No FS indicated, using default:local

bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
but in crawl.log:
060419 124302 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060419 124302 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060419 124302 parsing /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
060419 124302 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal: hadoop-site.xml
    at
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
    at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
060419 124302 Running job: job_e7cpf1
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)

Any ideas?
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
bin/hadoop dfs -ls

Can you see your "seeds" directory?

bin/hadoop dfs -ls seeds

Can you see your text file with URLS?

Furthermore bin/nutch crawl is a one shot crawl/index command. I
strongly recommend you take the long route of

inject, generate, fetch, updatedb, invertlinks, index, dedup and
merge.  You can try the above commands just by typing
bin/nutch inject
etc..
If just try the inject command without any parameters it will tell you
how to use it..

Hope this helps.
On 4/21/06, Peter Swoboda <[hidden email]> wrote:

> hi
>
> i've changed from nutch 0.7 to 0.8
> done the following steps:
> created an urls.txt in a dir. named seeds
>
> bin/hadoop dfs -put seeds seeds
>
> 060317 121440 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> 060317 121441 No FS indicated, using default:local
>
> bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> but in crawl.log:
> 060419 124302 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> 060419 124302 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> 060419 124302 parsing /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> 060419 124302 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> java.io.IOException: No input directories specified in: Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal: hadoop-site.xml
>     at
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
>     at
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> 060419 124302 Running job: job_e7cpf1
> Exception in thread "main" java.io.IOException: Job failed!
>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
>     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
>     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
>
> Any ideas?
>
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda



> --- Ursprüngliche Nachricht ---
> Von: "Zaheed Haque" <[hidden email]>
> An: [hidden email]
> Betreff: Re: java.io.IOException: No input directories specified in
> Datum: Fri, 21 Apr 2006 09:48:38 +0200
>
> bin/hadoop dfs -ls
>
> Can you see your "seeds" directory?
>

bash-3.00$ bin/hadoop dfs -put seeds seeds
060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
1-dev.jar!/hadoop-default.xml

060421 122421 No FS indicated, using default:local

bash-3.00$ bin/hadoop dfs -ls

060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
1-dev.jar!/hadoop-default.xml

060421 122426 No FS indicated, using default:local

Found 0 items

bash-3.00$

As you can see, i can't.
What's going wrong?

> bin/hadoop dfs -ls seeds
>
> Can you see your text file with URLS?
>
> Furthermore bin/nutch crawl is a one shot crawl/index command. I
> strongly recommend you take the long route of
>
> inject, generate, fetch, updatedb, invertlinks, index, dedup and
> merge.  You can try the above commands just by typing
> bin/nutch inject
> etc..
> If just try the inject command without any parameters it will tell you
> how to use it..
>
> Hope this helps.
> On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > hi
> >
> > i've changed from nutch 0.7 to 0.8
> > done the following steps:
> > created an urls.txt in a dir. named seeds
> >
> > bin/hadoop dfs -put seeds seeds
> >
> > 060317 121440 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > 060317 121441 No FS indicated, using default:local
> >
> > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > but in crawl.log:
> > 060419 124302 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > 060419 124302 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > 060419 124302 parsing
> /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > 060419 124302 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > java.io.IOException: No input directories specified in: Configuration:
> > defaults: hadoop-default.xml , mapred-default.xml ,
> > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> hadoop-site.xml
> >     at
> >
>
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
> >     at
> >
>
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)

> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > 060419 124302 Running job: job_e7cpf1
> > Exception in thread "main" java.io.IOException: Job failed!
> >     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> >     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> >
> > Any ideas?
> >
>

--
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
Do you have a file called "hadoop-site.xml" under your conf directory?
The content of the file is like the following:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

</configuration>

or is it missing... if its missing please create a file under the conf
catalog with the name hadoop-site.xml and then try the hadoop dfs -ls
again?  you should see something! like listing from your local file
system.

On 4/21/06, Peter Swoboda <[hidden email]> wrote:

>
>
>
> > --- Ursprüngliche Nachricht ---
> > Von: "Zaheed Haque" <[hidden email]>
> > An: [hidden email]
> > Betreff: Re: java.io.IOException: No input directories specified in
> > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> >
> > bin/hadoop dfs -ls
> >
> > Can you see your "seeds" directory?
> >
>
> bash-3.00$ bin/hadoop dfs -put seeds seeds
> 060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> 1-dev.jar!/hadoop-default.xml

I think the hadoop-site is missing cos we should be seeing a message
like this here...

060421 131014 parsing
file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml

> 060421 122421 No FS indicated, using default:local
>
> bash-3.00$ bin/hadoop dfs -ls
>
> 060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> 1-dev.jar!/hadoop-default.xml
>
> 060421 122426 No FS indicated, using default:local
>
> Found 0 items
>
> bash-3.00$
>
> As you can see, i can't.
> What's going wrong?
>
> > bin/hadoop dfs -ls seeds
> >
> > Can you see your text file with URLS?
> >
> > Furthermore bin/nutch crawl is a one shot crawl/index command. I
> > strongly recommend you take the long route of
> >
> > inject, generate, fetch, updatedb, invertlinks, index, dedup and
> > merge.  You can try the above commands just by typing
> > bin/nutch inject
> > etc..
> > If just try the inject command without any parameters it will tell you
> > how to use it..
> >
> > Hope this helps.
> > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > hi
> > >
> > > i've changed from nutch 0.7 to 0.8
> > > done the following steps:
> > > created an urls.txt in a dir. named seeds
> > >
> > > bin/hadoop dfs -put seeds seeds
> > >
> > > 060317 121440 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > 060317 121441 No FS indicated, using default:local
> > >
> > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > but in crawl.log:
> > > 060419 124302 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > 060419 124302 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > 060419 124302 parsing
> > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > 060419 124302 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > java.io.IOException: No input directories specified in: Configuration:
> > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > hadoop-site.xml
> > >     at
> > >
> >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
> > >     at
> > >
> >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
> > >     at
> > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > 060419 124302 Running job: job_e7cpf1
> > > Exception in thread "main" java.io.IOException: Job failed!
> > >     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > >     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > >
> > > Any ideas?
> > >
> >
>
> --
> Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
>
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
Also I have noticed that you are using hadoop-0.1, there was a bug in
0.1 you should be using 0.1.1. Under you lib catalog you should have
the following file

hadoop-0.1.1.jar

If thats the case. Please download the latest nightly build.

Cheers



On 4/21/06, Zaheed Haque <[hidden email]> wrote:

> Do you have a file called "hadoop-site.xml" under your conf directory?
> The content of the file is like the following:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>
> </configuration>
>
> or is it missing... if its missing please create a file under the conf
> catalog with the name hadoop-site.xml and then try the hadoop dfs -ls
> again?  you should see something! like listing from your local file
> system.
>
> On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> >
> >
> >
> > > --- Ursprüngliche Nachricht ---
> > > Von: "Zaheed Haque" <[hidden email]>
> > > An: [hidden email]
> > > Betreff: Re: java.io.IOException: No input directories specified in
> > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > >
> > > bin/hadoop dfs -ls
> > >
> > > Can you see your "seeds" directory?
> > >
> >
> > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > 060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > 1-dev.jar!/hadoop-default.xml
>
> I think the hadoop-site is missing cos we should be seeing a message
> like this here...
>
> 060421 131014 parsing
> file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
>
> > 060421 122421 No FS indicated, using default:local
> >
> > bash-3.00$ bin/hadoop dfs -ls
> >
> > 060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > 1-dev.jar!/hadoop-default.xml
> >
> > 060421 122426 No FS indicated, using default:local
> >
> > Found 0 items
> >
> > bash-3.00$
> >
> > As you can see, i can't.
> > What's going wrong?
> >
> > > bin/hadoop dfs -ls seeds
> > >
> > > Can you see your text file with URLS?
> > >
> > > Furthermore bin/nutch crawl is a one shot crawl/index command. I
> > > strongly recommend you take the long route of
> > >
> > > inject, generate, fetch, updatedb, invertlinks, index, dedup and
> > > merge.  You can try the above commands just by typing
> > > bin/nutch inject
> > > etc..
> > > If just try the inject command without any parameters it will tell you
> > > how to use it..
> > >
> > > Hope this helps.
> > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > hi
> > > >
> > > > i've changed from nutch 0.7 to 0.8
> > > > done the following steps:
> > > > created an urls.txt in a dir. named seeds
> > > >
> > > > bin/hadoop dfs -put seeds seeds
> > > >
> > > > 060317 121440 parsing
> > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > 060317 121441 No FS indicated, using default:local
> > > >
> > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > > but in crawl.log:
> > > > 060419 124302 parsing
> > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > 060419 124302 parsing
> > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > 060419 124302 parsing
> > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > 060419 124302 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > java.io.IOException: No input directories specified in: Configuration:
> > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > hadoop-site.xml
> > > >     at
> > > >
> > >
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
> > > >     at
> > > >
> > >
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
> > > >     at
> > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > 060419 124302 Running job: job_e7cpf1
> > > > Exception in thread "main" java.io.IOException: Job failed!
> > > >     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > >     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > >
> > > > Any ideas?
> > > >
> > >
> >
> > --
> > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
ok. changed to latest nightly build.
hadoop-0.1.1.jar is existing,
hadoop-site.xml also.
now trying

bash-3.00$ bin/hadoop dfs -put seeds seeds

060421 125154 parsing
jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
0.1.1.jar!/hadoop-default.xml
060421 125155 parsing
file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
060421 125155 No FS indicated, using default:local

and

bash-3.00$ bin/hadoop dfs -ls

060421 125217 parsing
jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
0.1.1.jar!/hadoop-default.xml
060421 125217 parsing
file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
060421 125217 No FS indicated, using default:local
Found 16 items
/home/stud/jung/Desktop/nutch-nightly/docs      <dir>
/home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
/home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
/home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
/home/stud/jung/Desktop/nutch-nightly/build.xml 21433
/home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
/home/stud/jung/Desktop/nutch-nightly/conf      <dir>
/home/stud/jung/Desktop/nutch-nightly/default.properties        3043
/home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
/home/stud/jung/Desktop/nutch-nightly/lib       <dir>
/home/stud/jung/Desktop/nutch-nightly/bin       <dir>
/home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
/home/stud/jung/Desktop/nutch-nightly/src       <dir>
/home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
/home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
/home/stud/jung/Desktop/nutch-nightly/README.txt        403

also:

bash-3.00$ bin/hadoop dfs -ls seeds

060421 133004 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 133004 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 133004 No FS indicated, using default:local
Found 2 items
/home/../nutch-nightly/seeds/urls.txt~   0
/home/../nutch-nightly/seeds/urls.txt    26
bash-3.00$

but:

but:

bin/nutch crawl seeds -dir crawled -depht 2

060421 131722 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060421 131723 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060421 131723 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131723 crawl started in: crawled
060421 131723 rootUrlDir = 2
060421 131723 threads = 10
060421 131723 depth = 5
060421 131724 Injector: starting
060421 131724 Injector: crawlDb: crawled/crawldb
060421 131724 Injector: urlDir: 2
060421 131724 Injector: Converting injected urls to crawl db entries.
060421 131724 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131724 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060421 131724 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060421 131724 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131724 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131725 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060421 131725 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131725 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060421 131726 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060421 131726 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131726 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131726 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131727 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060421 131727 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060421 131727 parsing /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060421 131727 job_6jn7j8
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal: hadoop-site.xml
        at
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:90)
        at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:100)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
060421 131728 Running job: job_6jn7j8
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:115)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
bash-3.00$

Can anyone help?





> --- Ursprüngliche Nachricht ---
> Von: "Zaheed Haque" <[hidden email]>
> An: [hidden email]
> Betreff: Re: java.io.IOException: No input directories specified in
> Datum: Fri, 21 Apr 2006 13:18:37 +0200
>
> Also I have noticed that you are using hadoop-0.1, there was a bug in
> 0.1 you should be using 0.1.1. Under you lib catalog you should have
> the following file
>
> hadoop-0.1.1.jar
>
> If thats the case. Please download the latest nightly build.
>
> Cheers
>
>
>
> On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > Do you have a file called "hadoop-site.xml" under your conf directory?
> > The content of the file is like the following:
> >
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >
> > <!-- Put site-specific property overrides in this file. -->
> >
> > <configuration>
> >
> > </configuration>
> >
> > or is it missing... if its missing please create a file under the conf
> > catalog with the name hadoop-site.xml and then try the hadoop dfs -ls
> > again?  you should see something! like listing from your local file
> > system.
> >
> > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > >
> > >
> > >
> > > > --- Ursprüngliche Nachricht ---
> > > > Von: "Zaheed Haque" <[hidden email]>
> > > > An: [hidden email]
> > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > >
> > > > bin/hadoop dfs -ls
> > > >
> > > > Can you see your "seeds" directory?
> > > >
> > >
> > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > 060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > 1-dev.jar!/hadoop-default.xml
> >
> > I think the hadoop-site is missing cos we should be seeing a message
> > like this here...
> >
> > 060421 131014 parsing
> > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> >
> > > 060421 122421 No FS indicated, using default:local
> > >
> > > bash-3.00$ bin/hadoop dfs -ls
> > >
> > > 060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > 1-dev.jar!/hadoop-default.xml
> > >
> > > 060421 122426 No FS indicated, using default:local
> > >
> > > Found 0 items
> > >
> > > bash-3.00$
> > >
> > > As you can see, i can't.
> > > What's going wrong?
> > >
> > > > bin/hadoop dfs -ls seeds
> > > >
> > > > Can you see your text file with URLS?
> > > >
> > > > Furthermore bin/nutch crawl is a one shot crawl/index command. I
> > > > strongly recommend you take the long route of
> > > >
> > > > inject, generate, fetch, updatedb, invertlinks, index, dedup and
> > > > merge.  You can try the above commands just by typing
> > > > bin/nutch inject
> > > > etc..
> > > > If just try the inject command without any parameters it will tell
> you
> > > > how to use it..
> > > >
> > > > Hope this helps.
> > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > hi
> > > > >
> > > > > i've changed from nutch 0.7 to 0.8
> > > > > done the following steps:
> > > > > created an urls.txt in a dir. named seeds
> > > > >
> > > > > bin/hadoop dfs -put seeds seeds
> > > > >
> > > > > 060317 121440 parsing
> > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > 060317 121441 No FS indicated, using default:local
> > > > >
> > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > > > but in crawl.log:
> > > > > 060419 124302 parsing
> > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > 060419 124302 parsing
> > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > 060419 124302 parsing
> > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > 060419 124302 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > java.io.IOException: No input directories specified in:
> Configuration:
> > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > hadoop-site.xml
> > > > >     at
> > > > >
> > > >
> > >
>
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
> > > > >     at
> > > > >
> > > >
> > >
>
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)

> > > > >     at
> > > > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > 060419 124302 Running job: job_e7cpf1
> > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > >     at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > >     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > >
> > > > > Any ideas?
> > > > >
> > > >
> > >
> > > --
> > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > >
> >
>

--
Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
Is your hadoop-site.xml empty, I mean it doesn't consisit any
configuration correct? So what you need to do is add your
configuration there. I suggest you copy the hadoop-0.1.1.jar to
another directory for inspection, copy not move. unzip the
hadoop-0.1.1.jar file you will see hadoop-default.xml file there. use
that as a template to edit your hadoop-site.xml under conf. Once you
have edited it then you should start your 'namenode' and 'datanode'. I
am guessing you are using nutch in a distributed way. cos you don't
need to use hadoop if you are just running in one machine local mode!!

Anyway you need to do the following to start the datanode and namenode

bin/hadoop-daemon.sh start namenode
bin/hadoop-daemon.sh start datanode

then you need to start jobtracker and tasktracker before you start crawling.

bin/hadoop-daemon.sh start jobtracker
bin/hadoop-daemon.sh start tasktracker

then you start your bin/hadoop dfs -put seeds seeds

On 4/21/06, Peter Swoboda <[hidden email]> wrote:

> ok. changed to latest nightly build.
> hadoop-0.1.1.jar is existing,
> hadoop-site.xml also.
> now trying
>
> bash-3.00$ bin/hadoop dfs -put seeds seeds
>
> 060421 125154 parsing
> jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> 0.1.1.jar!/hadoop-default.xml
> 060421 125155 parsing
> file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> 060421 125155 No FS indicated, using default:local
>
> and
>
> bash-3.00$ bin/hadoop dfs -ls
>
> 060421 125217 parsing
> jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> 0.1.1.jar!/hadoop-default.xml
> 060421 125217 parsing
> file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> 060421 125217 No FS indicated, using default:local
> Found 16 items
> /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
> /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
> /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
> /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> /home/stud/jung/Desktop/nutch-nightly/default.properties        3043
> /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
> /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
> /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> /home/stud/jung/Desktop/nutch-nightly/README.txt        403
>
> also:
>
> bash-3.00$ bin/hadoop dfs -ls seeds
>
> 060421 133004 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 133004 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 133004 No FS indicated, using default:local
> Found 2 items
> /home/../nutch-nightly/seeds/urls.txt~   0
> /home/../nutch-nightly/seeds/urls.txt    26
> bash-3.00$
>
> but:
>
> but:
>
> bin/nutch crawl seeds -dir crawled -depht 2
>
> 060421 131722 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060421 131723 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131723 crawl started in: crawled
> 060421 131723 rootUrlDir = 2
> 060421 131723 threads = 10
> 060421 131723 depth = 5
> 060421 131724 Injector: starting
> 060421 131724 Injector: crawlDb: crawled/crawldb
> 060421 131724 Injector: urlDir: 2
> 060421 131724 Injector: Converting injected urls to crawl db entries.
> 060421 131724 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131724 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060421 131724 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060421 131724 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131724 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131725 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060421 131725 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131725 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060421 131726 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060421 131726 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131726 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131726 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131727 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131727 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131727 parsing /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131727 job_6jn7j8
> java.io.IOException: No input directories specified in: Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal: hadoop-site.xml
>         at
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:90)
>         at
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:100)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> 060421 131728 Running job: job_6jn7j8
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:115)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> bash-3.00$
>
> Can anyone help?
>
>
>
>
>
> > --- Ursprüngliche Nachricht ---
> > Von: "Zaheed Haque" <[hidden email]>
> > An: [hidden email]
> > Betreff: Re: java.io.IOException: No input directories specified in
> > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> >
> > Also I have noticed that you are using hadoop-0.1, there was a bug in
> > 0.1 you should be using 0.1.1. Under you lib catalog you should have
> > the following file
> >
> > hadoop-0.1.1.jar
> >
> > If thats the case. Please download the latest nightly build.
> >
> > Cheers
> >
> >
> >
> > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > Do you have a file called "hadoop-site.xml" under your conf directory?
> > > The content of the file is like the following:
> > >
> > > <?xml version="1.0"?>
> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > >
> > > <!-- Put site-specific property overrides in this file. -->
> > >
> > > <configuration>
> > >
> > > </configuration>
> > >
> > > or is it missing... if its missing please create a file under the conf
> > > catalog with the name hadoop-site.xml and then try the hadoop dfs -ls
> > > again?  you should see something! like listing from your local file
> > > system.
> > >
> > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > >
> > > >
> > > >
> > > > > --- Ursprüngliche Nachricht ---
> > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > An: [hidden email]
> > > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > >
> > > > > bin/hadoop dfs -ls
> > > > >
> > > > > Can you see your "seeds" directory?
> > > > >
> > > >
> > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > 060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > 1-dev.jar!/hadoop-default.xml
> > >
> > > I think the hadoop-site is missing cos we should be seeing a message
> > > like this here...
> > >
> > > 060421 131014 parsing
> > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > >
> > > > 060421 122421 No FS indicated, using default:local
> > > >
> > > > bash-3.00$ bin/hadoop dfs -ls
> > > >
> > > > 060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > 1-dev.jar!/hadoop-default.xml
> > > >
> > > > 060421 122426 No FS indicated, using default:local
> > > >
> > > > Found 0 items
> > > >
> > > > bash-3.00$
> > > >
> > > > As you can see, i can't.
> > > > What's going wrong?
> > > >
> > > > > bin/hadoop dfs -ls seeds
> > > > >
> > > > > Can you see your text file with URLS?
> > > > >
> > > > > Furthermore bin/nutch crawl is a one shot crawl/index command. I
> > > > > strongly recommend you take the long route of
> > > > >
> > > > > inject, generate, fetch, updatedb, invertlinks, index, dedup and
> > > > > merge.  You can try the above commands just by typing
> > > > > bin/nutch inject
> > > > > etc..
> > > > > If just try the inject command without any parameters it will tell
> > you
> > > > > how to use it..
> > > > >
> > > > > Hope this helps.
> > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > hi
> > > > > >
> > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > done the following steps:
> > > > > > created an urls.txt in a dir. named seeds
> > > > > >
> > > > > > bin/hadoop dfs -put seeds seeds
> > > > > >
> > > > > > 060317 121440 parsing
> > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > 060317 121441 No FS indicated, using default:local
> > > > > >
> > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > > > > but in crawl.log:
> > > > > > 060419 124302 parsing
> > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > 060419 124302 parsing
> > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > 060419 124302 parsing
> > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > 060419 124302 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > java.io.IOException: No input directories specified in:
> > Configuration:
> > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > hadoop-site.xml
> > > > > >     at
> > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
> > > > > >     at
> > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
> > > > > >     at
> > > > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > >     at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > >     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > >
> > > > > > Any ideas?
> > > > > >
> > > > >
> > > >
> > > > --
> > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > >
> > >
> >
>
> --
> Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
> Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer
>
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
Thanks for giving help.
you're right, hasoop-site.xml is empty.
i will try that on monday.
thx again

Zaheed Haque schrieb:

> Is your hadoop-site.xml empty, I mean it doesn't consisit any
> configuration correct? So what you need to do is add your
> configuration there. I suggest you copy the hadoop-0.1.1.jar to
> another directory for inspection, copy not move. unzip the
> hadoop-0.1.1.jar file you will see hadoop-default.xml file there. use
> that as a template to edit your hadoop-site.xml under conf. Once you
> have edited it then you should start your 'namenode' and 'datanode'. I
> am guessing you are using nutch in a distributed way. cos you don't
> need to use hadoop if you are just running in one machine local mode!!
>
> Anyway you need to do the following to start the datanode and namenode
>
> bin/hadoop-daemon.sh start namenode
> bin/hadoop-daemon.sh start datanode
>
> then you need to start jobtracker and tasktracker before you start crawling.
>
> bin/hadoop-daemon.sh start jobtracker
> bin/hadoop-daemon.sh start tasktracker
>
> then you start your bin/hadoop dfs -put seeds seeds
>
> On 4/21/06, Peter Swoboda <[hidden email]> wrote:
>  
>> ok. changed to latest nightly build.
>> hadoop-0.1.1.jar is existing,
>> hadoop-site.xml also.
>> now trying
>>
>> bash-3.00$ bin/hadoop dfs -put seeds seeds
>>
>> 060421 125154 parsing
>> jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
>> 0.1.1.jar!/hadoop-default.xml
>> 060421 125155 parsing
>> file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
>> 060421 125155 No FS indicated, using default:local
>>
>> and
>>
>> bash-3.00$ bin/hadoop dfs -ls
>>
>> 060421 125217 parsing
>> jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
>> 0.1.1.jar!/hadoop-default.xml
>> 060421 125217 parsing
>> file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
>> 060421 125217 No FS indicated, using default:local
>> Found 16 items
>> /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
>> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
>> /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
>> /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
>> /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
>> /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
>> /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
>> /home/stud/jung/Desktop/nutch-nightly/default.properties        3043
>> /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
>> /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
>> /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
>> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
>> /home/stud/jung/Desktop/nutch-nightly/src       <dir>
>> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
>> /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
>> /home/stud/jung/Desktop/nutch-nightly/README.txt        403
>>
>> also:
>>
>> bash-3.00$ bin/hadoop dfs -ls seeds
>>
>> 060421 133004 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
>> 060421 133004 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
>> 060421 133004 No FS indicated, using default:local
>> Found 2 items
>> /home/../nutch-nightly/seeds/urls.txt~   0
>> /home/../nutch-nightly/seeds/urls.txt    26
>> bash-3.00$
>>
>> but:
>>
>> but:
>>
>> bin/nutch crawl seeds -dir crawled -depht 2
>>
>> 060421 131722 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
>> 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
>> 060421 131723 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
>> 060421 131723 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
>> 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
>> 060421 131723 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
>> 060421 131723 crawl started in: crawled
>> 060421 131723 rootUrlDir = 2
>> 060421 131723 threads = 10
>> 060421 131723 depth = 5
>> 060421 131724 Injector: starting
>> 060421 131724 Injector: crawlDb: crawled/crawldb
>> 060421 131724 Injector: urlDir: 2
>> 060421 131724 Injector: Converting injected urls to crawl db entries.
>> 060421 131724 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
>> 060421 131724 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
>> 060421 131724 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
>> 060421 131724 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
>> 060421 131724 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
>> 060421 131725 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
>> 060421 131725 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
>> 060421 131725 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
>> 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
>> 060421 131726 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
>> 060421 131726 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
>> 060421 131726 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
>> 060421 131726 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
>> 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
>> 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
>> 060421 131727 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
>> 060421 131727 parsing
>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
>> 060421 131727 parsing /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
>> 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
>> 060421 131727 job_6jn7j8
>> java.io.IOException: No input directories specified in: Configuration:
>> defaults: hadoop-default.xml , mapred-default.xml ,
>> /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal: hadoop-site.xml
>>         at
>> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:90)
>>         at
>> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:100)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
>> 060421 131728 Running job: job_6jn7j8
>> Exception in thread "main" java.io.IOException: Job failed!
>>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
>>         at org.apache.nutch.crawl.Injector.inject(Injector.java:115)
>>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
>> bash-3.00$
>>
>> Can anyone help?
>>
>>
>>
>>
>>
>>    
>>> --- Ursprüngliche Nachricht ---
>>> Von: "Zaheed Haque" <[hidden email]>
>>> An: [hidden email]
>>> Betreff: Re: java.io.IOException: No input directories specified in
>>> Datum: Fri, 21 Apr 2006 13:18:37 +0200
>>>
>>> Also I have noticed that you are using hadoop-0.1, there was a bug in
>>> 0.1 you should be using 0.1.1. Under you lib catalog you should have
>>> the following file
>>>
>>> hadoop-0.1.1.jar
>>>
>>> If thats the case. Please download the latest nightly build.
>>>
>>> Cheers
>>>
>>>
>>>
>>> On 4/21/06, Zaheed Haque <[hidden email]> wrote:
>>>      
>>>> Do you have a file called "hadoop-site.xml" under your conf directory?
>>>> The content of the file is like the following:
>>>>
>>>> <?xml version="1.0"?>
>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>
>>>> <!-- Put site-specific property overrides in this file. -->
>>>>
>>>> <configuration>
>>>>
>>>> </configuration>
>>>>
>>>> or is it missing... if its missing please create a file under the conf
>>>> catalog with the name hadoop-site.xml and then try the hadoop dfs -ls
>>>> again?  you should see something! like listing from your local file
>>>> system.
>>>>
>>>> On 4/21/06, Peter Swoboda <[hidden email]> wrote:
>>>>        
>>>>>
>>>>>          
>>>>>> --- Ursprüngliche Nachricht ---
>>>>>> Von: "Zaheed Haque" <[hidden email]>
>>>>>> An: [hidden email]
>>>>>> Betreff: Re: java.io.IOException: No input directories specified in
>>>>>> Datum: Fri, 21 Apr 2006 09:48:38 +0200
>>>>>>
>>>>>> bin/hadoop dfs -ls
>>>>>>
>>>>>> Can you see your "seeds" directory?
>>>>>>
>>>>>>            
>>>>> bash-3.00$ bin/hadoop dfs -put seeds seeds
>>>>> 060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
>>>>> 1-dev.jar!/hadoop-default.xml
>>>>>          
>>>> I think the hadoop-site is missing cos we should be seeing a message
>>>> like this here...
>>>>
>>>> 060421 131014 parsing
>>>> file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
>>>>
>>>>        
>>>>> 060421 122421 No FS indicated, using default:local
>>>>>
>>>>> bash-3.00$ bin/hadoop dfs -ls
>>>>>
>>>>> 060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
>>>>> 1-dev.jar!/hadoop-default.xml
>>>>>
>>>>> 060421 122426 No FS indicated, using default:local
>>>>>
>>>>> Found 0 items
>>>>>
>>>>> bash-3.00$
>>>>>
>>>>> As you can see, i can't.
>>>>> What's going wrong?
>>>>>
>>>>>          
>>>>>> bin/hadoop dfs -ls seeds
>>>>>>
>>>>>> Can you see your text file with URLS?
>>>>>>
>>>>>> Furthermore bin/nutch crawl is a one shot crawl/index command. I
>>>>>> strongly recommend you take the long route of
>>>>>>
>>>>>> inject, generate, fetch, updatedb, invertlinks, index, dedup and
>>>>>> merge.  You can try the above commands just by typing
>>>>>> bin/nutch inject
>>>>>> etc..
>>>>>> If just try the inject command without any parameters it will tell
>>>>>>            
>>> you
>>>      
>>>>>> how to use it..
>>>>>>
>>>>>> Hope this helps.
>>>>>> On 4/21/06, Peter Swoboda <[hidden email]> wrote:
>>>>>>            
>>>>>>> hi
>>>>>>>
>>>>>>> i've changed from nutch 0.7 to 0.8
>>>>>>> done the following steps:
>>>>>>> created an urls.txt in a dir. named seeds
>>>>>>>
>>>>>>> bin/hadoop dfs -put seeds seeds
>>>>>>>
>>>>>>> 060317 121440 parsing
>>>>>>>
>>>>>>>              
>>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
>>>      
>>>>>>> 060317 121441 No FS indicated, using default:local
>>>>>>>
>>>>>>> bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
>>>>>>> but in crawl.log:
>>>>>>> 060419 124302 parsing
>>>>>>>
>>>>>>>              
>>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
>>>      
>>>>>>> 060419 124302 parsing
>>>>>>>
>>>>>>>              
>>> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
>>>      
>>>>>>> 060419 124302 parsing
>>>>>>>              
>>>>>> /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
>>>>>>            
>>>>>>> 060419 124302 parsing
>>>>>>>              
>>> file:/home/../nutch-nightly/conf/hadoop-site.xml
>>>      
>>>>>>> java.io.IOException: No input directories specified in:
>>>>>>>              
>>> Configuration:
>>>      
>>>>>>> defaults: hadoop-default.xml , mapred-default.xml ,
>>>>>>> /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
>>>>>>>              
>>>>>> hadoop-site.xml
>>>>>>            
>>>>>>>     at
>>>>>>>
>>>>>>>              
>> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
>>    
>>>>>>>     at
>>>>>>>
>>>>>>>              
>> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
>>    
>>>>>>>     at
>>>>>>>
>>>>>>>              
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
>>>      
>>>>>>> 060419 124302 Running job: job_e7cpf1
>>>>>>> Exception in thread "main" java.io.IOException: Job failed!
>>>>>>>     at
>>>>>>>              
>>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
>>>      
>>>>>>>     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
>>>>>>>     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>>              
>>>>> --
>>>>> Echte DSL-Flatrate dauerhaft für 0,- Euro*!
>>>>> "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
>>>>>
>>>>>          
>> --
>> Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
>> Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer
>>
>>    
>
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
In reply to this post by Peter Swoboda
Got the latest nutch-nightly built,
including hadoop-0.1.1.jar.
Copied the content of the daoop-default.xml into hadoop-site.xml.
started namenode, datanode, jobtracker, tasktracker.
made
bin/hadoop dfs -put seeds seeds

result:

bash-3.00$ bin/hadoop-daemon.sh start namenode
starting namenode, logging to bin/../logs/hadoop-jung-namenode-gillespie.log

bash-3.00$ bin/hadoop-daemon.sh start datanode
starting datanode, logging to bin/../logs/hadoop-jung-datanode-gillespie.log

bash-3.00$ bin/hadoop-daemon.sh start jobtracker
starting jobtracker, logging to
bin/../logs/hadoop-jung-jobtracker-gillespie.log

bash-3.00$ bin/hadoop-daemon.sh start tasktracker
starting tasktracker, logging to
bin/../logs/hadoop-jung-tasktracker-gillespie.log

bash-3.00$ bin/hadoop dfs -put seeds seeds
060424 121512 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121512 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060424 121513 No FS indicated, using default:local

bash-3.00$ bin/hadoop dfs -ls
060424 121543 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121543 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060424 121544 No FS indicated, using default:local
Found 18 items
/home/../nutch-nightly/docs      <dir>
/home/../nutch-nightly/nutch-nightly.war 15541036
/home/../nutch-nightly/webapps   <dir>
/home/../nutch-nightly/CHANGES.txt       17709
/home/../nutch-nightly/build.xml 21433
/home/../nutch-nightly/LICENSE.txt       615
/home/../nutch-nightly/test.log  3447
/home/../nutch-nightly/conf      <dir>
/home/../nutch-nightly/default.properties        3043
/home/../nutch-nightly/plugins   <dir>
/home/../nutch-nightly/lib       <dir>
/home/../nutch-nightly/bin       <dir>
/home/../nutch-nightly/logs      <dir>
/home/../nutch-nightly/nutch-nightly.jar 408375
/home/../nutch-nightly/src       <dir>
/home/../nutch-nightly/nutch-nightly.job 18537096
/home/../nutch-nightly/seeds     <dir>
/home/../nutch-nightly/README.txt        403

bash-3.00$ bin/hadoop dfs -ls seeds
060424 121603 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121603 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060424 121603 No FS indicated, using default:local
Found 2 items
/home/../nutch-nightly/seeds/urls.txt~   0
/home/../nutch-nightly/seeds/urls.txt    26

so far so good, but:

bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
060424 121613 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060424 121613 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060424 121613 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060424 121613 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060424 121614 crawl started in: crawled
060424 121614 rootUrlDir = 2
060424 121614 threads = 10
060424 121614 depth = 5
Exception in thread "main" java.io.IOException: No valid local directories
in property: mapred.local.dir
        at
org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
        at org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
bash-3.00$

I really don't know what to do.
in hadoop-site.xml it's:
..
<property>
  <name>mapred.local.dir</name>
  <value>/tmp/hadoop/mapred/local</value>
  <description>The local directory where MapReduce stores intermediate
  data files.  May be a space- or comma- separated list of
  directories on different devices in order to spread disk i/o.
  </description>
</property>
..




_______________________________________
Is your hadoop-site.xml empty, I mean it doesn't consisit any
configuration correct? So what you need to do is add your
configuration there. I suggest you copy the hadoop-0.1.1.jar to
another directory for inspection, copy not move. unzip the
hadoop-0.1.1.jar file you will see hadoop-default.xml file there. use
that as a template to edit your hadoop-site.xml under conf. Once you
have edited it then you should start your 'namenode' and 'datanode'. I
am guessing you are using nutch in a distributed way. cos you don't
need to use hadoop if you are just running in one machine local mode!!

Anyway you need to do the following to start the datanode and namenode

bin/hadoop-daemon.sh start namenode
bin/hadoop-daemon.sh start datanode

then you need to start jobtracker and tasktracker before you start crawling
bin/hadoop-daemon.sh start jobtracker
bin/hadoop-daemon.sh start tasktracker

then you start your bin/hadoop dfs -put seeds seeds

On 4/21/06, Peter Swoboda <[hidden email]> wrote:

> ok. changed to latest nightly build.
> hadoop-0.1.1.jar is existing,
> hadoop-site.xml also.
> now trying
>
> bash-3.00$ bin/hadoop dfs -put seeds seeds
>
> 060421 125154 parsing
> jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> 0.1.1.jar!/hadoop-default.xml
> 060421 125155 parsing
> file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> 060421 125155 No FS indicated, using default:local
>
> and
>
> bash-3.00$ bin/hadoop dfs -ls
>
> 060421 125217 parsing
> jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> 0.1.1.jar!/hadoop-default.xml
> 060421 125217 parsing
> file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> 060421 125217 No FS indicated, using default:local
> Found 16 items
> /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
> /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
> /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
> /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> /home/stud/jung/Desktop/nutch-nightly/default.properties        3043
> /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
> /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
> /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> /home/stud/jung/Desktop/nutch-nightly/README.txt        403
>
> also:
>
> bash-3.00$ bin/hadoop dfs -ls seeds
>
> 060421 133004 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 133004 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 133004 No FS indicated, using default:local
> Found 2 items
> /home/../nutch-nightly/seeds/urls.txt~   0
> /home/../nutch-nightly/seeds/urls.txt    26
> bash-3.00$
>
> but:
>
> but:
>
> bin/nutch crawl seeds -dir crawled -depht 2
>
> 060421 131722 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060421 131723 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131723 crawl started in: crawled
> 060421 131723 rootUrlDir = 2
> 060421 131723 threads = 10
> 060421 131723 depth = 5
> 060421 131724 Injector: starting
> 060421 131724 Injector: crawlDb: crawled/crawldb
> 060421 131724 Injector: urlDir: 2
> 060421 131724 Injector: Converting injected urls to crawl db entries.
> 060421 131724 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131724 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060421 131724 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060421 131724 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131724 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131725 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060421 131725 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131725 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060421 131726 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060421 131726 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131726 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131726 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131727 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131727 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131727 parsing /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131727 job_6jn7j8
> java.io.IOException: No input directories specified in: Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal: hadoop-site.xml
>         at
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
:90)
>         at
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
:100)

>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> 060421 131728 Running job: job_6jn7j8
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:115)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> bash-3.00$
>
> Can anyone help?
>
>
>
>
>
> > --- Ursprüngliche Nachricht ---
> > Von: "Zaheed Haque" <[hidden email]>
> > An: [hidden email]
> > Betreff: Re: java.io.IOException: No input directories specified in
> > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> >
> > Also I have noticed that you are using hadoop-0.1, there was a bug in
> > 0.1 you should be using 0.1.1. Under you lib catalog you should have
> > the following file
> >
> > hadoop-0.1.1.jar
> >
> > If thats the case. Please download the latest nightly build.
> >
> > Cheers
> >
> >
> >
> > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > Do you have a file called "hadoop-site.xml" under your conf directory?
> > > The content of the file is like the following:
> > >
> > > <?xml version="1.0"?>
> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > >
> > > <!-- Put site-specific property overrides in this file. -->
> > >
> > > <configuration>
> > >
> > > </configuration>
> > >
> > > or is it missing... if its missing please create a file under the conf
> > > catalog with the name hadoop-site.xml and then try the hadoop dfs -ls
> > > again?  you should see something! like listing from your local file
> > > system.
> > >
> > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > >
> > > >
> > > >
> > > > > --- Ursprüngliche Nachricht ---
> > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > An: [hidden email]
> > > > > Betreff: Re: java.io.IOException: No input directories specified
in

> > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > >
> > > > > bin/hadoop dfs -ls
> > > > >
> > > > > Can you see your "seeds" directory?
> > > > >
> > > >
> > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > 060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > 1-dev.jar!/hadoop-default.xml
> > >
> > > I think the hadoop-site is missing cos we should be seeing a message
> > > like this here...
> > >
> > > 060421 131014 parsing
> > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > >
> > > > 060421 122421 No FS indicated, using default:local
> > > >
> > > > bash-3.00$ bin/hadoop dfs -ls
> > > >
> > > > 060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > 1-dev.jar!/hadoop-default.xml
> > > >
> > > > 060421 122426 No FS indicated, using default:local
> > > >
> > > > Found 0 items
> > > >
> > > > bash-3.00$
> > > >
> > > > As you can see, i can't.
> > > > What's going wrong?
> > > >
> > > > > bin/hadoop dfs -ls seeds
> > > > >
> > > > > Can you see your text file with URLS?
> > > > >
> > > > > Furthermore bin/nutch crawl is a one shot crawl/index command. I
> > > > > strongly recommend you take the long route of
> > > > >
> > > > > inject, generate, fetch, updatedb, invertlinks, index, dedup and
> > > > > merge.  You can try the above commands just by typing
> > > > > bin/nutch inject
> > > > > etc..
> > > > > If just try the inject command without any parameters it will tell
> > you
> > > > > how to use it..
> > > > >
> > > > > Hope this helps.
> > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > hi
> > > > > >
> > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > done the following steps:
> > > > > > created an urls.txt in a dir. named seeds
> > > > > >
> > > > > > bin/hadoop dfs -put seeds seeds
> > > > > >
> > > > > > 060317 121440 parsing
> > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-
0.1-dev.jar!/hadoop-default.xml
> > > > > > 060317 121441 No FS indicated, using default:local
> > > > > >
> > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > > > > but in crawl.log:
> > > > > > 060419 124302 parsing
> > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-
0.1-dev.jar!/hadoop-default.xml
> > > > > > 060419 124302 parsing
> > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-
0.1-dev.jar!/mapred-default.xml

> > > > > > 060419 124302 parsing
> > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > 060419 124302 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > java.io.IOException: No input directories specified in:
> > Configuration:
> > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > hadoop-site.xml
> > > > > >     at
> > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
:84)
> > > > > >     at
> > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
:94)

> > > > > >     at
> > > > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > >     at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > >     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > >
> > > > > > Any ideas?
> > > > > >
> > > > >
> > > >
> > > > --
> > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > >
> > >
> >
>
> --

--
Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
In reply to this post by Peter Swoboda
Got the latest nutch-nightly built,
including hadoop-0.1.1.jar.
Copied the content of the daoop-default.xml into hadoop-site.xml.
started namenode, datanode, jobtracker, tasktracker.
made
bin/hadoop dfs -put seeds seeds

result:

bash-3.00$ bin/hadoop-daemon.sh start namenode
starting namenode, logging to bin/../logs/hadoop-jung-namenode-gillespie.log

bash-3.00$ bin/hadoop-daemon.sh start datanode
starting datanode, logging to bin/../logs/hadoop-jung-datanode-gillespie.log

bash-3.00$ bin/hadoop-daemon.sh start jobtracker
starting jobtracker, logging to
bin/../logs/hadoop-jung-jobtracker-gillespie.log

bash-3.00$ bin/hadoop-daemon.sh start tasktracker
starting tasktracker, logging to
bin/../logs/hadoop-jung-tasktracker-gillespie.log

bash-3.00$ bin/hadoop dfs -put seeds seeds
060424 121512 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121512 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060424 121513 No FS indicated, using default:local

bash-3.00$ bin/hadoop dfs -ls
060424 121543 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121543 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060424 121544 No FS indicated, using default:local
Found 18 items
/home/../nutch-nightly/docs      <dir>
/home/../nutch-nightly/nutch-nightly.war 15541036
/home/../nutch-nightly/webapps   <dir>
/home/../nutch-nightly/CHANGES.txt       17709
/home/../nutch-nightly/build.xml 21433
/home/../nutch-nightly/LICENSE.txt       615
/home/../nutch-nightly/test.log  3447
/home/../nutch-nightly/conf      <dir>
/home/../nutch-nightly/default.properties        3043
/home/../nutch-nightly/plugins   <dir>
/home/../nutch-nightly/lib       <dir>
/home/../nutch-nightly/bin       <dir>
/home/../nutch-nightly/logs      <dir>
/home/../nutch-nightly/nutch-nightly.jar 408375
/home/../nutch-nightly/src       <dir>
/home/../nutch-nightly/nutch-nightly.job 18537096
/home/../nutch-nightly/seeds     <dir>
/home/../nutch-nightly/README.txt        403

bash-3.00$ bin/hadoop dfs -ls seeds
060424 121603 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121603 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060424 121603 No FS indicated, using default:local
Found 2 items
/home/../nutch-nightly/seeds/urls.txt~   0
/home/../nutch-nightly/seeds/urls.txt    26

so far so good, but:

bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
060424 121613 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060424 121613 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060424 121613 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060424 121613 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060424 121614 crawl started in: crawled
060424 121614 rootUrlDir = 2
060424 121614 threads = 10
060424 121614 depth = 5
Exception in thread "main" java.io.IOException: No valid local directories
in property: mapred.local.dir
        at
org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
        at org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
bash-3.00$

I really don't know what to do.
in hadoop-site.xml it's:
..
<property>
  <name>mapred.local.dir</name>
  <value>/tmp/hadoop/mapred/local</value>
  <description>The local directory where MapReduce stores intermediate
  data files.  May be a space- or comma- separated list of
  directories on different devices in order to spread disk i/o.
  </description>
</property>
..




_______________________________________
Is your hadoop-site.xml empty, I mean it doesn't consisit any
configuration correct? So what you need to do is add your
configuration there. I suggest you copy the hadoop-0.1.1.jar to
another directory for inspection, copy not move. unzip the
hadoop-0.1.1.jar file you will see hadoop-default.xml file there. use
that as a template to edit your hadoop-site.xml under conf. Once you
have edited it then you should start your 'namenode' and 'datanode'. I
am guessing you are using nutch in a distributed way. cos you don't
need to use hadoop if you are just running in one machine local mode!!

Anyway you need to do the following to start the datanode and namenode

bin/hadoop-daemon.sh start namenode
bin/hadoop-daemon.sh start datanode

then you need to start jobtracker and tasktracker before you start crawling
bin/hadoop-daemon.sh start jobtracker
bin/hadoop-daemon.sh start tasktracker

then you start your bin/hadoop dfs -put seeds seeds

On 4/21/06, Peter Swoboda <[hidden email]> wrote:

> ok. changed to latest nightly build.
> hadoop-0.1.1.jar is existing,
> hadoop-site.xml also.
> now trying
>
> bash-3.00$ bin/hadoop dfs -put seeds seeds
>
> 060421 125154 parsing
> jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> 0.1.1.jar!/hadoop-default.xml
> 060421 125155 parsing
> file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> 060421 125155 No FS indicated, using default:local
>
> and
>
> bash-3.00$ bin/hadoop dfs -ls
>
> 060421 125217 parsing
> jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> 0.1.1.jar!/hadoop-default.xml
> 060421 125217 parsing
> file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> 060421 125217 No FS indicated, using default:local
> Found 16 items
> /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
> /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
> /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
> /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> /home/stud/jung/Desktop/nutch-nightly/default.properties        3043
> /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
> /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
> /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> /home/stud/jung/Desktop/nutch-nightly/README.txt        403
>
> also:
>
> bash-3.00$ bin/hadoop dfs -ls seeds
>
> 060421 133004 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 133004 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 133004 No FS indicated, using default:local
> Found 2 items
> /home/../nutch-nightly/seeds/urls.txt~   0
> /home/../nutch-nightly/seeds/urls.txt    26
> bash-3.00$
>
> but:
>
> but:
>
> bin/nutch crawl seeds -dir crawled -depht 2
>
> 060421 131722 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060421 131723 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060421 131723 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131723 crawl started in: crawled
> 060421 131723 rootUrlDir = 2
> 060421 131723 threads = 10
> 060421 131723 depth = 5
> 060421 131724 Injector: starting
> 060421 131724 Injector: crawlDb: crawled/crawldb
> 060421 131724 Injector: urlDir: 2
> 060421 131724 Injector: Converting injected urls to crawl db entries.
> 060421 131724 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131724 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060421 131724 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060421 131724 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131724 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131725 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060421 131725 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131725 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060421 131726 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060421 131726 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131726 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131726 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131727 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060421 131727 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060421 131727 parsing /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060421 131727 job_6jn7j8
> java.io.IOException: No input directories specified in: Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal: hadoop-site.xml
>         at
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
:90)
>         at
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
:100)

>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> 060421 131728 Running job: job_6jn7j8
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:115)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> bash-3.00$
>
> Can anyone help?
>
>
>
>
>
> > --- Ursprüngliche Nachricht ---
> > Von: "Zaheed Haque" <[hidden email]>
> > An: [hidden email]
> > Betreff: Re: java.io.IOException: No input directories specified in
> > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> >
> > Also I have noticed that you are using hadoop-0.1, there was a bug in
> > 0.1 you should be using 0.1.1. Under you lib catalog you should have
> > the following file
> >
> > hadoop-0.1.1.jar
> >
> > If thats the case. Please download the latest nightly build.
> >
> > Cheers
> >
> >
> >
> > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > Do you have a file called "hadoop-site.xml" under your conf directory?
> > > The content of the file is like the following:
> > >
> > > <?xml version="1.0"?>
> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > >
> > > <!-- Put site-specific property overrides in this file. -->
> > >
> > > <configuration>
> > >
> > > </configuration>
> > >
> > > or is it missing... if its missing please create a file under the conf
> > > catalog with the name hadoop-site.xml and then try the hadoop dfs -ls
> > > again?  you should see something! like listing from your local file
> > > system.
> > >
> > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > >
> > > >
> > > >
> > > > > --- Ursprüngliche Nachricht ---
> > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > An: [hidden email]
> > > > > Betreff: Re: java.io.IOException: No input directories specified
in

> > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > >
> > > > > bin/hadoop dfs -ls
> > > > >
> > > > > Can you see your "seeds" directory?
> > > > >
> > > >
> > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > 060421 122421 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > 1-dev.jar!/hadoop-default.xml
> > >
> > > I think the hadoop-site is missing cos we should be seeing a message
> > > like this here...
> > >
> > > 060421 131014 parsing
> > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > >
> > > > 060421 122421 No FS indicated, using default:local
> > > >
> > > > bash-3.00$ bin/hadoop dfs -ls
> > > >
> > > > 060421 122425 parsing jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > 1-dev.jar!/hadoop-default.xml
> > > >
> > > > 060421 122426 No FS indicated, using default:local
> > > >
> > > > Found 0 items
> > > >
> > > > bash-3.00$
> > > >
> > > > As you can see, i can't.
> > > > What's going wrong?
> > > >
> > > > > bin/hadoop dfs -ls seeds
> > > > >
> > > > > Can you see your text file with URLS?
> > > > >
> > > > > Furthermore bin/nutch crawl is a one shot crawl/index command. I
> > > > > strongly recommend you take the long route of
> > > > >
> > > > > inject, generate, fetch, updatedb, invertlinks, index, dedup and
> > > > > merge.  You can try the above commands just by typing
> > > > > bin/nutch inject
> > > > > etc..
> > > > > If just try the inject command without any parameters it will tell
> > you
> > > > > how to use it..
> > > > >
> > > > > Hope this helps.
> > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > hi
> > > > > >
> > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > done the following steps:
> > > > > > created an urls.txt in a dir. named seeds
> > > > > >
> > > > > > bin/hadoop dfs -put seeds seeds
> > > > > >
> > > > > > 060317 121440 parsing
> > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-
0.1-dev.jar!/hadoop-default.xml
> > > > > > 060317 121441 No FS indicated, using default:local
> > > > > >
> > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > > > > but in crawl.log:
> > > > > > 060419 124302 parsing
> > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-
0.1-dev.jar!/hadoop-default.xml
> > > > > > 060419 124302 parsing
> > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-
0.1-dev.jar!/mapred-default.xml

> > > > > > 060419 124302 parsing
> > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > 060419 124302 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > java.io.IOException: No input directories specified in:
> > Configuration:
> > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > hadoop-site.xml
> > > > > >     at
> > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
:84)
> > > > > >     at
> > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
:94)

> > > > > >     at
> > > > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > >     at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > >     at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > >
> > > > > > Any ideas?
> > > > > >
> > > > >
> > > >
> > > > --
> > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > >
> > >
> >
>
> --

--
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
I forgot to have a look at the log files:
namenode:
060424 121444 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121444 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
Exception in thread "main" java.lang.RuntimeException: Not a host:port pair:
local
        at org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)


datanode
060424 121448 10 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121448 10 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060424 121448 10 Can't start DataNode in non-directory: /tmp/hadoop/dfs/data

jobtracker
060424 121455 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121455 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060424 121455 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121456 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060424 121456 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
Exception in thread "main" java.lang.RuntimeException: Bad
mapred.job.tracker: local
        at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
        at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
        at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
        at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)


tasktracker
060424 121502 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060424 121503 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
Exception in thread "main" java.lang.RuntimeException: Bad
mapred.job.tracker: local
        at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
        at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)


What can be the problem?

> --- Ursprüngliche Nachricht ---
> Von: "Peter Swoboda" <[hidden email]>
> An: [hidden email]
> Betreff: Re: java.io.IOException: No input directories specified in
> Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
>
> Got the latest nutch-nightly built,
> including hadoop-0.1.1.jar.
> Copied the content of the daoop-default.xml into hadoop-site.xml.
> started namenode, datanode, jobtracker, tasktracker.
> made
> bin/hadoop dfs -put seeds seeds
>
> result:
>
> bash-3.00$ bin/hadoop-daemon.sh start namenode
> starting namenode, logging to
> bin/../logs/hadoop-jung-namenode-gillespie.log
>
> bash-3.00$ bin/hadoop-daemon.sh start datanode
> starting datanode, logging to
> bin/../logs/hadoop-jung-datanode-gillespie.log
>
> bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> starting jobtracker, logging to
> bin/../logs/hadoop-jung-jobtracker-gillespie.log
>
> bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> starting tasktracker, logging to
> bin/../logs/hadoop-jung-tasktracker-gillespie.log
>
> bash-3.00$ bin/hadoop dfs -put seeds seeds
> 060424 121512 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060424 121512 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060424 121513 No FS indicated, using default:local
>
> bash-3.00$ bin/hadoop dfs -ls
> 060424 121543 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060424 121543 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060424 121544 No FS indicated, using default:local
> Found 18 items
> /home/../nutch-nightly/docs      <dir>
> /home/../nutch-nightly/nutch-nightly.war 15541036
> /home/../nutch-nightly/webapps   <dir>
> /home/../nutch-nightly/CHANGES.txt       17709
> /home/../nutch-nightly/build.xml 21433
> /home/../nutch-nightly/LICENSE.txt       615
> /home/../nutch-nightly/test.log  3447
> /home/../nutch-nightly/conf      <dir>
> /home/../nutch-nightly/default.properties        3043
> /home/../nutch-nightly/plugins   <dir>
> /home/../nutch-nightly/lib       <dir>
> /home/../nutch-nightly/bin       <dir>
> /home/../nutch-nightly/logs      <dir>
> /home/../nutch-nightly/nutch-nightly.jar 408375
> /home/../nutch-nightly/src       <dir>
> /home/../nutch-nightly/nutch-nightly.job 18537096
> /home/../nutch-nightly/seeds     <dir>
> /home/../nutch-nightly/README.txt        403
>
> bash-3.00$ bin/hadoop dfs -ls seeds
> 060424 121603 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060424 121603 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060424 121603 No FS indicated, using default:local
> Found 2 items
> /home/../nutch-nightly/seeds/urls.txt~   0
> /home/../nutch-nightly/seeds/urls.txt    26
>
> so far so good, but:
>
> bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> 060424 121613 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060424 121613 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060424 121613 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060424 121613 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060424 121614 crawl started in: crawled
> 060424 121614 rootUrlDir = 2
> 060424 121614 threads = 10
> 060424 121614 depth = 5
> Exception in thread "main" java.io.IOException: No valid local directories
> in property: mapred.local.dir
>         at
> org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
>         at org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> bash-3.00$
>
> I really don't know what to do.
> in hadoop-site.xml it's:
> ..
> <property>
>   <name>mapred.local.dir</name>
>   <value>/tmp/hadoop/mapred/local</value>
>   <description>The local directory where MapReduce stores intermediate
>   data files.  May be a space- or comma- separated list of
>   directories on different devices in order to spread disk i/o.
>   </description>
> </property>
> ..
>
>
>
>
> _______________________________________
> Is your hadoop-site.xml empty, I mean it doesn't consisit any
> configuration correct? So what you need to do is add your
> configuration there. I suggest you copy the hadoop-0.1.1.jar to
> another directory for inspection, copy not move. unzip the
> hadoop-0.1.1.jar file you will see hadoop-default.xml file there. use
> that as a template to edit your hadoop-site.xml under conf. Once you
> have edited it then you should start your 'namenode' and 'datanode'. I
> am guessing you are using nutch in a distributed way. cos you don't
> need to use hadoop if you are just running in one machine local mode!!
>
> Anyway you need to do the following to start the datanode and namenode
>
> bin/hadoop-daemon.sh start namenode
> bin/hadoop-daemon.sh start datanode
>
> then you need to start jobtracker and tasktracker before you start
> crawling
> bin/hadoop-daemon.sh start jobtracker
> bin/hadoop-daemon.sh start tasktracker
>
> then you start your bin/hadoop dfs -put seeds seeds
>
> On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > ok. changed to latest nightly build.
> > hadoop-0.1.1.jar is existing,
> > hadoop-site.xml also.
> > now trying
> >
> > bash-3.00$ bin/hadoop dfs -put seeds seeds
> >
> > 060421 125154 parsing
> > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > 0.1.1.jar!/hadoop-default.xml
> > 060421 125155 parsing
> > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> > 060421 125155 No FS indicated, using default:local
> >
> > and
> >
> > bash-3.00$ bin/hadoop dfs -ls
> >
> > 060421 125217 parsing
> > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > 0.1.1.jar!/hadoop-default.xml
> > 060421 125217 parsing
> > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> > 060421 125217 No FS indicated, using default:local
> > Found 16 items
> > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
> > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
> > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
> > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > /home/stud/jung/Desktop/nutch-nightly/default.properties        3043
> > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
> > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
> > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > /home/stud/jung/Desktop/nutch-nightly/README.txt        403
> >
> > also:
> >
> > bash-3.00$ bin/hadoop dfs -ls seeds
> >
> > 060421 133004 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060421 133004 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060421 133004 No FS indicated, using default:local
> > Found 2 items
> > /home/../nutch-nightly/seeds/urls.txt~   0
> > /home/../nutch-nightly/seeds/urls.txt    26
> > bash-3.00$
> >
> > but:
> >
> > but:
> >
> > bin/nutch crawl seeds -dir crawled -depht 2
> >
> > 060421 131722 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > 060421 131723 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > 060421 131723 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > 060421 131723 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060421 131723 crawl started in: crawled
> > 060421 131723 rootUrlDir = 2
> > 060421 131723 threads = 10
> > 060421 131723 depth = 5
> > 060421 131724 Injector: starting
> > 060421 131724 Injector: crawlDb: crawled/crawldb
> > 060421 131724 Injector: urlDir: 2
> > 060421 131724 Injector: Converting injected urls to crawl db entries.
> > 060421 131724 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060421 131724 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > 060421 131724 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > 060421 131724 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060421 131724 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060421 131725 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > 060421 131725 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060421 131725 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > 060421 131726 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > 060421 131726 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060421 131726 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060421 131726 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060421 131727 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060421 131727 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060421 131727 parsing
> /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060421 131727 job_6jn7j8
> > java.io.IOException: No input directories specified in: Configuration:
> > defaults: hadoop-default.xml , mapred-default.xml ,
> > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> hadoop-site.xml
> >         at
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> :90)
> >         at
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> :100)
> >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > 060421 131728 Running job: job_6jn7j8
> > Exception in thread "main" java.io.IOException: Job failed!
> >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> >         at org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > bash-3.00$
> >
> > Can anyone help?
> >
> >
> >
> >
> >
> > > --- Ursprüngliche Nachricht ---
> > > Von: "Zaheed Haque" <[hidden email]>
> > > An: [hidden email]
> > > Betreff: Re: java.io.IOException: No input directories specified in
> > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > >
> > > Also I have noticed that you are using hadoop-0.1, there was a bug in
> > > 0.1 you should be using 0.1.1. Under you lib catalog you should have
> > > the following file
> > >
> > > hadoop-0.1.1.jar
> > >
> > > If thats the case. Please download the latest nightly build.
> > >
> > > Cheers
> > >
> > >
> > >
> > > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > > Do you have a file called "hadoop-site.xml" under your conf
> directory?
> > > > The content of the file is like the following:
> > > >
> > > > <?xml version="1.0"?>
> > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > >
> > > > <!-- Put site-specific property overrides in this file. -->
> > > >
> > > > <configuration>
> > > >
> > > > </configuration>
> > > >
> > > > or is it missing... if its missing please create a file under the
> conf
> > > > catalog with the name hadoop-site.xml and then try the hadoop dfs
> -ls
> > > > again?  you should see something! like listing from your local file
> > > > system.
> > > >
> > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > --- Ursprüngliche Nachricht ---
> > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > An: [hidden email]
> > > > > > Betreff: Re: java.io.IOException: No input directories specified
> in
> > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > >
> > > > > > bin/hadoop dfs -ls
> > > > > >
> > > > > > Can you see your "seeds" directory?
> > > > > >
> > > > >
> > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > 060421 122421 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > 1-dev.jar!/hadoop-default.xml
> > > >
> > > > I think the hadoop-site is missing cos we should be seeing a message
> > > > like this here...
> > > >
> > > > 060421 131014 parsing
> > > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > >
> > > > > 060421 122421 No FS indicated, using default:local
> > > > >
> > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > >
> > > > > 060421 122425 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > 1-dev.jar!/hadoop-default.xml
> > > > >
> > > > > 060421 122426 No FS indicated, using default:local
> > > > >
> > > > > Found 0 items
> > > > >
> > > > > bash-3.00$
> > > > >
> > > > > As you can see, i can't.
> > > > > What's going wrong?
> > > > >
> > > > > > bin/hadoop dfs -ls seeds
> > > > > >
> > > > > > Can you see your text file with URLS?
> > > > > >
> > > > > > Furthermore bin/nutch crawl is a one shot crawl/index command. I
> > > > > > strongly recommend you take the long route of
> > > > > >
> > > > > > inject, generate, fetch, updatedb, invertlinks, index, dedup and
> > > > > > merge.  You can try the above commands just by typing
> > > > > > bin/nutch inject
> > > > > > etc..
> > > > > > If just try the inject command without any parameters it will
> tell
> > > you
> > > > > > how to use it..
> > > > > >
> > > > > > Hope this helps.
> > > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > hi
> > > > > > >
> > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > done the following steps:
> > > > > > > created an urls.txt in a dir. named seeds
> > > > > > >
> > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > >
> > > > > > > 060317 121440 parsing
> > > > > > >
> > > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-
> 0.1-dev.jar!/hadoop-default.xml
> > > > > > > 060317 121441 No FS indicated, using default:local
> > > > > > >
> > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > > > > > but in crawl.log:
> > > > > > > 060419 124302 parsing
> > > > > > >
> > > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-
> 0.1-dev.jar!/hadoop-default.xml
> > > > > > > 060419 124302 parsing
> > > > > > >
> > > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-
> 0.1-dev.jar!/mapred-default.xml
> > > > > > > 060419 124302 parsing
> > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > 060419 124302 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > java.io.IOException: No input directories specified in:
> > > Configuration:
> > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > hadoop-site.xml
> > > > > > >     at
> > > > > > >
> > > > > >
> > > > >
> > >
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> :84)
> > > > > > >     at
> > > > > > >
> > > > > >
> > > > >
> > >
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> :94)
> > > > > > >     at
> > > > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > > >     at
> > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > >     at
> org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > >
> > > > > > > Any ideas?
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > >
> > > >
> > >
> >
> > --
>
> --
> Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
>

--
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
On 4/24/06, Peter Swoboda <[hidden email]> wrote:

> I forgot to have a look at the log files:
> namenode:
> 060424 121444 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060424 121444 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> Exception in thread "main" java.lang.RuntimeException: Not a host:port pair:
> local
>         at org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
>         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
>
>
> datanode
> 060424 121448 10 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060424 121448 10 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060424 121448 10 Can't start DataNode in non-directory: /tmp/hadoop/dfs/data
>
> jobtracker
> 060424 121455 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060424 121455 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060424 121455 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060424 121456 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060424 121456 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> Exception in thread "main" java.lang.RuntimeException: Bad
> mapred.job.tracker: local
>         at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
>         at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
>         at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
>         at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
>
>
> tasktracker
> 060424 121502 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060424 121503 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> Exception in thread "main" java.lang.RuntimeException: Bad
> mapred.job.tracker: local
>         at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
>         at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
>
>
> What can be the problem?
> > --- Ursprüngliche Nachricht ---
> > Von: "Peter Swoboda" <[hidden email]>
> > An: [hidden email]
> > Betreff: Re: java.io.IOException: No input directories specified in
> > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> >
> > Got the latest nutch-nightly built,
> > including hadoop-0.1.1.jar.
> > Copied the content of the daoop-default.xml into hadoop-site.xml.
> > started namenode, datanode, jobtracker, tasktracker.
> > made
> > bin/hadoop dfs -put seeds seeds
> >
> > result:
> >
> > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > starting namenode, logging to
> > bin/../logs/hadoop-jung-namenode-gillespie.log
> >
> > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > starting datanode, logging to
> > bin/../logs/hadoop-jung-datanode-gillespie.log
> >
> > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > starting jobtracker, logging to
> > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> >
> > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > starting tasktracker, logging to
> > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> >
> > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > 060424 121512 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060424 121512 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060424 121513 No FS indicated, using default:local
> >
> > bash-3.00$ bin/hadoop dfs -ls
> > 060424 121543 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060424 121543 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060424 121544 No FS indicated, using default:local
> > Found 18 items
> > /home/../nutch-nightly/docs      <dir>
> > /home/../nutch-nightly/nutch-nightly.war 15541036
> > /home/../nutch-nightly/webapps   <dir>
> > /home/../nutch-nightly/CHANGES.txt       17709
> > /home/../nutch-nightly/build.xml 21433
> > /home/../nutch-nightly/LICENSE.txt       615
> > /home/../nutch-nightly/test.log  3447
> > /home/../nutch-nightly/conf      <dir>
> > /home/../nutch-nightly/default.properties        3043
> > /home/../nutch-nightly/plugins   <dir>
> > /home/../nutch-nightly/lib       <dir>
> > /home/../nutch-nightly/bin       <dir>
> > /home/../nutch-nightly/logs      <dir>
> > /home/../nutch-nightly/nutch-nightly.jar 408375
> > /home/../nutch-nightly/src       <dir>
> > /home/../nutch-nightly/nutch-nightly.job 18537096
> > /home/../nutch-nightly/seeds     <dir>
> > /home/../nutch-nightly/README.txt        403
> >
> > bash-3.00$ bin/hadoop dfs -ls seeds
> > 060424 121603 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060424 121603 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060424 121603 No FS indicated, using default:local
> > Found 2 items
> > /home/../nutch-nightly/seeds/urls.txt~   0
> > /home/../nutch-nightly/seeds/urls.txt    26
> >
> > so far so good, but:
> >
> > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > 060424 121613 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > 060424 121613 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > 060424 121613 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > 060424 121613 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060424 121614 crawl started in: crawled
> > 060424 121614 rootUrlDir = 2
> > 060424 121614 threads = 10
> > 060424 121614 depth = 5
> > Exception in thread "main" java.io.IOException: No valid local directories
> > in property: mapred.local.dir
> >         at
> > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> >         at org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > bash-3.00$
> >
> > I really don't know what to do.
> > in hadoop-site.xml it's:
> > ..
> > <property>
> >   <name>mapred.local.dir</name>
> >   <value>/tmp/hadoop/mapred/local</value>
> >   <description>The local directory where MapReduce stores intermediate
> >   data files.  May be a space- or comma- separated list of
> >   directories on different devices in order to spread disk i/o.
> >   </description>
> > </property>
> > ..
> >
> >
> >
> >
> > _______________________________________
> > Is your hadoop-site.xml empty, I mean it doesn't consisit any
> > configuration correct? So what you need to do is add your
> > configuration there. I suggest you copy the hadoop-0.1.1.jar to
> > another directory for inspection, copy not move. unzip the
> > hadoop-0.1.1.jar file you will see hadoop-default.xml file there. use
> > that as a template to edit your hadoop-site.xml under conf. Once you
> > have edited it then you should start your 'namenode' and 'datanode'. I
> > am guessing you are using nutch in a distributed way. cos you don't
> > need to use hadoop if you are just running in one machine local mode!!
> >
> > Anyway you need to do the following to start the datanode and namenode
> >
> > bin/hadoop-daemon.sh start namenode
> > bin/hadoop-daemon.sh start datanode
> >
> > then you need to start jobtracker and tasktracker before you start
> > crawling
> > bin/hadoop-daemon.sh start jobtracker
> > bin/hadoop-daemon.sh start tasktracker
> >
> > then you start your bin/hadoop dfs -put seeds seeds
> >
> > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > ok. changed to latest nightly build.
> > > hadoop-0.1.1.jar is existing,
> > > hadoop-site.xml also.
> > > now trying
> > >
> > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > >
> > > 060421 125154 parsing
> > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > 0.1.1.jar!/hadoop-default.xml
> > > 060421 125155 parsing
> > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> > > 060421 125155 No FS indicated, using default:local
> > >
> > > and
> > >
> > > bash-3.00$ bin/hadoop dfs -ls
> > >
> > > 060421 125217 parsing
> > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > 0.1.1.jar!/hadoop-default.xml
> > > 060421 125217 parsing
> > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> > > 060421 125217 No FS indicated, using default:local
> > > Found 16 items
> > > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
> > > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
> > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
> > > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > > /home/stud/jung/Desktop/nutch-nightly/default.properties        3043
> > > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
> > > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
> > > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > > /home/stud/jung/Desktop/nutch-nightly/README.txt        403
> > >
> > > also:
> > >
> > > bash-3.00$ bin/hadoop dfs -ls seeds
> > >
> > > 060421 133004 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060421 133004 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060421 133004 No FS indicated, using default:local
> > > Found 2 items
> > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > /home/../nutch-nightly/seeds/urls.txt    26
> > > bash-3.00$
> > >
> > > but:
> > >
> > > but:
> > >
> > > bin/nutch crawl seeds -dir crawled -depht 2
> > >
> > > 060421 131722 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > > 060421 131723 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > 060421 131723 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > > 060421 131723 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060421 131723 crawl started in: crawled
> > > 060421 131723 rootUrlDir = 2
> > > 060421 131723 threads = 10
> > > 060421 131723 depth = 5
> > > 060421 131724 Injector: starting
> > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > 060421 131724 Injector: urlDir: 2
> > > 060421 131724 Injector: Converting injected urls to crawl db entries.
> > > 060421 131724 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060421 131724 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > > 060421 131724 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > 060421 131724 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060421 131724 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060421 131725 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > > 060421 131725 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060421 131725 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > > 060421 131726 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > 060421 131726 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060421 131726 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060421 131726 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > > 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060421 131727 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060421 131727 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060421 131727 parsing
> > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060421 131727 job_6jn7j8
> > > java.io.IOException: No input directories specified in: Configuration:
> > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > hadoop-site.xml
> > >         at
> > > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > :90)
> > >         at
> > > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > :100)
> > >         at
> > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > 060421 131728 Running job: job_6jn7j8
> > > Exception in thread "main" java.io.IOException: Job failed!
> > >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > bash-3.00$
> > >
> > > Can anyone help?
> > >
> > >
> > >
> > >
> > >
> > > > --- Ursprüngliche Nachricht ---
> > > > Von: "Zaheed Haque" <[hidden email]>
> > > > An: [hidden email]
> > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > >
> > > > Also I have noticed that you are using hadoop-0.1, there was a bug in
> > > > 0.1 you should be using 0.1.1. Under you lib catalog you should have
> > > > the following file
> > > >
> > > > hadoop-0.1.1.jar
> > > >
> > > > If thats the case. Please download the latest nightly build.
> > > >
> > > > Cheers
> > > >
> > > >
> > > >
> > > > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > > > Do you have a file called "hadoop-site.xml" under your conf
> > directory?
> > > > > The content of the file is like the following:
> > > > >
> > > > > <?xml version="1.0"?>
> > > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > >
> > > > > <!-- Put site-specific property overrides in this file. -->
> > > > >
> > > > > <configuration>
> > > > >
> > > > > </configuration>
> > > > >
> > > > > or is it missing... if its missing please create a file under the
> > conf
> > > > > catalog with the name hadoop-site.xml and then try the hadoop dfs
> > -ls
> > > > > again?  you should see something! like listing from your local file
> > > > > system.
> > > > >
> > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > An: [hidden email]
> > > > > > > Betreff: Re: java.io.IOException: No input directories specified
> > in
> > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > >
> > > > > > > bin/hadoop dfs -ls
> > > > > > >
> > > > > > > Can you see your "seeds" directory?
> > > > > > >
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > 060421 122421 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > 1-dev.jar!/hadoop-default.xml
> > > > >
> > > > > I think the hadoop-site is missing cos we should be seeing a message
> > > > > like this here...
> > > > >
> > > > > 060421 131014 parsing
> > > > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > >
> > > > > > 060421 122421 No FS indicated, using default:local
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > >
> > > > > > 060421 122425 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > >
> > > > > > 060421 122426 No FS indicated, using default:local
> > > > > >
> > > > > > Found 0 items
> > > > > >
> > > > > > bash-3.00$
> > > > > >
> > > > > > As you can see, i can't.
> > > > > > What's going wrong?
> > > > > >
> > > > > > > bin/hadoop dfs -ls seeds
> > > > > > >
> > > > > > > Can you see your text file with URLS?
> > > > > > >
> > > > > > > Furthermore bin/nutch crawl is a one shot crawl/index command. I
> > > > > > > strongly recommend you take the long route of
> > > > > > >
> > > > > > > inject, generate, fetch, updatedb, invertlinks, index, dedup and
> > > > > > > merge.  You can try the above commands just by typing
> > > > > > > bin/nutch inject
> > > > > > > etc..
> > > > > > > If just try the inject command without any parameters it will
> > tell
> > > > you
> > > > > > > how to use it..
> > > > > > >
> > > > > > > Hope this helps.
> > > > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > > hi
> > > > > > > >
> > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > done the following steps:
> > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > >
> > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > >
> > > > > > > > 060317 121440 parsing
> > > > > > > >
> > > > > > >
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > 060317 121441 No FS indicated, using default:local
> > > > > > > >
> > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > > > > > > but in crawl.log:
> > > > > > > > 060419 124302 parsing
> > > > > > > >
> > > > > > >
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > 060419 124302 parsing
> > > > > > > >
> > > > > > >
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > 060419 124302 parsing
> > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > 060419 124302 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > java.io.IOException: No input directories specified in:
> > > > Configuration:
> > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > hadoop-site.xml
> > > > > > > >     at
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > :84)
> > > > > > > >     at
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > :94)
> > > > > > > >     at
> > > > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > > > >     at
> > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > >     at
> > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > >
> > > > > > > > Any ideas?
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > >
> > > > >
> > > >
> > >
> > > --
> >
> > --
> > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> >
>
> --
> Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
>
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
Try the following in your hadoop-site.xml.. please change and adjust
based on your ip address. The following configuration assumes that the
you have 1 server and you are using it as a namenode as well as a
datanode. Note this is NOT the reason for running Hadoopified Nutch!
It is rather for testing....

--------------------

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<!-- file system properties -->

<property>
  <name>fs.default.name</name>
  <value>127.0.0.1:50000</value>
  <description>The name of the default file system.  Either the
  literal string "local" or a host:port for DFS.</description>
</property>

<property>
  <name>dfs.datanode.port</name>
  <value>50010</value>
  <description>The port number that the dfs datanode server uses as a starting
               point to look for a free port to listen on.
</description>
</property>

<property>
  <name>dfs.name.dir</name>
  <value>/tmp/hadoop/dfs/name</value>
  <description>Determines where on the local filesystem the DFS name node
      should store the name table.</description>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/tmp/hadoop/dfs/data</value>
  <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma- or space-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices.</description>
</property>

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>How many copies we try to have at all times. The actual
  number of replications is at max the number of datanodes in the
  cluster.</description>
</property>
<!-- map/reduce properties -->

<property>
  <name>mapred.job.tracker</name>
  <value>127.0.0.1:50020</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

<property>
  <name>mapred.job.tracker.info.port</name>
  <value>50030</value>
  <description>The port that the MapReduce job tracker info webserver runs at.
  </description>
</property>

<property>
  <name>mapred.task.tracker.output.port</name>
  <value>50040</value>
  <description>The port number that the MapReduce task tracker output
server uses as a starting point to look for
a free port to listen on.
  </description>
</property>

<property>
  <name>mapred.task.tracker.report.port</name>
  <value>50050</value>
  <description>The port number that the MapReduce task tracker report
server uses as a starting
               point to look for a free port to listen on.
  </description>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>/tmp/hadoop/mapred/local</value>
  <description>The local directory where MapReduce stores intermediate
  data files.  May be a space- or comma- separated list of
  directories on different devices in order to spread disk i/o.
  </description>
</property>

<property>
  <name>mapred.system.dir</name>
  <value>/tmp/hadoop/mapred/system</value>
  <description>The shared directory where MapReduce stores control files.
  </description>
</property>

<property>
  <name>mapred.temp.dir</name>
  <value>/tmp/hadoop/mapred/temp</value>
  <description>A shared directory for temporary files.
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>1</value>
  <description>The default number of reduce tasks per job.  Typically set
  to a prime close to the number of available hosts.  Ignored when
  mapred.job.tracker is "local".
  </description>
</property>

<property>
  <name>mapred.tasktracker.tasks.maximum</name>
  <value>2</value>
  <value>/tmp/hadoop/mapred/temp</value>
  <description>A shared directory for temporary files.
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>1</value>
  <description>The default number of reduce tasks per job.  Typically set
  to a prime close to the number of available hosts.  Ignored when
  mapred.job.tracker is "local".
  </description>
</property>

<property>
  <name>mapred.tasktracker.tasks.maximum</name>
  <value>2</value>
  <description>The maximum number of tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

</configuration>

------

Then execute the following commands
- initialize the HDFS
bin/hadoop namenode -format
- Start the namenode/datanode
bin/hadoop-daemon.sh start namenode
bin/hadoop-daemon.sh start datanode
- Lets do some checking...
bin/hadoop dfs -ls

Should return 0 items!! So lets try to add a file to the DFS

bin/hadoop dfs -put xyz.html xyz.html

Try

bin/hadoop dfs -ls

You should see one item which is
Found 1 items
/user/root/xyz.html    21433

bin/hadoop-daemon.sh start jobtracker
bin/hadoop-daemon.sh start tasktracker

Now you can start of with inject, generate etc.. etc..

Hope this time it works for you..

Cheers


On 4/24/06, Zaheed Haque <[hidden email]> wrote:

> On 4/24/06, Peter Swoboda <[hidden email]> wrote:
> > I forgot to have a look at the log files:
> > namenode:
> > 060424 121444 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060424 121444 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > Exception in thread "main" java.lang.RuntimeException: Not a host:port pair:
> > local
> >         at org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> >         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> >         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> >
> >
> > datanode
> > 060424 121448 10 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060424 121448 10 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060424 121448 10 Can't start DataNode in non-directory: /tmp/hadoop/dfs/data
> >
> > jobtracker
> > 060424 121455 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060424 121455 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060424 121455 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060424 121456 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060424 121456 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > Exception in thread "main" java.lang.RuntimeException: Bad
> > mapred.job.tracker: local
> >         at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> >         at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> >         at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> >         at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> >
> >
> > tasktracker
> > 060424 121502 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060424 121503 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > Exception in thread "main" java.lang.RuntimeException: Bad
> > mapred.job.tracker: local
> >         at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> >         at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> >         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> >
> >
> > What can be the problem?
> > > --- Ursprüngliche Nachricht ---
> > > Von: "Peter Swoboda" <[hidden email]>
> > > An: [hidden email]
> > > Betreff: Re: java.io.IOException: No input directories specified in
> > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > >
> > > Got the latest nutch-nightly built,
> > > including hadoop-0.1.1.jar.
> > > Copied the content of the daoop-default.xml into hadoop-site.xml.
> > > started namenode, datanode, jobtracker, tasktracker.
> > > made
> > > bin/hadoop dfs -put seeds seeds
> > >
> > > result:
> > >
> > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > starting namenode, logging to
> > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > >
> > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > starting datanode, logging to
> > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > >
> > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > starting jobtracker, logging to
> > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > >
> > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > starting tasktracker, logging to
> > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > >
> > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > 060424 121512 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060424 121512 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060424 121513 No FS indicated, using default:local
> > >
> > > bash-3.00$ bin/hadoop dfs -ls
> > > 060424 121543 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060424 121543 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060424 121544 No FS indicated, using default:local
> > > Found 18 items
> > > /home/../nutch-nightly/docs      <dir>
> > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > /home/../nutch-nightly/webapps   <dir>
> > > /home/../nutch-nightly/CHANGES.txt       17709
> > > /home/../nutch-nightly/build.xml 21433
> > > /home/../nutch-nightly/LICENSE.txt       615
> > > /home/../nutch-nightly/test.log  3447
> > > /home/../nutch-nightly/conf      <dir>
> > > /home/../nutch-nightly/default.properties        3043
> > > /home/../nutch-nightly/plugins   <dir>
> > > /home/../nutch-nightly/lib       <dir>
> > > /home/../nutch-nightly/bin       <dir>
> > > /home/../nutch-nightly/logs      <dir>
> > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > /home/../nutch-nightly/src       <dir>
> > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > /home/../nutch-nightly/seeds     <dir>
> > > /home/../nutch-nightly/README.txt        403
> > >
> > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > 060424 121603 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060424 121603 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060424 121603 No FS indicated, using default:local
> > > Found 2 items
> > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > /home/../nutch-nightly/seeds/urls.txt    26
> > >
> > > so far so good, but:
> > >
> > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > > 060424 121613 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > > 060424 121613 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > 060424 121613 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > > 060424 121613 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060424 121614 crawl started in: crawled
> > > 060424 121614 rootUrlDir = 2
> > > 060424 121614 threads = 10
> > > 060424 121614 depth = 5
> > > Exception in thread "main" java.io.IOException: No valid local directories
> > > in property: mapred.local.dir
> > >         at
> > > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > >         at org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > bash-3.00$
> > >
> > > I really don't know what to do.
> > > in hadoop-site.xml it's:
> > > ..
> > > <property>
> > >   <name>mapred.local.dir</name>
> > >   <value>/tmp/hadoop/mapred/local</value>
> > >   <description>The local directory where MapReduce stores intermediate
> > >   data files.  May be a space- or comma- separated list of
> > >   directories on different devices in order to spread disk i/o.
> > >   </description>
> > > </property>
> > > ..
> > >
> > >
> > >
> > >
> > > _______________________________________
> > > Is your hadoop-site.xml empty, I mean it doesn't consisit any
> > > configuration correct? So what you need to do is add your
> > > configuration there. I suggest you copy the hadoop-0.1.1.jar to
> > > another directory for inspection, copy not move. unzip the
> > > hadoop-0.1.1.jar file you will see hadoop-default.xml file there. use
> > > that as a template to edit your hadoop-site.xml under conf. Once you
> > > have edited it then you should start your 'namenode' and 'datanode'. I
> > > am guessing you are using nutch in a distributed way. cos you don't
> > > need to use hadoop if you are just running in one machine local mode!!
> > >
> > > Anyway you need to do the following to start the datanode and namenode
> > >
> > > bin/hadoop-daemon.sh start namenode
> > > bin/hadoop-daemon.sh start datanode
> > >
> > > then you need to start jobtracker and tasktracker before you start
> > > crawling
> > > bin/hadoop-daemon.sh start jobtracker
> > > bin/hadoop-daemon.sh start tasktracker
> > >
> > > then you start your bin/hadoop dfs -put seeds seeds
> > >
> > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > ok. changed to latest nightly build.
> > > > hadoop-0.1.1.jar is existing,
> > > > hadoop-site.xml also.
> > > > now trying
> > > >
> > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > >
> > > > 060421 125154 parsing
> > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > 0.1.1.jar!/hadoop-default.xml
> > > > 060421 125155 parsing
> > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> > > > 060421 125155 No FS indicated, using default:local
> > > >
> > > > and
> > > >
> > > > bash-3.00$ bin/hadoop dfs -ls
> > > >
> > > > 060421 125217 parsing
> > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > 0.1.1.jar!/hadoop-default.xml
> > > > 060421 125217 parsing
> > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> > > > 060421 125217 No FS indicated, using default:local
> > > > Found 16 items
> > > > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
> > > > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
> > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
> > > > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > > > /home/stud/jung/Desktop/nutch-nightly/default.properties        3043
> > > > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > > > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > > > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
> > > > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
> > > > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > > > /home/stud/jung/Desktop/nutch-nightly/README.txt        403
> > > >
> > > > also:
> > > >
> > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > >
> > > > 060421 133004 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060421 133004 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060421 133004 No FS indicated, using default:local
> > > > Found 2 items
> > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > bash-3.00$
> > > >
> > > > but:
> > > >
> > > > but:
> > > >
> > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > >
> > > > 060421 131722 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > 060421 131723 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > 060421 131723 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > 060421 131723 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060421 131723 crawl started in: crawled
> > > > 060421 131723 rootUrlDir = 2
> > > > 060421 131723 threads = 10
> > > > 060421 131723 depth = 5
> > > > 060421 131724 Injector: starting
> > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > 060421 131724 Injector: urlDir: 2
> > > > 060421 131724 Injector: Converting injected urls to crawl db entries.
> > > > 060421 131724 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060421 131724 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > 060421 131724 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > 060421 131724 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060421 131724 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060421 131725 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > 060421 131725 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060421 131725 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > 060421 131726 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > 060421 131726 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060421 131726 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060421 131726 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060421 131727 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060421 131727 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060421 131727 parsing
> > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060421 131727 job_6jn7j8
> > > > java.io.IOException: No input directories specified in: Configuration:
> > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > hadoop-site.xml
> > > >         at
> > > > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > :90)
> > > >         at
> > > > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > :100)
> > > >         at
> > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > 060421 131728 Running job: job_6jn7j8
> > > > Exception in thread "main" java.io.IOException: Job failed!
> > > >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > bash-3.00$
> > > >
> > > > Can anyone help?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > --- Ursprüngliche Nachricht ---
> > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > An: [hidden email]
> > > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > >
> > > > > Also I have noticed that you are using hadoop-0.1, there was a bug in
> > > > > 0.1 you should be using 0.1.1. Under you lib catalog you should have
> > > > > the following file
> > > > >
> > > > > hadoop-0.1.1.jar
> > > > >
> > > > > If thats the case. Please download the latest nightly build.
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > >
> > > > > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > Do you have a file called "hadoop-site.xml" under your conf
> > > directory?
> > > > > > The content of the file is like the following:
> > > > > >
> > > > > > <?xml version="1.0"?>
> > > > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > > >
> > > > > > <!-- Put site-specific property overrides in this file. -->
> > > > > >
> > > > > > <configuration>
> > > > > >
> > > > > > </configuration>
> > > > > >
> > > > > > or is it missing... if its missing please create a file under the
> > > conf
> > > > > > catalog with the name hadoop-site.xml and then try the hadoop dfs
> > > -ls
> > > > > > again?  you should see something! like listing from your local file
> > > > > > system.
> > > > > >
> > > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > An: [hidden email]
> > > > > > > > Betreff: Re: java.io.IOException: No input directories specified
> > > in
> > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > >
> > > > > > > > bin/hadoop dfs -ls
> > > > > > > >
> > > > > > > > Can you see your "seeds" directory?
> > > > > > > >
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > 060421 122421 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > >
> > > > > > I think the hadoop-site is missing cos we should be seeing a message
> > > > > > like this here...
> > > > > >
> > > > > > 060421 131014 parsing
> > > > > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > >
> > > > > > > 060421 122421 No FS indicated, using default:local
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > >
> > > > > > > 060421 122425 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > >
> > > > > > > 060421 122426 No FS indicated, using default:local
> > > > > > >
> > > > > > > Found 0 items
> > > > > > >
> > > > > > > bash-3.00$
> > > > > > >
> > > > > > > As you can see, i can't.
> > > > > > > What's going wrong?
> > > > > > >
> > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > >
> > > > > > > > Can you see your text file with URLS?
> > > > > > > >
> > > > > > > > Furthermore bin/nutch crawl is a one shot crawl/index command. I
> > > > > > > > strongly recommend you take the long route of
> > > > > > > >
> > > > > > > > inject, generate, fetch, updatedb, invertlinks, index, dedup and
> > > > > > > > merge.  You can try the above commands just by typing
> > > > > > > > bin/nutch inject
> > > > > > > > etc..
> > > > > > > > If just try the inject command without any parameters it will
> > > tell
> > > > > you
> > > > > > > > how to use it..
> > > > > > > >
> > > > > > > > Hope this helps.
> > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > > > hi
> > > > > > > > >
> > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > done the following steps:
> > > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > > >
> > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > >
> > > > > > > > > 060317 121440 parsing
> > > > > > > > >
> > > > > > > >
> > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > 060317 121441 No FS indicated, using default:local
> > > > > > > > >
> > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > > > > > > > but in crawl.log:
> > > > > > > > > 060419 124302 parsing
> > > > > > > > >
> > > > > > > >
> > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > 060419 124302 parsing
> > > > > > > > >
> > > > > > > >
> > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > 060419 124302 parsing
> > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > 060419 124302 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > java.io.IOException: No input directories specified in:
> > > > > Configuration:
> > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > hadoop-site.xml
> > > > > > > > >     at
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > :84)
> > > > > > > > >     at
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > :94)
> > > > > > > > >     at
> > > > > > > > >
> > > > >
> > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > > > > >     at
> > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > >     at
> > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > >
> > > > > > > > > Any ideas?
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > --
> > >
> > > --
> > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > >
> >
> > --
> > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
Seems to be a bit better, doesn't it?

bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
060425 110124 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060425 110124 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060425 110124 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060425 110124 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060425 110125 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060425 110125 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060425 110125 Client connection to 127.0.0.1:50000: starting
060425 110125 crawl started in: crawled
060425 110125 rootUrlDir = 2
060425 110125 threads = 10
060425 110125 depth = 5
060425 110126 Injector: starting
060425 110126 Injector: crawlDb: crawled/crawldb
060425 110126 Injector: urlDir: 2
060425 110126 Injector: Converting injected urls to crawl db entries.
060425 110126 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060425 110126 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060425 110126 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060425 110126 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060425 110126 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060425 110126 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060425 110127 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060425 110127 Client connection to 127.0.0.1:50020: starting
060425 110127 Client connection to 127.0.0.1:50000: starting
060425 110127 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060425 110127 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown Source)
        at
org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
Caused by: java.io.IOException: timed out waiting for response
        at org.apache.hadoop.ipc.Client.call(Client.java:303)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
        ... 6 more


local ip is the same,
but don't exactly know how to handle the ports.

Step by Step (generate, index..) caused same error while
 bin/nutch generate crawl/crawldb crawl/segments

> --- Ursprüngliche Nachricht ---
> Von: "Zaheed Haque" <[hidden email]>
> An: [hidden email]
> Betreff: Re: java.io.IOException: No input directories specified in
> Datum: Mon, 24 Apr 2006 13:39:10 +0200
>
> Try the following in your hadoop-site.xml.. please change and adjust
> based on your ip address. The following configuration assumes that the
> you have 1 server and you are using it as a namenode as well as a
> datanode. Note this is NOT the reason for running Hadoopified Nutch!
> It is rather for testing....
>
> --------------------
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> <!-- file system properties -->
>
> <property>
>   <name>fs.default.name</name>
>   <value>127.0.0.1:50000</value>
>   <description>The name of the default file system.  Either the
>   literal string "local" or a host:port for DFS.</description>
> </property>
>
> <property>
>   <name>dfs.datanode.port</name>
>   <value>50010</value>
>   <description>The port number that the dfs datanode server uses as a
> starting
>                point to look for a free port to listen on.
> </description>
> </property>
>
> <property>
>   <name>dfs.name.dir</name>
>   <value>/tmp/hadoop/dfs/name</value>
>   <description>Determines where on the local filesystem the DFS name node
>       should store the name table.</description>
> </property>
>
> <property>
>   <name>dfs.data.dir</name>
>   <value>/tmp/hadoop/dfs/data</value>
>   <description>Determines where on the local filesystem an DFS data node
>   should store its blocks.  If this is a comma- or space-delimited
>   list of directories, then data will be stored in all named
>   directories, typically on different devices.</description>
> </property>
>
> <property>
>   <name>dfs.replication</name>
>   <value>1</value>
>   <description>How many copies we try to have at all times. The actual
>   number of replications is at max the number of datanodes in the
>   cluster.</description>
> </property>
> <!-- map/reduce properties -->
>
> <property>
>   <name>mapred.job.tracker</name>
>   <value>127.0.0.1:50020</value>
>   <description>The host and port that the MapReduce job tracker runs
>   at.  If "local", then jobs are run in-process as a single map
>   and reduce task.
>   </description>
> </property>
>
> <property>
>   <name>mapred.job.tracker.info.port</name>
>   <value>50030</value>
>   <description>The port that the MapReduce job tracker info webserver runs
> at.
>   </description>
> </property>
>
> <property>
>   <name>mapred.task.tracker.output.port</name>
>   <value>50040</value>
>   <description>The port number that the MapReduce task tracker output
> server uses as a starting point to look for
> a free port to listen on.
>   </description>
> </property>
>
> <property>
>   <name>mapred.task.tracker.report.port</name>
>   <value>50050</value>
>   <description>The port number that the MapReduce task tracker report
> server uses as a starting
>                point to look for a free port to listen on.
>   </description>
> </property>
>
> <property>
>   <name>mapred.local.dir</name>
>   <value>/tmp/hadoop/mapred/local</value>
>   <description>The local directory where MapReduce stores intermediate
>   data files.  May be a space- or comma- separated list of
>   directories on different devices in order to spread disk i/o.
>   </description>
> </property>
>
> <property>
>   <name>mapred.system.dir</name>
>   <value>/tmp/hadoop/mapred/system</value>
>   <description>The shared directory where MapReduce stores control files.
>   </description>
> </property>
>
> <property>
>   <name>mapred.temp.dir</name>
>   <value>/tmp/hadoop/mapred/temp</value>
>   <description>A shared directory for temporary files.
>   </description>
> </property>
>
> <property>
>   <name>mapred.reduce.tasks</name>
>   <value>1</value>
>   <description>The default number of reduce tasks per job.  Typically set
>   to a prime close to the number of available hosts.  Ignored when
>   mapred.job.tracker is "local".
>   </description>
> </property>
>
> <property>
>   <name>mapred.tasktracker.tasks.maximum</name>
>   <value>2</value>
>   <value>/tmp/hadoop/mapred/temp</value>
>   <description>A shared directory for temporary files.
>   </description>
> </property>
>
> <property>
>   <name>mapred.reduce.tasks</name>
>   <value>1</value>
>   <description>The default number of reduce tasks per job.  Typically set
>   to a prime close to the number of available hosts.  Ignored when
>   mapred.job.tracker is "local".
>   </description>
> </property>
>
> <property>
>   <name>mapred.tasktracker.tasks.maximum</name>
>   <value>2</value>
>   <description>The maximum number of tasks that will be run
>   simultaneously by a task tracker.
>   </description>
> </property>
>
> </configuration>
>
> ------
>
> Then execute the following commands
> - initialize the HDFS
> bin/hadoop namenode -format
> - Start the namenode/datanode
> bin/hadoop-daemon.sh start namenode
> bin/hadoop-daemon.sh start datanode
> - Lets do some checking...
> bin/hadoop dfs -ls
>
> Should return 0 items!! So lets try to add a file to the DFS
>
> bin/hadoop dfs -put xyz.html xyz.html
>
> Try
>
> bin/hadoop dfs -ls
>
> You should see one item which is
> Found 1 items
> /user/root/xyz.html    21433
>
> bin/hadoop-daemon.sh start jobtracker
> bin/hadoop-daemon.sh start tasktracker
>
> Now you can start of with inject, generate etc.. etc..
>
> Hope this time it works for you..
>
> Cheers
>
>
> On 4/24/06, Zaheed Haque <[hidden email]> wrote:
> > On 4/24/06, Peter Swoboda <[hidden email]> wrote:
> > > I forgot to have a look at the log files:
> > > namenode:
> > > 060424 121444 parsing
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060424 121444 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > Exception in thread "main" java.lang.RuntimeException: Not a host:port
> pair:
> > > local
> > >         at
> org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > >         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > >         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > >
> > >
> > > datanode
> > > 060424 121448 10 parsing
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060424 121448 10 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060424 121448 10 Can't start DataNode in non-directory:
> /tmp/hadoop/dfs/data
> > >
> > > jobtracker
> > > 060424 121455 parsing
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060424 121455 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060424 121455 parsing
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060424 121456 parsing
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060424 121456 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > mapred.job.tracker: local
> > >         at
> org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > >         at
> org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > >         at
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > >         at
> org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > >
> > >
> > > tasktracker
> > > 060424 121502 parsing
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060424 121503 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > mapred.job.tracker: local
> > >         at
> org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > >         at
> org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > >         at
> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > >
> > >
> > > What can be the problem?
> > > > --- Ursprüngliche Nachricht ---
> > > > Von: "Peter Swoboda" <[hidden email]>
> > > > An: [hidden email]
> > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > >
> > > > Got the latest nutch-nightly built,
> > > > including hadoop-0.1.1.jar.
> > > > Copied the content of the daoop-default.xml into hadoop-site.xml.
> > > > started namenode, datanode, jobtracker, tasktracker.
> > > > made
> > > > bin/hadoop dfs -put seeds seeds
> > > >
> > > > result:
> > > >
> > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > starting namenode, logging to
> > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > >
> > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > starting datanode, logging to
> > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > >
> > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > starting jobtracker, logging to
> > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > >
> > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > starting tasktracker, logging to
> > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > >
> > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > 060424 121512 parsing
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060424 121512 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060424 121513 No FS indicated, using default:local
> > > >
> > > > bash-3.00$ bin/hadoop dfs -ls
> > > > 060424 121543 parsing
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060424 121543 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060424 121544 No FS indicated, using default:local
> > > > Found 18 items
> > > > /home/../nutch-nightly/docs      <dir>
> > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > /home/../nutch-nightly/webapps   <dir>
> > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > /home/../nutch-nightly/build.xml 21433
> > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > /home/../nutch-nightly/test.log  3447
> > > > /home/../nutch-nightly/conf      <dir>
> > > > /home/../nutch-nightly/default.properties        3043
> > > > /home/../nutch-nightly/plugins   <dir>
> > > > /home/../nutch-nightly/lib       <dir>
> > > > /home/../nutch-nightly/bin       <dir>
> > > > /home/../nutch-nightly/logs      <dir>
> > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > /home/../nutch-nightly/src       <dir>
> > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > /home/../nutch-nightly/seeds     <dir>
> > > > /home/../nutch-nightly/README.txt        403
> > > >
> > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > 060424 121603 parsing
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060424 121603 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060424 121603 No FS indicated, using default:local
> > > > Found 2 items
> > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > >
> > > > so far so good, but:
> > > >
> > > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > > > 060424 121613 parsing
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060424 121613 parsing
> file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > 060424 121613 parsing
> file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > 060424 121613 parsing
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060424 121613 parsing
> file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > 060424 121613 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060424 121614 crawl started in: crawled
> > > > 060424 121614 rootUrlDir = 2
> > > > 060424 121614 threads = 10
> > > > 060424 121614 depth = 5
> > > > Exception in thread "main" java.io.IOException: No valid local
> directories
> > > > in property: mapred.local.dir
> > > >         at
> > > > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > >         at
> org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > bash-3.00$
> > > >
> > > > I really don't know what to do.
> > > > in hadoop-site.xml it's:
> > > > ..
> > > > <property>
> > > >   <name>mapred.local.dir</name>
> > > >   <value>/tmp/hadoop/mapred/local</value>
> > > >   <description>The local directory where MapReduce stores
> intermediate
> > > >   data files.  May be a space- or comma- separated list of
> > > >   directories on different devices in order to spread disk i/o.
> > > >   </description>
> > > > </property>
> > > > ..
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________
> > > > Is your hadoop-site.xml empty, I mean it doesn't consisit any
> > > > configuration correct? So what you need to do is add your
> > > > configuration there. I suggest you copy the hadoop-0.1.1.jar to
> > > > another directory for inspection, copy not move. unzip the
> > > > hadoop-0.1.1.jar file you will see hadoop-default.xml file there.
> use
> > > > that as a template to edit your hadoop-site.xml under conf. Once you
> > > > have edited it then you should start your 'namenode' and 'datanode'.
> I
> > > > am guessing you are using nutch in a distributed way. cos you don't
> > > > need to use hadoop if you are just running in one machine local
> mode!!
> > > >
> > > > Anyway you need to do the following to start the datanode and
> namenode
> > > >
> > > > bin/hadoop-daemon.sh start namenode
> > > > bin/hadoop-daemon.sh start datanode
> > > >
> > > > then you need to start jobtracker and tasktracker before you start
> > > > crawling
> > > > bin/hadoop-daemon.sh start jobtracker
> > > > bin/hadoop-daemon.sh start tasktracker
> > > >
> > > > then you start your bin/hadoop dfs -put seeds seeds
> > > >
> > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > ok. changed to latest nightly build.
> > > > > hadoop-0.1.1.jar is existing,
> > > > > hadoop-site.xml also.
> > > > > now trying
> > > > >
> > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > >
> > > > > 060421 125154 parsing
> > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > 060421 125155 parsing
> > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> > > > > 060421 125155 No FS indicated, using default:local
> > > > >
> > > > > and
> > > > >
> > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > >
> > > > > 060421 125217 parsing
> > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > 060421 125217 parsing
> > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> > > > > 060421 125217 No FS indicated, using default:local
> > > > > Found 16 items
> > > > > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
> > > > > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
> > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
> > > > > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > > > > /home/stud/jung/Desktop/nutch-nightly/default.properties      
> 3043
> > > > > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > > > > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > > > > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
> > > > > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
> > > > > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt        403
> > > > >
> > > > > also:
> > > > >
> > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > >
> > > > > 060421 133004 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060421 133004 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060421 133004 No FS indicated, using default:local
> > > > > Found 2 items
> > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > bash-3.00$
> > > > >
> > > > > but:
> > > > >
> > > > > but:
> > > > >
> > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > >
> > > > > 060421 131722 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060421 131723 parsing
> file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > 060421 131723 parsing
> file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > 060421 131723 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060421 131723 parsing
> file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > 060421 131723 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060421 131723 crawl started in: crawled
> > > > > 060421 131723 rootUrlDir = 2
> > > > > 060421 131723 threads = 10
> > > > > 060421 131723 depth = 5
> > > > > 060421 131724 Injector: starting
> > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > 060421 131724 Injector: urlDir: 2
> > > > > 060421 131724 Injector: Converting injected urls to crawl db
> entries.
> > > > > 060421 131724 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060421 131724 parsing
> file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > 060421 131724 parsing
> file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > 060421 131724 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060421 131724 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060421 131725 parsing
> file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > 060421 131725 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060421 131725 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060421 131726 parsing
> file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > 060421 131726 parsing
> file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > 060421 131726 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060421 131726 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060421 131726 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060421 131726 parsing
> file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > 060421 131727 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060421 131727 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060421 131727 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060421 131727 parsing
> > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > 060421 131727 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060421 131727 job_6jn7j8
> > > > > java.io.IOException: No input directories specified in:
> Configuration:
> > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > hadoop-site.xml
> > > > >         at
> > > > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > :90)
> > > > >         at
> > > > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > :100)
> > > > >         at
> > > > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > 060421 131728 Running job: job_6jn7j8
> > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > >         at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > >         at
> org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > bash-3.00$
> > > > >
> > > > > Can anyone help?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > --- Ursprüngliche Nachricht ---
> > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > An: [hidden email]
> > > > > > Betreff: Re: java.io.IOException: No input directories specified
> in
> > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > >
> > > > > > Also I have noticed that you are using hadoop-0.1, there was a
> bug in
> > > > > > 0.1 you should be using 0.1.1. Under you lib catalog you should
> have
> > > > > > the following file
> > > > > >
> > > > > > hadoop-0.1.1.jar
> > > > > >
> > > > > > If thats the case. Please download the latest nightly build.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > Do you have a file called "hadoop-site.xml" under your conf
> > > > directory?
> > > > > > > The content of the file is like the following:
> > > > > > >
> > > > > > > <?xml version="1.0"?>
> > > > > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > > > >
> > > > > > > <!-- Put site-specific property overrides in this file. -->
> > > > > > >
> > > > > > > <configuration>
> > > > > > >
> > > > > > > </configuration>
> > > > > > >
> > > > > > > or is it missing... if its missing please create a file under
> the
> > > > conf
> > > > > > > catalog with the name hadoop-site.xml and then try the hadoop
> dfs
> > > > -ls
> > > > > > > again?  you should see something! like listing from your local
> file
> > > > > > > system.
> > > > > > >
> > > > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > An: [hidden email]
> > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> specified
> > > > in
> > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > >
> > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > >
> > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > >
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > 060421 122421 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > >
> > > > > > > I think the hadoop-site is missing cos we should be seeing a
> message
> > > > > > > like this here...
> > > > > > >
> > > > > > > 060421 131014 parsing
> > > > > > >
> file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > >
> > > > > > > > 060421 122421 No FS indicated, using default:local
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > >
> > > > > > > > 060421 122425 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > >
> > > > > > > > 060421 122426 No FS indicated, using default:local
> > > > > > > >
> > > > > > > > Found 0 items
> > > > > > > >
> > > > > > > > bash-3.00$
> > > > > > > >
> > > > > > > > As you can see, i can't.
> > > > > > > > What's going wrong?
> > > > > > > >
> > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > >
> > > > > > > > > Can you see your text file with URLS?
> > > > > > > > >
> > > > > > > > > Furthermore bin/nutch crawl is a one shot crawl/index
> command. I
> > > > > > > > > strongly recommend you take the long route of
> > > > > > > > >
> > > > > > > > > inject, generate, fetch, updatedb, invertlinks, index,
> dedup and
> > > > > > > > > merge.  You can try the above commands just by typing
> > > > > > > > > bin/nutch inject
> > > > > > > > > etc..
> > > > > > > > > If just try the inject command without any parameters it
> will
> > > > tell
> > > > > > you
> > > > > > > > > how to use it..
> > > > > > > > >
> > > > > > > > > Hope this helps.
> > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> wrote:
> > > > > > > > > > hi
> > > > > > > > > >
> > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > done the following steps:
> > > > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > > > >
> > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > >
> > > > > > > > > > 060317 121440 parsing
> > > > > > > > > >
> > > > > > > > >
> > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > 060317 121441 No FS indicated, using default:local
> > > > > > > > > >
> > > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > > > > > > > > but in crawl.log:
> > > > > > > > > > 060419 124302 parsing
> > > > > > > > > >
> > > > > > > > >
> > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > 060419 124302 parsing
> > > > > > > > > >
> > > > > > > > >
> > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > 060419 124302 parsing
> > > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > 060419 124302 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > java.io.IOException: No input directories specified in:
> > > > > > Configuration:
> > > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > > >
> /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > hadoop-site.xml
> > > > > > > > > >     at
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > :84)
> > > > > > > > > >     at
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > :94)
> > > > > > > > > >     at
> > > > > > > > > >
> > > > > >
> > > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > > Exception in thread "main" java.io.IOException: Job
> failed!
> > > > > > > > > >     at
> > > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > > >     at
> > > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > >
> > > > > > > > > > Any ideas?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > >
> > > > --
> > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > >
> > >
> > > --
> > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > >
> >
>

--
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml

Somehow you are still using hadoop-0.1 and not 0.1.1. I am not sure if
this update will solve your problem but it might. With the config I
sent you, I could, crawl-index-serach so there must be something
else.. I am not sure.

Cheers
Zaheed

On 4/25/06, Peter Swoboda <[hidden email]> wrote:

> Seems to be a bit better, doesn't it?
>
> bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> 060425 110124 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> 060425 110124 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060425 110124 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060425 110124 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> 060425 110125 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060425 110125 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060425 110125 Client connection to 127.0.0.1:50000: starting
> 060425 110125 crawl started in: crawled
> 060425 110125 rootUrlDir = 2
> 060425 110125 threads = 10
> 060425 110125 depth = 5
> 060425 110126 Injector: starting
> 060425 110126 Injector: crawlDb: crawled/crawldb
> 060425 110126 Injector: urlDir: 2
> 060425 110126 Injector: Converting injected urls to crawl db entries.
> 060425 110126 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> 060425 110126 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060425 110126 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060425 110126 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> 060425 110126 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> 060425 110126 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060425 110127 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060425 110127 Client connection to 127.0.0.1:50020: starting
> 060425 110127 Client connection to 127.0.0.1:50000: starting
> 060425 110127 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> 060425 110127 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown Source)
>         at
> org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> Caused by: java.io.IOException: timed out waiting for response
>         at org.apache.hadoop.ipc.Client.call(Client.java:303)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
>         ... 6 more
>
>
> local ip is the same,
> but don't exactly know how to handle the ports.
>
> Step by Step (generate, index..) caused same error while
>  bin/nutch generate crawl/crawldb crawl/segments
>
> > --- Ursprüngliche Nachricht ---
> > Von: "Zaheed Haque" <[hidden email]>
> > An: [hidden email]
> > Betreff: Re: java.io.IOException: No input directories specified in
> > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> >
> > Try the following in your hadoop-site.xml.. please change and adjust
> > based on your ip address. The following configuration assumes that the
> > you have 1 server and you are using it as a namenode as well as a
> > datanode. Note this is NOT the reason for running Hadoopified Nutch!
> > It is rather for testing....
> >
> > --------------------
> >
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >
> > <configuration>
> >
> > <!-- file system properties -->
> >
> > <property>
> >   <name>fs.default.name</name>
> >   <value>127.0.0.1:50000</value>
> >   <description>The name of the default file system.  Either the
> >   literal string "local" or a host:port for DFS.</description>
> > </property>
> >
> > <property>
> >   <name>dfs.datanode.port</name>
> >   <value>50010</value>
> >   <description>The port number that the dfs datanode server uses as a
> > starting
> >                point to look for a free port to listen on.
> > </description>
> > </property>
> >
> > <property>
> >   <name>dfs.name.dir</name>
> >   <value>/tmp/hadoop/dfs/name</value>
> >   <description>Determines where on the local filesystem the DFS name node
> >       should store the name table.</description>
> > </property>
> >
> > <property>
> >   <name>dfs.data.dir</name>
> >   <value>/tmp/hadoop/dfs/data</value>
> >   <description>Determines where on the local filesystem an DFS data node
> >   should store its blocks.  If this is a comma- or space-delimited
> >   list of directories, then data will be stored in all named
> >   directories, typically on different devices.</description>
> > </property>
> >
> > <property>
> >   <name>dfs.replication</name>
> >   <value>1</value>
> >   <description>How many copies we try to have at all times. The actual
> >   number of replications is at max the number of datanodes in the
> >   cluster.</description>
> > </property>
> > <!-- map/reduce properties -->
> >
> > <property>
> >   <name>mapred.job.tracker</name>
> >   <value>127.0.0.1:50020</value>
> >   <description>The host and port that the MapReduce job tracker runs
> >   at.  If "local", then jobs are run in-process as a single map
> >   and reduce task.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapred.job.tracker.info.port</name>
> >   <value>50030</value>
> >   <description>The port that the MapReduce job tracker info webserver runs
> > at.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapred.task.tracker.output.port</name>
> >   <value>50040</value>
> >   <description>The port number that the MapReduce task tracker output
> > server uses as a starting point to look for
> > a free port to listen on.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapred.task.tracker.report.port</name>
> >   <value>50050</value>
> >   <description>The port number that the MapReduce task tracker report
> > server uses as a starting
> >                point to look for a free port to listen on.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapred.local.dir</name>
> >   <value>/tmp/hadoop/mapred/local</value>
> >   <description>The local directory where MapReduce stores intermediate
> >   data files.  May be a space- or comma- separated list of
> >   directories on different devices in order to spread disk i/o.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapred.system.dir</name>
> >   <value>/tmp/hadoop/mapred/system</value>
> >   <description>The shared directory where MapReduce stores control files.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapred.temp.dir</name>
> >   <value>/tmp/hadoop/mapred/temp</value>
> >   <description>A shared directory for temporary files.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapred.reduce.tasks</name>
> >   <value>1</value>
> >   <description>The default number of reduce tasks per job.  Typically set
> >   to a prime close to the number of available hosts.  Ignored when
> >   mapred.job.tracker is "local".
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapred.tasktracker.tasks.maximum</name>
> >   <value>2</value>
> >   <value>/tmp/hadoop/mapred/temp</value>
> >   <description>A shared directory for temporary files.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapred.reduce.tasks</name>
> >   <value>1</value>
> >   <description>The default number of reduce tasks per job.  Typically set
> >   to a prime close to the number of available hosts.  Ignored when
> >   mapred.job.tracker is "local".
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapred.tasktracker.tasks.maximum</name>
> >   <value>2</value>
> >   <description>The maximum number of tasks that will be run
> >   simultaneously by a task tracker.
> >   </description>
> > </property>
> >
> > </configuration>
> >
> > ------
> >
> > Then execute the following commands
> > - initialize the HDFS
> > bin/hadoop namenode -format
> > - Start the namenode/datanode
> > bin/hadoop-daemon.sh start namenode
> > bin/hadoop-daemon.sh start datanode
> > - Lets do some checking...
> > bin/hadoop dfs -ls
> >
> > Should return 0 items!! So lets try to add a file to the DFS
> >
> > bin/hadoop dfs -put xyz.html xyz.html
> >
> > Try
> >
> > bin/hadoop dfs -ls
> >
> > You should see one item which is
> > Found 1 items
> > /user/root/xyz.html    21433
> >
> > bin/hadoop-daemon.sh start jobtracker
> > bin/hadoop-daemon.sh start tasktracker
> >
> > Now you can start of with inject, generate etc.. etc..
> >
> > Hope this time it works for you..
> >
> > Cheers
> >
> >
> > On 4/24/06, Zaheed Haque <[hidden email]> wrote:
> > > On 4/24/06, Peter Swoboda <[hidden email]> wrote:
> > > > I forgot to have a look at the log files:
> > > > namenode:
> > > > 060424 121444 parsing
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060424 121444 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > Exception in thread "main" java.lang.RuntimeException: Not a host:port
> > pair:
> > > > local
> > > >         at
> > org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > >         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > >         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > >
> > > >
> > > > datanode
> > > > 060424 121448 10 parsing
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060424 121448 10 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060424 121448 10 Can't start DataNode in non-directory:
> > /tmp/hadoop/dfs/data
> > > >
> > > > jobtracker
> > > > 060424 121455 parsing
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060424 121455 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060424 121455 parsing
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060424 121456 parsing
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060424 121456 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > mapred.job.tracker: local
> > > >         at
> > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > >         at
> > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > >         at
> > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > >         at
> > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > >
> > > >
> > > > tasktracker
> > > > 060424 121502 parsing
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060424 121503 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > mapred.job.tracker: local
> > > >         at
> > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > >         at
> > org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > >         at
> > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > >
> > > >
> > > > What can be the problem?
> > > > > --- Ursprüngliche Nachricht ---
> > > > > Von: "Peter Swoboda" <[hidden email]>
> > > > > An: [hidden email]
> > > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > >
> > > > > Got the latest nutch-nightly built,
> > > > > including hadoop-0.1.1.jar.
> > > > > Copied the content of the daoop-default.xml into hadoop-site.xml.
> > > > > started namenode, datanode, jobtracker, tasktracker.
> > > > > made
> > > > > bin/hadoop dfs -put seeds seeds
> > > > >
> > > > > result:
> > > > >
> > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > starting namenode, logging to
> > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > >
> > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > starting datanode, logging to
> > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > >
> > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > > starting jobtracker, logging to
> > > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > >
> > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > > starting tasktracker, logging to
> > > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > >
> > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > 060424 121512 parsing
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121512 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060424 121513 No FS indicated, using default:local
> > > > >
> > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > 060424 121543 parsing
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121543 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060424 121544 No FS indicated, using default:local
> > > > > Found 18 items
> > > > > /home/../nutch-nightly/docs      <dir>
> > > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > > /home/../nutch-nightly/webapps   <dir>
> > > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > > /home/../nutch-nightly/build.xml 21433
> > > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > > /home/../nutch-nightly/test.log  3447
> > > > > /home/../nutch-nightly/conf      <dir>
> > > > > /home/../nutch-nightly/default.properties        3043
> > > > > /home/../nutch-nightly/plugins   <dir>
> > > > > /home/../nutch-nightly/lib       <dir>
> > > > > /home/../nutch-nightly/bin       <dir>
> > > > > /home/../nutch-nightly/logs      <dir>
> > > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > > /home/../nutch-nightly/src       <dir>
> > > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > > /home/../nutch-nightly/seeds     <dir>
> > > > > /home/../nutch-nightly/README.txt        403
> > > > >
> > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > 060424 121603 parsing
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121603 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060424 121603 No FS indicated, using default:local
> > > > > Found 2 items
> > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > >
> > > > > so far so good, but:
> > > > >
> > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > > > > 060424 121613 parsing
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121613 parsing
> > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > 060424 121613 parsing
> > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > 060424 121613 parsing
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060424 121613 parsing
> > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > 060424 121613 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060424 121614 crawl started in: crawled
> > > > > 060424 121614 rootUrlDir = 2
> > > > > 060424 121614 threads = 10
> > > > > 060424 121614 depth = 5
> > > > > Exception in thread "main" java.io.IOException: No valid local
> > directories
> > > > > in property: mapred.local.dir
> > > > >         at
> > > > > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > >         at
> > org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > bash-3.00$
> > > > >
> > > > > I really don't know what to do.
> > > > > in hadoop-site.xml it's:
> > > > > ..
> > > > > <property>
> > > > >   <name>mapred.local.dir</name>
> > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > >   <description>The local directory where MapReduce stores
> > intermediate
> > > > >   data files.  May be a space- or comma- separated list of
> > > > >   directories on different devices in order to spread disk i/o.
> > > > >   </description>
> > > > > </property>
> > > > > ..
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________
> > > > > Is your hadoop-site.xml empty, I mean it doesn't consisit any
> > > > > configuration correct? So what you need to do is add your
> > > > > configuration there. I suggest you copy the hadoop-0.1.1.jar to
> > > > > another directory for inspection, copy not move. unzip the
> > > > > hadoop-0.1.1.jar file you will see hadoop-default.xml file there.
> > use
> > > > > that as a template to edit your hadoop-site.xml under conf. Once you
> > > > > have edited it then you should start your 'namenode' and 'datanode'.
> > I
> > > > > am guessing you are using nutch in a distributed way. cos you don't
> > > > > need to use hadoop if you are just running in one machine local
> > mode!!
> > > > >
> > > > > Anyway you need to do the following to start the datanode and
> > namenode
> > > > >
> > > > > bin/hadoop-daemon.sh start namenode
> > > > > bin/hadoop-daemon.sh start datanode
> > > > >
> > > > > then you need to start jobtracker and tasktracker before you start
> > > > > crawling
> > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > bin/hadoop-daemon.sh start tasktracker
> > > > >
> > > > > then you start your bin/hadoop dfs -put seeds seeds
> > > > >
> > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > ok. changed to latest nightly build.
> > > > > > hadoop-0.1.1.jar is existing,
> > > > > > hadoop-site.xml also.
> > > > > > now trying
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > >
> > > > > > 060421 125154 parsing
> > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > 060421 125155 parsing
> > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> > > > > > 060421 125155 No FS indicated, using default:local
> > > > > >
> > > > > > and
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > >
> > > > > > 060421 125217 parsing
> > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > 060421 125217 parsing
> > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml
> > > > > > 060421 125217 No FS indicated, using default:local
> > > > > > Found 16 items
> > > > > > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036
> > > > > > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > > > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
> > > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
> > > > > > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > > > > > /home/stud/jung/Desktop/nutch-nightly/default.properties
> > 3043
> > > > > > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > > > > > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > > > > > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
> > > > > > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096
> > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt        403
> > > > > >
> > > > > > also:
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > >
> > > > > > 060421 133004 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060421 133004 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060421 133004 No FS indicated, using default:local
> > > > > > Found 2 items
> > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > bash-3.00$
> > > > > >
> > > > > > but:
> > > > > >
> > > > > > but:
> > > > > >
> > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > >
> > > > > > 060421 131722 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060421 131723 parsing
> > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > 060421 131723 parsing
> > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > 060421 131723 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060421 131723 parsing
> > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > 060421 131723 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060421 131723 crawl started in: crawled
> > > > > > 060421 131723 rootUrlDir = 2
> > > > > > 060421 131723 threads = 10
> > > > > > 060421 131723 depth = 5
> > > > > > 060421 131724 Injector: starting
> > > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > 060421 131724 Injector: Converting injected urls to crawl db
> > entries.
> > > > > > 060421 131724 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060421 131724 parsing
> > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > 060421 131724 parsing
> > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > 060421 131724 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060421 131724 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060421 131725 parsing
> > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > 060421 131725 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060421 131725 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060421 131726 parsing
> > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > 060421 131726 parsing
> > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > 060421 131726 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060421 131726 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060421 131726 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060421 131726 parsing
> > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > 060421 131727 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060421 131727 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060421 131727 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060421 131727 parsing
> > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > 060421 131727 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060421 131727 job_6jn7j8
> > > > > > java.io.IOException: No input directories specified in:
> > Configuration:
> > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > hadoop-site.xml
> > > > > >         at
> > > > > >
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > :90)
> > > > > >         at
> > > > > >
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > :100)
> > > > > >         at
> > > > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > >         at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > >         at
> > org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > bash-3.00$
> > > > > >
> > > > > > Can anyone help?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > An: [hidden email]
> > > > > > > Betreff: Re: java.io.IOException: No input directories specified
> > in
> > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > >
> > > > > > > Also I have noticed that you are using hadoop-0.1, there was a
> > bug in
> > > > > > > 0.1 you should be using 0.1.1. Under you lib catalog you should
> > have
> > > > > > > the following file
> > > > > > >
> > > > > > > hadoop-0.1.1.jar
> > > > > > >
> > > > > > > If thats the case. Please download the latest nightly build.
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > > Do you have a file called "hadoop-site.xml" under your conf
> > > > > directory?
> > > > > > > > The content of the file is like the following:
> > > > > > > >
> > > > > > > > <?xml version="1.0"?>
> > > > > > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > > > > >
> > > > > > > > <!-- Put site-specific property overrides in this file. -->
> > > > > > > >
> > > > > > > > <configuration>
> > > > > > > >
> > > > > > > > </configuration>
> > > > > > > >
> > > > > > > > or is it missing... if its missing please create a file under
> > the
> > > > > conf
> > > > > > > > catalog with the name hadoop-site.xml and then try the hadoop
> > dfs
> > > > > -ls
> > > > > > > > again?  you should see something! like listing from your local
> > file
> > > > > > > > system.
> > > > > > > >
> > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > An: [hidden email]
> > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > specified
> > > > > in
> > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > > >
> > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > >
> > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > 060421 122421 parsing
> > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > >
> > > > > > > > I think the hadoop-site is missing cos we should be seeing a
> > message
> > > > > > > > like this here...
> > > > > > > >
> > > > > > > > 060421 131014 parsing
> > > > > > > >
> > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > >
> > > > > > > > > 060421 122421 No FS indicated, using default:local
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > >
> > > > > > > > > 060421 122425 parsing
> > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > >
> > > > > > > > > 060421 122426 No FS indicated, using default:local
> > > > > > > > >
> > > > > > > > > Found 0 items
> > > > > > > > >
> > > > > > > > > bash-3.00$
> > > > > > > > >
> > > > > > > > > As you can see, i can't.
> > > > > > > > > What's going wrong?
> > > > > > > > >
> > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > >
> > > > > > > > > > Can you see your text file with URLS?
> > > > > > > > > >
> > > > > > > > > > Furthermore bin/nutch crawl is a one shot crawl/index
> > command. I
> > > > > > > > > > strongly recommend you take the long route of
> > > > > > > > > >
> > > > > > > > > > inject, generate, fetch, updatedb, invertlinks, index,
> > dedup and
> > > > > > > > > > merge.  You can try the above commands just by typing
> > > > > > > > > > bin/nutch inject
> > > > > > > > > > etc..
> > > > > > > > > > If just try the inject command without any parameters it
> > will
> > > > > tell
> > > > > > > you
> > > > > > > > > > how to use it..
> > > > > > > > > >
> > > > > > > > > > Hope this helps.
> > > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> > wrote:
> > > > > > > > > > > hi
> > > > > > > > > > >
> > > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > > done the following steps:
> > > > > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > > > > >
> > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > >
> > > > > > > > > > > 060317 121440 parsing
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > 060317 121441 No FS indicated, using default:local
> > > > > > > > > > >
> > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log
> > > > > > > > > > > but in crawl.log:
> > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > > 060419 124302 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > java.io.IOException: No input directories specified in:
> > > > > > > Configuration:
> > > > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > > > >
> > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > > hadoop-site.xml
> > > > > > > > > > >     at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > :84)
> > > > > > > > > > >     at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > :94)
> > > > > > > > > > >     at
> > > > > > > > > > >
> > > > > > >
> > > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > > > Exception in thread "main" java.io.IOException: Job
> > failed!
> > > > > > > > > > >     at
> > > > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > > > >     at
> > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > > >     at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > >
> > > > > > > > > > > Any ideas?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > >
> > > > > --
> > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > >
> > > >
> > > > --
> > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > >
> > >
> >
>
> --
> Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
>
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
Sorry, my mistake. changed to 0.1.1
results:

bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
060425 113831 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060425 113831 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060425 113832 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060425 113832 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060425 113832 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060425 113832 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060425 113832 Client connection to 127.0.0.1:50000: starting
060425 113832 crawl started in: crawled
060425 113832 rootUrlDir = 2
060425 113832 threads = 10
060425 113832 depth = 5
060425 113833 Injector: starting
060425 113833 Injector: crawlDb: crawled/crawldb
060425 113833 Injector: urlDir: 2
060425 113833 Injector: Converting injected urls to crawl db entries.
060425 113833 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060425 113833 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
060425 113833 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
060425 113833 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060425 113833 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060425 113833 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
060425 113833 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060425 113834 Client connection to 127.0.0.1:50020: starting
060425 113834 Client connection to 127.0.0.1:50000: starting
060425 113834 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060425 113834 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060425 113838 Running job: job_23a6ra
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
bash-3.00$


Step by Step, same but another job that failed.

> --- Ursprüngliche Nachricht ---
> Von: "Zaheed Haque" <[hidden email]>
> An: [hidden email]
> Betreff: Re: java.io.IOException: No input directories specified in
> Datum: Tue, 25 Apr 2006 11:34:10 +0200
>
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
>
> Somehow you are still using hadoop-0.1 and not 0.1.1. I am not sure if
> this update will solve your problem but it might. With the config I
> sent you, I could, crawl-index-serach so there must be something
> else.. I am not sure.
>
> Cheers
> Zaheed
>
> On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > Seems to be a bit better, doesn't it?
> >
> > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > 060425 110124 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > 060425 110124 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > 060425 110124 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > 060425 110124 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > 060425 110125 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > 060425 110125 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060425 110125 Client connection to 127.0.0.1:50000: starting
> > 060425 110125 crawl started in: crawled
> > 060425 110125 rootUrlDir = 2
> > 060425 110125 threads = 10
> > 060425 110125 depth = 5
> > 060425 110126 Injector: starting
> > 060425 110126 Injector: crawlDb: crawled/crawldb
> > 060425 110126 Injector: urlDir: 2
> > 060425 110126 Injector: Converting injected urls to crawl db entries.
> > 060425 110126 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > 060425 110126 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > 060425 110126 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > 060425 110126 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > 060425 110126 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > 060425 110126 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > 060425 110127 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060425 110127 Client connection to 127.0.0.1:50020: starting
> > 060425 110127 Client connection to 127.0.0.1:50000: starting
> > 060425 110127 parsing
> >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > 060425 110127 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > Exception in thread "main"
> java.lang.reflect.UndeclaredThrowableException
> >         at org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown
> Source)
> >         at
> >
> org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
> >         at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
> >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
> >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > Caused by: java.io.IOException: timed out waiting for response
> >         at org.apache.hadoop.ipc.Client.call(Client.java:303)
> >         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> >         ... 6 more
> >
> >
> > local ip is the same,
> > but don't exactly know how to handle the ports.
> >
> > Step by Step (generate, index..) caused same error while
> >  bin/nutch generate crawl/crawldb crawl/segments
> >
> > > --- Ursprüngliche Nachricht ---
> > > Von: "Zaheed Haque" <[hidden email]>
> > > An: [hidden email]
> > > Betreff: Re: java.io.IOException: No input directories specified in
> > > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> > >
> > > Try the following in your hadoop-site.xml.. please change and adjust
> > > based on your ip address. The following configuration assumes that the
> > > you have 1 server and you are using it as a namenode as well as a
> > > datanode. Note this is NOT the reason for running Hadoopified Nutch!
> > > It is rather for testing....
> > >
> > > --------------------
> > >
> > > <?xml version="1.0"?>
> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > >
> > > <configuration>
> > >
> > > <!-- file system properties -->
> > >
> > > <property>
> > >   <name>fs.default.name</name>
> > >   <value>127.0.0.1:50000</value>
> > >   <description>The name of the default file system.  Either the
> > >   literal string "local" or a host:port for DFS.</description>
> > > </property>
> > >
> > > <property>
> > >   <name>dfs.datanode.port</name>
> > >   <value>50010</value>
> > >   <description>The port number that the dfs datanode server uses as a
> > > starting
> > >                point to look for a free port to listen on.
> > > </description>
> > > </property>
> > >
> > > <property>
> > >   <name>dfs.name.dir</name>
> > >   <value>/tmp/hadoop/dfs/name</value>
> > >   <description>Determines where on the local filesystem the DFS name
> node
> > >       should store the name table.</description>
> > > </property>
> > >
> > > <property>
> > >   <name>dfs.data.dir</name>
> > >   <value>/tmp/hadoop/dfs/data</value>
> > >   <description>Determines where on the local filesystem an DFS data
> node
> > >   should store its blocks.  If this is a comma- or space-delimited
> > >   list of directories, then data will be stored in all named
> > >   directories, typically on different devices.</description>
> > > </property>
> > >
> > > <property>
> > >   <name>dfs.replication</name>
> > >   <value>1</value>
> > >   <description>How many copies we try to have at all times. The actual
> > >   number of replications is at max the number of datanodes in the
> > >   cluster.</description>
> > > </property>
> > > <!-- map/reduce properties -->
> > >
> > > <property>
> > >   <name>mapred.job.tracker</name>
> > >   <value>127.0.0.1:50020</value>
> > >   <description>The host and port that the MapReduce job tracker runs
> > >   at.  If "local", then jobs are run in-process as a single map
> > >   and reduce task.
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapred.job.tracker.info.port</name>
> > >   <value>50030</value>
> > >   <description>The port that the MapReduce job tracker info webserver
> runs
> > > at.
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapred.task.tracker.output.port</name>
> > >   <value>50040</value>
> > >   <description>The port number that the MapReduce task tracker output
> > > server uses as a starting point to look for
> > > a free port to listen on.
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapred.task.tracker.report.port</name>
> > >   <value>50050</value>
> > >   <description>The port number that the MapReduce task tracker report
> > > server uses as a starting
> > >                point to look for a free port to listen on.
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapred.local.dir</name>
> > >   <value>/tmp/hadoop/mapred/local</value>
> > >   <description>The local directory where MapReduce stores intermediate
> > >   data files.  May be a space- or comma- separated list of
> > >   directories on different devices in order to spread disk i/o.
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapred.system.dir</name>
> > >   <value>/tmp/hadoop/mapred/system</value>
> > >   <description>The shared directory where MapReduce stores control
> files.
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapred.temp.dir</name>
> > >   <value>/tmp/hadoop/mapred/temp</value>
> > >   <description>A shared directory for temporary files.
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapred.reduce.tasks</name>
> > >   <value>1</value>
> > >   <description>The default number of reduce tasks per job.  Typically
> set
> > >   to a prime close to the number of available hosts.  Ignored when
> > >   mapred.job.tracker is "local".
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapred.tasktracker.tasks.maximum</name>
> > >   <value>2</value>
> > >   <value>/tmp/hadoop/mapred/temp</value>
> > >   <description>A shared directory for temporary files.
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapred.reduce.tasks</name>
> > >   <value>1</value>
> > >   <description>The default number of reduce tasks per job.  Typically
> set
> > >   to a prime close to the number of available hosts.  Ignored when
> > >   mapred.job.tracker is "local".
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapred.tasktracker.tasks.maximum</name>
> > >   <value>2</value>
> > >   <description>The maximum number of tasks that will be run
> > >   simultaneously by a task tracker.
> > >   </description>
> > > </property>
> > >
> > > </configuration>
> > >
> > > ------
> > >
> > > Then execute the following commands
> > > - initialize the HDFS
> > > bin/hadoop namenode -format
> > > - Start the namenode/datanode
> > > bin/hadoop-daemon.sh start namenode
> > > bin/hadoop-daemon.sh start datanode
> > > - Lets do some checking...
> > > bin/hadoop dfs -ls
> > >
> > > Should return 0 items!! So lets try to add a file to the DFS
> > >
> > > bin/hadoop dfs -put xyz.html xyz.html
> > >
> > > Try
> > >
> > > bin/hadoop dfs -ls
> > >
> > > You should see one item which is
> > > Found 1 items
> > > /user/root/xyz.html    21433
> > >
> > > bin/hadoop-daemon.sh start jobtracker
> > > bin/hadoop-daemon.sh start tasktracker
> > >
> > > Now you can start of with inject, generate etc.. etc..
> > >
> > > Hope this time it works for you..
> > >
> > > Cheers
> > >
> > >
> > > On 4/24/06, Zaheed Haque <[hidden email]> wrote:
> > > > On 4/24/06, Peter Swoboda <[hidden email]> wrote:
> > > > > I forgot to have a look at the log files:
> > > > > namenode:
> > > > > 060424 121444 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121444 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > Exception in thread "main" java.lang.RuntimeException: Not a
> host:port
> > > pair:
> > > > > local
> > > > >         at
> > > org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > > >         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > > >         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > > >
> > > > >
> > > > > datanode
> > > > > 060424 121448 10 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121448 10 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060424 121448 10 Can't start DataNode in non-directory:
> > > /tmp/hadoop/dfs/data
> > > > >
> > > > > jobtracker
> > > > > 060424 121455 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121455 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060424 121455 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121456 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060424 121456 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > mapred.job.tracker: local
> > > > >         at
> > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > >         at
> > > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > > >         at
> > > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > > >         at
> > > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > > >
> > > > >
> > > > > tasktracker
> > > > > 060424 121502 parsing
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060424 121503 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > mapred.job.tracker: local
> > > > >         at
> > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > >         at
> > > org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > > >         at
> > > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > > >
> > > > >
> > > > > What can be the problem?
> > > > > > --- Ursprüngliche Nachricht ---
> > > > > > Von: "Peter Swoboda" <[hidden email]>
> > > > > > An: [hidden email]
> > > > > > Betreff: Re: java.io.IOException: No input directories specified
> in
> > > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > > >
> > > > > > Got the latest nutch-nightly built,
> > > > > > including hadoop-0.1.1.jar.
> > > > > > Copied the content of the daoop-default.xml into
> hadoop-site.xml.
> > > > > > started namenode, datanode, jobtracker, tasktracker.
> > > > > > made
> > > > > > bin/hadoop dfs -put seeds seeds
> > > > > >
> > > > > > result:
> > > > > >
> > > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > > starting namenode, logging to
> > > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > > >
> > > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > > starting datanode, logging to
> > > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > > >
> > > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > > > starting jobtracker, logging to
> > > > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > > >
> > > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > > > starting tasktracker, logging to
> > > > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > 060424 121512 parsing
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121512 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060424 121513 No FS indicated, using default:local
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > 060424 121543 parsing
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121543 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060424 121544 No FS indicated, using default:local
> > > > > > Found 18 items
> > > > > > /home/../nutch-nightly/docs      <dir>
> > > > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > > > /home/../nutch-nightly/webapps   <dir>
> > > > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > > > /home/../nutch-nightly/build.xml 21433
> > > > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > > > /home/../nutch-nightly/test.log  3447
> > > > > > /home/../nutch-nightly/conf      <dir>
> > > > > > /home/../nutch-nightly/default.properties        3043
> > > > > > /home/../nutch-nightly/plugins   <dir>
> > > > > > /home/../nutch-nightly/lib       <dir>
> > > > > > /home/../nutch-nightly/bin       <dir>
> > > > > > /home/../nutch-nightly/logs      <dir>
> > > > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > > > /home/../nutch-nightly/src       <dir>
> > > > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > > > /home/../nutch-nightly/seeds     <dir>
> > > > > > /home/../nutch-nightly/README.txt        403
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > 060424 121603 parsing
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121603 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060424 121603 No FS indicated, using default:local
> > > > > > Found 2 items
> > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > >
> > > > > > so far so good, but:
> > > > > >
> > > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > 060424 121613 parsing
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121613 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > 060424 121613 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > 060424 121613 parsing
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060424 121613 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > 060424 121613 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060424 121614 crawl started in: crawled
> > > > > > 060424 121614 rootUrlDir = 2
> > > > > > 060424 121614 threads = 10
> > > > > > 060424 121614 depth = 5
> > > > > > Exception in thread "main" java.io.IOException: No valid local
> > > directories
> > > > > > in property: mapred.local.dir
> > > > > >         at
> > > > > >
> org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > > >         at
> > > org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > > bash-3.00$
> > > > > >
> > > > > > I really don't know what to do.
> > > > > > in hadoop-site.xml it's:
> > > > > > ..
> > > > > > <property>
> > > > > >   <name>mapred.local.dir</name>
> > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > >   <description>The local directory where MapReduce stores
> > > intermediate
> > > > > >   data files.  May be a space- or comma- separated list of
> > > > > >   directories on different devices in order to spread disk i/o.
> > > > > >   </description>
> > > > > > </property>
> > > > > > ..
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________
> > > > > > Is your hadoop-site.xml empty, I mean it doesn't consisit any
> > > > > > configuration correct? So what you need to do is add your
> > > > > > configuration there. I suggest you copy the hadoop-0.1.1.jar to
> > > > > > another directory for inspection, copy not move. unzip the
> > > > > > hadoop-0.1.1.jar file you will see hadoop-default.xml file
> there.
> > > use
> > > > > > that as a template to edit your hadoop-site.xml under conf. Once
> you
> > > > > > have edited it then you should start your 'namenode' and
> 'datanode'.
> > > I
> > > > > > am guessing you are using nutch in a distributed way. cos you
> don't
> > > > > > need to use hadoop if you are just running in one machine local
> > > mode!!
> > > > > >
> > > > > > Anyway you need to do the following to start the datanode and
> > > namenode
> > > > > >
> > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > bin/hadoop-daemon.sh start datanode
> > > > > >
> > > > > > then you need to start jobtracker and tasktracker before you
> start
> > > > > > crawling
> > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > >
> > > > > > then you start your bin/hadoop dfs -put seeds seeds
> > > > > >
> > > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > ok. changed to latest nightly build.
> > > > > > > hadoop-0.1.1.jar is existing,
> > > > > > > hadoop-site.xml also.
> > > > > > > now trying
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > >
> > > > > > > 060421 125154 parsing
> > > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 125155 parsing
> > > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> e.xml
> > > > > > > 060421 125155 No FS indicated, using default:local
> > > > > > >
> > > > > > > and
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > >
> > > > > > > 060421 125217 parsing
> > > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 125217 parsing
> > > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> e.xml
> > > > > > > 060421 125217 No FS indicated, using default:local
> > > > > > > Found 16 items
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war
> 15541036
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/default.properties
> > > 3043
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job
> 18537096
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > > > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt        403
> > > > > > >
> > > > > > > also:
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > >
> > > > > > > 060421 133004 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 133004 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060421 133004 No FS indicated, using default:local
> > > > > > > Found 2 items
> > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > bash-3.00$
> > > > > > >
> > > > > > > but:
> > > > > > >
> > > > > > > but:
> > > > > > >
> > > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > >
> > > > > > > 060421 131722 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 131723 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > 060421 131723 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > 060421 131723 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131723 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > 060421 131723 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060421 131723 crawl started in: crawled
> > > > > > > 060421 131723 rootUrlDir = 2
> > > > > > > 060421 131723 threads = 10
> > > > > > > 060421 131723 depth = 5
> > > > > > > 060421 131724 Injector: starting
> > > > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > > 060421 131724 Injector: Converting injected urls to crawl db
> > > entries.
> > > > > > > 060421 131724 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 131724 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > 060421 131724 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > 060421 131724 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131724 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131725 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > 060421 131725 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060421 131725 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 131726 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > 060421 131726 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > 060421 131726 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131726 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131726 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131726 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > 060421 131727 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060421 131727 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060421 131727 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060421 131727 parsing
> > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > > 060421 131727 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060421 131727 job_6jn7j8
> > > > > > > java.io.IOException: No input directories specified in:
> > > Configuration:
> > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > > hadoop-site.xml
> > > > > > >         at
> > > > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > :90)
> > > > > > >         at
> > > > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > :100)
> > > > > > >         at
> > > > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > > >         at
> > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > >         at
> > > org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > bash-3.00$
> > > > > > >
> > > > > > > Can anyone help?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > An: [hidden email]
> > > > > > > > Betreff: Re: java.io.IOException: No input directories
> specified
> > > in
> > > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > > >
> > > > > > > > Also I have noticed that you are using hadoop-0.1, there was
> a
> > > bug in
> > > > > > > > 0.1 you should be using 0.1.1. Under you lib catalog you
> should
> > > have
> > > > > > > > the following file
> > > > > > > >
> > > > > > > > hadoop-0.1.1.jar
> > > > > > > >
> > > > > > > > If thats the case. Please download the latest nightly build.
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > > > Do you have a file called "hadoop-site.xml" under your
> conf
> > > > > > directory?
> > > > > > > > > The content of the file is like the following:
> > > > > > > > >
> > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > <?xml-stylesheet type="text/xsl"
> href="configuration.xsl"?>
> > > > > > > > >
> > > > > > > > > <!-- Put site-specific property overrides in this file.
> -->
> > > > > > > > >
> > > > > > > > > <configuration>
> > > > > > > > >
> > > > > > > > > </configuration>
> > > > > > > > >
> > > > > > > > > or is it missing... if its missing please create a file
> under
> > > the
> > > > > > conf
> > > > > > > > > catalog with the name hadoop-site.xml and then try the
> hadoop
> > > dfs
> > > > > > -ls
> > > > > > > > > again?  you should see something! like listing from your
> local
> > > file
> > > > > > > > > system.
> > > > > > > > >
> > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > specified
> > > > > > in
> > > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > > > >
> > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > >
> > > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > 060421 122421 parsing
> > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > >
> > > > > > > > > I think the hadoop-site is missing cos we should be seeing
> a
> > > message
> > > > > > > > > like this here...
> > > > > > > > >
> > > > > > > > > 060421 131014 parsing
> > > > > > > > >
> > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > > >
> > > > > > > > > > 060421 122421 No FS indicated, using default:local
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > >
> > > > > > > > > > 060421 122425 parsing
> > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > >
> > > > > > > > > > 060421 122426 No FS indicated, using default:local
> > > > > > > > > >
> > > > > > > > > > Found 0 items
> > > > > > > > > >
> > > > > > > > > > bash-3.00$
> > > > > > > > > >
> > > > > > > > > > As you can see, i can't.
> > > > > > > > > > What's going wrong?
> > > > > > > > > >
> > > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > > >
> > > > > > > > > > > Can you see your text file with URLS?
> > > > > > > > > > >
> > > > > > > > > > > Furthermore bin/nutch crawl is a one shot crawl/index
> > > command. I
> > > > > > > > > > > strongly recommend you take the long route of
> > > > > > > > > > >
> > > > > > > > > > > inject, generate, fetch, updatedb, invertlinks, index,
> > > dedup and
> > > > > > > > > > > merge.  You can try the above commands just by typing
> > > > > > > > > > > bin/nutch inject
> > > > > > > > > > > etc..
> > > > > > > > > > > If just try the inject command without any parameters
> it
> > > will
> > > > > > tell
> > > > > > > > you
> > > > > > > > > > > how to use it..
> > > > > > > > > > >
> > > > > > > > > > > Hope this helps.
> > > > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> > > wrote:
> > > > > > > > > > > > hi
> > > > > > > > > > > >
> > > > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > > > done the following steps:
> > > > > > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > > > > > >
> > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > >
> > > > > > > > > > > > 060317 121440 parsing
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > 060317 121441 No FS indicated, using default:local
> > > > > > > > > > > >
> > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >&
> crawl.log
> > > > > > > > > > > > but in crawl.log:
> > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > java.io.IOException: No input directories specified
> in:
> > > > > > > > Configuration:
> > > > > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > > > > >
> > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > >     at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > :84)
> > > > > > > > > > > >     at
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > :94)
> > > > > > > > > > > >     at
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > > > > Exception in thread "main" java.io.IOException: Job
> > > failed!
> > > > > > > > > > > >     at
> > > > > > > >
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > > > > >     at
> > > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > > > >     at
> org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > >
> > > > > > > > > > > > Any ideas?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > >
> > > > > > --
> > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > >
> > > > >
> > > > > --
> > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > >
> > > >
> > >
> >
> > --
> > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> >
>

--
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
Hi Could you please post the results for the following commands
bin/hadoop dfs -ls

and

bin/nutch inject crawldb crawled(your urls directory in hadoop)

thanks

On 4/25/06, Peter Swoboda <[hidden email]> wrote:

> Sorry, my mistake. changed to 0.1.1
> results:
>
> bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> 060425 113831 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060425 113831 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060425 113832 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060425 113832 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060425 113832 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060425 113832 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060425 113832 Client connection to 127.0.0.1:50000: starting
> 060425 113832 crawl started in: crawled
> 060425 113832 rootUrlDir = 2
> 060425 113832 threads = 10
> 060425 113832 depth = 5
> 060425 113833 Injector: starting
> 060425 113833 Injector: crawlDb: crawled/crawldb
> 060425 113833 Injector: urlDir: 2
> 060425 113833 Injector: Converting injected urls to crawl db entries.
> 060425 113833 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060425 113833 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> 060425 113833 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> 060425 113833 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060425 113833 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060425 113833 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> 060425 113833 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060425 113834 Client connection to 127.0.0.1:50020: starting
> 060425 113834 Client connection to 127.0.0.1:50000: starting
> 060425 113834 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060425 113834 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060425 113838 Running job: job_23a6ra
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> bash-3.00$
>
>
> Step by Step, same but another job that failed.
>
> > --- Ursprüngliche Nachricht ---
> > Von: "Zaheed Haque" <[hidden email]>
> > An: [hidden email]
> > Betreff: Re: java.io.IOException: No input directories specified in
> > Datum: Tue, 25 Apr 2006 11:34:10 +0200
> >
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> >
> > Somehow you are still using hadoop-0.1 and not 0.1.1. I am not sure if
> > this update will solve your problem but it might. With the config I
> > sent you, I could, crawl-index-serach so there must be something
> > else.. I am not sure.
> >
> > Cheers
> > Zaheed
> >
> > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > Seems to be a bit better, doesn't it?
> > >
> > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > 060425 110124 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > 060425 110124 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > > 060425 110124 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > 060425 110124 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > 060425 110125 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > > 060425 110125 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060425 110125 Client connection to 127.0.0.1:50000: starting
> > > 060425 110125 crawl started in: crawled
> > > 060425 110125 rootUrlDir = 2
> > > 060425 110125 threads = 10
> > > 060425 110125 depth = 5
> > > 060425 110126 Injector: starting
> > > 060425 110126 Injector: crawlDb: crawled/crawldb
> > > 060425 110126 Injector: urlDir: 2
> > > 060425 110126 Injector: Converting injected urls to crawl db entries.
> > > 060425 110126 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > 060425 110126 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > > 060425 110126 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > 060425 110126 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > 060425 110126 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > 060425 110126 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > > 060425 110127 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060425 110127 Client connection to 127.0.0.1:50020: starting
> > > 060425 110127 Client connection to 127.0.0.1:50000: starting
> > > 060425 110127 parsing
> > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > 060425 110127 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > Exception in thread "main"
> > java.lang.reflect.UndeclaredThrowableException
> > >         at org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown
> > Source)
> > >         at
> > >
> > org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
> > >         at
> > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
> > >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
> > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > Caused by: java.io.IOException: timed out waiting for response
> > >         at org.apache.hadoop.ipc.Client.call(Client.java:303)
> > >         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> > >         ... 6 more
> > >
> > >
> > > local ip is the same,
> > > but don't exactly know how to handle the ports.
> > >
> > > Step by Step (generate, index..) caused same error while
> > >  bin/nutch generate crawl/crawldb crawl/segments
> > >
> > > > --- Ursprüngliche Nachricht ---
> > > > Von: "Zaheed Haque" <[hidden email]>
> > > > An: [hidden email]
> > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> > > >
> > > > Try the following in your hadoop-site.xml.. please change and adjust
> > > > based on your ip address. The following configuration assumes that the
> > > > you have 1 server and you are using it as a namenode as well as a
> > > > datanode. Note this is NOT the reason for running Hadoopified Nutch!
> > > > It is rather for testing....
> > > >
> > > > --------------------
> > > >
> > > > <?xml version="1.0"?>
> > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > >
> > > > <configuration>
> > > >
> > > > <!-- file system properties -->
> > > >
> > > > <property>
> > > >   <name>fs.default.name</name>
> > > >   <value>127.0.0.1:50000</value>
> > > >   <description>The name of the default file system.  Either the
> > > >   literal string "local" or a host:port for DFS.</description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>dfs.datanode.port</name>
> > > >   <value>50010</value>
> > > >   <description>The port number that the dfs datanode server uses as a
> > > > starting
> > > >                point to look for a free port to listen on.
> > > > </description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>dfs.name.dir</name>
> > > >   <value>/tmp/hadoop/dfs/name</value>
> > > >   <description>Determines where on the local filesystem the DFS name
> > node
> > > >       should store the name table.</description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>dfs.data.dir</name>
> > > >   <value>/tmp/hadoop/dfs/data</value>
> > > >   <description>Determines where on the local filesystem an DFS data
> > node
> > > >   should store its blocks.  If this is a comma- or space-delimited
> > > >   list of directories, then data will be stored in all named
> > > >   directories, typically on different devices.</description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>dfs.replication</name>
> > > >   <value>1</value>
> > > >   <description>How many copies we try to have at all times. The actual
> > > >   number of replications is at max the number of datanodes in the
> > > >   cluster.</description>
> > > > </property>
> > > > <!-- map/reduce properties -->
> > > >
> > > > <property>
> > > >   <name>mapred.job.tracker</name>
> > > >   <value>127.0.0.1:50020</value>
> > > >   <description>The host and port that the MapReduce job tracker runs
> > > >   at.  If "local", then jobs are run in-process as a single map
> > > >   and reduce task.
> > > >   </description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>mapred.job.tracker.info.port</name>
> > > >   <value>50030</value>
> > > >   <description>The port that the MapReduce job tracker info webserver
> > runs
> > > > at.
> > > >   </description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>mapred.task.tracker.output.port</name>
> > > >   <value>50040</value>
> > > >   <description>The port number that the MapReduce task tracker output
> > > > server uses as a starting point to look for
> > > > a free port to listen on.
> > > >   </description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>mapred.task.tracker.report.port</name>
> > > >   <value>50050</value>
> > > >   <description>The port number that the MapReduce task tracker report
> > > > server uses as a starting
> > > >                point to look for a free port to listen on.
> > > >   </description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>mapred.local.dir</name>
> > > >   <value>/tmp/hadoop/mapred/local</value>
> > > >   <description>The local directory where MapReduce stores intermediate
> > > >   data files.  May be a space- or comma- separated list of
> > > >   directories on different devices in order to spread disk i/o.
> > > >   </description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>mapred.system.dir</name>
> > > >   <value>/tmp/hadoop/mapred/system</value>
> > > >   <description>The shared directory where MapReduce stores control
> > files.
> > > >   </description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>mapred.temp.dir</name>
> > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > >   <description>A shared directory for temporary files.
> > > >   </description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>mapred.reduce.tasks</name>
> > > >   <value>1</value>
> > > >   <description>The default number of reduce tasks per job.  Typically
> > set
> > > >   to a prime close to the number of available hosts.  Ignored when
> > > >   mapred.job.tracker is "local".
> > > >   </description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > >   <value>2</value>
> > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > >   <description>A shared directory for temporary files.
> > > >   </description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>mapred.reduce.tasks</name>
> > > >   <value>1</value>
> > > >   <description>The default number of reduce tasks per job.  Typically
> > set
> > > >   to a prime close to the number of available hosts.  Ignored when
> > > >   mapred.job.tracker is "local".
> > > >   </description>
> > > > </property>
> > > >
> > > > <property>
> > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > >   <value>2</value>
> > > >   <description>The maximum number of tasks that will be run
> > > >   simultaneously by a task tracker.
> > > >   </description>
> > > > </property>
> > > >
> > > > </configuration>
> > > >
> > > > ------
> > > >
> > > > Then execute the following commands
> > > > - initialize the HDFS
> > > > bin/hadoop namenode -format
> > > > - Start the namenode/datanode
> > > > bin/hadoop-daemon.sh start namenode
> > > > bin/hadoop-daemon.sh start datanode
> > > > - Lets do some checking...
> > > > bin/hadoop dfs -ls
> > > >
> > > > Should return 0 items!! So lets try to add a file to the DFS
> > > >
> > > > bin/hadoop dfs -put xyz.html xyz.html
> > > >
> > > > Try
> > > >
> > > > bin/hadoop dfs -ls
> > > >
> > > > You should see one item which is
> > > > Found 1 items
> > > > /user/root/xyz.html    21433
> > > >
> > > > bin/hadoop-daemon.sh start jobtracker
> > > > bin/hadoop-daemon.sh start tasktracker
> > > >
> > > > Now you can start of with inject, generate etc.. etc..
> > > >
> > > > Hope this time it works for you..
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On 4/24/06, Zaheed Haque <[hidden email]> wrote:
> > > > > On 4/24/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > I forgot to have a look at the log files:
> > > > > > namenode:
> > > > > > 060424 121444 parsing
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121444 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > Exception in thread "main" java.lang.RuntimeException: Not a
> > host:port
> > > > pair:
> > > > > > local
> > > > > >         at
> > > > org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > > > >         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > > > >         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > > > >
> > > > > >
> > > > > > datanode
> > > > > > 060424 121448 10 parsing
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121448 10 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060424 121448 10 Can't start DataNode in non-directory:
> > > > /tmp/hadoop/dfs/data
> > > > > >
> > > > > > jobtracker
> > > > > > 060424 121455 parsing
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121455 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060424 121455 parsing
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121456 parsing
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060424 121456 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > mapred.job.tracker: local
> > > > > >         at
> > > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > >         at
> > > > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > > > >         at
> > > > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > > > >         at
> > > > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > > > >
> > > > > >
> > > > > > tasktracker
> > > > > > 060424 121502 parsing
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060424 121503 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > mapred.job.tracker: local
> > > > > >         at
> > > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > >         at
> > > > org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > > > >         at
> > > > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > > > >
> > > > > >
> > > > > > What can be the problem?
> > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > Von: "Peter Swoboda" <[hidden email]>
> > > > > > > An: [hidden email]
> > > > > > > Betreff: Re: java.io.IOException: No input directories specified
> > in
> > > > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > > > >
> > > > > > > Got the latest nutch-nightly built,
> > > > > > > including hadoop-0.1.1.jar.
> > > > > > > Copied the content of the daoop-default.xml into
> > hadoop-site.xml.
> > > > > > > started namenode, datanode, jobtracker, tasktracker.
> > > > > > > made
> > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > >
> > > > > > > result:
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > > > starting namenode, logging to
> > > > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > > > starting datanode, logging to
> > > > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > > > > starting jobtracker, logging to
> > > > > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > > > > starting tasktracker, logging to
> > > > > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > 060424 121512 parsing
> > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060424 121512 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060424 121513 No FS indicated, using default:local
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > 060424 121543 parsing
> > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060424 121543 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060424 121544 No FS indicated, using default:local
> > > > > > > Found 18 items
> > > > > > > /home/../nutch-nightly/docs      <dir>
> > > > > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > > > > /home/../nutch-nightly/webapps   <dir>
> > > > > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > > > > /home/../nutch-nightly/build.xml 21433
> > > > > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > > > > /home/../nutch-nightly/test.log  3447
> > > > > > > /home/../nutch-nightly/conf      <dir>
> > > > > > > /home/../nutch-nightly/default.properties        3043
> > > > > > > /home/../nutch-nightly/plugins   <dir>
> > > > > > > /home/../nutch-nightly/lib       <dir>
> > > > > > > /home/../nutch-nightly/bin       <dir>
> > > > > > > /home/../nutch-nightly/logs      <dir>
> > > > > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > > > > /home/../nutch-nightly/src       <dir>
> > > > > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > > > > /home/../nutch-nightly/seeds     <dir>
> > > > > > > /home/../nutch-nightly/README.txt        403
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > 060424 121603 parsing
> > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060424 121603 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060424 121603 No FS indicated, using default:local
> > > > > > > Found 2 items
> > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > >
> > > > > > > so far so good, but:
> > > > > > >
> > > > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > 060424 121613 parsing
> > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060424 121613 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > 060424 121613 parsing
> > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > 060424 121613 parsing
> > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060424 121613 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > 060424 121613 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060424 121614 crawl started in: crawled
> > > > > > > 060424 121614 rootUrlDir = 2
> > > > > > > 060424 121614 threads = 10
> > > > > > > 060424 121614 depth = 5
> > > > > > > Exception in thread "main" java.io.IOException: No valid local
> > > > directories
> > > > > > > in property: mapred.local.dir
> > > > > > >         at
> > > > > > >
> > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > > > >         at
> > > > org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > > > bash-3.00$
> > > > > > >
> > > > > > > I really don't know what to do.
> > > > > > > in hadoop-site.xml it's:
> > > > > > > ..
> > > > > > > <property>
> > > > > > >   <name>mapred.local.dir</name>
> > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > >   <description>The local directory where MapReduce stores
> > > > intermediate
> > > > > > >   data files.  May be a space- or comma- separated list of
> > > > > > >   directories on different devices in order to spread disk i/o.
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > > ..
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________
> > > > > > > Is your hadoop-site.xml empty, I mean it doesn't consisit any
> > > > > > > configuration correct? So what you need to do is add your
> > > > > > > configuration there. I suggest you copy the hadoop-0.1.1.jar to
> > > > > > > another directory for inspection, copy not move. unzip the
> > > > > > > hadoop-0.1.1.jar file you will see hadoop-default.xml file
> > there.
> > > > use
> > > > > > > that as a template to edit your hadoop-site.xml under conf. Once
> > you
> > > > > > > have edited it then you should start your 'namenode' and
> > 'datanode'.
> > > > I
> > > > > > > am guessing you are using nutch in a distributed way. cos you
> > don't
> > > > > > > need to use hadoop if you are just running in one machine local
> > > > mode!!
> > > > > > >
> > > > > > > Anyway you need to do the following to start the datanode and
> > > > namenode
> > > > > > >
> > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > >
> > > > > > > then you need to start jobtracker and tasktracker before you
> > start
> > > > > > > crawling
> > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > >
> > > > > > > then you start your bin/hadoop dfs -put seeds seeds
> > > > > > >
> > > > > > > On 4/21/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > > ok. changed to latest nightly build.
> > > > > > > > hadoop-0.1.1.jar is existing,
> > > > > > > > hadoop-site.xml also.
> > > > > > > > now trying
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > >
> > > > > > > > 060421 125154 parsing
> > > > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060421 125155 parsing
> > > > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > e.xml
> > > > > > > > 060421 125155 No FS indicated, using default:local
> > > > > > > >
> > > > > > > > and
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > >
> > > > > > > > 060421 125217 parsing
> > > > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060421 125217 parsing
> > > > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > e.xml
> > > > > > > > 060421 125217 No FS indicated, using default:local
> > > > > > > > Found 16 items
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war
> > 15541036
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt       17709
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt       615
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/default.properties
> > > > 3043
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job
> > 18537096
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > > > > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt        403
> > > > > > > >
> > > > > > > > also:
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > >
> > > > > > > > 060421 133004 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060421 133004 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060421 133004 No FS indicated, using default:local
> > > > > > > > Found 2 items
> > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > bash-3.00$
> > > > > > > >
> > > > > > > > but:
> > > > > > > >
> > > > > > > > but:
> > > > > > > >
> > > > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > >
> > > > > > > > 060421 131722 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060421 131723 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > 060421 131723 parsing
> > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > 060421 131723 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060421 131723 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > 060421 131723 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060421 131723 crawl started in: crawled
> > > > > > > > 060421 131723 rootUrlDir = 2
> > > > > > > > 060421 131723 threads = 10
> > > > > > > > 060421 131723 depth = 5
> > > > > > > > 060421 131724 Injector: starting
> > > > > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > > > 060421 131724 Injector: Converting injected urls to crawl db
> > > > entries.
> > > > > > > > 060421 131724 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060421 131724 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > 060421 131724 parsing
> > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > 060421 131724 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060421 131724 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060421 131725 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > 060421 131725 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060421 131725 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060421 131726 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > 060421 131726 parsing
> > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > 060421 131726 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060421 131726 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060421 131726 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060421 131726 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > 060421 131727 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060421 131727 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060421 131727 parsing
> > > > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060421 131727 parsing
> > > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > > > 060421 131727 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060421 131727 job_6jn7j8
> > > > > > > > java.io.IOException: No input directories specified in:
> > > > Configuration:
> > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > > > hadoop-site.xml
> > > > > > > >         at
> > > > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > :90)
> > > > > > > >         at
> > > > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > :100)
> > > > > > > >         at
> > > > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > > > >         at
> > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > >         at
> > > > org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > bash-3.00$
> > > > > > > >
> > > > > > > > Can anyone help?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > An: [hidden email]
> > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > specified
> > > > in
> > > > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > > > >
> > > > > > > > > Also I have noticed that you are using hadoop-0.1, there was
> > a
> > > > bug in
> > > > > > > > > 0.1 you should be using 0.1.1. Under you lib catalog you
> > should
> > > > have
> > > > > > > > > the following file
> > > > > > > > >
> > > > > > > > > hadoop-0.1.1.jar
> > > > > > > > >
> > > > > > > > > If thats the case. Please download the latest nightly build.
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > > > > Do you have a file called "hadoop-site.xml" under your
> > conf
> > > > > > > directory?
> > > > > > > > > > The content of the file is like the following:
> > > > > > > > > >
> > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > <?xml-stylesheet type="text/xsl"
> > href="configuration.xsl"?>
> > > > > > > > > >
> > > > > > > > > > <!-- Put site-specific property overrides in this file.
> > -->
> > > > > > > > > >
> > > > > > > > > > <configuration>
> > > > > > > > > >
> > > > > > > > > > </configuration>
> > > > > > > > > >
> > > > > > > > > > or is it missing... if its missing please create a file
> > under
> > > > the
> > > > > > > conf
> > > > > > > > > > catalog with the name hadoop-site.xml and then try the
> > hadoop
> > > > dfs
> > > > > > > -ls
> > > > > > > > > > again?  you should see something! like listing from your
> > local
> > > > file
> > > > > > > > > > system.
> > > > > > > > > >
> > > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> > wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > > specified
> > > > > > > in
> > > > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > > > > >
> > > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > > >
> > > > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > 060421 122421 parsing
> > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > >
> > > > > > > > > > I think the hadoop-site is missing cos we should be seeing
> > a
> > > > message
> > > > > > > > > > like this here...
> > > > > > > > > >
> > > > > > > > > > 060421 131014 parsing
> > > > > > > > > >
> > > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > > > >
> > > > > > > > > > > 060421 122421 No FS indicated, using default:local
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > >
> > > > > > > > > > > 060421 122425 parsing
> > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > >
> > > > > > > > > > > 060421 122426 No FS indicated, using default:local
> > > > > > > > > > >
> > > > > > > > > > > Found 0 items
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$
> > > > > > > > > > >
> > > > > > > > > > > As you can see, i can't.
> > > > > > > > > > > What's going wrong?
> > > > > > > > > > >
> > > > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > > > >
> > > > > > > > > > > > Can you see your text file with URLS?
> > > > > > > > > > > >
> > > > > > > > > > > > Furthermore bin/nutch crawl is a one shot crawl/index
> > > > command. I
> > > > > > > > > > > > strongly recommend you take the long route of
> > > > > > > > > > > >
> > > > > > > > > > > > inject, generate, fetch, updatedb, invertlinks, index,
> > > > dedup and
> > > > > > > > > > > > merge.  You can try the above commands just by typing
> > > > > > > > > > > > bin/nutch inject
> > > > > > > > > > > > etc..
> > > > > > > > > > > > If just try the inject command without any parameters
> > it
> > > > will
> > > > > > > tell
> > > > > > > > > you
> > > > > > > > > > > > how to use it..
> > > > > > > > > > > >
> > > > > > > > > > > > Hope this helps.
> > > > > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> > > > wrote:
> > > > > > > > > > > > > hi
> > > > > > > > > > > > >
> > > > > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > > > > done the following steps:
> > > > > > > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > > > > > > >
> > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > >
> > > > > > > > > > > > > 060317 121440 parsing
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060317 121441 No FS indicated, using default:local
> > > > > > > > > > > > >
> > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >&
> > crawl.log
> > > > > > > > > > > > > but in crawl.log:
> > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > java.io.IOException: No input directories specified
> > in:
> > > > > > > > > Configuration:
> > > > > > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > > > > > >
> > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > >     at
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > :84)
> > > > > > > > > > > > >     at
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > :94)
> > > > > > > > > > > > >     at
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > > > > > Exception in thread "main" java.io.IOException: Job
> > > > failed!
> > > > > > > > > > > > >     at
> > > > > > > > >
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > > > > > >     at
> > > > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > > > > >     at
> > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Any ideas?
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > >
> > > > > > > --
> > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > >
> > > > >
> > > >
> > >
> > > --
> > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > >
> >
>
> --
> GMX Produkte empfehlen und ganz einfach Geld verdienen!
> Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
>
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
Hi.
Of course i can. here you are:


> --- Ursprüngliche Nachricht ---
> Von: "Zaheed Haque" <[hidden email]>
> An: [hidden email]
> Betreff: Re: java.io.IOException: No input directories specified in
> Datum: Tue, 25 Apr 2006 12:00:53 +0200
>
> Hi Could you please post the results for the following commands
> bin/hadoop dfs -ls

bash-3.00$ bin/hadoop dfs -ls
060426 085559 parsing
jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 085559 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
060426 085559 No FS indicated, using default:localhost.localdomain:50000
060426 085559 Client connection to 127.0.0.1:50000: starting
Found 1 items
/user/swoboda/urls      <dir>
bash-3.00$


>
> and
>
> bin/nutch inject crawldb crawled(your urls directory in hadoop)
>

bash-3.00$ bin/nutch inject crawldb crawled urls
060426 085723 Injector: starting
060426 085723 Injector: crawlDb: crawldb
060426 085723 Injector: urlDir: crawled
060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
op-0.1.1.jar!/hadoop-default.xml
060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d efault.xml
060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-s ite.xml
060426 085724 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
060426 085724 Injector: Converting injected urls to crawl db entries.
060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
op-0.1.1.jar!/hadoop-default.xml
060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d efault.xml
060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
op-0.1.1.jar!/mapred-default.xml
060426 085725 parsing file:/home/../nutch-nightly/conf/nutch-s ite.xml
060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
060426 085725 Client connection to 127.0.0.1:50020: starting
060426 085725 Client connection to 127.0.0.1:50000: starting
060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
op-0.1.1.jar!/hadoop-default.xml
060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
060426 085730 Running job: job_o6tvpr
060426 085731  map 100%  reduce 100%
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
        at org.apache.nutch.crawl.Injector.main(Injector.java:138)
bash-3.00$

 

> thanks
>
> On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > Sorry, my mistake. changed to 0.1.1
> > results:
> >
> > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > 060425 113831 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060425 113831 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > 060425 113832 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > 060425 113832 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060425 113832 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > 060425 113832 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060425 113832 Client connection to 127.0.0.1:50000: starting
> > 060425 113832 crawl started in: crawled
> > 060425 113832 rootUrlDir = 2
> > 060425 113832 threads = 10
> > 060425 113832 depth = 5
> > 060425 113833 Injector: starting
> > 060425 113833 Injector: crawlDb: crawled/crawldb
> > 060425 113833 Injector: urlDir: 2
> > 060425 113833 Injector: Converting injected urls to crawl db entries.
> > 060425 113833 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060425 113833 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > 060425 113833 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > 060425 113833 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060425 113833 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060425 113833 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > 060425 113833 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060425 113834 Client connection to 127.0.0.1:50020: starting
> > 060425 113834 Client connection to 127.0.0.1:50000: starting
> > 060425 113834 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060425 113834 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060425 113838 Running job: job_23a6ra
> > Exception in thread "main" java.io.IOException: Job failed!
> >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > bash-3.00$
> >
> >
> > Step by Step, same but another job that failed.
> >
> > > --- Ursprüngliche Nachricht ---
> > > Von: "Zaheed Haque" <[hidden email]>
> > > An: [hidden email]
> > > Betreff: Re: java.io.IOException: No input directories specified in
> > > Datum: Tue, 25 Apr 2006 11:34:10 +0200
> > >
> > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > >
> > > Somehow you are still using hadoop-0.1 and not 0.1.1. I am not sure if
> > > this update will solve your problem but it might. With the config I
> > > sent you, I could, crawl-index-serach so there must be something
> > > else.. I am not sure.
> > >
> > > Cheers
> > > Zaheed
> > >
> > > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > > Seems to be a bit better, doesn't it?
> > > >
> > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > 060425 110124 parsing
> > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > 060425 110124 parsing
> file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > 060425 110124 parsing
> file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > 060425 110124 parsing
> > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > 060425 110125 parsing
> file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > 060425 110125 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060425 110125 Client connection to 127.0.0.1:50000: starting
> > > > 060425 110125 crawl started in: crawled
> > > > 060425 110125 rootUrlDir = 2
> > > > 060425 110125 threads = 10
> > > > 060425 110125 depth = 5
> > > > 060425 110126 Injector: starting
> > > > 060425 110126 Injector: crawlDb: crawled/crawldb
> > > > 060425 110126 Injector: urlDir: 2
> > > > 060425 110126 Injector: Converting injected urls to crawl db
> entries.
> > > > 060425 110126 parsing
> > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > 060425 110126 parsing
> file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > 060425 110126 parsing
> file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > 060425 110126 parsing
> > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > 060425 110126 parsing
> > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > 060425 110126 parsing
> file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > 060425 110127 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060425 110127 Client connection to 127.0.0.1:50020: starting
> > > > 060425 110127 Client connection to 127.0.0.1:50000: starting
> > > > 060425 110127 parsing
> > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > 060425 110127 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > Exception in thread "main"
> > > java.lang.reflect.UndeclaredThrowableException
> > > >         at org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown
> > > Source)
> > > >         at
> > > >
> > >
> org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
> > > >         at
> > > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
> > > >         at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
> > > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > Caused by: java.io.IOException: timed out waiting for response
> > > >         at org.apache.hadoop.ipc.Client.call(Client.java:303)
> > > >         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> > > >         ... 6 more
> > > >
> > > >
> > > > local ip is the same,
> > > > but don't exactly know how to handle the ports.
> > > >
> > > > Step by Step (generate, index..) caused same error while
> > > >  bin/nutch generate crawl/crawldb crawl/segments
> > > >
> > > > > --- Ursprüngliche Nachricht ---
> > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > An: [hidden email]
> > > > > Betreff: Re: java.io.IOException: No input directories specified
> in
> > > > > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> > > > >
> > > > > Try the following in your hadoop-site.xml.. please change and
> adjust
> > > > > based on your ip address. The following configuration assumes that
> the
> > > > > you have 1 server and you are using it as a namenode as well as a
> > > > > datanode. Note this is NOT the reason for running Hadoopified
> Nutch!
> > > > > It is rather for testing....
> > > > >
> > > > > --------------------
> > > > >
> > > > > <?xml version="1.0"?>
> > > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > >
> > > > > <configuration>
> > > > >
> > > > > <!-- file system properties -->
> > > > >
> > > > > <property>
> > > > >   <name>fs.default.name</name>
> > > > >   <value>127.0.0.1:50000</value>
> > > > >   <description>The name of the default file system.  Either the
> > > > >   literal string "local" or a host:port for DFS.</description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>dfs.datanode.port</name>
> > > > >   <value>50010</value>
> > > > >   <description>The port number that the dfs datanode server uses
> as a
> > > > > starting
> > > > >                point to look for a free port to listen on.
> > > > > </description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>dfs.name.dir</name>
> > > > >   <value>/tmp/hadoop/dfs/name</value>
> > > > >   <description>Determines where on the local filesystem the DFS
> name
> > > node
> > > > >       should store the name table.</description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>dfs.data.dir</name>
> > > > >   <value>/tmp/hadoop/dfs/data</value>
> > > > >   <description>Determines where on the local filesystem an DFS
> data
> > > node
> > > > >   should store its blocks.  If this is a comma- or space-delimited
> > > > >   list of directories, then data will be stored in all named
> > > > >   directories, typically on different devices.</description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>dfs.replication</name>
> > > > >   <value>1</value>
> > > > >   <description>How many copies we try to have at all times. The
> actual
> > > > >   number of replications is at max the number of datanodes in the
> > > > >   cluster.</description>
> > > > > </property>
> > > > > <!-- map/reduce properties -->
> > > > >
> > > > > <property>
> > > > >   <name>mapred.job.tracker</name>
> > > > >   <value>127.0.0.1:50020</value>
> > > > >   <description>The host and port that the MapReduce job tracker
> runs
> > > > >   at.  If "local", then jobs are run in-process as a single map
> > > > >   and reduce task.
> > > > >   </description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>mapred.job.tracker.info.port</name>
> > > > >   <value>50030</value>
> > > > >   <description>The port that the MapReduce job tracker info
> webserver
> > > runs
> > > > > at.
> > > > >   </description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>mapred.task.tracker.output.port</name>
> > > > >   <value>50040</value>
> > > > >   <description>The port number that the MapReduce task tracker
> output
> > > > > server uses as a starting point to look for
> > > > > a free port to listen on.
> > > > >   </description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>mapred.task.tracker.report.port</name>
> > > > >   <value>50050</value>
> > > > >   <description>The port number that the MapReduce task tracker
> report
> > > > > server uses as a starting
> > > > >                point to look for a free port to listen on.
> > > > >   </description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>mapred.local.dir</name>
> > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > >   <description>The local directory where MapReduce stores
> intermediate
> > > > >   data files.  May be a space- or comma- separated list of
> > > > >   directories on different devices in order to spread disk i/o.
> > > > >   </description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>mapred.system.dir</name>
> > > > >   <value>/tmp/hadoop/mapred/system</value>
> > > > >   <description>The shared directory where MapReduce stores control
> > > files.
> > > > >   </description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>mapred.temp.dir</name>
> > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > >   <description>A shared directory for temporary files.
> > > > >   </description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>mapred.reduce.tasks</name>
> > > > >   <value>1</value>
> > > > >   <description>The default number of reduce tasks per job.
> Typically
> > > set
> > > > >   to a prime close to the number of available hosts.  Ignored when
> > > > >   mapred.job.tracker is "local".
> > > > >   </description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > >   <value>2</value>
> > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > >   <description>A shared directory for temporary files.
> > > > >   </description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>mapred.reduce.tasks</name>
> > > > >   <value>1</value>
> > > > >   <description>The default number of reduce tasks per job.
> Typically
> > > set
> > > > >   to a prime close to the number of available hosts.  Ignored when
> > > > >   mapred.job.tracker is "local".
> > > > >   </description>
> > > > > </property>
> > > > >
> > > > > <property>
> > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > >   <value>2</value>
> > > > >   <description>The maximum number of tasks that will be run
> > > > >   simultaneously by a task tracker.
> > > > >   </description>
> > > > > </property>
> > > > >
> > > > > </configuration>
> > > > >
> > > > > ------
> > > > >
> > > > > Then execute the following commands
> > > > > - initialize the HDFS
> > > > > bin/hadoop namenode -format
> > > > > - Start the namenode/datanode
> > > > > bin/hadoop-daemon.sh start namenode
> > > > > bin/hadoop-daemon.sh start datanode
> > > > > - Lets do some checking...
> > > > > bin/hadoop dfs -ls
> > > > >
> > > > > Should return 0 items!! So lets try to add a file to the DFS
> > > > >
> > > > > bin/hadoop dfs -put xyz.html xyz.html
> > > > >
> > > > > Try
> > > > >
> > > > > bin/hadoop dfs -ls
> > > > >
> > > > > You should see one item which is
> > > > > Found 1 items
> > > > > /user/root/xyz.html    21433
> > > > >
> > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > bin/hadoop-daemon.sh start tasktracker
> > > > >
> > > > > Now you can start of with inject, generate etc.. etc..
> > > > >
> > > > > Hope this time it works for you..
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > On 4/24/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > On 4/24/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > I forgot to have a look at the log files:
> > > > > > > namenode:
> > > > > > > 060424 121444 parsing
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060424 121444 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > Exception in thread "main" java.lang.RuntimeException: Not a
> > > host:port
> > > > > pair:
> > > > > > > local
> > > > > > >         at
> > > > > org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > > > > >         at
> org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > > > > >         at
> org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > > > > >
> > > > > > >
> > > > > > > datanode
> > > > > > > 060424 121448 10 parsing
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060424 121448 10 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060424 121448 10 Can't start DataNode in non-directory:
> > > > > /tmp/hadoop/dfs/data
> > > > > > >
> > > > > > > jobtracker
> > > > > > > 060424 121455 parsing
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060424 121455 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060424 121455 parsing
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060424 121456 parsing
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060424 121456 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > > mapred.job.tracker: local
> > > > > > >         at
> > > > >
> org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > >         at
> > > > > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > > > > >         at
> > > > >
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > > > > >         at
> > > > > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > > > > >
> > > > > > >
> > > > > > > tasktracker
> > > > > > > 060424 121502 parsing
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060424 121503 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > > mapred.job.tracker: local
> > > > > > >         at
> > > > >
> org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > >         at
> > > > > org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > > > > >         at
> > > > > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > > > > >
> > > > > > >
> > > > > > > What can be the problem?
> > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > Von: "Peter Swoboda" <[hidden email]>
> > > > > > > > An: [hidden email]
> > > > > > > > Betreff: Re: java.io.IOException: No input directories
> specified
> > > in
> > > > > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > > > > >
> > > > > > > > Got the latest nutch-nightly built,
> > > > > > > > including hadoop-0.1.1.jar.
> > > > > > > > Copied the content of the daoop-default.xml into
> > > hadoop-site.xml.
> > > > > > > > started namenode, datanode, jobtracker, tasktracker.
> > > > > > > > made
> > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > >
> > > > > > > > result:
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > > > > starting namenode, logging to
> > > > > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > > > > starting datanode, logging to
> > > > > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > > > > > starting jobtracker, logging to
> > > > > > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > > > > > starting tasktracker, logging to
> > > > > > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > 060424 121512 parsing
> > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060424 121512 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060424 121513 No FS indicated, using default:local
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > 060424 121543 parsing
> > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060424 121543 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060424 121544 No FS indicated, using default:local
> > > > > > > > Found 18 items
> > > > > > > > /home/../nutch-nightly/docs      <dir>
> > > > > > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > > > > > /home/../nutch-nightly/webapps   <dir>
> > > > > > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > > > > > /home/../nutch-nightly/build.xml 21433
> > > > > > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > > > > > /home/../nutch-nightly/test.log  3447
> > > > > > > > /home/../nutch-nightly/conf      <dir>
> > > > > > > > /home/../nutch-nightly/default.properties        3043
> > > > > > > > /home/../nutch-nightly/plugins   <dir>
> > > > > > > > /home/../nutch-nightly/lib       <dir>
> > > > > > > > /home/../nutch-nightly/bin       <dir>
> > > > > > > > /home/../nutch-nightly/logs      <dir>
> > > > > > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > > > > > /home/../nutch-nightly/src       <dir>
> > > > > > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > > > > > /home/../nutch-nightly/seeds     <dir>
> > > > > > > > /home/../nutch-nightly/README.txt        403
> > > > > > > >
> > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > 060424 121603 parsing
> > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060424 121603 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060424 121603 No FS indicated, using default:local
> > > > > > > > Found 2 items
> > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > >
> > > > > > > > so far so good, but:
> > > > > > > >
> > > > > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > 060424 121613 parsing
> > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060424 121613 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > 060424 121613 parsing
> > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > 060424 121613 parsing
> > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060424 121613 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > 060424 121613 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060424 121614 crawl started in: crawled
> > > > > > > > 060424 121614 rootUrlDir = 2
> > > > > > > > 060424 121614 threads = 10
> > > > > > > > 060424 121614 depth = 5
> > > > > > > > Exception in thread "main" java.io.IOException: No valid
> local
> > > > > directories
> > > > > > > > in property: mapred.local.dir
> > > > > > > >         at
> > > > > > > >
> > > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > > > > >         at
> > > > > org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > > > > bash-3.00$
> > > > > > > >
> > > > > > > > I really don't know what to do.
> > > > > > > > in hadoop-site.xml it's:
> > > > > > > > ..
> > > > > > > > <property>
> > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > >   <description>The local directory where MapReduce stores
> > > > > intermediate
> > > > > > > >   data files.  May be a space- or comma- separated list of
> > > > > > > >   directories on different devices in order to spread disk
> i/o.
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > > ..
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________
> > > > > > > > Is your hadoop-site.xml empty, I mean it doesn't consisit
> any
> > > > > > > > configuration correct? So what you need to do is add your
> > > > > > > > configuration there. I suggest you copy the hadoop-0.1.1.jar
> to
> > > > > > > > another directory for inspection, copy not move. unzip the
> > > > > > > > hadoop-0.1.1.jar file you will see hadoop-default.xml file
> > > there.
> > > > > use
> > > > > > > > that as a template to edit your hadoop-site.xml under conf.
> Once
> > > you
> > > > > > > > have edited it then you should start your 'namenode' and
> > > 'datanode'.
> > > > > I
> > > > > > > > am guessing you are using nutch in a distributed way. cos
> you
> > > don't
> > > > > > > > need to use hadoop if you are just running in one machine
> local
> > > > > mode!!
> > > > > > > >
> > > > > > > > Anyway you need to do the following to start the datanode
> and
> > > > > namenode
> > > > > > > >
> > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > >
> > > > > > > > then you need to start jobtracker and tasktracker before you
> > > start
> > > > > > > > crawling
> > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > >
> > > > > > > > then you start your bin/hadoop dfs -put seeds seeds
> > > > > > > >
> > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> wrote:
> > > > > > > > > ok. changed to latest nightly build.
> > > > > > > > > hadoop-0.1.1.jar is existing,
> > > > > > > > > hadoop-site.xml also.
> > > > > > > > > now trying
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > >
> > > > > > > > > 060421 125154 parsing
> > > > > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060421 125155 parsing
> > > > > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > e.xml
> > > > > > > > > 060421 125155 No FS indicated, using default:local
> > > > > > > > >
> > > > > > > > > and
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > >
> > > > > > > > > 060421 125217 parsing
> > > > > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060421 125217 parsing
> > > > > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > e.xml
> > > > > > > > > 060421 125217 No FS indicated, using default:local
> > > > > > > > > Found 16 items
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war
> > > 15541036
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt      
> 17709
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt      
> 615
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/default.properties
> > > > > 3043
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar
> 408375
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job
> > > 18537096
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt      
> 403
> > > > > > > > >
> > > > > > > > > also:
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > >
> > > > > > > > > 060421 133004 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060421 133004 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060421 133004 No FS indicated, using default:local
> > > > > > > > > Found 2 items
> > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > bash-3.00$
> > > > > > > > >
> > > > > > > > > but:
> > > > > > > > >
> > > > > > > > > but:
> > > > > > > > >
> > > > > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > >
> > > > > > > > > 060421 131722 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060421 131723 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > 060421 131723 parsing
> > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > 060421 131723 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060421 131723 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > 060421 131723 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060421 131723 crawl started in: crawled
> > > > > > > > > 060421 131723 rootUrlDir = 2
> > > > > > > > > 060421 131723 threads = 10
> > > > > > > > > 060421 131723 depth = 5
> > > > > > > > > 060421 131724 Injector: starting
> > > > > > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > > > > 060421 131724 Injector: Converting injected urls to crawl
> db
> > > > > entries.
> > > > > > > > > 060421 131724 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060421 131724 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > 060421 131724 parsing
> > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > 060421 131724 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060421 131724 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060421 131725 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > 060421 131725 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060421 131725 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060421 131726 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > 060421 131726 parsing
> > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > 060421 131726 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060421 131726 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060421 131726 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060421 131726 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > 060421 131727 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060421 131727 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060421 131727 parsing
> > > > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060421 131727 parsing
> > > > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > > > > 060421 131727 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060421 131727 job_6jn7j8
> > > > > > > > > java.io.IOException: No input directories specified in:
> > > > > Configuration:
> > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > > > > hadoop-site.xml
> > > > > > > > >         at
> > > > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > :90)
> > > > > > > > >         at
> > > > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > :100)
> > > > > > > > >         at
> > > > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > > > > Exception in thread "main" java.io.IOException: Job
> failed!
> > > > > > > > >         at
> > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > > >         at
> > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > > > > >         at
> org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > bash-3.00$
> > > > > > > > >
> > > > > > > > > Can anyone help?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > An: [hidden email]
> > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > specified
> > > > > in
> > > > > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > > > > >
> > > > > > > > > > Also I have noticed that you are using hadoop-0.1, there
> was
> > > a
> > > > > bug in
> > > > > > > > > > 0.1 you should be using 0.1.1. Under you lib catalog you
> > > should
> > > > > have
> > > > > > > > > > the following file
> > > > > > > > > >
> > > > > > > > > > hadoop-0.1.1.jar
> > > > > > > > > >
> > > > > > > > > > If thats the case. Please download the latest nightly
> build.
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > > > > > Do you have a file called "hadoop-site.xml" under your
> > > conf
> > > > > > > > directory?
> > > > > > > > > > > The content of the file is like the following:
> > > > > > > > > > >
> > > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > > <?xml-stylesheet type="text/xsl"
> > > href="configuration.xsl"?>
> > > > > > > > > > >
> > > > > > > > > > > <!-- Put site-specific property overrides in this
> file.
> > > -->
> > > > > > > > > > >
> > > > > > > > > > > <configuration>
> > > > > > > > > > >
> > > > > > > > > > > </configuration>
> > > > > > > > > > >
> > > > > > > > > > > or is it missing... if its missing please create a
> file
> > > under
> > > > > the
> > > > > > > > conf
> > > > > > > > > > > catalog with the name hadoop-site.xml and then try the
> > > hadoop
> > > > > dfs
> > > > > > > > -ls
> > > > > > > > > > > again?  you should see something! like listing from
> your
> > > local
> > > > > file
> > > > > > > > > > > system.
> > > > > > > > > > >
> > > > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> directories
> > > > > specified
> > > > > > > > in
> > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > > > > > >
> > > > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > 060421 122421 parsing
> > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > >
> > > > > > > > > > > I think the hadoop-site is missing cos we should be
> seeing
> > > a
> > > > > message
> > > > > > > > > > > like this here...
> > > > > > > > > > >
> > > > > > > > > > > 060421 131014 parsing
> > > > > > > > > > >
> > > > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > > > > >
> > > > > > > > > > > > 060421 122421 No FS indicated, using default:local
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > >
> > > > > > > > > > > > 060421 122425 parsing
> > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > >
> > > > > > > > > > > > 060421 122426 No FS indicated, using default:local
> > > > > > > > > > > >
> > > > > > > > > > > > Found 0 items
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > >
> > > > > > > > > > > > As you can see, i can't.
> > > > > > > > > > > > What's going wrong?
> > > > > > > > > > > >
> > > > > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you see your text file with URLS?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Furthermore bin/nutch crawl is a one shot
> crawl/index
> > > > > command. I
> > > > > > > > > > > > > strongly recommend you take the long route of
> > > > > > > > > > > > >
> > > > > > > > > > > > > inject, generate, fetch, updatedb, invertlinks,
> index,
> > > > > dedup and
> > > > > > > > > > > > > merge.  You can try the above commands just by
> typing
> > > > > > > > > > > > > bin/nutch inject
> > > > > > > > > > > > > etc..
> > > > > > > > > > > > > If just try the inject command without any
> parameters
> > > it
> > > > > will
> > > > > > > > tell
> > > > > > > > > > you
> > > > > > > > > > > > > how to use it..
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hope this helps.
> > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> <[hidden email]>
> > > > > wrote:
> > > > > > > > > > > > > > hi
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > > > > > done the following steps:
> > > > > > > > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 060317 121440 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060317 121441 No FS indicated, using
> default:local
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >&
> > > crawl.log
> > > > > > > > > > > > > > but in crawl.log:
> > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > >
> /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > java.io.IOException: No input directories
> specified
> > > in:
> > > > > > > > > > Configuration:
> > > > > > > > > > > > > > defaults: hadoop-default.xml ,
> mapred-default.xml ,
> > > > > > > > > > > > > >
> > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > > >     at
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > :84)
> > > > > > > > > > > > > >     at
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > :94)
> > > > > > > > > > > > > >     at
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > > > > > > Exception in thread "main" java.io.IOException:
> Job
> > > > > failed!
> > > > > > > > > > > > > >     at
> > > > > > > > > >
> > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > > > > > > >     at
> > > > > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > > > > > >     at
> > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Any ideas?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > > > "Feel free" mit GMX DSL!
> http://www.gmx.net/de/go/dsl
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > >
> > > > > > > > --
> > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > --
> > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > >
> > >
> >
> > --
> > GMX Produkte empfehlen und ganz einfach Geld verdienen!
> > Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
> >
>

--


Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
good. as you can see all your data will be saved under

/user/swoboda/

And urls is the directory where you have your urls.txt file.

so the inject statement you should have is the following:

bin/nutch inject crawldb urls

so try the above first then try

hadoop dfs -ls you will see crawldb directory.

Cheers

On 4/26/06, Peter Swoboda <[hidden email]> wrote:

> Hi.
> Of course i can. here you are:
>
>
> > --- Ursprüngliche Nachricht ---
> > Von: "Zaheed Haque" <[hidden email]>
> > An: [hidden email]
> > Betreff: Re: java.io.IOException: No input directories specified in
> > Datum: Tue, 25 Apr 2006 12:00:53 +0200
> >
> > Hi Could you please post the results for the following commands
> > bin/hadoop dfs -ls
>
> bash-3.00$ bin/hadoop dfs -ls
> 060426 085559 parsing
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060426 085559 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> 060426 085559 No FS indicated, using default:localhost.localdomain:50000
> 060426 085559 Client connection to 127.0.0.1:50000: starting
> Found 1 items
> /user/swoboda/urls      <dir>
> bash-3.00$
>
>
> >
> > and
> >
> > bin/nutch inject crawldb crawled(your urls directory in hadoop)
> >
>
> bash-3.00$ bin/nutch inject crawldb crawled urls
> 060426 085723 Injector: starting
> 060426 085723 Injector: crawlDb: crawldb
> 060426 085723 Injector: urlDir: crawled
> 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> op-0.1.1.jar!/hadoop-default.xml
> 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d efault.xml
> 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-s ite.xml
> 060426 085724 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> 060426 085724 Injector: Converting injected urls to crawl db entries.
> 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> op-0.1.1.jar!/hadoop-default.xml
> 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d efault.xml
> 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> op-0.1.1.jar!/mapred-default.xml
> 060426 085725 parsing file:/home/../nutch-nightly/conf/nutch-s ite.xml
> 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> 060426 085725 Client connection to 127.0.0.1:50020: starting
> 060426 085725 Client connection to 127.0.0.1:50000: starting
> 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> op-0.1.1.jar!/hadoop-default.xml
> 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> 060426 085730 Running job: job_o6tvpr
> 060426 085731  map 100%  reduce 100%
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
>         at org.apache.nutch.crawl.Injector.main(Injector.java:138)
> bash-3.00$
>
>
> > thanks
> >
> > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > Sorry, my mistake. changed to 0.1.1
> > > results:
> > >
> > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > 060425 113831 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060425 113831 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > > 060425 113832 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > 060425 113832 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060425 113832 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > > 060425 113832 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060425 113832 Client connection to 127.0.0.1:50000: starting
> > > 060425 113832 crawl started in: crawled
> > > 060425 113832 rootUrlDir = 2
> > > 060425 113832 threads = 10
> > > 060425 113832 depth = 5
> > > 060425 113833 Injector: starting
> > > 060425 113833 Injector: crawlDb: crawled/crawldb
> > > 060425 113833 Injector: urlDir: 2
> > > 060425 113833 Injector: Converting injected urls to crawl db entries.
> > > 060425 113833 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060425 113833 parsing file:/home/../nutch-nightly/conf/nutch-default.xml
> > > 060425 113833 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > 060425 113833 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060425 113833 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060425 113833 parsing file:/home/../nutch-nightly/conf/nutch-site.xml
> > > 060425 113833 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060425 113834 Client connection to 127.0.0.1:50020: starting
> > > 060425 113834 Client connection to 127.0.0.1:50000: starting
> > > 060425 113834 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060425 113834 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060425 113838 Running job: job_23a6ra
> > > Exception in thread "main" java.io.IOException: Job failed!
> > >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > bash-3.00$
> > >
> > >
> > > Step by Step, same but another job that failed.
> > >
> > > > --- Ursprüngliche Nachricht ---
> > > > Von: "Zaheed Haque" <[hidden email]>
> > > > An: [hidden email]
> > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > Datum: Tue, 25 Apr 2006 11:34:10 +0200
> > > >
> > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > >
> > > > Somehow you are still using hadoop-0.1 and not 0.1.1. I am not sure if
> > > > this update will solve your problem but it might. With the config I
> > > > sent you, I could, crawl-index-serach so there must be something
> > > > else.. I am not sure.
> > > >
> > > > Cheers
> > > > Zaheed
> > > >
> > > > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > > > Seems to be a bit better, doesn't it?
> > > > >
> > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > 060425 110124 parsing
> > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > 060425 110124 parsing
> > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > 060425 110124 parsing
> > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > 060425 110124 parsing
> > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > 060425 110125 parsing
> > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > 060425 110125 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060425 110125 Client connection to 127.0.0.1:50000: starting
> > > > > 060425 110125 crawl started in: crawled
> > > > > 060425 110125 rootUrlDir = 2
> > > > > 060425 110125 threads = 10
> > > > > 060425 110125 depth = 5
> > > > > 060425 110126 Injector: starting
> > > > > 060425 110126 Injector: crawlDb: crawled/crawldb
> > > > > 060425 110126 Injector: urlDir: 2
> > > > > 060425 110126 Injector: Converting injected urls to crawl db
> > entries.
> > > > > 060425 110126 parsing
> > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > 060425 110126 parsing
> > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > 060425 110126 parsing
> > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > 060425 110126 parsing
> > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > 060425 110126 parsing
> > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > 060425 110126 parsing
> > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > 060425 110127 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060425 110127 Client connection to 127.0.0.1:50020: starting
> > > > > 060425 110127 Client connection to 127.0.0.1:50000: starting
> > > > > 060425 110127 parsing
> > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > 060425 110127 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > Exception in thread "main"
> > > > java.lang.reflect.UndeclaredThrowableException
> > > > >         at org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown
> > > > Source)
> > > > >         at
> > > > >
> > > >
> > org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
> > > > >         at
> > > > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
> > > > >         at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
> > > > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > Caused by: java.io.IOException: timed out waiting for response
> > > > >         at org.apache.hadoop.ipc.Client.call(Client.java:303)
> > > > >         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> > > > >         ... 6 more
> > > > >
> > > > >
> > > > > local ip is the same,
> > > > > but don't exactly know how to handle the ports.
> > > > >
> > > > > Step by Step (generate, index..) caused same error while
> > > > >  bin/nutch generate crawl/crawldb crawl/segments
> > > > >
> > > > > > --- Ursprüngliche Nachricht ---
> > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > An: [hidden email]
> > > > > > Betreff: Re: java.io.IOException: No input directories specified
> > in
> > > > > > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> > > > > >
> > > > > > Try the following in your hadoop-site.xml.. please change and
> > adjust
> > > > > > based on your ip address. The following configuration assumes that
> > the
> > > > > > you have 1 server and you are using it as a namenode as well as a
> > > > > > datanode. Note this is NOT the reason for running Hadoopified
> > Nutch!
> > > > > > It is rather for testing....
> > > > > >
> > > > > > --------------------
> > > > > >
> > > > > > <?xml version="1.0"?>
> > > > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > > >
> > > > > > <configuration>
> > > > > >
> > > > > > <!-- file system properties -->
> > > > > >
> > > > > > <property>
> > > > > >   <name>fs.default.name</name>
> > > > > >   <value>127.0.0.1:50000</value>
> > > > > >   <description>The name of the default file system.  Either the
> > > > > >   literal string "local" or a host:port for DFS.</description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>dfs.datanode.port</name>
> > > > > >   <value>50010</value>
> > > > > >   <description>The port number that the dfs datanode server uses
> > as a
> > > > > > starting
> > > > > >                point to look for a free port to listen on.
> > > > > > </description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>dfs.name.dir</name>
> > > > > >   <value>/tmp/hadoop/dfs/name</value>
> > > > > >   <description>Determines where on the local filesystem the DFS
> > name
> > > > node
> > > > > >       should store the name table.</description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>dfs.data.dir</name>
> > > > > >   <value>/tmp/hadoop/dfs/data</value>
> > > > > >   <description>Determines where on the local filesystem an DFS
> > data
> > > > node
> > > > > >   should store its blocks.  If this is a comma- or space-delimited
> > > > > >   list of directories, then data will be stored in all named
> > > > > >   directories, typically on different devices.</description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>dfs.replication</name>
> > > > > >   <value>1</value>
> > > > > >   <description>How many copies we try to have at all times. The
> > actual
> > > > > >   number of replications is at max the number of datanodes in the
> > > > > >   cluster.</description>
> > > > > > </property>
> > > > > > <!-- map/reduce properties -->
> > > > > >
> > > > > > <property>
> > > > > >   <name>mapred.job.tracker</name>
> > > > > >   <value>127.0.0.1:50020</value>
> > > > > >   <description>The host and port that the MapReduce job tracker
> > runs
> > > > > >   at.  If "local", then jobs are run in-process as a single map
> > > > > >   and reduce task.
> > > > > >   </description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>mapred.job.tracker.info.port</name>
> > > > > >   <value>50030</value>
> > > > > >   <description>The port that the MapReduce job tracker info
> > webserver
> > > > runs
> > > > > > at.
> > > > > >   </description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>mapred.task.tracker.output.port</name>
> > > > > >   <value>50040</value>
> > > > > >   <description>The port number that the MapReduce task tracker
> > output
> > > > > > server uses as a starting point to look for
> > > > > > a free port to listen on.
> > > > > >   </description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>mapred.task.tracker.report.port</name>
> > > > > >   <value>50050</value>
> > > > > >   <description>The port number that the MapReduce task tracker
> > report
> > > > > > server uses as a starting
> > > > > >                point to look for a free port to listen on.
> > > > > >   </description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>mapred.local.dir</name>
> > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > >   <description>The local directory where MapReduce stores
> > intermediate
> > > > > >   data files.  May be a space- or comma- separated list of
> > > > > >   directories on different devices in order to spread disk i/o.
> > > > > >   </description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>mapred.system.dir</name>
> > > > > >   <value>/tmp/hadoop/mapred/system</value>
> > > > > >   <description>The shared directory where MapReduce stores control
> > > > files.
> > > > > >   </description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>mapred.temp.dir</name>
> > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > >   <description>A shared directory for temporary files.
> > > > > >   </description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>mapred.reduce.tasks</name>
> > > > > >   <value>1</value>
> > > > > >   <description>The default number of reduce tasks per job.
> > Typically
> > > > set
> > > > > >   to a prime close to the number of available hosts.  Ignored when
> > > > > >   mapred.job.tracker is "local".
> > > > > >   </description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > >   <value>2</value>
> > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > >   <description>A shared directory for temporary files.
> > > > > >   </description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>mapred.reduce.tasks</name>
> > > > > >   <value>1</value>
> > > > > >   <description>The default number of reduce tasks per job.
> > Typically
> > > > set
> > > > > >   to a prime close to the number of available hosts.  Ignored when
> > > > > >   mapred.job.tracker is "local".
> > > > > >   </description>
> > > > > > </property>
> > > > > >
> > > > > > <property>
> > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > >   <value>2</value>
> > > > > >   <description>The maximum number of tasks that will be run
> > > > > >   simultaneously by a task tracker.
> > > > > >   </description>
> > > > > > </property>
> > > > > >
> > > > > > </configuration>
> > > > > >
> > > > > > ------
> > > > > >
> > > > > > Then execute the following commands
> > > > > > - initialize the HDFS
> > > > > > bin/hadoop namenode -format
> > > > > > - Start the namenode/datanode
> > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > - Lets do some checking...
> > > > > > bin/hadoop dfs -ls
> > > > > >
> > > > > > Should return 0 items!! So lets try to add a file to the DFS
> > > > > >
> > > > > > bin/hadoop dfs -put xyz.html xyz.html
> > > > > >
> > > > > > Try
> > > > > >
> > > > > > bin/hadoop dfs -ls
> > > > > >
> > > > > > You should see one item which is
> > > > > > Found 1 items
> > > > > > /user/root/xyz.html    21433
> > > > > >
> > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > >
> > > > > > Now you can start of with inject, generate etc.. etc..
> > > > > >
> > > > > > Hope this time it works for you..
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > >
> > > > > > On 4/24/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > On 4/24/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > > I forgot to have a look at the log files:
> > > > > > > > namenode:
> > > > > > > > 060424 121444 parsing
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060424 121444 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > Exception in thread "main" java.lang.RuntimeException: Not a
> > > > host:port
> > > > > > pair:
> > > > > > > > local
> > > > > > > >         at
> > > > > > org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > > > > > >         at
> > org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > > > > > >         at
> > org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > > > > > >
> > > > > > > >
> > > > > > > > datanode
> > > > > > > > 060424 121448 10 parsing
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060424 121448 10 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060424 121448 10 Can't start DataNode in non-directory:
> > > > > > /tmp/hadoop/dfs/data
> > > > > > > >
> > > > > > > > jobtracker
> > > > > > > > 060424 121455 parsing
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060424 121455 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060424 121455 parsing
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060424 121456 parsing
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060424 121456 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > > > mapred.job.tracker: local
> > > > > > > >         at
> > > > > >
> > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > >         at
> > > > > > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > > > > > >         at
> > > > > >
> > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > > > > > >         at
> > > > > > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > > > > > >
> > > > > > > >
> > > > > > > > tasktracker
> > > > > > > > 060424 121502 parsing
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060424 121503 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > > > mapred.job.tracker: local
> > > > > > > >         at
> > > > > >
> > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > >         at
> > > > > > org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > > > > > >         at
> > > > > > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > > > > > >
> > > > > > > >
> > > > > > > > What can be the problem?
> > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > Von: "Peter Swoboda" <[hidden email]>
> > > > > > > > > An: [hidden email]
> > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > specified
> > > > in
> > > > > > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > > > > > >
> > > > > > > > > Got the latest nutch-nightly built,
> > > > > > > > > including hadoop-0.1.1.jar.
> > > > > > > > > Copied the content of the daoop-default.xml into
> > > > hadoop-site.xml.
> > > > > > > > > started namenode, datanode, jobtracker, tasktracker.
> > > > > > > > > made
> > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > >
> > > > > > > > > result:
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > > > > > starting namenode, logging to
> > > > > > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > > > > > starting datanode, logging to
> > > > > > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > starting jobtracker, logging to
> > > > > > > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > starting tasktracker, logging to
> > > > > > > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > 060424 121512 parsing
> > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060424 121512 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060424 121513 No FS indicated, using default:local
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > 060424 121543 parsing
> > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060424 121543 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060424 121544 No FS indicated, using default:local
> > > > > > > > > Found 18 items
> > > > > > > > > /home/../nutch-nightly/docs      <dir>
> > > > > > > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > > > > > > /home/../nutch-nightly/webapps   <dir>
> > > > > > > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > > > > > > /home/../nutch-nightly/build.xml 21433
> > > > > > > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > > > > > > /home/../nutch-nightly/test.log  3447
> > > > > > > > > /home/../nutch-nightly/conf      <dir>
> > > > > > > > > /home/../nutch-nightly/default.properties        3043
> > > > > > > > > /home/../nutch-nightly/plugins   <dir>
> > > > > > > > > /home/../nutch-nightly/lib       <dir>
> > > > > > > > > /home/../nutch-nightly/bin       <dir>
> > > > > > > > > /home/../nutch-nightly/logs      <dir>
> > > > > > > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > > > > > > /home/../nutch-nightly/src       <dir>
> > > > > > > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > > > > > > /home/../nutch-nightly/seeds     <dir>
> > > > > > > > > /home/../nutch-nightly/README.txt        403
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > 060424 121603 parsing
> > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060424 121603 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060424 121603 No FS indicated, using default:local
> > > > > > > > > Found 2 items
> > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > >
> > > > > > > > > so far so good, but:
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > > 060424 121613 parsing
> > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060424 121613 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > 060424 121613 parsing
> > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > 060424 121613 parsing
> > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060424 121613 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > 060424 121613 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060424 121614 crawl started in: crawled
> > > > > > > > > 060424 121614 rootUrlDir = 2
> > > > > > > > > 060424 121614 threads = 10
> > > > > > > > > 060424 121614 depth = 5
> > > > > > > > > Exception in thread "main" java.io.IOException: No valid
> > local
> > > > > > directories
> > > > > > > > > in property: mapred.local.dir
> > > > > > > > >         at
> > > > > > > > >
> > > > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > > > > > >         at
> > > > > > org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > > > > > bash-3.00$
> > > > > > > > >
> > > > > > > > > I really don't know what to do.
> > > > > > > > > in hadoop-site.xml it's:
> > > > > > > > > ..
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > > >   <description>The local directory where MapReduce stores
> > > > > > intermediate
> > > > > > > > >   data files.  May be a space- or comma- separated list of
> > > > > > > > >   directories on different devices in order to spread disk
> > i/o.
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > > ..
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > _______________________________________
> > > > > > > > > Is your hadoop-site.xml empty, I mean it doesn't consisit
> > any
> > > > > > > > > configuration correct? So what you need to do is add your
> > > > > > > > > configuration there. I suggest you copy the hadoop-0.1.1.jar
> > to
> > > > > > > > > another directory for inspection, copy not move. unzip the
> > > > > > > > > hadoop-0.1.1.jar file you will see hadoop-default.xml file
> > > > there.
> > > > > > use
> > > > > > > > > that as a template to edit your hadoop-site.xml under conf.
> > Once
> > > > you
> > > > > > > > > have edited it then you should start your 'namenode' and
> > > > 'datanode'.
> > > > > > I
> > > > > > > > > am guessing you are using nutch in a distributed way. cos
> > you
> > > > don't
> > > > > > > > > need to use hadoop if you are just running in one machine
> > local
> > > > > > mode!!
> > > > > > > > >
> > > > > > > > > Anyway you need to do the following to start the datanode
> > and
> > > > > > namenode
> > > > > > > > >
> > > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > >
> > > > > > > > > then you need to start jobtracker and tasktracker before you
> > > > start
> > > > > > > > > crawling
> > > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > > >
> > > > > > > > > then you start your bin/hadoop dfs -put seeds seeds
> > > > > > > > >
> > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> > wrote:
> > > > > > > > > > ok. changed to latest nightly build.
> > > > > > > > > > hadoop-0.1.1.jar is existing,
> > > > > > > > > > hadoop-site.xml also.
> > > > > > > > > > now trying
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > >
> > > > > > > > > > 060421 125154 parsing
> > > > > > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060421 125155 parsing
> > > > > > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > e.xml
> > > > > > > > > > 060421 125155 No FS indicated, using default:local
> > > > > > > > > >
> > > > > > > > > > and
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > >
> > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > e.xml
> > > > > > > > > > 060421 125217 No FS indicated, using default:local
> > > > > > > > > > Found 16 items
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war
> > > > 15541036
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt
> > 17709
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt
> > 615
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/default.properties
> > > > > > 3043
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar
> > 408375
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job
> > > > 18537096
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt
> > 403
> > > > > > > > > >
> > > > > > > > > > also:
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > >
> > > > > > > > > > 060421 133004 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060421 133004 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060421 133004 No FS indicated, using default:local
> > > > > > > > > > Found 2 items
> > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > bash-3.00$
> > > > > > > > > >
> > > > > > > > > > but:
> > > > > > > > > >
> > > > > > > > > > but:
> > > > > > > > > >
> > > > > > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > > >
> > > > > > > > > > 060421 131722 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060421 131723 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > 060421 131723 parsing
> > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > 060421 131723 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > 060421 131723 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > 060421 131723 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060421 131723 crawl started in: crawled
> > > > > > > > > > 060421 131723 rootUrlDir = 2
> > > > > > > > > > 060421 131723 threads = 10
> > > > > > > > > > 060421 131723 depth = 5
> > > > > > > > > > 060421 131724 Injector: starting
> > > > > > > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > > > > > 060421 131724 Injector: Converting injected urls to crawl
> > db
> > > > > > entries.
> > > > > > > > > > 060421 131724 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060421 131724 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > 060421 131724 parsing
> > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > 060421 131724 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > 060421 131724 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > 060421 131725 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > 060421 131725 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060421 131725 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060421 131726 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > 060421 131726 parsing
> > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > 060421 131726 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > 060421 131726 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > 060421 131726 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > 060421 131726 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > 060421 131727 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060421 131727 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060421 131727 parsing
> > > > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > 060421 131727 parsing
> > > > > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > > > > > 060421 131727 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060421 131727 job_6jn7j8
> > > > > > > > > > java.io.IOException: No input directories specified in:
> > > > > > Configuration:
> > > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > > > > > hadoop-site.xml
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > :90)
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > :100)
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > > > > > Exception in thread "main" java.io.IOException: Job
> > failed!
> > > > > > > > > >         at
> > > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > > > >         at
> > > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > > > > > >         at
> > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > bash-3.00$
> > > > > > > > > >
> > > > > > > > > > Can anyone help?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > > specified
> > > > > > in
> > > > > > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > > > > > >
> > > > > > > > > > > Also I have noticed that you are using hadoop-0.1, there
> > was
> > > > a
> > > > > > bug in
> > > > > > > > > > > 0.1 you should be using 0.1.1. Under you lib catalog you
> > > > should
> > > > > > have
> > > > > > > > > > > the following file
> > > > > > > > > > >
> > > > > > > > > > > hadoop-0.1.1.jar
> > > > > > > > > > >
> > > > > > > > > > > If thats the case. Please download the latest nightly
> > build.
> > > > > > > > > > >
> > > > > > > > > > > Cheers
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On 4/21/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > > > > > > Do you have a file called "hadoop-site.xml" under your
> > > > conf
> > > > > > > > > directory?
> > > > > > > > > > > > The content of the file is like the following:
> > > > > > > > > > > >
> > > > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > > > <?xml-stylesheet type="text/xsl"
> > > > href="configuration.xsl"?>
> > > > > > > > > > > >
> > > > > > > > > > > > <!-- Put site-specific property overrides in this
> > file.
> > > > -->
> > > > > > > > > > > >
> > > > > > > > > > > > <configuration>
> > > > > > > > > > > >
> > > > > > > > > > > > </configuration>
> > > > > > > > > > > >
> > > > > > > > > > > > or is it missing... if its missing please create a
> > file
> > > > under
> > > > > > the
> > > > > > > > > conf
> > > > > > > > > > > > catalog with the name hadoop-site.xml and then try the
> > > > hadoop
> > > > > > dfs
> > > > > > > > > -ls
> > > > > > > > > > > > again?  you should see something! like listing from
> > your
> > > > local
> > > > > > file
> > > > > > > > > > > > system.
> > > > > > > > > > > >
> > > > > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> > directories
> > > > > > specified
> > > > > > > > > in
> > > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > 060421 122421 parsing
> > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > >
> > > > > > > > > > > > I think the hadoop-site is missing cos we should be
> > seeing
> > > > a
> > > > > > message
> > > > > > > > > > > > like this here...
> > > > > > > > > > > >
> > > > > > > > > > > > 060421 131014 parsing
> > > > > > > > > > > >
> > > > > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > > > > > >
> > > > > > > > > > > > > 060421 122421 No FS indicated, using default:local
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > >
> > > > > > > > > > > > > 060421 122425 parsing
> > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > >
> > > > > > > > > > > > > 060421 122426 No FS indicated, using default:local
> > > > > > > > > > > > >
> > > > > > > > > > > > > Found 0 items
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > >
> > > > > > > > > > > > > As you can see, i can't.
> > > > > > > > > > > > > What's going wrong?
> > > > > > > > > > > > >
> > > > > > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you see your text file with URLS?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Furthermore bin/nutch crawl is a one shot
> > crawl/index
> > > > > > command. I
> > > > > > > > > > > > > > strongly recommend you take the long route of
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > inject, generate, fetch, updatedb, invertlinks,
> > index,
> > > > > > dedup and
> > > > > > > > > > > > > > merge.  You can try the above commands just by
> > typing
> > > > > > > > > > > > > > bin/nutch inject
> > > > > > > > > > > > > > etc..
> > > > > > > > > > > > > > If just try the inject command without any
> > parameters
> > > > it
> > > > > > will
> > > > > > > > > tell
> > > > > > > > > > > you
> > > > > > > > > > > > > > how to use it..
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hope this helps.
> > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > <[hidden email]>
> > > > > > wrote:
> > > > > > > > > > > > > > > hi
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > > > > > > done the following steps:
> > > > > > > > > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 060317 121440 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060317 121441 No FS indicated, using
> > default:local
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >&
> > > > crawl.log
> > > > > > > > > > > > > > > but in crawl.log:
> > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > >
> > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > java.io.IOException: No input directories
> > specified
> > > > in:
> > > > > > > > > > > Configuration:
> > > > > > > > > > > > > > > defaults: hadoop-default.xml ,
> > mapred-default.xml ,
> > > > > > > > > > > > > > >
> > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > :84)
> > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > :94)
> > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > > > > > > > Exception in thread "main" java.io.IOException:
> > Job
> > > > > > failed!
> > > > > > > > > > > > > > >     at
> > > > > > > > > > >
> > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > > > > > > > >     at
> > > > > > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > > > > > > >     at
> > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Any ideas?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > > > > "Feel free" mit GMX DSL!
> > http://www.gmx.net/de/go/dsl
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > >
> > > >
> > >
> > > --
> > > GMX Produkte empfehlen und ganz einfach Geld verdienen!
> > > Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
> > >
> >
>
> --
>
>
> Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
> Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer
>
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
> --- Ursprüngliche Nachricht ---
> Von: "Zaheed Haque" <[hidden email]>
> An: [hidden email]
> Betreff: Re: java.io.IOException: No input directories specified in
> Datum: Wed, 26 Apr 2006 09:12:47 +0200
>
> good. as you can see all your data will be saved under
>
> /user/swoboda/
>
> And urls is the directory where you have your urls.txt file.
>
> so the inject statement you should have is the following:
>
> bin/nutch inject crawldb urls

result:
bash-3.00$ bin/nutch inject crawldb urls
060426 091859 Injector: starting
060426 091859 Injector: crawlDb: crawldb
060426 091859 Injector: urlDir: urls
060426 091900 parsing
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 091900 parsing
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
060426 091901 parsing
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
060426 091901 parsing
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
060426 091901 Injector: Converting injected urls to crawl db entries.
060426 091901 parsing
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 091901 parsing
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
060426 091901 parsing
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
060426 091901 parsing
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
060426 091901 parsing
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
060426 091901 Client connection to 127.0.0.1:50020: starting
060426 091902 Client connection to 127.0.0.1:50000: starting
060426 091902 parsing
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 091902 parsing
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
060426 091907 Running job: job_b59xmu
060426 091908  map 100%  reduce 100%
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
        at org.apache.nutch.crawl.Injector.main(Injector.java:138)
bash-3.00$

>
> so try the above first then try
>
> hadoop dfs -ls you will see crawldb directory.
>

bash-3.00$ bin/hadoop dfs -ls
060426 091842 parsing
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 091843 parsing
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
060426 091843 Client connection to 127.0.0.1:50000: starting
060426 091843 No FS indicated, using default:localhost.localdomain:50000
Found 1 items
/user/swoboda/urls      <dir>
bash-3.00$

 

> Cheers
>
> On 4/26/06, Peter Swoboda <[hidden email]> wrote:
> > Hi.
> > Of course i can. here you are:
> >
> >
> > > --- Ursprüngliche Nachricht ---
> > > Von: "Zaheed Haque" <[hidden email]>
> > > An: [hidden email]
> > > Betreff: Re: java.io.IOException: No input directories specified in
> > > Datum: Tue, 25 Apr 2006 12:00:53 +0200
> > >
> > > Hi Could you please post the results for the following commands
> > > bin/hadoop dfs -ls
> >
> > bash-3.00$ bin/hadoop dfs -ls
> > 060426 085559 parsing
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060426 085559 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > 060426 085559 No FS indicated, using default:localhost.localdomain:50000
> > 060426 085559 Client connection to 127.0.0.1:50000: starting
> > Found 1 items
> > /user/swoboda/urls      <dir>
> > bash-3.00$
> >
> >
> > >
> > > and
> > >
> > > bin/nutch inject crawldb crawled(your urls directory in hadoop)
> > >
> >
> > bash-3.00$ bin/nutch inject crawldb crawled urls
> > 060426 085723 Injector: starting
> > 060426 085723 Injector: crawlDb: crawldb
> > 060426 085723 Injector: urlDir: crawled
> > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > op-0.1.1.jar!/hadoop-default.xml
> > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> efault.xml
> > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-s ite.xml
> > 060426 085724 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> > 060426 085724 Injector: Converting injected urls to crawl db entries.
> > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > op-0.1.1.jar!/hadoop-default.xml
> > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> efault.xml
> > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > op-0.1.1.jar!/mapred-default.xml
> > 060426 085725 parsing file:/home/../nutch-nightly/conf/nutch-s ite.xml
> > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> > 060426 085725 Client connection to 127.0.0.1:50020: starting
> > 060426 085725 Client connection to 127.0.0.1:50000: starting
> > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > op-0.1.1.jar!/hadoop-default.xml
> > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> > 060426 085730 Running job: job_o6tvpr
> > 060426 085731  map 100%  reduce 100%
> > Exception in thread "main" java.io.IOException: Job failed!
> >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> >         at org.apache.nutch.crawl.Injector.main(Injector.java:138)
> > bash-3.00$
> >
> >
> > > thanks
> > >
> > > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > > Sorry, my mistake. changed to 0.1.1
> > > > results:
> > > >
> > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > 060425 113831 parsing
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060425 113831 parsing
> file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > 060425 113832 parsing
> file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > 060425 113832 parsing
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060425 113832 parsing
> file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > 060425 113832 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060425 113832 Client connection to 127.0.0.1:50000: starting
> > > > 060425 113832 crawl started in: crawled
> > > > 060425 113832 rootUrlDir = 2
> > > > 060425 113832 threads = 10
> > > > 060425 113832 depth = 5
> > > > 060425 113833 Injector: starting
> > > > 060425 113833 Injector: crawlDb: crawled/crawldb
> > > > 060425 113833 Injector: urlDir: 2
> > > > 060425 113833 Injector: Converting injected urls to crawl db
> entries.
> > > > 060425 113833 parsing
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060425 113833 parsing
> file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > 060425 113833 parsing
> file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > 060425 113833 parsing
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060425 113833 parsing
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060425 113833 parsing
> file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > 060425 113833 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060425 113834 Client connection to 127.0.0.1:50020: starting
> > > > 060425 113834 Client connection to 127.0.0.1:50000: starting
> > > > 060425 113834 parsing
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060425 113834 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060425 113838 Running job: job_23a6ra
> > > > Exception in thread "main" java.io.IOException: Job failed!
> > > >         at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > bash-3.00$
> > > >
> > > >
> > > > Step by Step, same but another job that failed.
> > > >
> > > > > --- Ursprüngliche Nachricht ---
> > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > An: [hidden email]
> > > > > Betreff: Re: java.io.IOException: No input directories specified
> in
> > > > > Datum: Tue, 25 Apr 2006 11:34:10 +0200
> > > > >
> > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > >
> > > > > Somehow you are still using hadoop-0.1 and not 0.1.1. I am not
> sure if
> > > > > this update will solve your problem but it might. With the config
> I
> > > > > sent you, I could, crawl-index-serach so there must be something
> > > > > else.. I am not sure.
> > > > >
> > > > > Cheers
> > > > > Zaheed
> > > > >
> > > > > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > Seems to be a bit better, doesn't it?
> > > > > >
> > > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > > 060425 110124 parsing
> > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > 060425 110124 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > 060425 110124 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > 060425 110124 parsing
> > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > 060425 110125 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > 060425 110125 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060425 110125 Client connection to 127.0.0.1:50000: starting
> > > > > > 060425 110125 crawl started in: crawled
> > > > > > 060425 110125 rootUrlDir = 2
> > > > > > 060425 110125 threads = 10
> > > > > > 060425 110125 depth = 5
> > > > > > 060425 110126 Injector: starting
> > > > > > 060425 110126 Injector: crawlDb: crawled/crawldb
> > > > > > 060425 110126 Injector: urlDir: 2
> > > > > > 060425 110126 Injector: Converting injected urls to crawl db
> > > entries.
> > > > > > 060425 110126 parsing
> > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > 060425 110126 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > 060425 110126 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > 060425 110126 parsing
> > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > 060425 110126 parsing
> > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > 060425 110126 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > 060425 110127 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060425 110127 Client connection to 127.0.0.1:50020: starting
> > > > > > 060425 110127 Client connection to 127.0.0.1:50000: starting
> > > > > > 060425 110127 parsing
> > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > 060425 110127 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > Exception in thread "main"
> > > > > java.lang.reflect.UndeclaredThrowableException
> > > > > >         at
> org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown
> > > > > Source)
> > > > > >         at
> > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
> > > > > >         at
> > > > > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
> > > > > >         at
> > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
> > > > > >         at
> org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > Caused by: java.io.IOException: timed out waiting for response
> > > > > >         at org.apache.hadoop.ipc.Client.call(Client.java:303)
> > > > > >         at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> > > > > >         ... 6 more
> > > > > >
> > > > > >
> > > > > > local ip is the same,
> > > > > > but don't exactly know how to handle the ports.
> > > > > >
> > > > > > Step by Step (generate, index..) caused same error while
> > > > > >  bin/nutch generate crawl/crawldb crawl/segments
> > > > > >
> > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > An: [hidden email]
> > > > > > > Betreff: Re: java.io.IOException: No input directories
> specified
> > > in
> > > > > > > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> > > > > > >
> > > > > > > Try the following in your hadoop-site.xml.. please change and
> > > adjust
> > > > > > > based on your ip address. The following configuration assumes
> that
> > > the
> > > > > > > you have 1 server and you are using it as a namenode as well
> as a
> > > > > > > datanode. Note this is NOT the reason for running Hadoopified
> > > Nutch!
> > > > > > > It is rather for testing....
> > > > > > >
> > > > > > > --------------------
> > > > > > >
> > > > > > > <?xml version="1.0"?>
> > > > > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > > > >
> > > > > > > <configuration>
> > > > > > >
> > > > > > > <!-- file system properties -->
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>fs.default.name</name>
> > > > > > >   <value>127.0.0.1:50000</value>
> > > > > > >   <description>The name of the default file system.  Either
> the
> > > > > > >   literal string "local" or a host:port for DFS.</description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>dfs.datanode.port</name>
> > > > > > >   <value>50010</value>
> > > > > > >   <description>The port number that the dfs datanode server
> uses
> > > as a
> > > > > > > starting
> > > > > > >                point to look for a free port to listen on.
> > > > > > > </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>dfs.name.dir</name>
> > > > > > >   <value>/tmp/hadoop/dfs/name</value>
> > > > > > >   <description>Determines where on the local filesystem the
> DFS
> > > name
> > > > > node
> > > > > > >       should store the name table.</description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>dfs.data.dir</name>
> > > > > > >   <value>/tmp/hadoop/dfs/data</value>
> > > > > > >   <description>Determines where on the local filesystem an DFS
> > > data
> > > > > node
> > > > > > >   should store its blocks.  If this is a comma- or
> space-delimited
> > > > > > >   list of directories, then data will be stored in all named
> > > > > > >   directories, typically on different devices.</description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>dfs.replication</name>
> > > > > > >   <value>1</value>
> > > > > > >   <description>How many copies we try to have at all times.
> The
> > > actual
> > > > > > >   number of replications is at max the number of datanodes in
> the
> > > > > > >   cluster.</description>
> > > > > > > </property>
> > > > > > > <!-- map/reduce properties -->
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>mapred.job.tracker</name>
> > > > > > >   <value>127.0.0.1:50020</value>
> > > > > > >   <description>The host and port that the MapReduce job
> tracker
> > > runs
> > > > > > >   at.  If "local", then jobs are run in-process as a single
> map
> > > > > > >   and reduce task.
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>mapred.job.tracker.info.port</name>
> > > > > > >   <value>50030</value>
> > > > > > >   <description>The port that the MapReduce job tracker info
> > > webserver
> > > > > runs
> > > > > > > at.
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>mapred.task.tracker.output.port</name>
> > > > > > >   <value>50040</value>
> > > > > > >   <description>The port number that the MapReduce task tracker
> > > output
> > > > > > > server uses as a starting point to look for
> > > > > > > a free port to listen on.
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>mapred.task.tracker.report.port</name>
> > > > > > >   <value>50050</value>
> > > > > > >   <description>The port number that the MapReduce task tracker
> > > report
> > > > > > > server uses as a starting
> > > > > > >                point to look for a free port to listen on.
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>mapred.local.dir</name>
> > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > >   <description>The local directory where MapReduce stores
> > > intermediate
> > > > > > >   data files.  May be a space- or comma- separated list of
> > > > > > >   directories on different devices in order to spread disk
> i/o.
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>mapred.system.dir</name>
> > > > > > >   <value>/tmp/hadoop/mapred/system</value>
> > > > > > >   <description>The shared directory where MapReduce stores
> control
> > > > > files.
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>mapred.temp.dir</name>
> > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > >   <description>A shared directory for temporary files.
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > >   <value>1</value>
> > > > > > >   <description>The default number of reduce tasks per job.
> > > Typically
> > > > > set
> > > > > > >   to a prime close to the number of available hosts.  Ignored
> when
> > > > > > >   mapred.job.tracker is "local".
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > >   <value>2</value>
> > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > >   <description>A shared directory for temporary files.
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > >   <value>1</value>
> > > > > > >   <description>The default number of reduce tasks per job.
> > > Typically
> > > > > set
> > > > > > >   to a prime close to the number of available hosts.  Ignored
> when
> > > > > > >   mapred.job.tracker is "local".
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > <property>
> > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > >   <value>2</value>
> > > > > > >   <description>The maximum number of tasks that will be run
> > > > > > >   simultaneously by a task tracker.
> > > > > > >   </description>
> > > > > > > </property>
> > > > > > >
> > > > > > > </configuration>
> > > > > > >
> > > > > > > ------
> > > > > > >
> > > > > > > Then execute the following commands
> > > > > > > - initialize the HDFS
> > > > > > > bin/hadoop namenode -format
> > > > > > > - Start the namenode/datanode
> > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > - Lets do some checking...
> > > > > > > bin/hadoop dfs -ls
> > > > > > >
> > > > > > > Should return 0 items!! So lets try to add a file to the DFS
> > > > > > >
> > > > > > > bin/hadoop dfs -put xyz.html xyz.html
> > > > > > >
> > > > > > > Try
> > > > > > >
> > > > > > > bin/hadoop dfs -ls
> > > > > > >
> > > > > > > You should see one item which is
> > > > > > > Found 1 items
> > > > > > > /user/root/xyz.html    21433
> > > > > > >
> > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > >
> > > > > > > Now you can start of with inject, generate etc.. etc..
> > > > > > >
> > > > > > > Hope this time it works for you..
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > >
> > > > > > > On 4/24/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > > On 4/24/06, Peter Swoboda <[hidden email]>
> wrote:
> > > > > > > > > I forgot to have a look at the log files:
> > > > > > > > > namenode:
> > > > > > > > > 060424 121444 parsing
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060424 121444 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > Exception in thread "main" java.lang.RuntimeException: Not
> a
> > > > > host:port
> > > > > > > pair:
> > > > > > > > > local
> > > > > > > > >         at
> > > > > > >
> org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > > > > > > >         at
> > > org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > > > > > > >         at
> > > org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > datanode
> > > > > > > > > 060424 121448 10 parsing
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060424 121448 10 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060424 121448 10 Can't start DataNode in non-directory:
> > > > > > > /tmp/hadoop/dfs/data
> > > > > > > > >
> > > > > > > > > jobtracker
> > > > > > > > > 060424 121455 parsing
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060424 121455 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060424 121455 parsing
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060424 121456 parsing
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060424 121456 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > > > > mapred.job.tracker: local
> > > > > > > > >         at
> > > > > > >
> > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > >         at
> > > > > > >
> org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > > > > > > >         at
> > > > > > >
> > > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > > > > > > >         at
> > > > > > > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > tasktracker
> > > > > > > > > 060424 121502 parsing
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060424 121503 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > > > > mapred.job.tracker: local
> > > > > > > > >         at
> > > > > > >
> > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > >         at
> > > > > > >
> org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > > > > > > >         at
> > > > > > >
> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > What can be the problem?
> > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > Von: "Peter Swoboda" <[hidden email]>
> > > > > > > > > > An: [hidden email]
> > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > specified
> > > > > in
> > > > > > > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > > > > > > >
> > > > > > > > > > Got the latest nutch-nightly built,
> > > > > > > > > > including hadoop-0.1.1.jar.
> > > > > > > > > > Copied the content of the daoop-default.xml into
> > > > > hadoop-site.xml.
> > > > > > > > > > started namenode, datanode, jobtracker, tasktracker.
> > > > > > > > > > made
> > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > >
> > > > > > > > > > result:
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > > > > > > starting namenode, logging to
> > > > > > > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > > > > > > starting datanode, logging to
> > > > > > > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > starting jobtracker, logging to
> > > > > > > > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > starting tasktracker, logging to
> > > > > > > > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > 060424 121512 parsing
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060424 121512 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060424 121513 No FS indicated, using default:local
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > 060424 121543 parsing
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060424 121543 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060424 121544 No FS indicated, using default:local
> > > > > > > > > > Found 18 items
> > > > > > > > > > /home/../nutch-nightly/docs      <dir>
> > > > > > > > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > > > > > > > /home/../nutch-nightly/webapps   <dir>
> > > > > > > > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > > > > > > > /home/../nutch-nightly/build.xml 21433
> > > > > > > > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > > > > > > > /home/../nutch-nightly/test.log  3447
> > > > > > > > > > /home/../nutch-nightly/conf      <dir>
> > > > > > > > > > /home/../nutch-nightly/default.properties        3043
> > > > > > > > > > /home/../nutch-nightly/plugins   <dir>
> > > > > > > > > > /home/../nutch-nightly/lib       <dir>
> > > > > > > > > > /home/../nutch-nightly/bin       <dir>
> > > > > > > > > > /home/../nutch-nightly/logs      <dir>
> > > > > > > > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > > > > > > > /home/../nutch-nightly/src       <dir>
> > > > > > > > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > > > > > > > /home/../nutch-nightly/seeds     <dir>
> > > > > > > > > > /home/../nutch-nightly/README.txt        403
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > 060424 121603 parsing
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060424 121603 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060424 121603 No FS indicated, using default:local
> > > > > > > > > > Found 2 items
> > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > >
> > > > > > > > > > so far so good, but:
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > > > 060424 121613 parsing
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060424 121613 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > 060424 121613 parsing
> > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > 060424 121613 parsing
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > 060424 121613 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > 060424 121613 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060424 121614 crawl started in: crawled
> > > > > > > > > > 060424 121614 rootUrlDir = 2
> > > > > > > > > > 060424 121614 threads = 10
> > > > > > > > > > 060424 121614 depth = 5
> > > > > > > > > > Exception in thread "main" java.io.IOException: No valid
> > > local
> > > > > > > directories
> > > > > > > > > > in property: mapred.local.dir
> > > > > > > > > >         at
> > > > > > > > > >
> > > > >
> org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > > > > > > >         at
> > > > > > >
> org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > > > > > > >         at
> org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > > > > > > bash-3.00$
> > > > > > > > > >
> > > > > > > > > > I really don't know what to do.
> > > > > > > > > > in hadoop-site.xml it's:
> > > > > > > > > > ..
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > > > >   <description>The local directory where MapReduce
> stores
> > > > > > > intermediate
> > > > > > > > > >   data files.  May be a space- or comma- separated list
> of
> > > > > > > > > >   directories on different devices in order to spread
> disk
> > > i/o.
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > > ..
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > _______________________________________
> > > > > > > > > > Is your hadoop-site.xml empty, I mean it doesn't
> consisit
> > > any
> > > > > > > > > > configuration correct? So what you need to do is add
> your
> > > > > > > > > > configuration there. I suggest you copy the
> hadoop-0.1.1.jar
> > > to
> > > > > > > > > > another directory for inspection, copy not move. unzip
> the
> > > > > > > > > > hadoop-0.1.1.jar file you will see hadoop-default.xml
> file
> > > > > there.
> > > > > > > use
> > > > > > > > > > that as a template to edit your hadoop-site.xml under
> conf.
> > > Once
> > > > > you
> > > > > > > > > > have edited it then you should start your 'namenode' and
> > > > > 'datanode'.
> > > > > > > I
> > > > > > > > > > am guessing you are using nutch in a distributed way.
> cos
> > > you
> > > > > don't
> > > > > > > > > > need to use hadoop if you are just running in one
> machine
> > > local
> > > > > > > mode!!
> > > > > > > > > >
> > > > > > > > > > Anyway you need to do the following to start the
> datanode
> > > and
> > > > > > > namenode
> > > > > > > > > >
> > > > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > > >
> > > > > > > > > > then you need to start jobtracker and tasktracker before
> you
> > > > > start
> > > > > > > > > > crawling
> > > > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > >
> > > > > > > > > > then you start your bin/hadoop dfs -put seeds seeds
> > > > > > > > > >
> > > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> > > wrote:
> > > > > > > > > > > ok. changed to latest nightly build.
> > > > > > > > > > > hadoop-0.1.1.jar is existing,
> > > > > > > > > > > hadoop-site.xml also.
> > > > > > > > > > > now trying
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > >
> > > > > > > > > > > 060421 125154 parsing
> > > > > > > > > > >
> jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060421 125155 parsing
> > > > > > > > > > >
> file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > e.xml
> > > > > > > > > > > 060421 125155 No FS indicated, using default:local
> > > > > > > > > > >
> > > > > > > > > > > and
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > >
> > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > >
> jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > >
> file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > e.xml
> > > > > > > > > > > 060421 125217 No FS indicated, using default:local
> > > > > > > > > > > Found 16 items
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > > > > > > > > > >
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war
> > > > > 15541036
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt
> > > 17709
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt
> > > 615
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > > > > > > > > > >
> /home/stud/jung/Desktop/nutch-nightly/default.properties
> > > > > > > 3043
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > > > > > > > > > >
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar
> > > 408375
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > > > > > > > > > >
> /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job
> > > > > 18537096
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt
> > > 403
> > > > > > > > > > >
> > > > > > > > > > > also:
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > >
> > > > > > > > > > > 060421 133004 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060421 133004 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060421 133004 No FS indicated, using default:local
> > > > > > > > > > > Found 2 items
> > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > > bash-3.00$
> > > > > > > > > > >
> > > > > > > > > > > but:
> > > > > > > > > > >
> > > > > > > > > > > but:
> > > > > > > > > > >
> > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > > > >
> > > > > > > > > > > 060421 131722 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060421 131723 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > 060421 131723 parsing
> > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > 060421 131723 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > 060421 131723 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060421 131723 crawl started in: crawled
> > > > > > > > > > > 060421 131723 rootUrlDir = 2
> > > > > > > > > > > 060421 131723 threads = 10
> > > > > > > > > > > 060421 131723 depth = 5
> > > > > > > > > > > 060421 131724 Injector: starting
> > > > > > > > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > > > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > > > > > > 060421 131724 Injector: Converting injected urls to
> crawl
> > > db
> > > > > > > entries.
> > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060421 131724 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > 060421 131724 parsing
> > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > 060421 131725 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > 060421 131725 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060421 131726 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > 060421 131726 parsing
> > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > 060421 131726 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > 060421 131727 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > > > > > > 060421 131727 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060421 131727 job_6jn7j8
> > > > > > > > > > > java.io.IOException: No input directories specified
> in:
> > > > > > > Configuration:
> > > > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > > > >
> /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > > > > > > hadoop-site.xml
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > > :90)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > > :100)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > > > > > > Exception in thread "main" java.io.IOException: Job
> > > failed!
> > > > > > > > > > >         at
> > > > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > > > > >         at
> > > > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > > > > > > >         at
> > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > bash-3.00$
> > > > > > > > > > >
> > > > > > > > > > > Can anyone help?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> directories
> > > > > specified
> > > > > > > in
> > > > > > > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > > > > > > >
> > > > > > > > > > > > Also I have noticed that you are using hadoop-0.1,
> there
> > > was
> > > > > a
> > > > > > > bug in
> > > > > > > > > > > > 0.1 you should be using 0.1.1. Under you lib catalog
> you
> > > > > should
> > > > > > > have
> > > > > > > > > > > > the following file
> > > > > > > > > > > >
> > > > > > > > > > > > hadoop-0.1.1.jar
> > > > > > > > > > > >
> > > > > > > > > > > > If thats the case. Please download the latest
> nightly
> > > build.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On 4/21/06, Zaheed Haque <[hidden email]>
> wrote:
> > > > > > > > > > > > > Do you have a file called "hadoop-site.xml" under
> your
> > > > > conf
> > > > > > > > > > directory?
> > > > > > > > > > > > > The content of the file is like the following:
> > > > > > > > > > > > >
> > > > > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > > > > <?xml-stylesheet type="text/xsl"
> > > > > href="configuration.xsl"?>
> > > > > > > > > > > > >
> > > > > > > > > > > > > <!-- Put site-specific property overrides in this
> > > file.
> > > > > -->
> > > > > > > > > > > > >
> > > > > > > > > > > > > <configuration>
> > > > > > > > > > > > >
> > > > > > > > > > > > > </configuration>
> > > > > > > > > > > > >
> > > > > > > > > > > > > or is it missing... if its missing please create a
> > > file
> > > > > under
> > > > > > > the
> > > > > > > > > > conf
> > > > > > > > > > > > > catalog with the name hadoop-site.xml and then try
> the
> > > > > hadoop
> > > > > > > dfs
> > > > > > > > > > -ls
> > > > > > > > > > > > > again?  you should see something! like listing
> from
> > > your
> > > > > local
> > > > > > > file
> > > > > > > > > > > > > system.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> <[hidden email]>
> > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> > > directories
> > > > > > > specified
> > > > > > > > > > in
> > > > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > 060421 122421 parsing
> > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think the hadoop-site is missing cos we should
> be
> > > seeing
> > > > > a
> > > > > > > message
> > > > > > > > > > > > > like this here...
> > > > > > > > > > > > >
> > > > > > > > > > > > > 060421 131014 parsing
> > > > > > > > > > > > >
> > > > > > >
> file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 060421 122421 No FS indicated, using
> default:local
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 060421 122425 parsing
> > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 060421 122426 No FS indicated, using
> default:local
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Found 0 items
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As you can see, i can't.
> > > > > > > > > > > > > > What's going wrong?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can you see your text file with URLS?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Furthermore bin/nutch crawl is a one shot
> > > crawl/index
> > > > > > > command. I
> > > > > > > > > > > > > > > strongly recommend you take the long route of
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > inject, generate, fetch, updatedb,
> invertlinks,
> > > index,
> > > > > > > dedup and
> > > > > > > > > > > > > > > merge.  You can try the above commands just by
> > > typing
> > > > > > > > > > > > > > > bin/nutch inject
> > > > > > > > > > > > > > > etc..
> > > > > > > > > > > > > > > If just try the inject command without any
> > > parameters
> > > > > it
> > > > > > > will
> > > > > > > > > > tell
> > > > > > > > > > > > you
> > > > > > > > > > > > > > > how to use it..
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hope this helps.
> > > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > > <[hidden email]>
> > > > > > > wrote:
> > > > > > > > > > > > > > > > hi
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > > > > > > > done the following steps:
> > > > > > > > > > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 060317 121440 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > 060317 121441 No FS indicated, using
> > > default:local
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2
> >&
> > > > > crawl.log
> > > > > > > > > > > > > > > > but in crawl.log:
> > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > >
> > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > java.io.IOException: No input directories
> > > specified
> > > > > in:
> > > > > > > > > > > > Configuration:
> > > > > > > > > > > > > > > > defaults: hadoop-default.xml ,
> > > mapred-default.xml ,
> > > > > > > > > > > > > > > >
> > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > > :84)
> > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > > :94)
> > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > > > > > > > > Exception in thread "main"
> java.io.IOException:
> > > Job
> > > > > > > failed!
> > > > > > > > > > > > > > > >     at
> > > > > > > > > > > >
> > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > > > > > > > > >     at
> > > > > > > > > >
> org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > > > > > > > >     at
> > > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Any ideas?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > > > > > "Feel free" mit GMX DSL!
> > > http://www.gmx.net/de/go/dsl
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > >
> > > > >
> > > >
> > > > --
> > > > GMX Produkte empfehlen und ganz einfach Geld verdienen!
> > > > Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
> > > >
> > >
> >
> > --
> >
> >
> > Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
> > Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer
> >
>

--
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
12