java.io.IOException: No input directories specified in

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
On 4/26/06, Peter Swoboda <[hidden email]> wrote:

> > --- Ursprüngliche Nachricht ---
> > Von: "Zaheed Haque" <[hidden email]>
> > An: [hidden email]
> > Betreff: Re: java.io.IOException: No input directories specified in
> > Datum: Wed, 26 Apr 2006 09:12:47 +0200
> >
> > good. as you can see all your data will be saved under
> >
> > /user/swoboda/
> >
> > And urls is the directory where you have your urls.txt file.
> >
> > so the inject statement you should have is the following:
> >
> > bin/nutch inject crawldb urls
>
> result:
> bash-3.00$ bin/nutch inject crawldb urls
> 060426 091859 Injector: starting
> 060426 091859 Injector: crawlDb: crawldb
> 060426 091859 Injector: urlDir: urls
> 060426 091900 parsing
> jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060426 091900 parsing
> file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> 060426 091901 parsing
> file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> 060426 091901 parsing
> file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> 060426 091901 Injector: Converting injected urls to crawl db entries.
> 060426 091901 parsing
> jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060426 091901 parsing
> file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> 060426 091901 parsing
> jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> 060426 091901 parsing
> file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> 060426 091901 parsing
> file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> 060426 091901 Client connection to 127.0.0.1:50020: starting
> 060426 091902 Client connection to 127.0.0.1:50000: starting
> 060426 091902 parsing
> jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060426 091902 parsing
> file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> 060426 091907 Running job: job_b59xmu
> 060426 091908  map 100%  reduce 100%
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
>         at org.apache.nutch.crawl.Injector.main(Injector.java:138)
> bash-3.00$
>
> >
> > so try the above first then try
> >
> > hadoop dfs -ls you will see crawldb directory.
> >
>
> bash-3.00$ bin/hadoop dfs -ls
> 060426 091842 parsing
> jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060426 091843 parsing
> file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> 060426 091843 Client connection to 127.0.0.1:50000: starting
> 060426 091843 No FS indicated, using default:localhost.localdomain:50000
> Found 1 items
> /user/swoboda/urls      <dir>
> bash-3.00$
>
>
> > Cheers
> >
> > On 4/26/06, Peter Swoboda <[hidden email]> wrote:
> > > Hi.
> > > Of course i can. here you are:
> > >
> > >
> > > > --- Ursprüngliche Nachricht ---
> > > > Von: "Zaheed Haque" <[hidden email]>
> > > > An: [hidden email]
> > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > Datum: Tue, 25 Apr 2006 12:00:53 +0200
> > > >
> > > > Hi Could you please post the results for the following commands
> > > > bin/hadoop dfs -ls
> > >
> > > bash-3.00$ bin/hadoop dfs -ls
> > > 060426 085559 parsing
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060426 085559 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > 060426 085559 No FS indicated, using default:localhost.localdomain:50000
> > > 060426 085559 Client connection to 127.0.0.1:50000: starting
> > > Found 1 items
> > > /user/swoboda/urls      <dir>
> > > bash-3.00$
> > >
> > >
> > > >
> > > > and
> > > >
> > > > bin/nutch inject crawldb crawled(your urls directory in hadoop)
> > > >
> > >
> > > bash-3.00$ bin/nutch inject crawldb crawled urls
> > > 060426 085723 Injector: starting
> > > 060426 085723 Injector: crawlDb: crawldb
> > > 060426 085723 Injector: urlDir: crawled
> > > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > op-0.1.1.jar!/hadoop-default.xml
> > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> > efault.xml
> > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-s ite.xml
> > > 060426 085724 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> > > 060426 085724 Injector: Converting injected urls to crawl db entries.
> > > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > op-0.1.1.jar!/hadoop-default.xml
> > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> > efault.xml
> > > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > op-0.1.1.jar!/mapred-default.xml
> > > 060426 085725 parsing file:/home/../nutch-nightly/conf/nutch-s ite.xml
> > > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> > > 060426 085725 Client connection to 127.0.0.1:50020: starting
> > > 060426 085725 Client connection to 127.0.0.1:50000: starting
> > > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > op-0.1.1.jar!/hadoop-default.xml
> > > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> > > 060426 085730 Running job: job_o6tvpr
> > > 060426 085731  map 100%  reduce 100%
> > > Exception in thread "main" java.io.IOException: Job failed!
> > >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > >         at org.apache.nutch.crawl.Injector.main(Injector.java:138)
> > > bash-3.00$
> > >
> > >
> > > > thanks
> > > >
> > > > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > > > Sorry, my mistake. changed to 0.1.1
> > > > > results:
> > > > >
> > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > 060425 113831 parsing
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060425 113831 parsing
> > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > 060425 113832 parsing
> > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > 060425 113832 parsing
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060425 113832 parsing
> > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > 060425 113832 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060425 113832 Client connection to 127.0.0.1:50000: starting
> > > > > 060425 113832 crawl started in: crawled
> > > > > 060425 113832 rootUrlDir = 2
> > > > > 060425 113832 threads = 10
> > > > > 060425 113832 depth = 5
> > > > > 060425 113833 Injector: starting
> > > > > 060425 113833 Injector: crawlDb: crawled/crawldb
> > > > > 060425 113833 Injector: urlDir: 2
> > > > > 060425 113833 Injector: Converting injected urls to crawl db
> > entries.
> > > > > 060425 113833 parsing
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060425 113833 parsing
> > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > 060425 113833 parsing
> > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > 060425 113833 parsing
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060425 113833 parsing
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > 060425 113833 parsing
> > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > 060425 113833 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060425 113834 Client connection to 127.0.0.1:50020: starting
> > > > > 060425 113834 Client connection to 127.0.0.1:50000: starting
> > > > > 060425 113834 parsing
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060425 113834 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060425 113838 Running job: job_23a6ra
> > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > >         at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > bash-3.00$
> > > > >
> > > > >
> > > > > Step by Step, same but another job that failed.
> > > > >
> > > > > > --- Ursprüngliche Nachricht ---
> > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > An: [hidden email]
> > > > > > Betreff: Re: java.io.IOException: No input directories specified
> > in
> > > > > > Datum: Tue, 25 Apr 2006 11:34:10 +0200
> > > > > >
> > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > >
> > > > > > Somehow you are still using hadoop-0.1 and not 0.1.1. I am not
> > sure if
> > > > > > this update will solve your problem but it might. With the config
> > I
> > > > > > sent you, I could, crawl-index-serach so there must be something
> > > > > > else.. I am not sure.
> > > > > >
> > > > > > Cheers
> > > > > > Zaheed
> > > > > >
> > > > > > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > Seems to be a bit better, doesn't it?
> > > > > > >
> > > > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > > > 060425 110124 parsing
> > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > 060425 110124 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > 060425 110124 parsing
> > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > 060425 110124 parsing
> > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > 060425 110125 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > 060425 110125 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060425 110125 Client connection to 127.0.0.1:50000: starting
> > > > > > > 060425 110125 crawl started in: crawled
> > > > > > > 060425 110125 rootUrlDir = 2
> > > > > > > 060425 110125 threads = 10
> > > > > > > 060425 110125 depth = 5
> > > > > > > 060425 110126 Injector: starting
> > > > > > > 060425 110126 Injector: crawlDb: crawled/crawldb
> > > > > > > 060425 110126 Injector: urlDir: 2
> > > > > > > 060425 110126 Injector: Converting injected urls to crawl db
> > > > entries.
> > > > > > > 060425 110126 parsing
> > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > 060425 110126 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > 060425 110126 parsing
> > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > 060425 110126 parsing
> > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > 060425 110126 parsing
> > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > 060425 110126 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > 060425 110127 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060425 110127 Client connection to 127.0.0.1:50020: starting
> > > > > > > 060425 110127 Client connection to 127.0.0.1:50000: starting
> > > > > > > 060425 110127 parsing
> > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > 060425 110127 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > Exception in thread "main"
> > > > > > java.lang.reflect.UndeclaredThrowableException
> > > > > > >         at
> > org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown
> > > > > > Source)
> > > > > > >         at
> > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
> > > > > > >         at
> > > > > > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
> > > > > > >         at
> > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
> > > > > > >         at
> > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > Caused by: java.io.IOException: timed out waiting for response
> > > > > > >         at org.apache.hadoop.ipc.Client.call(Client.java:303)
> > > > > > >         at
> > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> > > > > > >         ... 6 more
> > > > > > >
> > > > > > >
> > > > > > > local ip is the same,
> > > > > > > but don't exactly know how to handle the ports.
> > > > > > >
> > > > > > > Step by Step (generate, index..) caused same error while
> > > > > > >  bin/nutch generate crawl/crawldb crawl/segments
> > > > > > >
> > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > An: [hidden email]
> > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > specified
> > > > in
> > > > > > > > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> > > > > > > >
> > > > > > > > Try the following in your hadoop-site.xml.. please change and
> > > > adjust
> > > > > > > > based on your ip address. The following configuration assumes
> > that
> > > > the
> > > > > > > > you have 1 server and you are using it as a namenode as well
> > as a
> > > > > > > > datanode. Note this is NOT the reason for running Hadoopified
> > > > Nutch!
> > > > > > > > It is rather for testing....
> > > > > > > >
> > > > > > > > --------------------
> > > > > > > >
> > > > > > > > <?xml version="1.0"?>
> > > > > > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > > > > >
> > > > > > > > <configuration>
> > > > > > > >
> > > > > > > > <!-- file system properties -->
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>fs.default.name</name>
> > > > > > > >   <value>127.0.0.1:50000</value>
> > > > > > > >   <description>The name of the default file system.  Either
> > the
> > > > > > > >   literal string "local" or a host:port for DFS.</description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>dfs.datanode.port</name>
> > > > > > > >   <value>50010</value>
> > > > > > > >   <description>The port number that the dfs datanode server
> > uses
> > > > as a
> > > > > > > > starting
> > > > > > > >                point to look for a free port to listen on.
> > > > > > > > </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>dfs.name.dir</name>
> > > > > > > >   <value>/tmp/hadoop/dfs/name</value>
> > > > > > > >   <description>Determines where on the local filesystem the
> > DFS
> > > > name
> > > > > > node
> > > > > > > >       should store the name table.</description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>dfs.data.dir</name>
> > > > > > > >   <value>/tmp/hadoop/dfs/data</value>
> > > > > > > >   <description>Determines where on the local filesystem an DFS
> > > > data
> > > > > > node
> > > > > > > >   should store its blocks.  If this is a comma- or
> > space-delimited
> > > > > > > >   list of directories, then data will be stored in all named
> > > > > > > >   directories, typically on different devices.</description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>dfs.replication</name>
> > > > > > > >   <value>1</value>
> > > > > > > >   <description>How many copies we try to have at all times.
> > The
> > > > actual
> > > > > > > >   number of replications is at max the number of datanodes in
> > the
> > > > > > > >   cluster.</description>
> > > > > > > > </property>
> > > > > > > > <!-- map/reduce properties -->
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>mapred.job.tracker</name>
> > > > > > > >   <value>127.0.0.1:50020</value>
> > > > > > > >   <description>The host and port that the MapReduce job
> > tracker
> > > > runs
> > > > > > > >   at.  If "local", then jobs are run in-process as a single
> > map
> > > > > > > >   and reduce task.
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>mapred.job.tracker.info.port</name>
> > > > > > > >   <value>50030</value>
> > > > > > > >   <description>The port that the MapReduce job tracker info
> > > > webserver
> > > > > > runs
> > > > > > > > at.
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>mapred.task.tracker.output.port</name>
> > > > > > > >   <value>50040</value>
> > > > > > > >   <description>The port number that the MapReduce task tracker
> > > > output
> > > > > > > > server uses as a starting point to look for
> > > > > > > > a free port to listen on.
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>mapred.task.tracker.report.port</name>
> > > > > > > >   <value>50050</value>
> > > > > > > >   <description>The port number that the MapReduce task tracker
> > > > report
> > > > > > > > server uses as a starting
> > > > > > > >                point to look for a free port to listen on.
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > >   <description>The local directory where MapReduce stores
> > > > intermediate
> > > > > > > >   data files.  May be a space- or comma- separated list of
> > > > > > > >   directories on different devices in order to spread disk
> > i/o.
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>mapred.system.dir</name>
> > > > > > > >   <value>/tmp/hadoop/mapred/system</value>
> > > > > > > >   <description>The shared directory where MapReduce stores
> > control
> > > > > > files.
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>mapred.temp.dir</name>
> > > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > > >   <description>A shared directory for temporary files.
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > > >   <value>1</value>
> > > > > > > >   <description>The default number of reduce tasks per job.
> > > > Typically
> > > > > > set
> > > > > > > >   to a prime close to the number of available hosts.  Ignored
> > when
> > > > > > > >   mapred.job.tracker is "local".
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > > >   <value>2</value>
> > > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > > >   <description>A shared directory for temporary files.
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > > >   <value>1</value>
> > > > > > > >   <description>The default number of reduce tasks per job.
> > > > Typically
> > > > > > set
> > > > > > > >   to a prime close to the number of available hosts.  Ignored
> > when
> > > > > > > >   mapred.job.tracker is "local".
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > <property>
> > > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > > >   <value>2</value>
> > > > > > > >   <description>The maximum number of tasks that will be run
> > > > > > > >   simultaneously by a task tracker.
> > > > > > > >   </description>
> > > > > > > > </property>
> > > > > > > >
> > > > > > > > </configuration>
> > > > > > > >
> > > > > > > > ------
> > > > > > > >
> > > > > > > > Then execute the following commands
> > > > > > > > - initialize the HDFS
> > > > > > > > bin/hadoop namenode -format
> > > > > > > > - Start the namenode/datanode
> > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > - Lets do some checking...
> > > > > > > > bin/hadoop dfs -ls
> > > > > > > >
> > > > > > > > Should return 0 items!! So lets try to add a file to the DFS
> > > > > > > >
> > > > > > > > bin/hadoop dfs -put xyz.html xyz.html
> > > > > > > >
> > > > > > > > Try
> > > > > > > >
> > > > > > > > bin/hadoop dfs -ls
> > > > > > > >
> > > > > > > > You should see one item which is
> > > > > > > > Found 1 items
> > > > > > > > /user/root/xyz.html    21433
> > > > > > > >
> > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > >
> > > > > > > > Now you can start of with inject, generate etc.. etc..
> > > > > > > >
> > > > > > > > Hope this time it works for you..
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > >
> > > > > > > > On 4/24/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > > > On 4/24/06, Peter Swoboda <[hidden email]>
> > wrote:
> > > > > > > > > > I forgot to have a look at the log files:
> > > > > > > > > > namenode:
> > > > > > > > > > 060424 121444 parsing
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060424 121444 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > Exception in thread "main" java.lang.RuntimeException: Not
> > a
> > > > > > host:port
> > > > > > > > pair:
> > > > > > > > > > local
> > > > > > > > > >         at
> > > > > > > >
> > org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > > > > > > > >         at
> > > > org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > > > > > > > >         at
> > > > org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > datanode
> > > > > > > > > > 060424 121448 10 parsing
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060424 121448 10 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060424 121448 10 Can't start DataNode in non-directory:
> > > > > > > > /tmp/hadoop/dfs/data
> > > > > > > > > >
> > > > > > > > > > jobtracker
> > > > > > > > > > 060424 121455 parsing
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060424 121455 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060424 121455 parsing
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060424 121456 parsing
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > 060424 121456 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > > > > > mapred.job.tracker: local
> > > > > > > > > >         at
> > > > > > > >
> > > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > > >         at
> > > > > > > >
> > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > > > > > > > >         at
> > > > > > > >
> > > > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > > > > > > > >         at
> > > > > > > > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > tasktracker
> > > > > > > > > > 060424 121502 parsing
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > 060424 121503 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > > > > > mapred.job.tracker: local
> > > > > > > > > >         at
> > > > > > > >
> > > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > > >         at
> > > > > > > >
> > org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > > > > > > > >         at
> > > > > > > >
> > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > What can be the problem?
> > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > Von: "Peter Swoboda" <[hidden email]>
> > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > > specified
> > > > > > in
> > > > > > > > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > > > > > > > >
> > > > > > > > > > > Got the latest nutch-nightly built,
> > > > > > > > > > > including hadoop-0.1.1.jar.
> > > > > > > > > > > Copied the content of the daoop-default.xml into
> > > > > > hadoop-site.xml.
> > > > > > > > > > > started namenode, datanode, jobtracker, tasktracker.
> > > > > > > > > > > made
> > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > >
> > > > > > > > > > > result:
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > starting namenode, logging to
> > > > > > > > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > > > > > > > starting datanode, logging to
> > > > > > > > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > > starting jobtracker, logging to
> > > > > > > > > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > > starting tasktracker, logging to
> > > > > > > > > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > 060424 121512 parsing
> > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060424 121512 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060424 121513 No FS indicated, using default:local
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > 060424 121543 parsing
> > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060424 121543 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060424 121544 No FS indicated, using default:local
> > > > > > > > > > > Found 18 items
> > > > > > > > > > > /home/../nutch-nightly/docs      <dir>
> > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > > > > > > > > /home/../nutch-nightly/webapps   <dir>
> > > > > > > > > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > > > > > > > > /home/../nutch-nightly/build.xml 21433
> > > > > > > > > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > > > > > > > > /home/../nutch-nightly/test.log  3447
> > > > > > > > > > > /home/../nutch-nightly/conf      <dir>
> > > > > > > > > > > /home/../nutch-nightly/default.properties        3043
> > > > > > > > > > > /home/../nutch-nightly/plugins   <dir>
> > > > > > > > > > > /home/../nutch-nightly/lib       <dir>
> > > > > > > > > > > /home/../nutch-nightly/bin       <dir>
> > > > > > > > > > > /home/../nutch-nightly/logs      <dir>
> > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > > > > > > > > /home/../nutch-nightly/src       <dir>
> > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > > > > > > > > /home/../nutch-nightly/seeds     <dir>
> > > > > > > > > > > /home/../nutch-nightly/README.txt        403
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > > 060424 121603 parsing
> > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060424 121603 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060424 121603 No FS indicated, using default:local
> > > > > > > > > > > Found 2 items
> > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > >
> > > > > > > > > > > so far so good, but:
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060424 121613 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > 060424 121613 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > 060424 121613 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > 060424 121613 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060424 121614 crawl started in: crawled
> > > > > > > > > > > 060424 121614 rootUrlDir = 2
> > > > > > > > > > > 060424 121614 threads = 10
> > > > > > > > > > > 060424 121614 depth = 5
> > > > > > > > > > > Exception in thread "main" java.io.IOException: No valid
> > > > local
> > > > > > > > directories
> > > > > > > > > > > in property: mapred.local.dir
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > >
> > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > > > > > > > >         at
> > > > > > > >
> > org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > > > > > > > >         at
> > org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > > > > > > > bash-3.00$
> > > > > > > > > > >
> > > > > > > > > > > I really don't know what to do.
> > > > > > > > > > > in hadoop-site.xml it's:
> > > > > > > > > > > ..
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > > > > >   <description>The local directory where MapReduce
> > stores
> > > > > > > > intermediate
> > > > > > > > > > >   data files.  May be a space- or comma- separated list
> > of
> > > > > > > > > > >   directories on different devices in order to spread
> > disk
> > > > i/o.
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > > ..
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > _______________________________________
> > > > > > > > > > > Is your hadoop-site.xml empty, I mean it doesn't
> > consisit
> > > > any
> > > > > > > > > > > configuration correct? So what you need to do is add
> > your
> > > > > > > > > > > configuration there. I suggest you copy the
> > hadoop-0.1.1.jar
> > > > to
> > > > > > > > > > > another directory for inspection, copy not move. unzip
> > the
> > > > > > > > > > > hadoop-0.1.1.jar file you will see hadoop-default.xml
> > file
> > > > > > there.
> > > > > > > > use
> > > > > > > > > > > that as a template to edit your hadoop-site.xml under
> > conf.
> > > > Once
> > > > > > you
> > > > > > > > > > > have edited it then you should start your 'namenode' and
> > > > > > 'datanode'.
> > > > > > > > I
> > > > > > > > > > > am guessing you are using nutch in a distributed way.
> > cos
> > > > you
> > > > > > don't
> > > > > > > > > > > need to use hadoop if you are just running in one
> > machine
> > > > local
> > > > > > > > mode!!
> > > > > > > > > > >
> > > > > > > > > > > Anyway you need to do the following to start the
> > datanode
> > > > and
> > > > > > > > namenode
> > > > > > > > > > >
> > > > > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > > > >
> > > > > > > > > > > then you need to start jobtracker and tasktracker before
> > you
> > > > > > start
> > > > > > > > > > > crawling
> > > > > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > >
> > > > > > > > > > > then you start your bin/hadoop dfs -put seeds seeds
> > > > > > > > > > >
> > > > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> > > > wrote:
> > > > > > > > > > > > ok. changed to latest nightly build.
> > > > > > > > > > > > hadoop-0.1.1.jar is existing,
> > > > > > > > > > > > hadoop-site.xml also.
> > > > > > > > > > > > now trying
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > >
> > > > > > > > > > > > 060421 125154 parsing
> > > > > > > > > > > >
> > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060421 125155 parsing
> > > > > > > > > > > >
> > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > > e.xml
> > > > > > > > > > > > 060421 125155 No FS indicated, using default:local
> > > > > > > > > > > >
> > > > > > > > > > > > and
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > >
> > > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > > >
> > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > > >
> > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > > e.xml
> > > > > > > > > > > > 060421 125217 No FS indicated, using default:local
> > > > > > > > > > > > Found 16 items
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > > > > > > > > > > >
> > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war
> > > > > > 15541036
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt
> > > > 17709
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt
> > > > 615
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > > > > > > > > > > >
> > /home/stud/jung/Desktop/nutch-nightly/default.properties
> > > > > > > > 3043
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > > > > > > > > > > >
> > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar
> > > > 408375
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > > > > > > > > > > >
> > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job
> > > > > > 18537096
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt
> > > > 403
> > > > > > > > > > > >
> > > > > > > > > > > > also:
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > > >
> > > > > > > > > > > > 060421 133004 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060421 133004 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > 060421 133004 No FS indicated, using default:local
> > > > > > > > > > > > Found 2 items
> > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > >
> > > > > > > > > > > > but:
> > > > > > > > > > > >
> > > > > > > > > > > > but:
> > > > > > > > > > > >
> > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > > > > >
> > > > > > > > > > > > 060421 131722 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > 060421 131723 crawl started in: crawled
> > > > > > > > > > > > 060421 131723 rootUrlDir = 2
> > > > > > > > > > > > 060421 131723 threads = 10
> > > > > > > > > > > > 060421 131723 depth = 5
> > > > > > > > > > > > 060421 131724 Injector: starting
> > > > > > > > > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > > > > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > > > > > > > 060421 131724 Injector: Converting injected urls to
> > crawl
> > > > db
> > > > > > > > entries.
> > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > 060421 131727 job_6jn7j8
> > > > > > > > > > > > java.io.IOException: No input directories specified
> > in:
> > > > > > > > Configuration:
> > > > > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > > > > >
> > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > > > :90)
> > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > > > :100)
> > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > > > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > > > > > > > Exception in thread "main" java.io.IOException: Job
> > > > failed!
> > > > > > > > > > > >         at
> > > > > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > > > > > >         at
> > > > > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > > > > > > > >         at
> > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > >
> > > > > > > > > > > > Can anyone help?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> > directories
> > > > > > specified
> > > > > > > > in
> > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > > > > > > > >
> > > > > > > > > > > > > Also I have noticed that you are using hadoop-0.1,
> > there
> > > > was
> > > > > > a
> > > > > > > > bug in
> > > > > > > > > > > > > 0.1 you should be using 0.1.1. Under you lib catalog
> > you
> > > > > > should
> > > > > > > > have
> > > > > > > > > > > > > the following file
> > > > > > > > > > > > >
> > > > > > > > > > > > > hadoop-0.1.1.jar
> > > > > > > > > > > > >
> > > > > > > > > > > > > If thats the case. Please download the latest
> > nightly
> > > > build.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 4/21/06, Zaheed Haque <[hidden email]>
> > wrote:
> > > > > > > > > > > > > > Do you have a file called "hadoop-site.xml" under
> > your
> > > > > > conf
> > > > > > > > > > > directory?
> > > > > > > > > > > > > > The content of the file is like the following:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > > > > > <?xml-stylesheet type="text/xsl"
> > > > > > href="configuration.xsl"?>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > <!-- Put site-specific property overrides in this
> > > > file.
> > > > > > -->
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > <configuration>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > </configuration>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > or is it missing... if its missing please create a
> > > > file
> > > > > > under
> > > > > > > > the
> > > > > > > > > > > conf
> > > > > > > > > > > > > > catalog with the name hadoop-site.xml and then try
> > the
> > > > > > hadoop
> > > > > > > > dfs
> > > > > > > > > > > -ls
> > > > > > > > > > > > > > again?  you should see something! like listing
> > from
> > > > your
> > > > > > local
> > > > > > > > file
> > > > > > > > > > > > > > system.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > <[hidden email]>
> > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> > > > directories
> > > > > > > > specified
> > > > > > > > > > > in
> > > > > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > > 060421 122421 parsing
> > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think the hadoop-site is missing cos we should
> > be
> > > > seeing
> > > > > > a
> > > > > > > > message
> > > > > > > > > > > > > > like this here...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 060421 131014 parsing
> > > > > > > > > > > > > >
> > > > > > > >
> > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 060421 122421 No FS indicated, using
> > default:local
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 060421 122425 parsing
> > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 060421 122426 No FS indicated, using
> > default:local
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Found 0 items
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > As you can see, i can't.
> > > > > > > > > > > > > > > What's going wrong?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you see your text file with URLS?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Furthermore bin/nutch crawl is a one shot
> > > > crawl/index
> > > > > > > > command. I
> > > > > > > > > > > > > > > > strongly recommend you take the long route of
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > inject, generate, fetch, updatedb,
> > invertlinks,
> > > > index,
> > > > > > > > dedup and
> > > > > > > > > > > > > > > > merge.  You can try the above commands just by
> > > > typing
> > > > > > > > > > > > > > > > bin/nutch inject
> > > > > > > > > > > > > > > > etc..
> > > > > > > > > > > > > > > > If just try the inject command without any
> > > > parameters
> > > > > > it
> > > > > > > > will
> > > > > > > > > > > tell
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > how to use it..
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hope this helps.
> > > > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > > > <[hidden email]>
> > > > > > > > wrote:
> > > > > > > > > > > > > > > > > hi
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > > > > > > > > done the following steps:
> > > > > > > > > > > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 060317 121440 parsing
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > > 060317 121441 No FS indicated, using
> > > > default:local
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2
> > >&
> > > > > > crawl.log
> > > > > > > > > > > > > > > > > but in crawl.log:
> > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > >
> > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > > java.io.IOException: No input directories
> > > > specified
> > > > > > in:
> > > > > > > > > > > > > Configuration:
> > > > > > > > > > > > > > > > > defaults: hadoop-default.xml ,
> > > > mapred-default.xml ,
> > > > > > > > > > > > > > > > >
> > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > > > :84)
> > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > > > :94)
> > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > > > > > > > > > Exception in thread "main"
> > java.io.IOException:
> > > > Job
> > > > > > > > failed!
> > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > >
> > > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > > > > > > > > > >     at
> > > > > > > > > > >
> > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > > > > > > > > >     at
> > > > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Any ideas?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > > > > > > "Feel free" mit GMX DSL!
> > > > http://www.gmx.net/de/go/dsl
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > GMX Produkte empfehlen und ganz einfach Geld verdienen!
> > > > > Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
> > > > >
> > > >
> > >
> > > --
> > >
> > >
> > > Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
> > > Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer
> > >
> >
>
> --
> "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
> Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
>
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
hmm.. where is your urls.txt file? is it in Hadoop filesystem, I mean
what happen if you try

bin/hadoop dfs -ls urls

/Z

On 4/26/06, Zaheed Haque <[hidden email]> wrote:

> On 4/26/06, Peter Swoboda <[hidden email]> wrote:
> > > --- Ursprüngliche Nachricht ---
> > > Von: "Zaheed Haque" <[hidden email]>
> > > An: [hidden email]
> > > Betreff: Re: java.io.IOException: No input directories specified in
> > > Datum: Wed, 26 Apr 2006 09:12:47 +0200
> > >
> > > good. as you can see all your data will be saved under
> > >
> > > /user/swoboda/
> > >
> > > And urls is the directory where you have your urls.txt file.
> > >
> > > so the inject statement you should have is the following:
> > >
> > > bin/nutch inject crawldb urls
> >
> > result:
> > bash-3.00$ bin/nutch inject crawldb urls
> > 060426 091859 Injector: starting
> > 060426 091859 Injector: crawlDb: crawldb
> > 060426 091859 Injector: urlDir: urls
> > 060426 091900 parsing
> > jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060426 091900 parsing
> > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > 060426 091901 parsing
> > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > 060426 091901 parsing
> > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > 060426 091901 Injector: Converting injected urls to crawl db entries.
> > 060426 091901 parsing
> > jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060426 091901 parsing
> > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > 060426 091901 parsing
> > jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > 060426 091901 parsing
> > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > 060426 091901 parsing
> > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > 060426 091901 Client connection to 127.0.0.1:50020: starting
> > 060426 091902 Client connection to 127.0.0.1:50000: starting
> > 060426 091902 parsing
> > jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060426 091902 parsing
> > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > 060426 091907 Running job: job_b59xmu
> > 060426 091908  map 100%  reduce 100%
> > Exception in thread "main" java.io.IOException: Job failed!
> >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> >         at org.apache.nutch.crawl.Injector.main(Injector.java:138)
> > bash-3.00$
> >
> > >
> > > so try the above first then try
> > >
> > > hadoop dfs -ls you will see crawldb directory.
> > >
> >
> > bash-3.00$ bin/hadoop dfs -ls
> > 060426 091842 parsing
> > jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > 060426 091843 parsing
> > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > 060426 091843 Client connection to 127.0.0.1:50000: starting
> > 060426 091843 No FS indicated, using default:localhost.localdomain:50000
> > Found 1 items
> > /user/swoboda/urls      <dir>
> > bash-3.00$
> >
> >
> > > Cheers
> > >
> > > On 4/26/06, Peter Swoboda <[hidden email]> wrote:
> > > > Hi.
> > > > Of course i can. here you are:
> > > >
> > > >
> > > > > --- Ursprüngliche Nachricht ---
> > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > An: [hidden email]
> > > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > > Datum: Tue, 25 Apr 2006 12:00:53 +0200
> > > > >
> > > > > Hi Could you please post the results for the following commands
> > > > > bin/hadoop dfs -ls
> > > >
> > > > bash-3.00$ bin/hadoop dfs -ls
> > > > 060426 085559 parsing
> > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060426 085559 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > 060426 085559 No FS indicated, using default:localhost.localdomain:50000
> > > > 060426 085559 Client connection to 127.0.0.1:50000: starting
> > > > Found 1 items
> > > > /user/swoboda/urls      <dir>
> > > > bash-3.00$
> > > >
> > > >
> > > > >
> > > > > and
> > > > >
> > > > > bin/nutch inject crawldb crawled(your urls directory in hadoop)
> > > > >
> > > >
> > > > bash-3.00$ bin/nutch inject crawldb crawled urls
> > > > 060426 085723 Injector: starting
> > > > 060426 085723 Injector: crawlDb: crawldb
> > > > 060426 085723 Injector: urlDir: crawled
> > > > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > op-0.1.1.jar!/hadoop-default.xml
> > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> > > efault.xml
> > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-s ite.xml
> > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> > > > 060426 085724 Injector: Converting injected urls to crawl db entries.
> > > > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > op-0.1.1.jar!/hadoop-default.xml
> > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> > > efault.xml
> > > > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > op-0.1.1.jar!/mapred-default.xml
> > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/nutch-s ite.xml
> > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> > > > 060426 085725 Client connection to 127.0.0.1:50020: starting
> > > > 060426 085725 Client connection to 127.0.0.1:50000: starting
> > > > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > op-0.1.1.jar!/hadoop-default.xml
> > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop- site.xml
> > > > 060426 085730 Running job: job_o6tvpr
> > > > 060426 085731  map 100%  reduce 100%
> > > > Exception in thread "main" java.io.IOException: Job failed!
> > > >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > >         at org.apache.nutch.crawl.Injector.main(Injector.java:138)
> > > > bash-3.00$
> > > >
> > > >
> > > > > thanks
> > > > >
> > > > > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > Sorry, my mistake. changed to 0.1.1
> > > > > > results:
> > > > > >
> > > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > > 060425 113831 parsing
> > > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060425 113831 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > 060425 113832 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > 060425 113832 parsing
> > > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060425 113832 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > 060425 113832 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060425 113832 Client connection to 127.0.0.1:50000: starting
> > > > > > 060425 113832 crawl started in: crawled
> > > > > > 060425 113832 rootUrlDir = 2
> > > > > > 060425 113832 threads = 10
> > > > > > 060425 113832 depth = 5
> > > > > > 060425 113833 Injector: starting
> > > > > > 060425 113833 Injector: crawlDb: crawled/crawldb
> > > > > > 060425 113833 Injector: urlDir: 2
> > > > > > 060425 113833 Injector: Converting injected urls to crawl db
> > > entries.
> > > > > > 060425 113833 parsing
> > > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060425 113833 parsing
> > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > 060425 113833 parsing
> > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > 060425 113833 parsing
> > > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060425 113833 parsing
> > > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060425 113833 parsing
> > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > 060425 113833 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060425 113834 Client connection to 127.0.0.1:50020: starting
> > > > > > 060425 113834 Client connection to 127.0.0.1:50000: starting
> > > > > > 060425 113834 parsing
> > > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060425 113834 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060425 113838 Running job: job_23a6ra
> > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > >         at
> > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > bash-3.00$
> > > > > >
> > > > > >
> > > > > > Step by Step, same but another job that failed.
> > > > > >
> > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > An: [hidden email]
> > > > > > > Betreff: Re: java.io.IOException: No input directories specified
> > > in
> > > > > > > Datum: Tue, 25 Apr 2006 11:34:10 +0200
> > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > >
> > > > > > > Somehow you are still using hadoop-0.1 and not 0.1.1. I am not
> > > sure if
> > > > > > > this update will solve your problem but it might. With the config
> > > I
> > > > > > > sent you, I could, crawl-index-serach so there must be something
> > > > > > > else.. I am not sure.
> > > > > > >
> > > > > > > Cheers
> > > > > > > Zaheed
> > > > > > >
> > > > > > > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > > Seems to be a bit better, doesn't it?
> > > > > > > >
> > > > > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > > > > 060425 110124 parsing
> > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > 060425 110124 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > 060425 110124 parsing
> > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > 060425 110124 parsing
> > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > 060425 110125 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > 060425 110125 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060425 110125 Client connection to 127.0.0.1:50000: starting
> > > > > > > > 060425 110125 crawl started in: crawled
> > > > > > > > 060425 110125 rootUrlDir = 2
> > > > > > > > 060425 110125 threads = 10
> > > > > > > > 060425 110125 depth = 5
> > > > > > > > 060425 110126 Injector: starting
> > > > > > > > 060425 110126 Injector: crawlDb: crawled/crawldb
> > > > > > > > 060425 110126 Injector: urlDir: 2
> > > > > > > > 060425 110126 Injector: Converting injected urls to crawl db
> > > > > entries.
> > > > > > > > 060425 110126 parsing
> > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > 060425 110126 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > 060425 110126 parsing
> > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > 060425 110126 parsing
> > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > 060425 110126 parsing
> > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > 060425 110126 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > 060425 110127 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060425 110127 Client connection to 127.0.0.1:50020: starting
> > > > > > > > 060425 110127 Client connection to 127.0.0.1:50000: starting
> > > > > > > > 060425 110127 parsing
> > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > 060425 110127 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > Exception in thread "main"
> > > > > > > java.lang.reflect.UndeclaredThrowableException
> > > > > > > >         at
> > > org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown
> > > > > > > Source)
> > > > > > > >         at
> > > > > > > >
> > > > > > >
> > > > >
> > > org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
> > > > > > > >         at
> > > > > > > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
> > > > > > > >         at
> > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
> > > > > > > >         at
> > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > Caused by: java.io.IOException: timed out waiting for response
> > > > > > > >         at org.apache.hadoop.ipc.Client.call(Client.java:303)
> > > > > > > >         at
> > > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> > > > > > > >         ... 6 more
> > > > > > > >
> > > > > > > >
> > > > > > > > local ip is the same,
> > > > > > > > but don't exactly know how to handle the ports.
> > > > > > > >
> > > > > > > > Step by Step (generate, index..) caused same error while
> > > > > > > >  bin/nutch generate crawl/crawldb crawl/segments
> > > > > > > >
> > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > An: [hidden email]
> > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > specified
> > > > > in
> > > > > > > > > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> > > > > > > > >
> > > > > > > > > Try the following in your hadoop-site.xml.. please change and
> > > > > adjust
> > > > > > > > > based on your ip address. The following configuration assumes
> > > that
> > > > > the
> > > > > > > > > you have 1 server and you are using it as a namenode as well
> > > as a
> > > > > > > > > datanode. Note this is NOT the reason for running Hadoopified
> > > > > Nutch!
> > > > > > > > > It is rather for testing....
> > > > > > > > >
> > > > > > > > > --------------------
> > > > > > > > >
> > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > > > > > > >
> > > > > > > > > <configuration>
> > > > > > > > >
> > > > > > > > > <!-- file system properties -->
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>fs.default.name</name>
> > > > > > > > >   <value>127.0.0.1:50000</value>
> > > > > > > > >   <description>The name of the default file system.  Either
> > > the
> > > > > > > > >   literal string "local" or a host:port for DFS.</description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>dfs.datanode.port</name>
> > > > > > > > >   <value>50010</value>
> > > > > > > > >   <description>The port number that the dfs datanode server
> > > uses
> > > > > as a
> > > > > > > > > starting
> > > > > > > > >                point to look for a free port to listen on.
> > > > > > > > > </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>dfs.name.dir</name>
> > > > > > > > >   <value>/tmp/hadoop/dfs/name</value>
> > > > > > > > >   <description>Determines where on the local filesystem the
> > > DFS
> > > > > name
> > > > > > > node
> > > > > > > > >       should store the name table.</description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>dfs.data.dir</name>
> > > > > > > > >   <value>/tmp/hadoop/dfs/data</value>
> > > > > > > > >   <description>Determines where on the local filesystem an DFS
> > > > > data
> > > > > > > node
> > > > > > > > >   should store its blocks.  If this is a comma- or
> > > space-delimited
> > > > > > > > >   list of directories, then data will be stored in all named
> > > > > > > > >   directories, typically on different devices.</description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>dfs.replication</name>
> > > > > > > > >   <value>1</value>
> > > > > > > > >   <description>How many copies we try to have at all times.
> > > The
> > > > > actual
> > > > > > > > >   number of replications is at max the number of datanodes in
> > > the
> > > > > > > > >   cluster.</description>
> > > > > > > > > </property>
> > > > > > > > > <!-- map/reduce properties -->
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.job.tracker</name>
> > > > > > > > >   <value>127.0.0.1:50020</value>
> > > > > > > > >   <description>The host and port that the MapReduce job
> > > tracker
> > > > > runs
> > > > > > > > >   at.  If "local", then jobs are run in-process as a single
> > > map
> > > > > > > > >   and reduce task.
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.job.tracker.info.port</name>
> > > > > > > > >   <value>50030</value>
> > > > > > > > >   <description>The port that the MapReduce job tracker info
> > > > > webserver
> > > > > > > runs
> > > > > > > > > at.
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.task.tracker.output.port</name>
> > > > > > > > >   <value>50040</value>
> > > > > > > > >   <description>The port number that the MapReduce task tracker
> > > > > output
> > > > > > > > > server uses as a starting point to look for
> > > > > > > > > a free port to listen on.
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.task.tracker.report.port</name>
> > > > > > > > >   <value>50050</value>
> > > > > > > > >   <description>The port number that the MapReduce task tracker
> > > > > report
> > > > > > > > > server uses as a starting
> > > > > > > > >                point to look for a free port to listen on.
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > > >   <description>The local directory where MapReduce stores
> > > > > intermediate
> > > > > > > > >   data files.  May be a space- or comma- separated list of
> > > > > > > > >   directories on different devices in order to spread disk
> > > i/o.
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.system.dir</name>
> > > > > > > > >   <value>/tmp/hadoop/mapred/system</value>
> > > > > > > > >   <description>The shared directory where MapReduce stores
> > > control
> > > > > > > files.
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.temp.dir</name>
> > > > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > > > >   <description>A shared directory for temporary files.
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > > > >   <value>1</value>
> > > > > > > > >   <description>The default number of reduce tasks per job.
> > > > > Typically
> > > > > > > set
> > > > > > > > >   to a prime close to the number of available hosts.  Ignored
> > > when
> > > > > > > > >   mapred.job.tracker is "local".
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > > > >   <value>2</value>
> > > > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > > > >   <description>A shared directory for temporary files.
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > > > >   <value>1</value>
> > > > > > > > >   <description>The default number of reduce tasks per job.
> > > > > Typically
> > > > > > > set
> > > > > > > > >   to a prime close to the number of available hosts.  Ignored
> > > when
> > > > > > > > >   mapred.job.tracker is "local".
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > <property>
> > > > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > > > >   <value>2</value>
> > > > > > > > >   <description>The maximum number of tasks that will be run
> > > > > > > > >   simultaneously by a task tracker.
> > > > > > > > >   </description>
> > > > > > > > > </property>
> > > > > > > > >
> > > > > > > > > </configuration>
> > > > > > > > >
> > > > > > > > > ------
> > > > > > > > >
> > > > > > > > > Then execute the following commands
> > > > > > > > > - initialize the HDFS
> > > > > > > > > bin/hadoop namenode -format
> > > > > > > > > - Start the namenode/datanode
> > > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > > - Lets do some checking...
> > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > >
> > > > > > > > > Should return 0 items!! So lets try to add a file to the DFS
> > > > > > > > >
> > > > > > > > > bin/hadoop dfs -put xyz.html xyz.html
> > > > > > > > >
> > > > > > > > > Try
> > > > > > > > >
> > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > >
> > > > > > > > > You should see one item which is
> > > > > > > > > Found 1 items
> > > > > > > > > /user/root/xyz.html    21433
> > > > > > > > >
> > > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > > >
> > > > > > > > > Now you can start of with inject, generate etc.. etc..
> > > > > > > > >
> > > > > > > > > Hope this time it works for you..
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 4/24/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > > > > On 4/24/06, Peter Swoboda <[hidden email]>
> > > wrote:
> > > > > > > > > > > I forgot to have a look at the log files:
> > > > > > > > > > > namenode:
> > > > > > > > > > > 060424 121444 parsing
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060424 121444 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > Exception in thread "main" java.lang.RuntimeException: Not
> > > a
> > > > > > > host:port
> > > > > > > > > pair:
> > > > > > > > > > > local
> > > > > > > > > > >         at
> > > > > > > > >
> > > org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > > > > > > > > >         at
> > > > > org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > > > > > > > > >         at
> > > > > org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > datanode
> > > > > > > > > > > 060424 121448 10 parsing
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060424 121448 10 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060424 121448 10 Can't start DataNode in non-directory:
> > > > > > > > > /tmp/hadoop/dfs/data
> > > > > > > > > > >
> > > > > > > > > > > jobtracker
> > > > > > > > > > > 060424 121455 parsing
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060424 121455 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060424 121455 parsing
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060424 121456 parsing
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > 060424 121456 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > > > > > > mapred.job.tracker: local
> > > > > > > > > > >         at
> > > > > > > > >
> > > > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > > > >         at
> > > > > > > > >
> > > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > > > > > > > > >         at
> > > > > > > > >
> > > > > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > > > > > > > > >         at
> > > > > > > > > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > tasktracker
> > > > > > > > > > > 060424 121502 parsing
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > 060424 121503 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > Exception in thread "main" java.lang.RuntimeException: Bad
> > > > > > > > > > > mapred.job.tracker: local
> > > > > > > > > > >         at
> > > > > > > > >
> > > > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > > > >         at
> > > > > > > > >
> > > org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > > > > > > > > >         at
> > > > > > > > >
> > > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > What can be the problem?
> > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > Von: "Peter Swoboda" <[hidden email]>
> > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > > > specified
> > > > > > > in
> > > > > > > > > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > > > > > > > > >
> > > > > > > > > > > > Got the latest nutch-nightly built,
> > > > > > > > > > > > including hadoop-0.1.1.jar.
> > > > > > > > > > > > Copied the content of the daoop-default.xml into
> > > > > > > hadoop-site.xml.
> > > > > > > > > > > > started namenode, datanode, jobtracker, tasktracker.
> > > > > > > > > > > > made
> > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > >
> > > > > > > > > > > > result:
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > > starting namenode, logging to
> > > > > > > > > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > > > > > > > > starting datanode, logging to
> > > > > > > > > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > > > starting jobtracker, logging to
> > > > > > > > > > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > > > starting tasktracker, logging to
> > > > > > > > > > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > 060424 121512 parsing
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060424 121512 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > 060424 121513 No FS indicated, using default:local
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > 060424 121543 parsing
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060424 121543 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > 060424 121544 No FS indicated, using default:local
> > > > > > > > > > > > Found 18 items
> > > > > > > > > > > > /home/../nutch-nightly/docs      <dir>
> > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > > > > > > > > > /home/../nutch-nightly/webapps   <dir>
> > > > > > > > > > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > > > > > > > > > /home/../nutch-nightly/build.xml 21433
> > > > > > > > > > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > > > > > > > > > /home/../nutch-nightly/test.log  3447
> > > > > > > > > > > > /home/../nutch-nightly/conf      <dir>
> > > > > > > > > > > > /home/../nutch-nightly/default.properties        3043
> > > > > > > > > > > > /home/../nutch-nightly/plugins   <dir>
> > > > > > > > > > > > /home/../nutch-nightly/lib       <dir>
> > > > > > > > > > > > /home/../nutch-nightly/bin       <dir>
> > > > > > > > > > > > /home/../nutch-nightly/logs      <dir>
> > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > > > > > > > > > /home/../nutch-nightly/src       <dir>
> > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > > > > > > > > > /home/../nutch-nightly/seeds     <dir>
> > > > > > > > > > > > /home/../nutch-nightly/README.txt        403
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > > > 060424 121603 parsing
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060424 121603 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > 060424 121603 No FS indicated, using default:local
> > > > > > > > > > > > Found 2 items
> > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > > >
> > > > > > > > > > > > so far so good, but:
> > > > > > > > > > > >
> > > > > > > > > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > 060424 121614 crawl started in: crawled
> > > > > > > > > > > > 060424 121614 rootUrlDir = 2
> > > > > > > > > > > > 060424 121614 threads = 10
> > > > > > > > > > > > 060424 121614 depth = 5
> > > > > > > > > > > > Exception in thread "main" java.io.IOException: No valid
> > > > > local
> > > > > > > > > directories
> > > > > > > > > > > > in property: mapred.local.dir
> > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > >
> > > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > > > > > > > > >         at
> > > > > > > > >
> > > org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > > > > > > > > >         at
> > > org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > >
> > > > > > > > > > > > I really don't know what to do.
> > > > > > > > > > > > in hadoop-site.xml it's:
> > > > > > > > > > > > ..
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > > > > > >   <description>The local directory where MapReduce
> > > stores
> > > > > > > > > intermediate
> > > > > > > > > > > >   data files.  May be a space- or comma- separated list
> > > of
> > > > > > > > > > > >   directories on different devices in order to spread
> > > disk
> > > > > i/o.
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > > ..
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > _______________________________________
> > > > > > > > > > > > Is your hadoop-site.xml empty, I mean it doesn't
> > > consisit
> > > > > any
> > > > > > > > > > > > configuration correct? So what you need to do is add
> > > your
> > > > > > > > > > > > configuration there. I suggest you copy the
> > > hadoop-0.1.1.jar
> > > > > to
> > > > > > > > > > > > another directory for inspection, copy not move. unzip
> > > the
> > > > > > > > > > > > hadoop-0.1.1.jar file you will see hadoop-default.xml
> > > file
> > > > > > > there.
> > > > > > > > > use
> > > > > > > > > > > > that as a template to edit your hadoop-site.xml under
> > > conf.
> > > > > Once
> > > > > > > you
> > > > > > > > > > > > have edited it then you should start your 'namenode' and
> > > > > > > 'datanode'.
> > > > > > > > > I
> > > > > > > > > > > > am guessing you are using nutch in a distributed way.
> > > cos
> > > > > you
> > > > > > > don't
> > > > > > > > > > > > need to use hadoop if you are just running in one
> > > machine
> > > > > local
> > > > > > > > > mode!!
> > > > > > > > > > > >
> > > > > > > > > > > > Anyway you need to do the following to start the
> > > datanode
> > > > > and
> > > > > > > > > namenode
> > > > > > > > > > > >
> > > > > > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > > > > >
> > > > > > > > > > > > then you need to start jobtracker and tasktracker before
> > > you
> > > > > > > start
> > > > > > > > > > > > crawling
> > > > > > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > > >
> > > > > > > > > > > > then you start your bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > >
> > > > > > > > > > > > On 4/21/06, Peter Swoboda <[hidden email]>
> > > > > wrote:
> > > > > > > > > > > > > ok. changed to latest nightly build.
> > > > > > > > > > > > > hadoop-0.1.1.jar is existing,
> > > > > > > > > > > > > hadoop-site.xml also.
> > > > > > > > > > > > > now trying
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > >
> > > > > > > > > > > > > 060421 125154 parsing
> > > > > > > > > > > > >
> > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060421 125155 parsing
> > > > > > > > > > > > >
> > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > > > e.xml
> > > > > > > > > > > > > 060421 125155 No FS indicated, using default:local
> > > > > > > > > > > > >
> > > > > > > > > > > > > and
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > >
> > > > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > > > >
> > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > > > >
> > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > > > e.xml
> > > > > > > > > > > > > 060421 125217 No FS indicated, using default:local
> > > > > > > > > > > > > Found 16 items
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/docs      <dir>
> > > > > > > > > > > > >
> > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war
> > > > > > > 15541036
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/webapps   <dir>
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt
> > > > > 17709
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt
> > > > > 615
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/conf      <dir>
> > > > > > > > > > > > >
> > > /home/stud/jung/Desktop/nutch-nightly/default.properties
> > > > > > > > > 3043
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/plugins   <dir>
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/lib       <dir>
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/bin       <dir>
> > > > > > > > > > > > >
> > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar
> > > > > 408375
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/src       <dir>
> > > > > > > > > > > > >
> > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job
> > > > > > > 18537096
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds     <dir>
> > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt
> > > > > 403
> > > > > > > > > > > > >
> > > > > > > > > > > > > also:
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > > > >
> > > > > > > > > > > > > 060421 133004 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060421 133004 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > 060421 133004 No FS indicated, using default:local
> > > > > > > > > > > > > Found 2 items
> > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > >
> > > > > > > > > > > > > but:
> > > > > > > > > > > > >
> > > > > > > > > > > > > but:
> > > > > > > > > > > > >
> > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > > > > > >
> > > > > > > > > > > > > 060421 131722 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > 060421 131723 crawl started in: crawled
> > > > > > > > > > > > > 060421 131723 rootUrlDir = 2
> > > > > > > > > > > > > 060421 131723 threads = 10
> > > > > > > > > > > > > 060421 131723 depth = 5
> > > > > > > > > > > > > 060421 131724 Injector: starting
> > > > > > > > > > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > > > > > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > > > > > > > > 060421 131724 Injector: Converting injected urls to
> > > crawl
> > > > > db
> > > > > > > > > entries.
> > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > 060421 131727 job_6jn7j8
> > > > > > > > > > > > > java.io.IOException: No input directories specified
> > > in:
> > > > > > > > > Configuration:
> > > > > > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml ,
> > > > > > > > > > > > >
> > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > >         at
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > > > > :90)
> > > > > > > > > > > > >         at
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > > > > :100)
> > > > > > > > > > > > >         at
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > > > > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > > > > > > > > Exception in thread "main" java.io.IOException: Job
> > > > > failed!
> > > > > > > > > > > > >         at
> > > > > > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > > > > > > >         at
> > > > > > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > > > > > > > > >         at
> > > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can anyone help?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> > > directories
> > > > > > > specified
> > > > > > > > > in
> > > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Also I have noticed that you are using hadoop-0.1,
> > > there
> > > > > was
> > > > > > > a
> > > > > > > > > bug in
> > > > > > > > > > > > > > 0.1 you should be using 0.1.1. Under you lib catalog
> > > you
> > > > > > > should
> > > > > > > > > have
> > > > > > > > > > > > > > the following file
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > hadoop-0.1.1.jar
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If thats the case. Please download the latest
> > > nightly
> > > > > build.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On 4/21/06, Zaheed Haque <[hidden email]>
> > > wrote:
> > > > > > > > > > > > > > > Do you have a file called "hadoop-site.xml" under
> > > your
> > > > > > > conf
> > > > > > > > > > > > directory?
> > > > > > > > > > > > > > > The content of the file is like the following:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > > > > > > <?xml-stylesheet type="text/xsl"
> > > > > > > href="configuration.xsl"?>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > <!-- Put site-specific property overrides in this
> > > > > file.
> > > > > > > -->
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > <configuration>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > </configuration>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > or is it missing... if its missing please create a
> > > > > file
> > > > > > > under
> > > > > > > > > the
> > > > > > > > > > > > conf
> > > > > > > > > > > > > > > catalog with the name hadoop-site.xml and then try
> > > the
> > > > > > > hadoop
> > > > > > > > > dfs
> > > > > > > > > > > > -ls
> > > > > > > > > > > > > > > again?  you should see something! like listing
> > > from
> > > > > your
> > > > > > > local
> > > > > > > > > file
> > > > > > > > > > > > > > > system.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > > <[hidden email]>
> > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> > > > > directories
> > > > > > > > > specified
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > > > 060421 122421 parsing
> > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think the hadoop-site is missing cos we should
> > > be
> > > > > seeing
> > > > > > > a
> > > > > > > > > message
> > > > > > > > > > > > > > > like this here...
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 060421 131014 parsing
> > > > > > > > > > > > > > >
> > > > > > > > >
> > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 060421 122421 No FS indicated, using
> > > default:local
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 060421 122425 parsing
> > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 060421 122426 No FS indicated, using
> > > default:local
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Found 0 items
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > As you can see, i can't.
> > > > > > > > > > > > > > > > What's going wrong?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can you see your text file with URLS?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Furthermore bin/nutch crawl is a one shot
> > > > > crawl/index
> > > > > > > > > command. I
> > > > > > > > > > > > > > > > > strongly recommend you take the long route of
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > inject, generate, fetch, updatedb,
> > > invertlinks,
> > > > > index,
> > > > > > > > > dedup and
> > > > > > > > > > > > > > > > > merge.  You can try the above commands just by
> > > > > typing
> > > > > > > > > > > > > > > > > bin/nutch inject
> > > > > > > > > > > > > > > > > etc..
> > > > > > > > > > > > > > > > > If just try the inject command without any
> > > > > parameters
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > > tell
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > how to use it..
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hope this helps.
> > > > > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > > > > <[hidden email]>
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > hi
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > > > > > > > > > done the following steps:
> > > > > > > > > > > > > > > > > > created an urls.txt in a dir. named seeds
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 060317 121440 parsing
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > > > 060317 121441 No FS indicated, using
> > > > > default:local
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2
> > > >&
> > > > > > > crawl.log
> > > > > > > > > > > > > > > > > > but in crawl.log:
> > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > > >
> > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > > > java.io.IOException: No input directories
> > > > > specified
> > > > > > > in:
> > > > > > > > > > > > > > Configuration:
> > > > > > > > > > > > > > > > > > defaults: hadoop-default.xml ,
> > > > > mapred-default.xml ,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > > > > :84)
> > > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > > > > :94)
> > > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > > > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > > > > > > > > > > Exception in thread "main"
> > > java.io.IOException:
> > > > > Job
> > > > > > > > > failed!
> > > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > >
> > > > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > >
> > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > > > > > > > > > >     at
> > > > > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Any ideas?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > > > > > > > "Feel free" mit GMX DSL!
> > > > > http://www.gmx.net/de/go/dsl
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > GMX Produkte empfehlen und ganz einfach Geld verdienen!
> > > > > > Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
> > > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > >
> > > > Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
> > > > Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer
> > > >
> > >
> >
> > --
> > "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
> > Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
 
> hmm.. where is your urls.txt file? is it in Hadoop filesystem, I mean
> what happen if you try
>
> bin/hadoop dfs -ls urls
>
bash-3.00$ bin/hadoop dfs -ls urls
060426 094810 parsing
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 094810 parsing
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
060426 094811 Client connection to 127.0.0.1:50000: starting
060426 094811 No FS indicated, using default:localhost.localdomain:50000
Found 3 items
/user/swoboda/urls/urllist.txt  26
/user/swoboda/urls/urllist.txt~ 0
/user/swoboda/urls/urls <dir>
bash-3.00$


 

> /Z
>
> On 4/26/06, Zaheed Haque <[hidden email]> wrote:
> > On 4/26/06, Peter Swoboda <[hidden email]> wrote:
> > > > --- Ursprüngliche Nachricht ---
> > > > Von: "Zaheed Haque" <[hidden email]>
> > > > An: [hidden email]
> > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > Datum: Wed, 26 Apr 2006 09:12:47 +0200
> > > >
> > > > good. as you can see all your data will be saved under
> > > >
> > > > /user/swoboda/
> > > >
> > > > And urls is the directory where you have your urls.txt file.
> > > >
> > > > so the inject statement you should have is the following:
> > > >
> > > > bin/nutch inject crawldb urls
> > >
> > > result:
> > > bash-3.00$ bin/nutch inject crawldb urls
> > > 060426 091859 Injector: starting
> > > 060426 091859 Injector: crawlDb: crawldb
> > > 060426 091859 Injector: urlDir: urls
> > > 060426 091900 parsing
> > >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml

> > > 060426 091900 parsing
> > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > > 060426 091901 parsing
> > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > > 060426 091901 parsing
> > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > 060426 091901 Injector: Converting injected urls to crawl db entries.
> > > 060426 091901 parsing
> > >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060426 091901 parsing
> > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > > 060426 091901 parsing
> > >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > 060426 091901 parsing
> > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > > 060426 091901 parsing
> > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > 060426 091901 Client connection to 127.0.0.1:50020: starting
> > > 060426 091902 Client connection to 127.0.0.1:50000: starting
> > > 060426 091902 parsing
> > >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml

> > > 060426 091902 parsing
> > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > 060426 091907 Running job: job_b59xmu
> > > 060426 091908  map 100%  reduce 100%
> > > Exception in thread "main" java.io.IOException: Job failed!
> > >         at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > >         at org.apache.nutch.crawl.Injector.main(Injector.java:138)
> > > bash-3.00$
> > >
> > > >
> > > > so try the above first then try
> > > >
> > > > hadoop dfs -ls you will see crawldb directory.
> > > >
> > >
> > > bash-3.00$ bin/hadoop dfs -ls
> > > 060426 091842 parsing
> > >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml

> > > 060426 091843 parsing
> > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > 060426 091843 Client connection to 127.0.0.1:50000: starting
> > > 060426 091843 No FS indicated, using
> default:localhost.localdomain:50000
> > > Found 1 items
> > > /user/swoboda/urls      <dir>
> > > bash-3.00$
> > >
> > >
> > > > Cheers
> > > >
> > > > On 4/26/06, Peter Swoboda <[hidden email]> wrote:
> > > > > Hi.
> > > > > Of course i can. here you are:
> > > > >
> > > > >
> > > > > > --- Ursprüngliche Nachricht ---
> > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > An: [hidden email]
> > > > > > Betreff: Re: java.io.IOException: No input directories specified
> in
> > > > > > Datum: Tue, 25 Apr 2006 12:00:53 +0200
> > > > > >
> > > > > > Hi Could you please post the results for the following commands
> > > > > > bin/hadoop dfs -ls
> > > > >
> > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > 060426 085559 parsing
> > > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060426 085559 parsing
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > 060426 085559 No FS indicated, using
> default:localhost.localdomain:50000
> > > > > 060426 085559 Client connection to 127.0.0.1:50000: starting
> > > > > Found 1 items
> > > > > /user/swoboda/urls      <dir>
> > > > > bash-3.00$
> > > > >
> > > > >
> > > > > >
> > > > > > and
> > > > > >
> > > > > > bin/nutch inject crawldb crawled(your urls directory in hadoop)
> > > > > >
> > > > >
> > > > > bash-3.00$ bin/nutch inject crawldb crawled urls
> > > > > 060426 085723 Injector: starting
> > > > > 060426 085723 Injector: crawlDb: crawldb
> > > > > 060426 085723 Injector: urlDir: crawled
> > > > > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > op-0.1.1.jar!/hadoop-default.xml
> > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> > > > efault.xml
> > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-s
> ite.xml
> > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/hadoop-
> site.xml
> > > > > 060426 085724 Injector: Converting injected urls to crawl db
> entries.
> > > > > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > op-0.1.1.jar!/hadoop-default.xml
> > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> > > > efault.xml
> > > > > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > op-0.1.1.jar!/mapred-default.xml
> > > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/nutch-s
> ite.xml
> > > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop-
> site.xml
> > > > > 060426 085725 Client connection to 127.0.0.1:50020: starting
> > > > > 060426 085725 Client connection to 127.0.0.1:50000: starting
> > > > > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > op-0.1.1.jar!/hadoop-default.xml
> > > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop-
> site.xml
> > > > > 060426 085730 Running job: job_o6tvpr
> > > > > 060426 085731  map 100%  reduce 100%
> > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > >         at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > >         at
> org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > >         at org.apache.nutch.crawl.Injector.main(Injector.java:138)
> > > > > bash-3.00$
> > > > >
> > > > >
> > > > > > thanks
> > > > > >
> > > > > > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > Sorry, my mistake. changed to 0.1.1
> > > > > > > results:
> > > > > > >
> > > > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > > > 060425 113831 parsing
> > > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060425 113831 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > 060425 113832 parsing
> > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > 060425 113832 parsing
> > > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060425 113832 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > 060425 113832 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060425 113832 Client connection to 127.0.0.1:50000: starting
> > > > > > > 060425 113832 crawl started in: crawled
> > > > > > > 060425 113832 rootUrlDir = 2
> > > > > > > 060425 113832 threads = 10
> > > > > > > 060425 113832 depth = 5
> > > > > > > 060425 113833 Injector: starting
> > > > > > > 060425 113833 Injector: crawlDb: crawled/crawldb
> > > > > > > 060425 113833 Injector: urlDir: 2
> > > > > > > 060425 113833 Injector: Converting injected urls to crawl db
> > > > entries.
> > > > > > > 060425 113833 parsing
> > > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060425 113833 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > 060425 113833 parsing
> > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > 060425 113833 parsing
> > > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060425 113833 parsing
> > > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > 060425 113833 parsing
> > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > 060425 113833 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060425 113834 Client connection to 127.0.0.1:50020: starting
> > > > > > > 060425 113834 Client connection to 127.0.0.1:50000: starting
> > > > > > > 060425 113834 parsing
> > > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060425 113834 parsing
> > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060425 113838 Running job: job_23a6ra
> > > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > > >         at
> > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > >         at
> org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > bash-3.00$
> > > > > > >
> > > > > > >
> > > > > > > Step by Step, same but another job that failed.
> > > > > > >
> > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > An: [hidden email]
> > > > > > > > Betreff: Re: java.io.IOException: No input directories
> specified
> > > > in
> > > > > > > > Datum: Tue, 25 Apr 2006 11:34:10 +0200
> > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > >
> > > > > > > > Somehow you are still using hadoop-0.1 and not 0.1.1. I am
> not
> > > > sure if
> > > > > > > > this update will solve your problem but it might. With the
> config
> > > > I
> > > > > > > > sent you, I could, crawl-index-serach so there must be
> something
> > > > > > > > else.. I am not sure.
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > > Zaheed
> > > > > > > >
> > > > > > > > On 4/25/06, Peter Swoboda <[hidden email]>
> wrote:
> > > > > > > > > Seems to be a bit better, doesn't it?
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > > > > > 060425 110124 parsing
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > 060425 110124 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > 060425 110124 parsing
> > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > 060425 110124 parsing
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > > 060425 110125 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > 060425 110125 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060425 110125 Client connection to 127.0.0.1:50000:
> starting
> > > > > > > > > 060425 110125 crawl started in: crawled
> > > > > > > > > 060425 110125 rootUrlDir = 2
> > > > > > > > > 060425 110125 threads = 10
> > > > > > > > > 060425 110125 depth = 5
> > > > > > > > > 060425 110126 Injector: starting
> > > > > > > > > 060425 110126 Injector: crawlDb: crawled/crawldb
> > > > > > > > > 060425 110126 Injector: urlDir: 2
> > > > > > > > > 060425 110126 Injector: Converting injected urls to crawl
> db
> > > > > > entries.
> > > > > > > > > 060425 110126 parsing
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > 060425 110126 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > 060425 110126 parsing
> > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > 060425 110126 parsing
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > > 060425 110126 parsing
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > > 060425 110126 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > 060425 110127 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060425 110127 Client connection to 127.0.0.1:50020:
> starting
> > > > > > > > > 060425 110127 Client connection to 127.0.0.1:50000:
> starting
> > > > > > > > > 060425 110127 parsing
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > 060425 110127 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > Exception in thread "main"
> > > > > > > > java.lang.reflect.UndeclaredThrowableException
> > > > > > > > >         at
> > > > org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown
> > > > > > > > Source)
> > > > > > > > >         at
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
> > > > > > > > >         at
> > > > > > > >
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
> > > > > > > > >         at
> > > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
> > > > > > > > >         at
> > > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > >         at
> org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > Caused by: java.io.IOException: timed out waiting for
> response
> > > > > > > > >         at
> org.apache.hadoop.ipc.Client.call(Client.java:303)
> > > > > > > > >         at
> > > > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> > > > > > > > >         ... 6 more
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > local ip is the same,
> > > > > > > > > but don't exactly know how to handle the ports.
> > > > > > > > >
> > > > > > > > > Step by Step (generate, index..) caused same error while
> > > > > > > > >  bin/nutch generate crawl/crawldb crawl/segments
> > > > > > > > >
> > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > An: [hidden email]
> > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > > specified
> > > > > > in
> > > > > > > > > > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> > > > > > > > > >
> > > > > > > > > > Try the following in your hadoop-site.xml.. please
> change and
> > > > > > adjust
> > > > > > > > > > based on your ip address. The following configuration
> assumes
> > > > that
> > > > > > the
> > > > > > > > > > you have 1 server and you are using it as a namenode as
> well
> > > > as a
> > > > > > > > > > datanode. Note this is NOT the reason for running
> Hadoopified
> > > > > > Nutch!
> > > > > > > > > > It is rather for testing....
> > > > > > > > > >
> > > > > > > > > > --------------------
> > > > > > > > > >
> > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > <?xml-stylesheet type="text/xsl"
> href="configuration.xsl"?>
> > > > > > > > > >
> > > > > > > > > > <configuration>
> > > > > > > > > >
> > > > > > > > > > <!-- file system properties -->
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>fs.default.name</name>
> > > > > > > > > >   <value>127.0.0.1:50000</value>
> > > > > > > > > >   <description>The name of the default file system.
> Either
> > > > the
> > > > > > > > > >   literal string "local" or a host:port for
> DFS.</description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>dfs.datanode.port</name>
> > > > > > > > > >   <value>50010</value>
> > > > > > > > > >   <description>The port number that the dfs datanode
> server
> > > > uses
> > > > > > as a
> > > > > > > > > > starting
> > > > > > > > > >                point to look for a free port to listen
> on.
> > > > > > > > > > </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>dfs.name.dir</name>
> > > > > > > > > >   <value>/tmp/hadoop/dfs/name</value>
> > > > > > > > > >   <description>Determines where on the local filesystem
> the
> > > > DFS
> > > > > > name
> > > > > > > > node
> > > > > > > > > >       should store the name table.</description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>dfs.data.dir</name>
> > > > > > > > > >   <value>/tmp/hadoop/dfs/data</value>
> > > > > > > > > >   <description>Determines where on the local filesystem
> an DFS
> > > > > > data
> > > > > > > > node
> > > > > > > > > >   should store its blocks.  If this is a comma- or
> > > > space-delimited
> > > > > > > > > >   list of directories, then data will be stored in all
> named
> > > > > > > > > >   directories, typically on different
> devices.</description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>dfs.replication</name>
> > > > > > > > > >   <value>1</value>
> > > > > > > > > >   <description>How many copies we try to have at all
> times.
> > > > The
> > > > > > actual
> > > > > > > > > >   number of replications is at max the number of
> datanodes in
> > > > the
> > > > > > > > > >   cluster.</description>
> > > > > > > > > > </property>
> > > > > > > > > > <!-- map/reduce properties -->
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.job.tracker</name>
> > > > > > > > > >   <value>127.0.0.1:50020</value>
> > > > > > > > > >   <description>The host and port that the MapReduce job
> > > > tracker
> > > > > > runs
> > > > > > > > > >   at.  If "local", then jobs are run in-process as a
> single
> > > > map
> > > > > > > > > >   and reduce task.
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.job.tracker.info.port</name>
> > > > > > > > > >   <value>50030</value>
> > > > > > > > > >   <description>The port that the MapReduce job tracker
> info
> > > > > > webserver
> > > > > > > > runs
> > > > > > > > > > at.
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.task.tracker.output.port</name>
> > > > > > > > > >   <value>50040</value>
> > > > > > > > > >   <description>The port number that the MapReduce task
> tracker
> > > > > > output
> > > > > > > > > > server uses as a starting point to look for
> > > > > > > > > > a free port to listen on.
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.task.tracker.report.port</name>
> > > > > > > > > >   <value>50050</value>
> > > > > > > > > >   <description>The port number that the MapReduce task
> tracker
> > > > > > report
> > > > > > > > > > server uses as a starting
> > > > > > > > > >                point to look for a free port to listen
> on.
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > > > >   <description>The local directory where MapReduce
> stores
> > > > > > intermediate
> > > > > > > > > >   data files.  May be a space- or comma- separated list
> of
> > > > > > > > > >   directories on different devices in order to spread
> disk
> > > > i/o.
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.system.dir</name>
> > > > > > > > > >   <value>/tmp/hadoop/mapred/system</value>
> > > > > > > > > >   <description>The shared directory where MapReduce
> stores
> > > > control
> > > > > > > > files.
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.temp.dir</name>
> > > > > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > > > > >   <description>A shared directory for temporary files.
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > > > > >   <value>1</value>
> > > > > > > > > >   <description>The default number of reduce tasks per
> job.
> > > > > > Typically
> > > > > > > > set
> > > > > > > > > >   to a prime close to the number of available hosts.
> Ignored
> > > > when
> > > > > > > > > >   mapred.job.tracker is "local".
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > > > > >   <value>2</value>
> > > > > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > > > > >   <description>A shared directory for temporary files.
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > > > > >   <value>1</value>
> > > > > > > > > >   <description>The default number of reduce tasks per
> job.
> > > > > > Typically
> > > > > > > > set
> > > > > > > > > >   to a prime close to the number of available hosts.
> Ignored
> > > > when
> > > > > > > > > >   mapred.job.tracker is "local".
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > <property>
> > > > > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > > > > >   <value>2</value>
> > > > > > > > > >   <description>The maximum number of tasks that will be
> run
> > > > > > > > > >   simultaneously by a task tracker.
> > > > > > > > > >   </description>
> > > > > > > > > > </property>
> > > > > > > > > >
> > > > > > > > > > </configuration>
> > > > > > > > > >
> > > > > > > > > > ------
> > > > > > > > > >
> > > > > > > > > > Then execute the following commands
> > > > > > > > > > - initialize the HDFS
> > > > > > > > > > bin/hadoop namenode -format
> > > > > > > > > > - Start the namenode/datanode
> > > > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > > > - Lets do some checking...
> > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > >
> > > > > > > > > > Should return 0 items!! So lets try to add a file to the
> DFS
> > > > > > > > > >
> > > > > > > > > > bin/hadoop dfs -put xyz.html xyz.html
> > > > > > > > > >
> > > > > > > > > > Try
> > > > > > > > > >
> > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > >
> > > > > > > > > > You should see one item which is
> > > > > > > > > > Found 1 items
> > > > > > > > > > /user/root/xyz.html    21433
> > > > > > > > > >
> > > > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > >
> > > > > > > > > > Now you can start of with inject, generate etc.. etc..
> > > > > > > > > >
> > > > > > > > > > Hope this time it works for you..
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 4/24/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > > > > > On 4/24/06, Peter Swoboda <[hidden email]>
> > > > wrote:
> > > > > > > > > > > > I forgot to have a look at the log files:
> > > > > > > > > > > > namenode:
> > > > > > > > > > > > 060424 121444 parsing
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060424 121444 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > Exception in thread "main"
> java.lang.RuntimeException: Not
> > > > a
> > > > > > > > host:port
> > > > > > > > > > pair:
> > > > > > > > > > > > local
> > > > > > > > > > > >         at
> > > > > > > > > >
> > > > org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > > > > > > > > > >         at
> > > > > > org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > > > > > > > > > >         at
> > > > > > org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > datanode
> > > > > > > > > > > > 060424 121448 10 parsing
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060424 121448 10 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > 060424 121448 10 Can't start DataNode in
> non-directory:
> > > > > > > > > > /tmp/hadoop/dfs/data
> > > > > > > > > > > >
> > > > > > > > > > > > jobtracker
> > > > > > > > > > > > 060424 121455 parsing
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060424 121455 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > 060424 121455 parsing
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060424 121456 parsing
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > 060424 121456 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > Exception in thread "main"
> java.lang.RuntimeException: Bad
> > > > > > > > > > > > mapred.job.tracker: local
> > > > > > > > > > > >         at
> > > > > > > > > >
> > > > > >
> org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > > > > >         at
> > > > > > > > > >
> > > > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > > > > > > > > > >         at
> > > > > > > > > >
> > > > > >
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > > > > > > > > > >         at
> > > > > > > > > >
> org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > tasktracker
> > > > > > > > > > > > 060424 121502 parsing
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > 060424 121503 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > Exception in thread "main"
> java.lang.RuntimeException: Bad
> > > > > > > > > > > > mapred.job.tracker: local
> > > > > > > > > > > >         at
> > > > > > > > > >
> > > > > >
> org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > > > > >         at
> > > > > > > > > >
> > > > org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > > > > > > > > > >         at
> > > > > > > > > >
> > > > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > What can be the problem?
> > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > Von: "Peter Swoboda" <[hidden email]>
> > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> directories
> > > > > > specified
> > > > > > > > in
> > > > > > > > > > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Got the latest nutch-nightly built,
> > > > > > > > > > > > > including hadoop-0.1.1.jar.
> > > > > > > > > > > > > Copied the content of the daoop-default.xml into
> > > > > > > > hadoop-site.xml.
> > > > > > > > > > > > > started namenode, datanode, jobtracker,
> tasktracker.
> > > > > > > > > > > > > made
> > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > >
> > > > > > > > > > > > > result:
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > > > starting namenode, logging to
> > > > > > > > > > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > > > > > > > > > starting datanode, logging to
> > > > > > > > > > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > > > > starting jobtracker, logging to
> > > > > > > > > > > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > > > > starting tasktracker, logging to
> > > > > > > > > > > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > 060424 121512 parsing
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060424 121512 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > 060424 121513 No FS indicated, using default:local
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > 060424 121543 parsing
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060424 121543 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > 060424 121544 No FS indicated, using default:local
> > > > > > > > > > > > > Found 18 items
> > > > > > > > > > > > > /home/../nutch-nightly/docs      <dir>
> > > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > > > > > > > > > > /home/../nutch-nightly/webapps   <dir>
> > > > > > > > > > > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > > > > > > > > > > /home/../nutch-nightly/build.xml 21433
> > > > > > > > > > > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > > > > > > > > > > /home/../nutch-nightly/test.log  3447
> > > > > > > > > > > > > /home/../nutch-nightly/conf      <dir>
> > > > > > > > > > > > > /home/../nutch-nightly/default.properties      
> 3043
> > > > > > > > > > > > > /home/../nutch-nightly/plugins   <dir>
> > > > > > > > > > > > > /home/../nutch-nightly/lib       <dir>
> > > > > > > > > > > > > /home/../nutch-nightly/bin       <dir>
> > > > > > > > > > > > > /home/../nutch-nightly/logs      <dir>
> > > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > > > > > > > > > > /home/../nutch-nightly/src       <dir>
> > > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > > > > > > > > > > /home/../nutch-nightly/seeds     <dir>
> > > > > > > > > > > > > /home/../nutch-nightly/README.txt        403
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > 060424 121603 parsing
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060424 121603 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > 060424 121603 No FS indicated, using default:local
> > > > > > > > > > > > > Found 2 items
> > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > > > >
> > > > > > > > > > > > > so far so good, but:
> > > > > > > > > > > > >
> > > > > > > > > > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled
> -depht 2
> > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > 060424 121614 crawl started in: crawled
> > > > > > > > > > > > > 060424 121614 rootUrlDir = 2
> > > > > > > > > > > > > 060424 121614 threads = 10
> > > > > > > > > > > > > 060424 121614 depth = 5
> > > > > > > > > > > > > Exception in thread "main" java.io.IOException: No
> valid
> > > > > > local
> > > > > > > > > > directories
> > > > > > > > > > > > > in property: mapred.local.dir
> > > > > > > > > > > > >         at
> > > > > > > > > > > > >
> > > > > > > >
> > > > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > > > > > > > > > >         at
> > > > > > > > > >
> > > > org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > > > > > > > > > >         at
> > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > >
> > > > > > > > > > > > > I really don't know what to do.
> > > > > > > > > > > > > in hadoop-site.xml it's:
> > > > > > > > > > > > > ..
> > > > > > > > > > > > > <property>
> > > > > > > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > > > > > > >   <description>The local directory where MapReduce
> > > > stores
> > > > > > > > > > intermediate
> > > > > > > > > > > > >   data files.  May be a space- or comma- separated
> list
> > > > of
> > > > > > > > > > > > >   directories on different devices in order to
> spread
> > > > disk
> > > > > > i/o.
> > > > > > > > > > > > >   </description>
> > > > > > > > > > > > > </property>
> > > > > > > > > > > > > ..
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > _______________________________________
> > > > > > > > > > > > > Is your hadoop-site.xml empty, I mean it doesn't
> > > > consisit
> > > > > > any
> > > > > > > > > > > > > configuration correct? So what you need to do is
> add
> > > > your
> > > > > > > > > > > > > configuration there. I suggest you copy the
> > > > hadoop-0.1.1.jar
> > > > > > to
> > > > > > > > > > > > > another directory for inspection, copy not move.
> unzip
> > > > the
> > > > > > > > > > > > > hadoop-0.1.1.jar file you will see
> hadoop-default.xml
> > > > file
> > > > > > > > there.
> > > > > > > > > > use
> > > > > > > > > > > > > that as a template to edit your hadoop-site.xml
> under
> > > > conf.
> > > > > > Once
> > > > > > > > you
> > > > > > > > > > > > > have edited it then you should start your
> 'namenode' and
> > > > > > > > 'datanode'.
> > > > > > > > > > I
> > > > > > > > > > > > > am guessing you are using nutch in a distributed
> way.
> > > > cos
> > > > > > you
> > > > > > > > don't
> > > > > > > > > > > > > need to use hadoop if you are just running in one
> > > > machine
> > > > > > local
> > > > > > > > > > mode!!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Anyway you need to do the following to start the
> > > > datanode
> > > > > > and
> > > > > > > > > > namenode
> > > > > > > > > > > > >
> > > > > > > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > > > > > >
> > > > > > > > > > > > > then you need to start jobtracker and tasktracker
> before
> > > > you
> > > > > > > > start
> > > > > > > > > > > > > crawling
> > > > > > > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > > > >
> > > > > > > > > > > > > then you start your bin/hadoop dfs -put seeds
> seeds
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> <[hidden email]>
> > > > > > wrote:
> > > > > > > > > > > > > > ok. changed to latest nightly build.
> > > > > > > > > > > > > > hadoop-0.1.1.jar is existing,
> > > > > > > > > > > > > > hadoop-site.xml also.
> > > > > > > > > > > > > > now trying
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 060421 125154 parsing
> > > > > > > > > > > > > >
> > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060421 125155 parsing
> > > > > > > > > > > > > >
> > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > > > > e.xml
> > > > > > > > > > > > > > 060421 125155 No FS indicated, using
> default:local
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > > > > >
> > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > > > > >
> > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > > > > e.xml
> > > > > > > > > > > > > > 060421 125217 No FS indicated, using
> default:local
> > > > > > > > > > > > > > Found 16 items
> > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/docs    
> <dir>
> > > > > > > > > > > > > >
> > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war
> > > > > > > > 15541036
> > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/webapps  
> <dir>
> > > > > > > > > > > > > >
> /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt
> > > > > > 17709
> > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml
> 21433
> > > > > > > > > > > > > >
> /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt
> > > > > > 615
> > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/conf    
> <dir>
> > > > > > > > > > > > > >
> > > > /home/stud/jung/Desktop/nutch-nightly/default.properties
> > > > > > > > > > 3043
> > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/plugins  
> <dir>
> > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/lib      
> <dir>
> > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/bin      
> <dir>
> > > > > > > > > > > > > >
> > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar
> > > > > > 408375
> > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/src      
> <dir>
> > > > > > > > > > > > > >
> > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job
> > > > > > > > 18537096
> > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds    
> <dir>
> > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt
> > > > > > 403
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > also:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 060421 133004 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060421 133004 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > 060421 133004 No FS indicated, using
> default:local
> > > > > > > > > > > > > > Found 2 items
> > > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > but:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > but:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 060421 131722 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > 060421 131723 crawl started in: crawled
> > > > > > > > > > > > > > 060421 131723 rootUrlDir = 2
> > > > > > > > > > > > > > 060421 131723 threads = 10
> > > > > > > > > > > > > > 060421 131723 depth = 5
> > > > > > > > > > > > > > 060421 131724 Injector: starting
> > > > > > > > > > > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > > > > > > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > > > > > > > > > 060421 131724 Injector: Converting injected urls
> to
> > > > crawl
> > > > > > db
> > > > > > > > > > entries.
> > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > >
> /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > 060421 131727 job_6jn7j8
> > > > > > > > > > > > > > java.io.IOException: No input directories
> specified
> > > > in:
> > > > > > > > > > Configuration:
> > > > > > > > > > > > > > defaults: hadoop-default.xml ,
> mapred-default.xml ,
> > > > > > > > > > > > > >
> > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > > >         at
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > > > > > :90)
> > > > > > > > > > > > > >         at
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > > > > > :100)
> > > > > > > > > > > > > >         at
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > > > > > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > > > > > > > > > Exception in thread "main" java.io.IOException:
> Job
> > > > > > failed!
> > > > > > > > > > > > > >         at
> > > > > > > > > >
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > > > > > > > >         at
> > > > > > > > > >
> org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > > > > > > > > > >         at
> > > > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can anyone help?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> > > > directories
> > > > > > > > specified
> > > > > > > > > > in
> > > > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Also I have noticed that you are using
> hadoop-0.1,
> > > > there
> > > > > > was
> > > > > > > > a
> > > > > > > > > > bug in
> > > > > > > > > > > > > > > 0.1 you should be using 0.1.1. Under you lib
> catalog
> > > > you
> > > > > > > > should
> > > > > > > > > > have
> > > > > > > > > > > > > > > the following file
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > hadoop-0.1.1.jar
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If thats the case. Please download the latest
> > > > nightly
> > > > > > build.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On 4/21/06, Zaheed Haque
> <[hidden email]>
> > > > wrote:
> > > > > > > > > > > > > > > > Do you have a file called "hadoop-site.xml"
> under
> > > > your
> > > > > > > > conf
> > > > > > > > > > > > > directory?
> > > > > > > > > > > > > > > > The content of the file is like the
> following:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > > > > > > > <?xml-stylesheet type="text/xsl"
> > > > > > > > href="configuration.xsl"?>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > <!-- Put site-specific property overrides in
> this
> > > > > > file.
> > > > > > > > -->
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > <configuration>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > </configuration>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > or is it missing... if its missing please
> create a
> > > > > > file
> > > > > > > > under
> > > > > > > > > > the
> > > > > > > > > > > > > conf
> > > > > > > > > > > > > > > > catalog with the name hadoop-site.xml and
> then try
> > > > the
> > > > > > > > hadoop
> > > > > > > > > > dfs
> > > > > > > > > > > > > -ls
> > > > > > > > > > > > > > > > again?  you should see something! like
> listing
> > > > from
> > > > > > your
> > > > > > > > local
> > > > > > > > > > file
> > > > > > > > > > > > > > > > system.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > > > <[hidden email]>
> > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > > > > > Von: "Zaheed Haque"
> <[hidden email]>
> > > > > > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No
> input
> > > > > > directories
> > > > > > > > > > specified
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > > > > 060421 122421 parsing
> > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think the hadoop-site is missing cos we
> should
> > > > be
> > > > > > seeing
> > > > > > > > a
> > > > > > > > > > message
> > > > > > > > > > > > > > > > like this here...
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 060421 131014 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > >
> > > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 060421 122421 No FS indicated, using
> > > > default:local
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 060421 122425 parsing
> > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 060421 122426 No FS indicated, using
> > > > default:local
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Found 0 items
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > As you can see, i can't.
> > > > > > > > > > > > > > > > > What's going wrong?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can you see your text file with URLS?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Furthermore bin/nutch crawl is a one
> shot
> > > > > > crawl/index
> > > > > > > > > > command. I
> > > > > > > > > > > > > > > > > > strongly recommend you take the long
> route of
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > inject, generate, fetch, updatedb,
> > > > invertlinks,
> > > > > > index,
> > > > > > > > > > dedup and
> > > > > > > > > > > > > > > > > > merge.  You can try the above commands
> just by
> > > > > > typing
> > > > > > > > > > > > > > > > > > bin/nutch inject
> > > > > > > > > > > > > > > > > > etc..
> > > > > > > > > > > > > > > > > > If just try the inject command without
> any
> > > > > > parameters
> > > > > > > > it
> > > > > > > > > > will
> > > > > > > > > > > > > tell
> > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > how to use it..
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hope this helps.
> > > > > > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > > > > > <[hidden email]>
> > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > hi
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > > > > > > > > > > done the following steps:
> > > > > > > > > > > > > > > > > > > created an urls.txt in a dir. named
> seeds
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 060317 121440 parsing
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > > > > 060317 121441 No FS indicated, using
> > > > > > default:local
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled
> -depth 2
> > > > >&
> > > > > > > > crawl.log
> > > > > > > > > > > > > > > > > > > but in crawl.log:
> > > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > > > >
> > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > >
> file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > > > > java.io.IOException: No input
> directories
> > > > > > specified
> > > > > > > > in:
> > > > > > > > > > > > > > > Configuration:
> > > > > > > > > > > > > > > > > > > defaults: hadoop-default.xml ,
> > > > > > mapred-default.xml ,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > >
> /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > > > > > :84)
> > > > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > > > > > :94)
> > > > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
> > > > > > > > > > > > > > > > > > > 060419 124302 Running job: job_e7cpf1
> > > > > > > > > > > > > > > > > > > Exception in thread "main"
> > > > java.io.IOException:
> > > > > > Job
> > > > > > > > > > failed!
> > > > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > >
> > > > > > > >
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
> > > > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > >
> > > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > > > > > > > > > > >     at
> > > > > > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Any ideas?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,-
> Euro*!
> > > > > > > > > > > > > > > > > "Feel free" mit GMX DSL!
> > > > > > http://www.gmx.net/de/go/dsl
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > > > > "Feel free" mit GMX DSL!
> http://www.gmx.net/de/go/dsl
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > > > > "Feel free" mit GMX DSL!
> http://www.gmx.net/de/go/dsl
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*!
> > > > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > GMX Produkte empfehlen und ganz einfach Geld verdienen!
> > > > > > > Satte Provisionen für GMX Partner:
> http://www.gmx.net/de/go/partner
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
> > > > > Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer
> > > > >
> > > >
> > >
> > > --
> > > "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
> > > Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
> > >
> >
>

--
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Zaheed Haque
I don't think you can have a directory called "urls" under your "urls"
directory? That you have below...

/user/swoboda/urls/urls <dir>

please remove the above directory and try inject again.

bin/hadoop dfs -rm urls/urls

then double check that there are no directory under your urls directory before
running inject..

On 4/26/06, Peter Swoboda <[hidden email]> wrote:

>
> > hmm.. where is your urls.txt file? is it in Hadoop filesystem, I mean
> > what happen if you try
> >
> > bin/hadoop dfs -ls urls
> >
> bash-3.00$ bin/hadoop dfs -ls urls
> 060426 094810 parsing
> jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> 060426 094810 parsing
> file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> 060426 094811 Client connection to 127.0.0.1:50000: starting
> 060426 094811 No FS indicated, using default:localhost.localdomain:50000
> Found 3 items
> /user/swoboda/urls/urllist.txt  26
> /user/swoboda/urls/urllist.txt~ 0
> /user/swoboda/urls/urls <dir>
> bash-3.00$
>
>
>
> > /Z
> >
> > On 4/26/06, Zaheed Haque <[hidden email]> wrote:
> > > On 4/26/06, Peter Swoboda <[hidden email]> wrote:
> > > > > --- Ursprüngliche Nachricht ---
> > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > An: [hidden email]
> > > > > Betreff: Re: java.io.IOException: No input directories specified in
> > > > > Datum: Wed, 26 Apr 2006 09:12:47 +0200
> > > > >
> > > > > good. as you can see all your data will be saved under
> > > > >
> > > > > /user/swoboda/
> > > > >
> > > > > And urls is the directory where you have your urls.txt file.
> > > > >
> > > > > so the inject statement you should have is the following:
> > > > >
> > > > > bin/nutch inject crawldb urls
> > > >
> > > > result:
> > > > bash-3.00$ bin/nutch inject crawldb urls
> > > > 060426 091859 Injector: starting
> > > > 060426 091859 Injector: crawlDb: crawldb
> > > > 060426 091859 Injector: urlDir: urls
> > > > 060426 091900 parsing
> > > >
> >
> jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060426 091900 parsing
> > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > > > 060426 091901 parsing
> > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > > > 060426 091901 parsing
> > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > 060426 091901 Injector: Converting injected urls to crawl db entries.
> > > > 060426 091901 parsing
> > > >
> >
> jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060426 091901 parsing
> > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > > > 060426 091901 parsing
> > > >
> >
> jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > 060426 091901 parsing
> > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > > > 060426 091901 parsing
> > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > 060426 091901 Client connection to 127.0.0.1:50020: starting
> > > > 060426 091902 Client connection to 127.0.0.1:50000: starting
> > > > 060426 091902 parsing
> > > >
> >
> jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060426 091902 parsing
> > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > 060426 091907 Running job: job_b59xmu
> > > > 060426 091908  map 100%  reduce 100%
> > > > Exception in thread "main" java.io.IOException: Job failed!
> > > >         at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > >         at org.apache.nutch.crawl.Injector.main(Injector.java:138)
> > > > bash-3.00$
> > > >
> > > > >
> > > > > so try the above first then try
> > > > >
> > > > > hadoop dfs -ls you will see crawldb directory.
> > > > >
> > > >
> > > > bash-3.00$ bin/hadoop dfs -ls
> > > > 060426 091842 parsing
> > > >
> >
> jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > 060426 091843 parsing
> > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > 060426 091843 Client connection to 127.0.0.1:50000: starting
> > > > 060426 091843 No FS indicated, using
> > default:localhost.localdomain:50000
> > > > Found 1 items
> > > > /user/swoboda/urls      <dir>
> > > > bash-3.00$
> > > >
> > > >
> > > > > Cheers
> > > > >
> > > > > On 4/26/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > Hi.
> > > > > > Of course i can. here you are:
> > > > > >
> > > > > >
> > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > An: [hidden email]
> > > > > > > Betreff: Re: java.io.IOException: No input directories specified
> > in
> > > > > > > Datum: Tue, 25 Apr 2006 12:00:53 +0200
> > > > > > >
> > > > > > > Hi Could you please post the results for the following commands
> > > > > > > bin/hadoop dfs -ls
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > 060426 085559 parsing
> > > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 085559 parsing
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > 060426 085559 No FS indicated, using
> > default:localhost.localdomain:50000
> > > > > > 060426 085559 Client connection to 127.0.0.1:50000: starting
> > > > > > Found 1 items
> > > > > > /user/swoboda/urls      <dir>
> > > > > > bash-3.00$
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > and
> > > > > > >
> > > > > > > bin/nutch inject crawldb crawled(your urls directory in hadoop)
> > > > > > >
> > > > > >
> > > > > > bash-3.00$ bin/nutch inject crawldb crawled urls
> > > > > > 060426 085723 Injector: starting
> > > > > > 060426 085723 Injector: crawlDb: crawldb
> > > > > > 060426 085723 Injector: urlDir: crawled
> > > > > > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > > op-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> > > > > efault.xml
> > > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-s
> > ite.xml
> > > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/hadoop-
> > site.xml
> > > > > > 060426 085724 Injector: Converting injected urls to crawl db
> > entries.
> > > > > > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > > op-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> > > > > efault.xml
> > > > > > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > > op-0.1.1.jar!/mapred-default.xml
> > > > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/nutch-s
> > ite.xml
> > > > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop-
> > site.xml
> > > > > > 060426 085725 Client connection to 127.0.0.1:50020: starting
> > > > > > 060426 085725 Client connection to 127.0.0.1:50000: starting
> > > > > > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > > op-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop-
> > site.xml
> > > > > > 060426 085730 Running job: job_o6tvpr
> > > > > > 060426 085731  map 100%  reduce 100%
> > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > >         at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > >         at
> > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > >         at org.apache.nutch.crawl.Injector.main(Injector.java:138)
> > > > > > bash-3.00$
> > > > > >
> > > > > >
> > > > > > > thanks
> > > > > > >
> > > > > > > On 4/25/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > > Sorry, my mistake. changed to 0.1.1
> > > > > > > > results:
> > > > > > > >
> > > > > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > > > > 060425 113831 parsing
> > > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060425 113831 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > 060425 113832 parsing
> > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > 060425 113832 parsing
> > > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060425 113832 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > 060425 113832 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060425 113832 Client connection to 127.0.0.1:50000: starting
> > > > > > > > 060425 113832 crawl started in: crawled
> > > > > > > > 060425 113832 rootUrlDir = 2
> > > > > > > > 060425 113832 threads = 10
> > > > > > > > 060425 113832 depth = 5
> > > > > > > > 060425 113833 Injector: starting
> > > > > > > > 060425 113833 Injector: crawlDb: crawled/crawldb
> > > > > > > > 060425 113833 Injector: urlDir: 2
> > > > > > > > 060425 113833 Injector: Converting injected urls to crawl db
> > > > > entries.
> > > > > > > > 060425 113833 parsing
> > > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060425 113833 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > 060425 113833 parsing
> > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > 060425 113833 parsing
> > > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060425 113833 parsing
> > > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > 060425 113833 parsing
> > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > 060425 113833 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060425 113834 Client connection to 127.0.0.1:50020: starting
> > > > > > > > 060425 113834 Client connection to 127.0.0.1:50000: starting
> > > > > > > > 060425 113834 parsing
> > > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > 060425 113834 parsing
> > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > 060425 113838 Running job: job_23a6ra
> > > > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > > > >         at
> > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > >         at
> > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > bash-3.00$
> > > > > > > >
> > > > > > > >
> > > > > > > > Step by Step, same but another job that failed.
> > > > > > > >
> > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > An: [hidden email]
> > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > specified
> > > > > in
> > > > > > > > > Datum: Tue, 25 Apr 2006 11:34:10 +0200
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > >
> > > > > > > > > Somehow you are still using hadoop-0.1 and not 0.1.1. I am
> > not
> > > > > sure if
> > > > > > > > > this update will solve your problem but it might. With the
> > config
> > > > > I
> > > > > > > > > sent you, I could, crawl-index-serach so there must be
> > something
> > > > > > > > > else.. I am not sure.
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > > Zaheed
> > > > > > > > >
> > > > > > > > > On 4/25/06, Peter Swoboda <[hidden email]>
> > wrote:
> > > > > > > > > > Seems to be a bit better, doesn't it?
> > > > > > > > > >
> > > > > > > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > > > > > > 060425 110124 parsing
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > 060425 110124 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > 060425 110124 parsing
> > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > 060425 110124 parsing
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > 060425 110125 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > 060425 110125 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060425 110125 Client connection to 127.0.0.1:50000:
> > starting
> > > > > > > > > > 060425 110125 crawl started in: crawled
> > > > > > > > > > 060425 110125 rootUrlDir = 2
> > > > > > > > > > 060425 110125 threads = 10
> > > > > > > > > > 060425 110125 depth = 5
> > > > > > > > > > 060425 110126 Injector: starting
> > > > > > > > > > 060425 110126 Injector: crawlDb: crawled/crawldb
> > > > > > > > > > 060425 110126 Injector: urlDir: 2
> > > > > > > > > > 060425 110126 Injector: Converting injected urls to crawl
> > db
> > > > > > > entries.
> > > > > > > > > > 060425 110126 parsing
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > 060425 110126 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > 060425 110126 parsing
> > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > 060425 110126 parsing
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > 060425 110126 parsing
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > 060425 110126 parsing
> > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > 060425 110127 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > 060425 110127 Client connection to 127.0.0.1:50020:
> > starting
> > > > > > > > > > 060425 110127 Client connection to 127.0.0.1:50000:
> > starting
> > > > > > > > > > 060425 110127 parsing
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > 060425 110127 parsing
> > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > Exception in thread "main"
> > > > > > > > > java.lang.reflect.UndeclaredThrowableException
> > > > > > > > > >         at
> > > > > org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown
> > > > > > > > > Source)
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
> > > > > > > > > >         at
> > > > > > > > >
> > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
> > > > > > > > > >         at
> > > > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
> > > > > > > > > >         at
> > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > >         at
> > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > Caused by: java.io.IOException: timed out waiting for
> > response
> > > > > > > > > >         at
> > org.apache.hadoop.ipc.Client.call(Client.java:303)
> > > > > > > > > >         at
> > > > > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> > > > > > > > > >         ... 6 more
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > local ip is the same,
> > > > > > > > > > but don't exactly know how to handle the ports.
> > > > > > > > > >
> > > > > > > > > > Step by Step (generate, index..) caused same error while
> > > > > > > > > >  bin/nutch generate crawl/crawldb crawl/segments
> > > > > > > > > >
> > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > > > specified
> > > > > > > in
> > > > > > > > > > > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> > > > > > > > > > >
> > > > > > > > > > > Try the following in your hadoop-site.xml.. please
> > change and
> > > > > > > adjust
> > > > > > > > > > > based on your ip address. The following configuration
> > assumes
> > > > > that
> > > > > > > the
> > > > > > > > > > > you have 1 server and you are using it as a namenode as
> > well
> > > > > as a
> > > > > > > > > > > datanode. Note this is NOT the reason for running
> > Hadoopified
> > > > > > > Nutch!
> > > > > > > > > > > It is rather for testing....
> > > > > > > > > > >
> > > > > > > > > > > --------------------
> > > > > > > > > > >
> > > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > > <?xml-stylesheet type="text/xsl"
> > href="configuration.xsl"?>
> > > > > > > > > > >
> > > > > > > > > > > <configuration>
> > > > > > > > > > >
> > > > > > > > > > > <!-- file system properties -->
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>fs.default.name</name>
> > > > > > > > > > >   <value>127.0.0.1:50000</value>
> > > > > > > > > > >   <description>The name of the default file system.
> > Either
> > > > > the
> > > > > > > > > > >   literal string "local" or a host:port for
> > DFS.</description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>dfs.datanode.port</name>
> > > > > > > > > > >   <value>50010</value>
> > > > > > > > > > >   <description>The port number that the dfs datanode
> > server
> > > > > uses
> > > > > > > as a
> > > > > > > > > > > starting
> > > > > > > > > > >                point to look for a free port to listen
> > on.
> > > > > > > > > > > </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>dfs.name.dir</name>
> > > > > > > > > > >   <value>/tmp/hadoop/dfs/name</value>
> > > > > > > > > > >   <description>Determines where on the local filesystem
> > the
> > > > > DFS
> > > > > > > name
> > > > > > > > > node
> > > > > > > > > > >       should store the name table.</description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>dfs.data.dir</name>
> > > > > > > > > > >   <value>/tmp/hadoop/dfs/data</value>
> > > > > > > > > > >   <description>Determines where on the local filesystem
> > an DFS
> > > > > > > data
> > > > > > > > > node
> > > > > > > > > > >   should store its blocks.  If this is a comma- or
> > > > > space-delimited
> > > > > > > > > > >   list of directories, then data will be stored in all
> > named
> > > > > > > > > > >   directories, typically on different
> > devices.</description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>dfs.replication</name>
> > > > > > > > > > >   <value>1</value>
> > > > > > > > > > >   <description>How many copies we try to have at all
> > times.
> > > > > The
> > > > > > > actual
> > > > > > > > > > >   number of replications is at max the number of
> > datanodes in
> > > > > the
> > > > > > > > > > >   cluster.</description>
> > > > > > > > > > > </property>
> > > > > > > > > > > <!-- map/reduce properties -->
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.job.tracker</name>
> > > > > > > > > > >   <value>127.0.0.1:50020</value>
> > > > > > > > > > >   <description>The host and port that the MapReduce job
> > > > > tracker
> > > > > > > runs
> > > > > > > > > > >   at.  If "local", then jobs are run in-process as a
> > single
> > > > > map
> > > > > > > > > > >   and reduce task.
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.job.tracker.info.port</name>
> > > > > > > > > > >   <value>50030</value>
> > > > > > > > > > >   <description>The port that the MapReduce job tracker
> > info
> > > > > > > webserver
> > > > > > > > > runs
> > > > > > > > > > > at.
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.task.tracker.output.port</name>
> > > > > > > > > > >   <value>50040</value>
> > > > > > > > > > >   <description>The port number that the MapReduce task
> > tracker
> > > > > > > output
> > > > > > > > > > > server uses as a starting point to look for
> > > > > > > > > > > a free port to listen on.
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.task.tracker.report.port</name>
> > > > > > > > > > >   <value>50050</value>
> > > > > > > > > > >   <description>The port number that the MapReduce task
> > tracker
> > > > > > > report
> > > > > > > > > > > server uses as a starting
> > > > > > > > > > >                point to look for a free port to listen
> > on.
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > > > > >   <description>The local directory where MapReduce
> > stores
> > > > > > > intermediate
> > > > > > > > > > >   data files.  May be a space- or comma- separated list
> > of
> > > > > > > > > > >   directories on different devices in order to spread
> > disk
> > > > > i/o.
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.system.dir</name>
> > > > > > > > > > >   <value>/tmp/hadoop/mapred/system</value>
> > > > > > > > > > >   <description>The shared directory where MapReduce
> > stores
> > > > > control
> > > > > > > > > files.
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.temp.dir</name>
> > > > > > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > > > > > >   <description>A shared directory for temporary files.
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > > > > > >   <value>1</value>
> > > > > > > > > > >   <description>The default number of reduce tasks per
> > job.
> > > > > > > Typically
> > > > > > > > > set
> > > > > > > > > > >   to a prime close to the number of available hosts.
> > Ignored
> > > > > when
> > > > > > > > > > >   mapred.job.tracker is "local".
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > > > > > >   <value>2</value>
> > > > > > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > > > > > >   <description>A shared directory for temporary files.
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > > > > > >   <value>1</value>
> > > > > > > > > > >   <description>The default number of reduce tasks per
> > job.
> > > > > > > Typically
> > > > > > > > > set
> > > > > > > > > > >   to a prime close to the number of available hosts.
> > Ignored
> > > > > when
> > > > > > > > > > >   mapred.job.tracker is "local".
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > <property>
> > > > > > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > > > > > >   <value>2</value>
> > > > > > > > > > >   <description>The maximum number of tasks that will be
> > run
> > > > > > > > > > >   simultaneously by a task tracker.
> > > > > > > > > > >   </description>
> > > > > > > > > > > </property>
> > > > > > > > > > >
> > > > > > > > > > > </configuration>
> > > > > > > > > > >
> > > > > > > > > > > ------
> > > > > > > > > > >
> > > > > > > > > > > Then execute the following commands
> > > > > > > > > > > - initialize the HDFS
> > > > > > > > > > > bin/hadoop namenode -format
> > > > > > > > > > > - Start the namenode/datanode
> > > > > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > > > > - Lets do some checking...
> > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > >
> > > > > > > > > > > Should return 0 items!! So lets try to add a file to the
> > DFS
> > > > > > > > > > >
> > > > > > > > > > > bin/hadoop dfs -put xyz.html xyz.html
> > > > > > > > > > >
> > > > > > > > > > > Try
> > > > > > > > > > >
> > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > >
> > > > > > > > > > > You should see one item which is
> > > > > > > > > > > Found 1 items
> > > > > > > > > > > /user/root/xyz.html    21433
> > > > > > > > > > >
> > > > > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > >
> > > > > > > > > > > Now you can start of with inject, generate etc.. etc..
> > > > > > > > > > >
> > > > > > > > > > > Hope this time it works for you..
> > > > > > > > > > >
> > > > > > > > > > > Cheers
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On 4/24/06, Zaheed Haque <[hidden email]> wrote:
> > > > > > > > > > > > On 4/24/06, Peter Swoboda <[hidden email]>
> > > > > wrote:
> > > > > > > > > > > > > I forgot to have a look at the log files:
> > > > > > > > > > > > > namenode:
> > > > > > > > > > > > > 060424 121444 parsing
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060424 121444 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > Exception in thread "main"
> > java.lang.RuntimeException: Not
> > > > > a
> > > > > > > > > host:port
> > > > > > > > > > > pair:
> > > > > > > > > > > > > local
> > > > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > > > > > > > > > > >         at
> > > > > > > org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > > > > > > > > > > >         at
> > > > > > > org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > datanode
> > > > > > > > > > > > > 060424 121448 10 parsing
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060424 121448 10 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > 060424 121448 10 Can't start DataNode in
> > non-directory:
> > > > > > > > > > > /tmp/hadoop/dfs/data
> > > > > > > > > > > > >
> > > > > > > > > > > > > jobtracker
> > > > > > > > > > > > > 060424 121455 parsing
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060424 121455 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > 060424 121455 parsing
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060424 121456 parsing
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > 060424 121456 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > Exception in thread "main"
> > java.lang.RuntimeException: Bad
> > > > > > > > > > > > > mapred.job.tracker: local
> > > > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > >
> > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > >
> > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > > > > > > > > > > >         at
> > > > > > > > > > >
> > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > tasktracker
> > > > > > > > > > > > > 060424 121502 parsing
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > 060424 121503 parsing
> > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > Exception in thread "main"
> > java.lang.RuntimeException: Bad
> > > > > > > > > > > > > mapred.job.tracker: local
> > > > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > >
> > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > What can be the problem?
> > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > Von: "Peter Swoboda" <[hidden email]>
> > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> > directories
> > > > > > > specified
> > > > > > > > > in
> > > > > > > > > > > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Got the latest nutch-nightly built,
> > > > > > > > > > > > > > including hadoop-0.1.1.jar.
> > > > > > > > > > > > > > Copied the content of the daoop-default.xml into
> > > > > > > > > hadoop-site.xml.
> > > > > > > > > > > > > > started namenode, datanode, jobtracker,
> > tasktracker.
> > > > > > > > > > > > > > made
> > > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > result:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > > > > starting namenode, logging to
> > > > > > > > > > > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > > > > > > > > > > starting datanode, logging to
> > > > > > > > > > > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > > > > > starting jobtracker, logging to
> > > > > > > > > > > > > > bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > > > > > starting tasktracker, logging to
> > > > > > > > > > > > > > bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > 060424 121512 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060424 121512 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > 060424 121513 No FS indicated, using default:local
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > > 060424 121543 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060424 121543 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > 060424 121544 No FS indicated, using default:local
> > > > > > > > > > > > > > Found 18 items
> > > > > > > > > > > > > > /home/../nutch-nightly/docs      <dir>
> > > > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.war 15541036
> > > > > > > > > > > > > > /home/../nutch-nightly/webapps   <dir>
> > > > > > > > > > > > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > > > > > > > > > > > /home/../nutch-nightly/build.xml 21433
> > > > > > > > > > > > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > > > > > > > > > > > /home/../nutch-nightly/test.log  3447
> > > > > > > > > > > > > > /home/../nutch-nightly/conf      <dir>
> > > > > > > > > > > > > > /home/../nutch-nightly/default.properties
> > 3043
> > > > > > > > > > > > > > /home/../nutch-nightly/plugins   <dir>
> > > > > > > > > > > > > > /home/../nutch-nightly/lib       <dir>
> > > > > > > > > > > > > > /home/../nutch-nightly/bin       <dir>
> > > > > > > > > > > > > > /home/../nutch-nightly/logs      <dir>
> > > > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.jar 408375
> > > > > > > > > > > > > > /home/../nutch-nightly/src       <dir>
> > > > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.job 18537096
> > > > > > > > > > > > > > /home/../nutch-nightly/seeds     <dir>
> > > > > > > > > > > > > > /home/../nutch-nightly/README.txt        403
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > > 060424 121603 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060424 121603 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > 060424 121603 No FS indicated, using default:local
> > > > > > > > > > > > > > Found 2 items
> > > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > so far so good, but:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled
> > -depht 2
> > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > 060424 121614 crawl started in: crawled
> > > > > > > > > > > > > > 060424 121614 rootUrlDir = 2
> > > > > > > > > > > > > > 060424 121614 threads = 10
> > > > > > > > > > > > > > 060424 121614 depth = 5
> > > > > > > > > > > > > > Exception in thread "main" java.io.IOException: No
> > valid
> > > > > > > local
> > > > > > > > > > > directories
> > > > > > > > > > > > > > in property: mapred.local.dir
> > > > > > > > > > > > > >         at
> > > > > > > > > > > > > >
> > > > > > > > >
> > > > > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > > > > > > > > > > >         at
> > > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I really don't know what to do.
> > > > > > > > > > > > > > in hadoop-site.xml it's:
> > > > > > > > > > > > > > ..
> > > > > > > > > > > > > > <property>
> > > > > > > > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > > > > > > > >   <description>The local directory where MapReduce
> > > > > stores
> > > > > > > > > > > intermediate
> > > > > > > > > > > > > >   data files.  May be a space- or comma- separated
> > list
> > > > > of
> > > > > > > > > > > > > >   directories on different devices in order to
> > spread
> > > > > disk
> > > > > > > i/o.
> > > > > > > > > > > > > >   </description>
> > > > > > > > > > > > > > </property>
> > > > > > > > > > > > > > ..
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > _______________________________________
> > > > > > > > > > > > > > Is your hadoop-site.xml empty, I mean it doesn't
> > > > > consisit
> > > > > > > any
> > > > > > > > > > > > > > configuration correct? So what you need to do is
> > add
> > > > > your
> > > > > > > > > > > > > > configuration there. I suggest you copy the
> > > > > hadoop-0.1.1.jar
> > > > > > > to
> > > > > > > > > > > > > > another directory for inspection, copy not move.
> > unzip
> > > > > the
> > > > > > > > > > > > > > hadoop-0.1.1.jar file you will see
> > hadoop-default.xml
> > > > > file
> > > > > > > > > there.
> > > > > > > > > > > use
> > > > > > > > > > > > > > that as a template to edit your hadoop-site.xml
> > under
> > > > > conf.
> > > > > > > Once
> > > > > > > > > you
> > > > > > > > > > > > > > have edited it then you should start your
> > 'namenode' and
> > > > > > > > > 'datanode'.
> > > > > > > > > > > I
> > > > > > > > > > > > > > am guessing you are using nutch in a distributed
> > way.
> > > > > cos
> > > > > > > you
> > > > > > > > > don't
> > > > > > > > > > > > > > need to use hadoop if you are just running in one
> > > > > machine
> > > > > > > local
> > > > > > > > > > > mode!!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Anyway you need to do the following to start the
> > > > > datanode
> > > > > > > and
> > > > > > > > > > > namenode
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > then you need to start jobtracker and tasktracker
> > before
> > > > > you
> > > > > > > > > start
> > > > > > > > > > > > > > crawling
> > > > > > > > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > then you start your bin/hadoop dfs -put seeds
> > seeds
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > <[hidden email]>
> > > > > > > wrote:
> > > > > > > > > > > > > > > ok. changed to latest nightly build.
> > > > > > > > > > > > > > > hadoop-0.1.1.jar is existing,
> > > > > > > > > > > > > > > hadoop-site.xml also.
> > > > > > > > > > > > > > > now trying
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 060421 125154 parsing
> > > > > > > > > > > > > > >
> > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060421 125155 parsing
> > > > > > > > > > > > > > >
> > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > > > > > e.xml
> > > > > > > > > > > > > > > 060421 125155 No FS indicated, using
> > default:local
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > > > > > >
> > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > > > > > >
> > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > > > > > e.xml
> > > > > > > > > > > > > > > 060421 125217 No FS indicated, using
> > default:local
> > > > > > > > > > > > > > > Found 16 items
> > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/docs
> > <dir>
> > > > > > > > > > > > > > >
> > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war
> > > > > > > > > 15541036
> > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/webapps
> > <dir>
> > > > > > > > > > > > > > >
> > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt
> > > > > > > 17709
> > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/build.xml
> > 21433
> > > > > > > > > > > > > > >
> > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt
> > > > > > > 615
> > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/conf
> > <dir>
> > > > > > > > > > > > > > >
> > > > > /home/stud/jung/Desktop/nutch-nightly/default.properties
> > > > > > > > > > > 3043
> > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/plugins
> > <dir>
> > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/lib
> > <dir>
> > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/bin
> > <dir>
> > > > > > > > > > > > > > >
> > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar
> > > > > > > 408375
> > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/src
> > <dir>
> > > > > > > > > > > > > > >
> > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job
> > > > > > > > > 18537096
> > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds
> > <dir>
> > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/README.txt
> > > > > > > 403
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > also:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 060421 133004 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060421 133004 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > 060421 133004 No FS indicated, using
> > default:local
> > > > > > > > > > > > > > > Found 2 items
> > > > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > but:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > but:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 060421 131722 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > 060421 131723 crawl started in: crawled
> > > > > > > > > > > > > > > 060421 131723 rootUrlDir = 2
> > > > > > > > > > > > > > > 060421 131723 threads = 10
> > > > > > > > > > > > > > > 060421 131723 depth = 5
> > > > > > > > > > > > > > > 060421 131724 Injector: starting
> > > > > > > > > > > > > > > 060421 131724 Injector: crawlDb: crawled/crawldb
> > > > > > > > > > > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > > > > > > > > > > 060421 131724 Injector: Converting injected urls
> > to
> > > > > crawl
> > > > > > > db
> > > > > > > > > > > entries.
> > > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > > >
> > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > 060421 131727 job_6jn7j8
> > > > > > > > > > > > > > > java.io.IOException: No input directories
> > specified
> > > > > in:
> > > > > > > > > > > Configuration:
> > > > > > > > > > > > > > > defaults: hadoop-default.xml ,
> > mapred-default.xml ,
> > > > > > > > > > > > > > >
> > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > > > >         at
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > > > > > > :90)
> > > > > > > > > > > > > > >         at
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > > > > > > :100)
> > > > > > > > > > > > > > >         at
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > > > > > > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > > > > > > > > > > Exception in thread "main" java.io.IOException:
> > Job
> > > > > > > failed!
> > > > > > > > > > > > > > >         at
> > > > > > > > > > >
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > > > > > > > > >         at
> > > > > > > > > > >
> > org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > > > > > > > > > > >         at
> > > > > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can anyone help?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> > > > > directories
> > > > > > > > > specified
> > > > > > > > > > > in
> > > > > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Also I have noticed that you are using
> > hadoop-0.1,
> > > > > there
> > > > > > > was
> > > > > > > > > a
> > > > > > > > > > > bug in
> > > > > > > > > > > > > > > > 0.1 you should be using 0.1.1. Under you lib
> > catalog
> > > > > you
> > > > > > > > > should
> > > > > > > > > > > have
> > > > > > > > > > > > > > > > the following file
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > hadoop-0.1.1.jar
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If thats the case. Please download the latest
> > > > > nightly
> > > > > > > build.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On 4/21/06, Zaheed Haque
> > <[hidden email]>
> > > > > wrote:
> > > > > > > > > > > > > > > > > Do you have a file called "hadoop-site.xml"
> > under
> > > > > your
> > > > > > > > > conf
> > > > > > > > > > > > > > directory?
> > > > > > > > > > > > > > > > > The content of the file is like the
> > following:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > > > > > > > > <?xml-stylesheet type="text/xsl"
> > > > > > > > > href="configuration.xsl"?>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > <!-- Put site-specific property overrides in
> > this
> > > > > > > file.
> > > > > > > > > -->
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > <configuration>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > </configuration>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > or is it missing... if its missing please
> > create a
> > > > > > > file
> > > > > > > > > under
> > > > > > > > > > > the
> > > > > > > > > > > > > > conf
> > > > > > > > > > > > > > > > > catalog with the name hadoop-site.xml and
> > then try
> > > > > the
> > > > > > > > > hadoop
> > > > > > > > > > > dfs
> > > > > > > > > > > > > > -ls
> > > > > > > > > > > > > > > > > again?  you should see something! like
> > listing
> > > > > from
> > > > > > > your
> > > > > > > > > local
> > > > > > > > > > > file
> > > > > > > > > > > > > > > > > system.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > > > > <[hidden email]>
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > > > > > > Von: "Zaheed Haque"
> > <[hidden email]>
> > > > > > > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No
> > input
> > > > > > > directories
> > > > > > > > > > > specified
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > > > > > 060421 122421 parsing
> > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I think the hadoop-site is missing cos we
> > should
> > > > > be
> > > > > > > seeing
> > > > > > > > > a
> > > > > > > > > > > message
> > > > > > > > > > > > > > > > > like this here...
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 060421 131014 parsing
> > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 060421 122421 No FS indicated, using
> > > > > default:local
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 060421 122425 parsing
> > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 060421 122426 No FS indicated, using
> > > > > default:local
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Found 0 items
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > As you can see, i can't.
> > > > > > > > > > > > > > > > > > What's going wrong?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Can you see your text file with URLS?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Furthermore bin/nutch crawl is a one
> > shot
> > > > > > > crawl/index
> > > > > > > > > > > command. I
> > > > > > > > > > > > > > > > > > > strongly recommend you take the long
> > route of
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > inject, generate, fetch, updatedb,
> > > > > invertlinks,
> > > > > > > index,
> > > > > > > > > > > dedup and
> > > > > > > > > > > > > > > > > > > merge.  You can try the above commands
> > just by
> > > > > > > typing
> > > > > > > > > > > > > > > > > > > bin/nutch inject
> > > > > > > > > > > > > > > > > > > etc..
> > > > > > > > > > > > > > > > > > > If just try the inject command without
> > any
> > > > > > > parameters
> > > > > > > > > it
> > > > > > > > > > > will
> > > > > > > > > > > > > > tell
> > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > how to use it..
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hope this helps.
> > > > > > > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > > > > > > <[hidden email]>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > hi
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > i've changed from nutch 0.7 to 0.8
> > > > > > > > > > > > > > > > > > > > done the following steps:
> > > > > > > > > > > > > > > > > > > > created an urls.txt in a dir. named
> > seeds
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 060317 121440 parsing
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > > > > > 060317 121441 No FS indicated, using
> > > > > > > default:local
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled
> > -depth 2
> > > > > >&
> > > > > > > > > crawl.log
> > > > > > > > > > > > > > > > > > > > but in crawl.log:
> > > > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > > 0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > > 0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > > > > >
> > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner
> > > > > > > > > > > > > > > > > > > > 060419 124302 parsing
> > > > > > > > > > > > > > > >
> > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > > > > > java.io.IOException: No input
> > directories
> > > > > > > specified
> > > > > > > > > in:
> > > > > > > > > > > > > > > > Configuration:
> > > > > > > > > > > > > > > > > > > > defaults: hadoop-default.xml ,
> > > > > > > mapred-default.xml ,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal:
> > > > > > > > > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > > > > > > > > >     at
> > > > > > > > > > > > > > > > >
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
you're right. there was another dir.
i deleted it, but injecting although doesn't work.
We decided to change to nutch0.7.2

Thanks for helping!!

> --- Ursprüngliche Nachricht ---
> Von: "Zaheed Haque" <[hidden email]>
> An: [hidden email]
> Betreff: Re: java.io.IOException: No input directories specified in
> Datum: Wed, 26 Apr 2006 10:06:27 +0200
>
> I don't think you can have a directory called "urls" under your "urls"
> directory? That you have below...
>
> /user/swoboda/urls/urls <dir>
>
> please remove the above directory and try inject again.
>
> bin/hadoop dfs -rm urls/urls
>
> then double check that there are no directory under your urls directory
> before
> running inject..
>
> On 4/26/06, Peter Swoboda <[hidden email]> wrote:
> >
> > > hmm.. where is your urls.txt file? is it in Hadoop filesystem, I mean
> > > what happen if you try
> > >
> > > bin/hadoop dfs -ls urls
> > >
> > bash-3.00$ bin/hadoop dfs -ls urls
> > 060426 094810 parsing
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml

> > 060426 094810 parsing
> > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > 060426 094811 Client connection to 127.0.0.1:50000: starting
> > 060426 094811 No FS indicated, using default:localhost.localdomain:50000
> > Found 3 items
> > /user/swoboda/urls/urllist.txt  26
> > /user/swoboda/urls/urllist.txt~ 0
> > /user/swoboda/urls/urls <dir>
> > bash-3.00$
> >
> >
> >
> > > /Z
> > >
> > > On 4/26/06, Zaheed Haque <[hidden email]> wrote:
> > > > On 4/26/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > --- Ursprüngliche Nachricht ---
> > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > An: [hidden email]
> > > > > > Betreff: Re: java.io.IOException: No input directories specified
> in
> > > > > > Datum: Wed, 26 Apr 2006 09:12:47 +0200
> > > > > >
> > > > > > good. as you can see all your data will be saved under
> > > > > >
> > > > > > /user/swoboda/
> > > > > >
> > > > > > And urls is the directory where you have your urls.txt file.
> > > > > >
> > > > > > so the inject statement you should have is the following:
> > > > > >
> > > > > > bin/nutch inject crawldb urls
> > > > >
> > > > > result:
> > > > > bash-3.00$ bin/nutch inject crawldb urls
> > > > > 060426 091859 Injector: starting
> > > > > 060426 091859 Injector: crawlDb: crawldb
> > > > > 060426 091859 Injector: urlDir: urls
> > > > > 060426 091900 parsing
> > > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml

> > > > > 060426 091900 parsing
> > > > >
> file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > > > > 060426 091901 parsing
> > > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > > > > 060426 091901 parsing
> > > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > > 060426 091901 Injector: Converting injected urls to crawl db
> entries.
> > > > > 060426 091901 parsing
> > > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > 060426 091901 parsing
> > > > >
> file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > > > > 060426 091901 parsing
> > > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml

> > > > > 060426 091901 parsing
> > > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > > > > 060426 091901 parsing
> > > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > > 060426 091901 Client connection to 127.0.0.1:50020: starting
> > > > > 060426 091902 Client connection to 127.0.0.1:50000: starting
> > > > > 060426 091902 parsing
> > > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml

> > > > > 060426 091902 parsing
> > > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > > 060426 091907 Running job: job_b59xmu
> > > > > 060426 091908  map 100%  reduce 100%
> > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > >         at
> > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > >         at
> org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > >         at org.apache.nutch.crawl.Injector.main(Injector.java:138)
> > > > > bash-3.00$
> > > > >
> > > > > >
> > > > > > so try the above first then try
> > > > > >
> > > > > > hadoop dfs -ls you will see crawldb directory.
> > > > > >
> > > > >
> > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > 060426 091842 parsing
> > > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml

> > > > > 060426 091843 parsing
> > > > > file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > > 060426 091843 Client connection to 127.0.0.1:50000: starting
> > > > > 060426 091843 No FS indicated, using
> > > default:localhost.localdomain:50000
> > > > > Found 1 items
> > > > > /user/swoboda/urls      <dir>
> > > > > bash-3.00$
> > > > >
> > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On 4/26/06, Peter Swoboda <[hidden email]> wrote:
> > > > > > > Hi.
> > > > > > > Of course i can. here you are:
> > > > > > >
> > > > > > >
> > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > An: [hidden email]
> > > > > > > > Betreff: Re: java.io.IOException: No input directories
> specified
> > > in
> > > > > > > > Datum: Tue, 25 Apr 2006 12:00:53 +0200
> > > > > > > >
> > > > > > > > Hi Could you please post the results for the following
> commands
> > > > > > > > bin/hadoop dfs -ls
> > > > > > >
> > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > 060426 085559 parsing
> > > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060426 085559 parsing
> > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > 060426 085559 No FS indicated, using
> > > default:localhost.localdomain:50000
> > > > > > > 060426 085559 Client connection to 127.0.0.1:50000: starting
> > > > > > > Found 1 items
> > > > > > > /user/swoboda/urls      <dir>
> > > > > > > bash-3.00$
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > and
> > > > > > > >
> > > > > > > > bin/nutch inject crawldb crawled(your urls directory in
> hadoop)
> > > > > > > >
> > > > > > >
> > > > > > > bash-3.00$ bin/nutch inject crawldb crawled urls
> > > > > > > 060426 085723 Injector: starting
> > > > > > > 060426 085723 Injector: crawlDb: crawldb
> > > > > > > 060426 085723 Injector: urlDir: crawled
> > > > > > > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > > > op-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> > > > > > efault.xml
> > > > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-s
> > > ite.xml
> > > > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/hadoop-
> > > site.xml
> > > > > > > 060426 085724 Injector: Converting injected urls to crawl db
> > > entries.
> > > > > > > 060426 085724 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > > > op-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060426 085724 parsing file:/home/../nutch-nightly/conf/nutch-d
> > > > > > efault.xml
> > > > > > > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > > > op-0.1.1.jar!/mapred-default.xml
> > > > > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/nutch-s
> > > ite.xml
> > > > > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop-
> > > site.xml
> > > > > > > 060426 085725 Client connection to 127.0.0.1:50020: starting
> > > > > > > 060426 085725 Client connection to 127.0.0.1:50000: starting
> > > > > > > 060426 085725 parsing jar:file:/home/../nutch-nightly/lib/hado
> > > > > > > op-0.1.1.jar!/hadoop-default.xml
> > > > > > > 060426 085725 parsing file:/home/../nutch-nightly/conf/hadoop-
> > > site.xml
> > > > > > > 060426 085730 Running job: job_o6tvpr
> > > > > > > 060426 085731  map 100%  reduce 100%
> > > > > > > Exception in thread "main" java.io.IOException: Job failed!
> > > > > > >         at
> > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > >         at
> > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > >         at
> org.apache.nutch.crawl.Injector.main(Injector.java:138)
> > > > > > > bash-3.00$
> > > > > > >
> > > > > > >
> > > > > > > > thanks
> > > > > > > >
> > > > > > > > On 4/25/06, Peter Swoboda <[hidden email]>
> wrote:
> > > > > > > > > Sorry, my mistake. changed to 0.1.1
> > > > > > > > > results:
> > > > > > > > >
> > > > > > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > > > > > 060425 113831 parsing
> > > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060425 113831 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > 060425 113832 parsing
> > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > 060425 113832 parsing
> > > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060425 113832 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > 060425 113832 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060425 113832 Client connection to 127.0.0.1:50000:
> starting
> > > > > > > > > 060425 113832 crawl started in: crawled
> > > > > > > > > 060425 113832 rootUrlDir = 2
> > > > > > > > > 060425 113832 threads = 10
> > > > > > > > > 060425 113832 depth = 5
> > > > > > > > > 060425 113833 Injector: starting
> > > > > > > > > 060425 113833 Injector: crawlDb: crawled/crawldb
> > > > > > > > > 060425 113833 Injector: urlDir: 2
> > > > > > > > > 060425 113833 Injector: Converting injected urls to crawl
> db
> > > > > > entries.
> > > > > > > > > 060425 113833 parsing
> > > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060425 113833 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > 060425 113833 parsing
> > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > 060425 113833 parsing
> > > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060425 113833 parsing
> > > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > 060425 113833 parsing
> > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > 060425 113833 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060425 113834 Client connection to 127.0.0.1:50020:
> starting
> > > > > > > > > 060425 113834 Client connection to 127.0.0.1:50000:
> starting
> > > > > > > > > 060425 113834 parsing
> > > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > 060425 113834 parsing
> > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > 060425 113838 Running job: job_23a6ra
> > > > > > > > > Exception in thread "main" java.io.IOException: Job
> failed!
> > > > > > > > >         at
> > > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > > >         at
> > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > >         at
> org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > bash-3.00$
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Step by Step, same but another job that failed.
> > > > > > > > >
> > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > An: [hidden email]
> > > > > > > > > > Betreff: Re: java.io.IOException: No input directories
> > > specified
> > > > > > in
> > > > > > > > > > Datum: Tue, 25 Apr 2006 11:34:10 +0200
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > >
> > > > > > > > > > Somehow you are still using hadoop-0.1 and not 0.1.1. I
> am
> > > not
> > > > > > sure if
> > > > > > > > > > this update will solve your problem but it might. With
> the
> > > config
> > > > > > I
> > > > > > > > > > sent you, I could, crawl-index-serach so there must be
> > > something
> > > > > > > > > > else.. I am not sure.
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > > Zaheed
> > > > > > > > > >
> > > > > > > > > > On 4/25/06, Peter Swoboda <[hidden email]>
> > > wrote:
> > > > > > > > > > > Seems to be a bit better, doesn't it?
> > > > > > > > > > >
> > > > > > > > > > > bash-3.00$ bin/nutch crawl urls -dir crawled -depht 2
> > > > > > > > > > > 060425 110124 parsing
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > 060425 110124 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > 060425 110124 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > 060425 110124 parsing
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > 060425 110125 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > 060425 110125 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060425 110125 Client connection to 127.0.0.1:50000:
> > > starting
> > > > > > > > > > > 060425 110125 crawl started in: crawled
> > > > > > > > > > > 060425 110125 rootUrlDir = 2
> > > > > > > > > > > 060425 110125 threads = 10
> > > > > > > > > > > 060425 110125 depth = 5
> > > > > > > > > > > 060425 110126 Injector: starting
> > > > > > > > > > > 060425 110126 Injector: crawlDb: crawled/crawldb
> > > > > > > > > > > 060425 110126 Injector: urlDir: 2
> > > > > > > > > > > 060425 110126 Injector: Converting injected urls to
> crawl
> > > db
> > > > > > > > entries.
> > > > > > > > > > > 060425 110126 parsing
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > 060425 110126 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > 060425 110126 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > 060425 110126 parsing
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > 060425 110126 parsing
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/mapred-default.xml
> > > > > > > > > > > 060425 110126 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > 060425 110127 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > 060425 110127 Client connection to 127.0.0.1:50020:
> > > starting
> > > > > > > > > > > 060425 110127 Client connection to 127.0.0.1:50000:
> > > starting
> > > > > > > > > > > 060425 110127 parsing
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > 060425 110127 parsing
> > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > Exception in thread "main"
> > > > > > > > > > java.lang.reflect.UndeclaredThrowableException
> > > > > > > > > > >         at
> > > > > > org.apache.hadoop.mapred.$Proxy1.getJobProfile(Unknown
> > > > > > > > > > Source)
> > > > > > > > > > >         at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:60)
> > > > > > > > > > >         at
> > > > > > > > > >
> > > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:263)
> > > > > > > > > > >         at
> > > > > > > >
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:294)
> > > > > > > > > > >         at
> > > > > > org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > > > > > > >         at
> > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > Caused by: java.io.IOException: timed out waiting for
> > > response
> > > > > > > > > > >         at
> > > org.apache.hadoop.ipc.Client.call(Client.java:303)
> > > > > > > > > > >         at
> > > > > > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141)
> > > > > > > > > > >         ... 6 more
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > local ip is the same,
> > > > > > > > > > > but don't exactly know how to handle the ports.
> > > > > > > > > > >
> > > > > > > > > > > Step by Step (generate, index..) caused same error
> while
> > > > > > > > > > >  bin/nutch generate crawl/crawldb crawl/segments
> > > > > > > > > > >
> > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > Von: "Zaheed Haque" <[hidden email]>
> > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> directories
> > > > > > specified
> > > > > > > > in
> > > > > > > > > > > > Datum: Mon, 24 Apr 2006 13:39:10 +0200
> > > > > > > > > > > >
> > > > > > > > > > > > Try the following in your hadoop-site.xml.. please
> > > change and
> > > > > > > > adjust
> > > > > > > > > > > > based on your ip address. The following
> configuration
> > > assumes
> > > > > > that
> > > > > > > > the
> > > > > > > > > > > > you have 1 server and you are using it as a namenode
> as
> > > well
> > > > > > as a
> > > > > > > > > > > > datanode. Note this is NOT the reason for running
> > > Hadoopified
> > > > > > > > Nutch!
> > > > > > > > > > > > It is rather for testing....
> > > > > > > > > > > >
> > > > > > > > > > > > --------------------
> > > > > > > > > > > >
> > > > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > > > <?xml-stylesheet type="text/xsl"
> > > href="configuration.xsl"?>
> > > > > > > > > > > >
> > > > > > > > > > > > <configuration>
> > > > > > > > > > > >
> > > > > > > > > > > > <!-- file system properties -->
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>fs.default.name</name>
> > > > > > > > > > > >   <value>127.0.0.1:50000</value>
> > > > > > > > > > > >   <description>The name of the default file system.
> > > Either
> > > > > > the
> > > > > > > > > > > >   literal string "local" or a host:port for
> > > DFS.</description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>dfs.datanode.port</name>
> > > > > > > > > > > >   <value>50010</value>
> > > > > > > > > > > >   <description>The port number that the dfs datanode
> > > server
> > > > > > uses
> > > > > > > > as a
> > > > > > > > > > > > starting
> > > > > > > > > > > >                point to look for a free port to
> listen
> > > on.
> > > > > > > > > > > > </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>dfs.name.dir</name>
> > > > > > > > > > > >   <value>/tmp/hadoop/dfs/name</value>
> > > > > > > > > > > >   <description>Determines where on the local
> filesystem
> > > the
> > > > > > DFS
> > > > > > > > name
> > > > > > > > > > node
> > > > > > > > > > > >       should store the name table.</description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>dfs.data.dir</name>
> > > > > > > > > > > >   <value>/tmp/hadoop/dfs/data</value>
> > > > > > > > > > > >   <description>Determines where on the local
> filesystem
> > > an DFS
> > > > > > > > data
> > > > > > > > > > node
> > > > > > > > > > > >   should store its blocks.  If this is a comma- or
> > > > > > space-delimited
> > > > > > > > > > > >   list of directories, then data will be stored in
> all
> > > named
> > > > > > > > > > > >   directories, typically on different
> > > devices.</description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>dfs.replication</name>
> > > > > > > > > > > >   <value>1</value>
> > > > > > > > > > > >   <description>How many copies we try to have at all
> > > times.
> > > > > > The
> > > > > > > > actual
> > > > > > > > > > > >   number of replications is at max the number of
> > > datanodes in
> > > > > > the
> > > > > > > > > > > >   cluster.</description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > > <!-- map/reduce properties -->
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.job.tracker</name>
> > > > > > > > > > > >   <value>127.0.0.1:50020</value>
> > > > > > > > > > > >   <description>The host and port that the MapReduce
> job
> > > > > > tracker
> > > > > > > > runs
> > > > > > > > > > > >   at.  If "local", then jobs are run in-process as a
> > > single
> > > > > > map
> > > > > > > > > > > >   and reduce task.
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.job.tracker.info.port</name>
> > > > > > > > > > > >   <value>50030</value>
> > > > > > > > > > > >   <description>The port that the MapReduce job
> tracker
> > > info
> > > > > > > > webserver
> > > > > > > > > > runs
> > > > > > > > > > > > at.
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.task.tracker.output.port</name>
> > > > > > > > > > > >   <value>50040</value>
> > > > > > > > > > > >   <description>The port number that the MapReduce
> task
> > > tracker
> > > > > > > > output
> > > > > > > > > > > > server uses as a starting point to look for
> > > > > > > > > > > > a free port to listen on.
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.task.tracker.report.port</name>
> > > > > > > > > > > >   <value>50050</value>
> > > > > > > > > > > >   <description>The port number that the MapReduce
> task
> > > tracker
> > > > > > > > report
> > > > > > > > > > > > server uses as a starting
> > > > > > > > > > > >                point to look for a free port to
> listen
> > > on.
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > > > > > >   <description>The local directory where MapReduce
> > > stores
> > > > > > > > intermediate
> > > > > > > > > > > >   data files.  May be a space- or comma- separated
> list
> > > of
> > > > > > > > > > > >   directories on different devices in order to
> spread
> > > disk
> > > > > > i/o.
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.system.dir</name>
> > > > > > > > > > > >   <value>/tmp/hadoop/mapred/system</value>
> > > > > > > > > > > >   <description>The shared directory where MapReduce
> > > stores
> > > > > > control
> > > > > > > > > > files.
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.temp.dir</name>
> > > > > > > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > > > > > > >   <description>A shared directory for temporary
> files.
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > > > > > > >   <value>1</value>
> > > > > > > > > > > >   <description>The default number of reduce tasks
> per
> > > job.
> > > > > > > > Typically
> > > > > > > > > > set
> > > > > > > > > > > >   to a prime close to the number of available hosts.
> > > Ignored
> > > > > > when
> > > > > > > > > > > >   mapred.job.tracker is "local".
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > > > > > > >   <value>2</value>
> > > > > > > > > > > >   <value>/tmp/hadoop/mapred/temp</value>
> > > > > > > > > > > >   <description>A shared directory for temporary
> files.
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.reduce.tasks</name>
> > > > > > > > > > > >   <value>1</value>
> > > > > > > > > > > >   <description>The default number of reduce tasks
> per
> > > job.
> > > > > > > > Typically
> > > > > > > > > > set
> > > > > > > > > > > >   to a prime close to the number of available hosts.
> > > Ignored
> > > > > > when
> > > > > > > > > > > >   mapred.job.tracker is "local".
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > <property>
> > > > > > > > > > > >   <name>mapred.tasktracker.tasks.maximum</name>
> > > > > > > > > > > >   <value>2</value>
> > > > > > > > > > > >   <description>The maximum number of tasks that will
> be
> > > run
> > > > > > > > > > > >   simultaneously by a task tracker.
> > > > > > > > > > > >   </description>
> > > > > > > > > > > > </property>
> > > > > > > > > > > >
> > > > > > > > > > > > </configuration>
> > > > > > > > > > > >
> > > > > > > > > > > > ------
> > > > > > > > > > > >
> > > > > > > > > > > > Then execute the following commands
> > > > > > > > > > > > - initialize the HDFS
> > > > > > > > > > > > bin/hadoop namenode -format
> > > > > > > > > > > > - Start the namenode/datanode
> > > > > > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > > > > > - Lets do some checking...
> > > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > > >
> > > > > > > > > > > > Should return 0 items!! So lets try to add a file to
> the
> > > DFS
> > > > > > > > > > > >
> > > > > > > > > > > > bin/hadoop dfs -put xyz.html xyz.html
> > > > > > > > > > > >
> > > > > > > > > > > > Try
> > > > > > > > > > > >
> > > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > > >
> > > > > > > > > > > > You should see one item which is
> > > > > > > > > > > > Found 1 items
> > > > > > > > > > > > /user/root/xyz.html    21433
> > > > > > > > > > > >
> > > > > > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > > >
> > > > > > > > > > > > Now you can start of with inject, generate etc..
> etc..
> > > > > > > > > > > >
> > > > > > > > > > > > Hope this time it works for you..
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On 4/24/06, Zaheed Haque <[hidden email]>
> wrote:
> > > > > > > > > > > > > On 4/24/06, Peter Swoboda
> <[hidden email]>
> > > > > > wrote:
> > > > > > > > > > > > > > I forgot to have a look at the log files:
> > > > > > > > > > > > > > namenode:
> > > > > > > > > > > > > > 060424 121444 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060424 121444 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > Exception in thread "main"
> > > java.lang.RuntimeException: Not
> > > > > > a
> > > > > > > > > > host:port
> > > > > > > > > > > > pair:
> > > > > > > > > > > > > > local
> > > > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > >
> org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75)
> > > > > > > > > > > > > >         at
> > > > > > > > org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78)
> > > > > > > > > > > > > >         at
> > > > > > > > org.apache.hadoop.dfs.NameNode.main(NameNode.java:394)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > datanode
> > > > > > > > > > > > > > 060424 121448 10 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060424 121448 10 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > 060424 121448 10 Can't start DataNode in
> > > non-directory:
> > > > > > > > > > > > /tmp/hadoop/dfs/data
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > jobtracker
> > > > > > > > > > > > > > 060424 121455 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060424 121455 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > 060424 121455 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060424 121456 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > 060424 121456 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > Exception in thread "main"
> > > java.lang.RuntimeException: Bad
> > > > > > > > > > > > > > mapred.job.tracker: local
> > > > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > > >
> > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333)
> > > > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > > >
> > > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51)
> > > > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > tasktracker
> > > > > > > > > > > > > > 060424 121502 parsing
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > 060424 121503 parsing
> > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > Exception in thread "main"
> > > java.lang.RuntimeException: Bad
> > > > > > > > > > > > > > mapred.job.tracker: local
> > > > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > > >
> > > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361)
> > > > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86)
> > > > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > What can be the problem?
> > > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > > Von: "Peter Swoboda"
> <[hidden email]>
> > > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> > > directories
> > > > > > > > specified
> > > > > > > > > > in
> > > > > > > > > > > > > > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Got the latest nutch-nightly built,
> > > > > > > > > > > > > > > including hadoop-0.1.1.jar.
> > > > > > > > > > > > > > > Copied the content of the daoop-default.xml
> into
> > > > > > > > > > hadoop-site.xml.
> > > > > > > > > > > > > > > started namenode, datanode, jobtracker,
> > > tasktracker.
> > > > > > > > > > > > > > > made
> > > > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > result:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > > > > > starting namenode, logging to
> > > > > > > > > > > > > > > bin/../logs/hadoop-jung-namenode-gillespie.log
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode
> > > > > > > > > > > > > > > starting datanode, logging to
> > > > > > > > > > > > > > > bin/../logs/hadoop-jung-datanode-gillespie.log
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start
> jobtracker
> > > > > > > > > > > > > > > starting jobtracker, logging to
> > > > > > > > > > > > > > >
> bin/../logs/hadoop-jung-jobtracker-gillespie.log
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop-daemon.sh start
> tasktracker
> > > > > > > > > > > > > > > starting tasktracker, logging to
> > > > > > > > > > > > > > >
> bin/../logs/hadoop-jung-tasktracker-gillespie.log
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > > 060424 121512 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060424 121512 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > 060424 121513 No FS indicated, using
> default:local
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > > > 060424 121543 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060424 121543 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > 060424 121544 No FS indicated, using
> default:local
> > > > > > > > > > > > > > > Found 18 items
> > > > > > > > > > > > > > > /home/../nutch-nightly/docs      <dir>
> > > > > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.war
> 15541036
> > > > > > > > > > > > > > > /home/../nutch-nightly/webapps   <dir>
> > > > > > > > > > > > > > > /home/../nutch-nightly/CHANGES.txt       17709
> > > > > > > > > > > > > > > /home/../nutch-nightly/build.xml 21433
> > > > > > > > > > > > > > > /home/../nutch-nightly/LICENSE.txt       615
> > > > > > > > > > > > > > > /home/../nutch-nightly/test.log  3447
> > > > > > > > > > > > > > > /home/../nutch-nightly/conf      <dir>
> > > > > > > > > > > > > > > /home/../nutch-nightly/default.properties
> > > 3043
> > > > > > > > > > > > > > > /home/../nutch-nightly/plugins   <dir>
> > > > > > > > > > > > > > > /home/../nutch-nightly/lib       <dir>
> > > > > > > > > > > > > > > /home/../nutch-nightly/bin       <dir>
> > > > > > > > > > > > > > > /home/../nutch-nightly/logs      <dir>
> > > > > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.jar
> 408375
> > > > > > > > > > > > > > > /home/../nutch-nightly/src       <dir>
> > > > > > > > > > > > > > > /home/../nutch-nightly/nutch-nightly.job
> 18537096
> > > > > > > > > > > > > > > /home/../nutch-nightly/seeds     <dir>
> > > > > > > > > > > > > > > /home/../nutch-nightly/README.txt        403
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > > > 060424 121603 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060424 121603 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > 060424 121603 No FS indicated, using
> default:local
> > > > > > > > > > > > > > > Found 2 items
> > > > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > so far so good, but:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled
> > > -depht 2
> > > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > > > 060424 121613 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > 060424 121614 crawl started in: crawled
> > > > > > > > > > > > > > > 060424 121614 rootUrlDir = 2
> > > > > > > > > > > > > > > 060424 121614 threads = 10
> > > > > > > > > > > > > > > 060424 121614 depth = 5
> > > > > > > > > > > > > > > Exception in thread "main"
> java.io.IOException: No
> > > valid
> > > > > > > > local
> > > > > > > > > > > > directories
> > > > > > > > > > > > > > > in property: mapred.local.dir
> > > > > > > > > > > > > > >         at
> > > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > >
> org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282)
> > > > > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > > > > org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127)
> > > > > > > > > > > > > > >         at
> > > > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:101)
> > > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I really don't know what to do.
> > > > > > > > > > > > > > > in hadoop-site.xml it's:
> > > > > > > > > > > > > > > ..
> > > > > > > > > > > > > > > <property>
> > > > > > > > > > > > > > >   <name>mapred.local.dir</name>
> > > > > > > > > > > > > > >   <value>/tmp/hadoop/mapred/local</value>
> > > > > > > > > > > > > > >   <description>The local directory where
> MapReduce
> > > > > > stores
> > > > > > > > > > > > intermediate
> > > > > > > > > > > > > > >   data files.  May be a space- or comma-
> separated
> > > list
> > > > > > of
> > > > > > > > > > > > > > >   directories on different devices in order to
> > > spread
> > > > > > disk
> > > > > > > > i/o.
> > > > > > > > > > > > > > >   </description>
> > > > > > > > > > > > > > > </property>
> > > > > > > > > > > > > > > ..
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > _______________________________________
> > > > > > > > > > > > > > > Is your hadoop-site.xml empty, I mean it
> doesn't
> > > > > > consisit
> > > > > > > > any
> > > > > > > > > > > > > > > configuration correct? So what you need to do
> is
> > > add
> > > > > > your
> > > > > > > > > > > > > > > configuration there. I suggest you copy the
> > > > > > hadoop-0.1.1.jar
> > > > > > > > to
> > > > > > > > > > > > > > > another directory for inspection, copy not
> move.
> > > unzip
> > > > > > the
> > > > > > > > > > > > > > > hadoop-0.1.1.jar file you will see
> > > hadoop-default.xml
> > > > > > file
> > > > > > > > > > there.
> > > > > > > > > > > > use
> > > > > > > > > > > > > > > that as a template to edit your
> hadoop-site.xml
> > > under
> > > > > > conf.
> > > > > > > > Once
> > > > > > > > > > you
> > > > > > > > > > > > > > > have edited it then you should start your
> > > 'namenode' and
> > > > > > > > > > 'datanode'.
> > > > > > > > > > > > I
> > > > > > > > > > > > > > > am guessing you are using nutch in a
> distributed
> > > way.
> > > > > > cos
> > > > > > > > you
> > > > > > > > > > don't
> > > > > > > > > > > > > > > need to use hadoop if you are just running in
> one
> > > > > > machine
> > > > > > > > local
> > > > > > > > > > > > mode!!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Anyway you need to do the following to start
> the
> > > > > > datanode
> > > > > > > > and
> > > > > > > > > > > > namenode
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > bin/hadoop-daemon.sh start namenode
> > > > > > > > > > > > > > > bin/hadoop-daemon.sh start datanode
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > then you need to start jobtracker and
> tasktracker
> > > before
> > > > > > you
> > > > > > > > > > start
> > > > > > > > > > > > > > > crawling
> > > > > > > > > > > > > > > bin/hadoop-daemon.sh start jobtracker
> > > > > > > > > > > > > > > bin/hadoop-daemon.sh start tasktracker
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > then you start your bin/hadoop dfs -put seeds
> > > seeds
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > > <[hidden email]>
> > > > > > > > wrote:
> > > > > > > > > > > > > > > > ok. changed to latest nightly build.
> > > > > > > > > > > > > > > > hadoop-0.1.1.jar is existing,
> > > > > > > > > > > > > > > > hadoop-site.xml also.
> > > > > > > > > > > > > > > > now trying
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 060421 125154 parsing
> > > > > > > > > > > > > > > >
> > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > 060421 125155 parsing
> > > > > > > > > > > > > > > >
> > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > > > > > > e.xml
> > > > > > > > > > > > > > > > 060421 125155 No FS indicated, using
> > > default:local
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > > > > > > >
> > > > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop-
> > > > > > > > > > > > > > > > 0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > 060421 125217 parsing
> > > > > > > > > > > > > > > >
> > > > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit
> > > > > > > > > > e.xml
> > > > > > > > > > > > > > > > 060421 125217 No FS indicated, using
> > > default:local
> > > > > > > > > > > > > > > > Found 16 items
> > > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/docs
> > > <dir>
> > > > > > > > > > > > > > > >
> > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war
> > > > > > > > > > 15541036
> > > > > > > > > > > > > > > >
> /home/stud/jung/Desktop/nutch-nightly/webapps
> > > <dir>
> > > > > > > > > > > > > > > >
> > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt
> > > > > > > > 17709
> > > > > > > > > > > > > > > >
> /home/stud/jung/Desktop/nutch-nightly/build.xml
> > > 21433
> > > > > > > > > > > > > > > >
> > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt
> > > > > > > > 615
> > > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/conf
> > > <dir>
> > > > > > > > > > > > > > > >
> > > > > > /home/stud/jung/Desktop/nutch-nightly/default.properties
> > > > > > > > > > > > 3043
> > > > > > > > > > > > > > > >
> /home/stud/jung/Desktop/nutch-nightly/plugins
> > > <dir>
> > > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/lib
> > > <dir>
> > > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/bin
> > > <dir>
> > > > > > > > > > > > > > > >
> > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar
> > > > > > > > 408375
> > > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/src
> > > <dir>
> > > > > > > > > > > > > > > >
> > > > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job
> > > > > > > > > > 18537096
> > > > > > > > > > > > > > > > /home/stud/jung/Desktop/nutch-nightly/seeds
> > > <dir>
> > > > > > > > > > > > > > > >
> /home/stud/jung/Desktop/nutch-nightly/README.txt
> > > > > > > > 403
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > also:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 060421 133004 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > 060421 133004 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > 060421 133004 No FS indicated, using
> > > default:local
> > > > > > > > > > > > > > > > Found 2 items
> > > > > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt~   0
> > > > > > > > > > > > > > > > /home/../nutch-nightly/seeds/urls.txt    26
> > > > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > but:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > but:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depht 2
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 060421 131722 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > > > > 060421 131723 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > 060421 131723 crawl started in: crawled
> > > > > > > > > > > > > > > > 060421 131723 rootUrlDir = 2
> > > > > > > > > > > > > > > > 060421 131723 threads = 10
> > > > > > > > > > > > > > > > 060421 131723 depth = 5
> > > > > > > > > > > > > > > > 060421 131724 Injector: starting
> > > > > > > > > > > > > > > > 060421 131724 Injector: crawlDb:
> crawled/crawldb
> > > > > > > > > > > > > > > > 060421 131724 Injector: urlDir: 2
> > > > > > > > > > > > > > > > 060421 131724 Injector: Converting injected
> urls
> > > to
> > > > > > crawl
> > > > > > > > db
> > > > > > > > > > > > entries.
> > > > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > 060421 131724 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > 060421 131725 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-default.xml
> > > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/crawl-tool.xml
> > > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > 060421 131726 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/nutch-site.xml
> > > > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > > > >
> > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml
> > > > > > > > > > > > > > > > 060421 131727 parsing
> > > > > > > > > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > 060421 131727 job_6jn7j8
> > > > > > > > > > > > > > > > java.io.IOException: No input directories
> > > specified
> > > > > > in:
> > > > > > > > > > > > Configuration:
> > > > > > > > > > > > > > > > defaults: hadoop-default.xml ,
> > > mapred-default.xml ,
> > > > > > > > > > > > > > > >
> > > > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal:
> > > > > > > > > > > > > > > hadoop-site.xml
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java
> > > > > > > > > > > > > > > :90)
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java
> > > > > > > > > > > > > > > :100)
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88)
> > > > > > > > > > > > > > > > 060421 131728 Running job: job_6jn7j8
> > > > > > > > > > > > > > > > Exception in thread "main"
> java.io.IOException:
> > > Job
> > > > > > > > failed!
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > > > > > > > > > > > >         at
> > > > > > > > > > > >
> > > org.apache.nutch.crawl.Injector.inject(Injector.java:115)
> > > > > > > > > > > > > > > >         at
> > > > > > > > org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
> > > > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can anyone help?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > > > > Von: "Zaheed Haque"
> <[hidden email]>
> > > > > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No input
> > > > > > directories
> > > > > > > > > > specified
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Also I have noticed that you are using
> > > hadoop-0.1,
> > > > > > there
> > > > > > > > was
> > > > > > > > > > a
> > > > > > > > > > > > bug in
> > > > > > > > > > > > > > > > > 0.1 you should be using 0.1.1. Under you
> lib
> > > catalog
> > > > > > you
> > > > > > > > > > should
> > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > the following file
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > hadoop-0.1.1.jar
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > If thats the case. Please download the
> latest
> > > > > > nightly
> > > > > > > > build.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cheers
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On 4/21/06, Zaheed Haque
> > > <[hidden email]>
> > > > > > wrote:
> > > > > > > > > > > > > > > > > > Do you have a file called
> "hadoop-site.xml"
> > > under
> > > > > > your
> > > > > > > > > > conf
> > > > > > > > > > > > > > > directory?
> > > > > > > > > > > > > > > > > > The content of the file is like the
> > > following:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > <?xml version="1.0"?>
> > > > > > > > > > > > > > > > > > <?xml-stylesheet type="text/xsl"
> > > > > > > > > > href="configuration.xsl"?>
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > <!-- Put site-specific property
> overrides in
> > > this
> > > > > > > > file.
> > > > > > > > > > -->
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > <configuration>
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > </configuration>
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > or is it missing... if its missing
> please
> > > create a
> > > > > > > > file
> > > > > > > > > > under
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > conf
> > > > > > > > > > > > > > > > > > catalog with the name hadoop-site.xml
> and
> > > then try
> > > > > > the
> > > > > > > > > > hadoop
> > > > > > > > > > > > dfs
> > > > > > > > > > > > > > > -ls
> > > > > > > > > > > > > > > > > > again?  you should see something! like
> > > listing
> > > > > > from
> > > > > > > > your
> > > > > > > > > > local
> > > > > > > > > > > > file
> > > > > > > > > > > > > > > > > > system.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On 4/21/06, Peter Swoboda
> > > > > > <[hidden email]>
> > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > --- Ursprüngliche Nachricht ---
> > > > > > > > > > > > > > > > > > > > Von: "Zaheed Haque"
> > > <[hidden email]>
> > > > > > > > > > > > > > > > > > > > An: [hidden email]
> > > > > > > > > > > > > > > > > > > > Betreff: Re: java.io.IOException: No
> > > input
> > > > > > > > directories
> > > > > > > > > > > > specified
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38
> +0200
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > bin/hadoop dfs -ls
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Can you see your "seeds" directory?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds
> seeds
> > > > > > > > > > > > > > > > > > > 060421 122421 parsing
> > > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I think the hadoop-site is missing cos
> we
> > > should
> > > > > > be
> > > > > > > > seeing
> > > > > > > > > > a
> > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > > like this here...
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 060421 131014 parsing
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > >
> file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 060421 122421 No FS indicated, using
> > > > > > default:local
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 060421 122425 parsing
> > > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.
> > > > > > > > > > > > > > > > > > > 1-dev.jar!/hadoop-default.xml
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 060421 122426 No FS indicated, using
> > > > > > default:local
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Found 0 items
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > bash-3.00$
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > As you can see, i can't.
> > > > > > > > > > > > > > > > > > > What's going wrong?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > bin/hadoop dfs -ls seeds
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Can you see your text file with
> URLS?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Furthermore bin/nutch crawl is a one
> > > shot
> > > > > > > > crawl/index
> > > > > > > > > > > > command. I
> > > > > > > > > > > > > > > > > > > > strongly recommend you take the long
> > > route of
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > inject, generate, fetch, updatedb,
> > > > > > invertlinks,
> > > > > > > > index,
> > > > > > > > > > > > dedup and
> > > > > > > > > > > > > > > > > > > > merge.  You can try the above
> commands
> > > just by
> > > > > > > > typing
> > > > > > > > > > > > > > > > > > > > bin/nutch inject
> > > > > > > > > > > > > > > > > > > > etc..
> > > > > > > > > > > > > > > > > > > > If just try the inject command
> without
> > > any
> > > > > > > > parameters
> > > > > > > > > > it
> > > > > > > > > > > > will
> > > > > > > > > > > > > > > tell
> > > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > > how to use it..
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hope
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Chris Fellows-3
Hello,

I'm having what appears to be the same issue on 0.8
trunk. I can get through inject, generate, fetch and
updatedb, but am getting the IOException: No input
directories on invertlinks and cannot figure out why.
I'm only using nutch on a single local windows
machine. Any idea's? Configuration has not changed
since checking out from svn.

Here's the output from invertlinks:

Owner@RMAKRSHA1 /cygdrive/c/app/nutch$ bin/nutch
invertlinks crawl/linkdb crawl/segments
060426 105413 LinkDb: starting
060426 105413 LinkDb: linkdb: crawl\linkdb
060426 105414 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 105414 parsing
file:/C:/APP/nutch/conf/nutch-default.xml
060426 105414 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/mapred-default.xml
060426 105414 parsing
file:/C:/APP/nutch/conf/nutch-site.xml
060426 105414 parsing
file:/C:/APP/nutch/conf/hadoop-site.xml
060426 105414 LinkDb: adding segment: crawl\segments
060426 105416 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 105416 parsing
file:/C:/APP/nutch/conf/nutch-default.xml
060426 105416 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/mapred-default.xml
060426 105416 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/mapred-default.xml
060426 105416 parsing
file:/C:/APP/nutch/conf/nutch-site.xml
060426 105416 parsing
file:/C:/APP/nutch/conf/hadoop-site.xml
060426 105416 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/hadoop-default.xml
060426 105416 parsing
jar:file:/C:/APP/nutch/lib/hadoop-0.1.1.jar!/mapred-default.xml
060426 105416 parsing
c:\tmp\hadoop\mapred\local\localRunner\job_dhieiq.xml
060426 105416 parsing
file:/C:/APP/nutch/conf/hadoop-site.xml
060426 105416 Running job: job_dhieiq
060426 105416 job_dhieiq
java.io.IOException: No input directories specified
in: Configuration: defaults:
 hadoop-default.xml , mapred-default.xml ,
c:\tmp\hadoop\mapred\local\localRunner\job_dhieiq.xmlfinal:
hadoop-site.xml
        at
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.ja
va:90)
        at
org.apache.hadoop.mapred.SequenceFileInputFormat.listFiles(SequenceFi
leInputFormat.java:37)
        at
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.ja
va:100)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:8
8)
060426 105417  map 0%  reduce 0%
Exception in thread "main" java.io.IOException: Job
failed!
        at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
        at
org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:151)

--- Peter Swoboda <[hidden email]> wrote:

> you're right. there was another dir.
> i deleted it, but injecting although doesn't work.
> We decided to change to nutch0.7.2
>
> Thanks for helping!!
>
> > --- Urspr�ngliche Nachricht ---
> > Von: "Zaheed Haque" <[hidden email]>
> > An: [hidden email]
> > Betreff: Re: java.io.IOException: No input
> directories specified in
> > Datum: Wed, 26 Apr 2006 10:06:27 +0200
> >
> > I don't think you can have a directory called
> "urls" under your "urls"
> > directory? That you have below...
> >
> > /user/swoboda/urls/urls <dir>
> >
> > please remove the above directory and try inject
> again.
> >
> > bin/hadoop dfs -rm urls/urls
> >
> > then double check that there are no directory
> under your urls directory
> > before
> > running inject..
> >
> > On 4/26/06, Peter Swoboda
> <[hidden email]> wrote:
> > >
> > > > hmm.. where is your urls.txt file? is it in
> Hadoop filesystem, I mean
> > > > what happen if you try
> > > >
> > > > bin/hadoop dfs -ls urls
> > > >
> > > bash-3.00$ bin/hadoop dfs -ls urls
> > > 060426 094810 parsing
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > 060426 094810 parsing
> > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml

> > > 060426 094811 Client connection to
> 127.0.0.1:50000: starting
> > > 060426 094811 No FS indicated, using
> default:localhost.localdomain:50000
> > > Found 3 items
> > > /user/swoboda/urls/urllist.txt  26
> > > /user/swoboda/urls/urllist.txt~ 0
> > > /user/swoboda/urls/urls <dir>
> > > bash-3.00$
> > >
> > >
> > >
> > > > /Z
> > > >
> > > > On 4/26/06, Zaheed Haque
> <[hidden email]> wrote:
> > > > > On 4/26/06, Peter Swoboda
> <[hidden email]> wrote:
> > > > > > > --- Urspr�ngliche Nachricht ---
> > > > > > > Von: "Zaheed Haque"
> <[hidden email]>
> > > > > > > An: [hidden email]
> > > > > > > Betreff: Re: java.io.IOException: No
> input directories specified
> > in
> > > > > > > Datum: Wed, 26 Apr 2006 09:12:47 +0200
> > > > > > >
> > > > > > > good. as you can see all your data will
> be saved under
> > > > > > >
> > > > > > > /user/swoboda/
> > > > > > >
> > > > > > > And urls is the directory where you have
> your urls.txt file.
> > > > > > >
> > > > > > > so the inject statement you should have
> is the following:
> > > > > > >
> > > > > > > bin/nutch inject crawldb urls
> > > > > >
> > > > > > result:
> > > > > > bash-3.00$ bin/nutch inject crawldb urls
> > > > > > 060426 091859 Injector: starting
> > > > > > 060426 091859 Injector: crawlDb: crawldb
> > > > > > 060426 091859 Injector: urlDir: urls
> > > > > > 060426 091900 parsing
> > > > > >
> > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 091900 parsing
> > > > > >
> >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > > > > > 060426 091901 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > > > > > 060426 091901 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > > > 060426 091901 Injector: Converting
> injected urls to crawl db
> > entries.
> > > > > > 060426 091901 parsing
> > > > > >
> > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 091901 parsing
> > > > > >
> >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-default.xml
> > > > > > 060426 091901 parsing
> > > > > >
> > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml
> > > > > > 060426 091901 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/nutch-site.xml
> > > > > > 060426 091901 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml

> > > > > > 060426 091901 Client connection to
> 127.0.0.1:50020: starting
> > > > > > 060426 091902 Client connection to
> 127.0.0.1:50000: starting
> > > > > > 060426 091902 parsing
> > > > > >
> > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 091902 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml
> > > > > > 060426 091907 Running job: job_b59xmu
> > > > > > 060426 091908  map 100%  reduce 100%
> > > > > > Exception in thread "main"
> java.io.IOException: Job failed!
> > > > > >         at
> > > >
>
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322)
> > > > > >         at
> >
>
org.apache.nutch.crawl.Injector.inject(Injector.java:114)
> > > > > >         at
>
org.apache.nutch.crawl.Injector.main(Injector.java:138)

> > > > > > bash-3.00$
> > > > > >
> > > > > > >
> > > > > > > so try the above first then try
> > > > > > >
> > > > > > > hadoop dfs -ls you will see crawldb
> directory.
> > > > > > >
> > > > > >
> > > > > > bash-3.00$ bin/hadoop dfs -ls
> > > > > > 060426 091842 parsing
> > > > > >
> > > >
> > >
> >
>
jar:file:/home/stud/swoboda/Desktop/nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml
> > > > > > 060426 091843 parsing
> > > > > >
>
file:/home/stud/swoboda/Desktop/nutch-nightly/conf/hadoop-site.xml

> > > > > > 060426 091843 Client connection to
> 127.0.0.1:50000: starting
> > > > > > 060426 091843 No FS indicated, using
> > > > default:localhost.localdomain:50000
> > > > > > Found 1 items
> > > > > > /user/swoboda/urls      <dir>
> > > > > > bash-3.00$
> > > > > >
> > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On 4/26/06, Peter Swoboda
> <[hidden email]> wrote:
> > > > > > > > Hi.
> > > > > > > > Of course i can. here you are:
> > > > > > > >
> > > > > > > >
> > > > > > > > > --- Urspr�ngliche Nachricht ---
> > > > > > > > > Von: "Zaheed Haque"
> <[hidden email]>
> > > > > > > > > An: [hidden email]
> > > > > > > > > Betreff: Re: java.io.IOException: No
> input directories
> > specified
> > > > in
>
=== message truncated ===
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Doug Cutting
Chris Fellows wrote:
> I'm having what appears to be the same issue on 0.8
> trunk. I can get through inject, generate, fetch and
> updatedb, but am getting the IOException: No input
> directories on invertlinks and cannot figure out why.
> I'm only using nutch on a single local windows
> machine. Any idea's? Configuration has not changed
> since checking out from svn.

The handling of Windows pathnames is still buggy in Hadoop 0.1.1.  You
might try replacing your lib/hadoop-0.1.1.jar file with the latest
Hadoop nightly jar, from:

http://cvs.apache.org/dist/lucene/hadoop/nightly/

The file name code has been extensively re-written.  The next Hadoop
release (0.2), containing these fixes, will be made in around a week.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Chris Fellows-3
Thanks for the response.

I did get it going by specifing the segment (ie.
crawl/segments/20060425173804)

Per your last email, that's probably a bug as it looks
like it is supposed to invertlinks on all the segments
(LinkDb.java: 147). I'll wait for the 0.2 release, for
now this is okay for me.

As quick feedback on the tutorials, a few short lines
on these commands might really help out. The commands
that tooks me a few minutes to figure out were:

bin/nutch inject db urls (where db is that database
directory and urls is the url directory, not the
actual url.txt file)

and the line in indexing -

wiki shows:

bin/nutch index indexes crawl/linkdb crawl/segments/*

should be:

bin/nutch index crawl/index crawl/crawldb crawl/linkdb
crawl/segments/*

Again, as you said, maybe this is just the windows
path names bug. In which case I'll try again on hadoop
0.2.

Otherwise, everything else is fairly self-explanatory.
I'm definitely enjoying the product. When I tried
0.7.2, I was up and running in under an hour!

--- Doug Cutting <[hidden email]> wrote:

> Chris Fellows wrote:
> > I'm having what appears to be the same issue on
> 0.8
> > trunk. I can get through inject, generate, fetch
> and
> > updatedb, but am getting the IOException: No input
> > directories on invertlinks and cannot figure out
> why.
> > I'm only using nutch on a single local windows
> > machine. Any idea's? Configuration has not changed
> > since checking out from svn.
>
> The handling of Windows pathnames is still buggy in
> Hadoop 0.1.1.  You
> might try replacing your lib/hadoop-0.1.1.jar file
> with the latest
> Hadoop nightly jar, from:
>
> http://cvs.apache.org/dist/lucene/hadoop/nightly/
>
> The file name code has been extensively re-written.
> The next Hadoop
> release (0.2), containing these fixes, will be made
> in around a week.
>
> Doug
>

Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Peter Swoboda
Mhh. But seems to be nearly the same problem like mine.
And i'm running unix.


Chris Fellows schrieb:

> Thanks for the response.
>
> I did get it going by specifing the segment (ie.
> crawl/segments/20060425173804)
>
> Per your last email, that's probably a bug as it looks
> like it is supposed to invertlinks on all the segments
> (LinkDb.java: 147). I'll wait for the 0.2 release, for
> now this is okay for me.
>
> As quick feedback on the tutorials, a few short lines
> on these commands might really help out. The commands
> that tooks me a few minutes to figure out were:
>
> bin/nutch inject db urls (where db is that database
> directory and urls is the url directory, not the
> actual url.txt file)
>
> and the line in indexing -
>
> wiki shows:
>
> bin/nutch index indexes crawl/linkdb crawl/segments/*
>
> should be:
>
> bin/nutch index crawl/index crawl/crawldb crawl/linkdb
> crawl/segments/*
>
> Again, as you said, maybe this is just the windows
> path names bug. In which case I'll try again on hadoop
> 0.2.
>
> Otherwise, everything else is fairly self-explanatory.
> I'm definitely enjoying the product. When I tried
> 0.7.2, I was up and running in under an hour!
>
> --- Doug Cutting <[hidden email]> wrote:
>
>  
>> Chris Fellows wrote:
>>    
>>> I'm having what appears to be the same issue on
>>>      
>> 0.8
>>    
>>> trunk. I can get through inject, generate, fetch
>>>      
>> and
>>    
>>> updatedb, but am getting the IOException: No input
>>> directories on invertlinks and cannot figure out
>>>      
>> why.
>>    
>>> I'm only using nutch on a single local windows
>>> machine. Any idea's? Configuration has not changed
>>> since checking out from svn.
>>>      
>> The handling of Windows pathnames is still buggy in
>> Hadoop 0.1.1.  You
>> might try replacing your lib/hadoop-0.1.1.jar file
>> with the latest
>> Hadoop nightly jar, from:
>>
>> http://cvs.apache.org/dist/lucene/hadoop/nightly/
>>
>> The file name code has been extensively re-written.
>> The next Hadoop
>> release (0.2), containing these fixes, will be made
>> in around a week.
>>
>> Doug
>>
>>    
>
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Chris Fellows-3
using crawl/segments/* does work...

--- Peter Swoboda <[hidden email]> wrote:

> Mhh. But seems to be nearly the same problem like
> mine.
> And i'm running unix.
>
>
> Chris Fellows schrieb:
> > Thanks for the response.
> >
> > I did get it going by specifing the segment (ie.
> > crawl/segments/20060425173804)
> >
> > Per your last email, that's probably a bug as it
> looks
> > like it is supposed to invertlinks on all the
> segments
> > (LinkDb.java: 147). I'll wait for the 0.2 release,
> for
> > now this is okay for me.
> >
> > As quick feedback on the tutorials, a few short
> lines
> > on these commands might really help out. The
> commands
> > that tooks me a few minutes to figure out were:
> >
> > bin/nutch inject db urls (where db is that
> database
> > directory and urls is the url directory, not the
> > actual url.txt file)
> >
> > and the line in indexing -
> >
> > wiki shows:
> >
> > bin/nutch index indexes crawl/linkdb
> crawl/segments/*
> >
> > should be:
> >
> > bin/nutch index crawl/index crawl/crawldb
> crawl/linkdb
> > crawl/segments/*
> >
> > Again, as you said, maybe this is just the windows
> > path names bug. In which case I'll try again on
> hadoop
> > 0.2.
> >
> > Otherwise, everything else is fairly
> self-explanatory.
> > I'm definitely enjoying the product. When I tried
> > 0.7.2, I was up and running in under an hour!
> >
> > --- Doug Cutting <[hidden email]> wrote:
> >
> >  
> >> Chris Fellows wrote:
> >>    
> >>> I'm having what appears to be the same issue on
> >>>      
> >> 0.8
> >>    
> >>> trunk. I can get through inject, generate, fetch
> >>>      
> >> and
> >>    
> >>> updatedb, but am getting the IOException: No
> input
> >>> directories on invertlinks and cannot figure out
> >>>      
> >> why.
> >>    
> >>> I'm only using nutch on a single local windows
> >>> machine. Any idea's? Configuration has not
> changed
> >>> since checking out from svn.
> >>>      
> >> The handling of Windows pathnames is still buggy
> in
> >> Hadoop 0.1.1.  You
> >> might try replacing your lib/hadoop-0.1.1.jar
> file
> >> with the latest
> >> Hadoop nightly jar, from:
> >>
> >> http://cvs.apache.org/dist/lucene/hadoop/nightly/
> >>
> >> The file name code has been extensively
> re-written.
> >> The next Hadoop
> >> release (0.2), containing these fixes, will be
> made
> >> in around a week.
> >>
> >> Doug
> >>
> >>    
> >
> >
> >  
>
>

Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

Andrzej Białecki-2
Chris Fellows wrote:
> using crawl/segments/* does work...
>  

I _think_ your problem may have been caused by a bug in LinkDb updating.
I just committed a fixed version, please give it a try.

Ah, and don't forget to use a -dir switch if you point it at the top
segments directory - this is an incompatible change with the
instructions in the tutorial and other docs... sorry about that, it is
more consistent this way with other tools.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


vis
Reply | Threaded
Open this post in threaded view
|

Re: java.io.IOException: No input directories specified in

vis
In reply to this post by Peter Swoboda
Sorry, I am on holiday until the 8th of May.

Please contact the [hidden email] for urgent matters.

Kind regards, Herman.

12