"Input path does not exist" on temporal inject files

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

"Input path does not exist" on temporal inject files

Hi ppl,

I'm trying to setup a local dev environment but I'm stuck on this point:

hadoop@localhost:~/nutch/trunk$ bin/nutch crawl urls -depth 1
crawl started in: crawl-20081229105849
rootUrlDir = urls
threads = 10
depth = 1
Injector: starting
Injector: crawlDb: crawl-20081229105849/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Exception in thread "main"
org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: hdfs://localhost:9000/home/hadoop/hdfs/mapred/temp/inject-temp-979701318
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
        at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:169)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)

According to the archives I've disabled "mapred.speculative.execution"
but it's active on each of .xml job files:


In addition I'm setting the environment variables before launching
hadoop & nutch like this (as advised):

export NUTCH_CONF_DIR=/home/hadoop/nutch/trunk/conf
export NUTCH_HOME=/home/hadoop/nutch/trunk
export CLASSPATH=$CLASSPATH:/home/hadoop/nutch/trunk/conf

I've also rm -rf'd & hadoop namenode -format'd.

What am I doing wrong ? Previous releases worked well for me out of the box :/