Amazon S3 and EC2

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Amazon S3 and EC2

Milica Bogicevic
Hi,

I'm trying to save crawled data ona S3.
I am using Nutch 1.4 and Hadoop 0.20.2 and everything works fine on my
local machine. When I try to do the same thing on EC2 using EMR and store
data on S3, I'm getting following exception:

Exception in thread "main"
org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: hdfs://domU-12-31-39-0B-00-88.compute-1.internal:9000/data/crawl/flowers/urls/seed.txt
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:858)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:829)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:777)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1297)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)


I'm sure that I've set up input path correctly.

If you have any ideas, it will be more than welcome.

Or... If I succeed in my attentions, I'll let you know.

Milica