Amazon S3 and EC2

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Amazon S3 and EC2

Milica Bogicevic

I'm trying to save crawled data ona S3.
I am using Nutch 1.4 and Hadoop 0.20.2 and everything works fine on my
local machine. When I try to do the same thing on EC2 using EMR and store
data on S3, I'm getting following exception:

Exception in thread "main"
org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: hdfs://domU-12-31-39-0B-00-88.compute-1.internal:9000/data/crawl/flowers/urls/seed.txt
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(
        at org.apache.hadoop.mapred.JobClient.submitJob(
        at org.apache.hadoop.mapred.JobClient.runJob(
        at org.apache.nutch.crawl.Injector.inject(

I'm sure that I've set up input path correctly.

If you have any ideas, it will be more than welcome.

Or... If I succeed in my attentions, I'll let you know.