Hadoop Filesystem not "seeing" my input files

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Hadoop Filesystem not "seeing" my input files

Mark Meissonnier
Hi,
I don't know if this is the right place to ask for help, but I've been
struggling with hadoop configuration for a few days and I'm running out
of ideas.
I'm trying to run a perl map/reduce using hadoopStreaming but I get an
error:
07/05/14 20:23:38 ERROR streaming.StreamJob: Error Launching job : Input
path doesnt exist : /tmp/mark/mrtest/input
 
I checked and there's stuff in there.
Now the weird part is that I unjared the code in Eclipse, compiled it
and somehow I'm able to make it work by calling it from the UI.
One thing I noticed was different between the two runs (I put some print
statements, recompiled and called from command line) is that the method
isHadoopLocal returns a different value which in turn yields to a
different logical path... The UI version return "true" and doesn't go
through the packageJar, etc.. while the command line returns "false" on
isLocalHadoop from the StreamJob class in the streaming package.
 
I found some page talking about my error message and one thing it
mentions is to run
bin/hadoop dfs -ls /tmp/mark/mrtest/input
 
I did it both  in the EclipseUI and command line and the EclipseUI
returns
found 5 items .... bottom line is it's seeing it which confirms the fact
that the streaming worked for a reason
while the command line returns
 
> hadoop_install/hadoop/bin/hadoop dfs -ls /tmp/mark/mrtest/input
Found 0 items
 
I tried changing hadoop-site.xml to put "local"
but got an error...
<configuration>
<property>
  <name>fs.default.name</name>
  <value>local</value>
  <description>
    The name of the default file system. Either the literal string
    "local" or a host:port for NDFS.
  </description>
</property>
 
<property>
  <name>mapred.job.tracker</name>
  <value>local</value>
  <description>
    The host and port that the MapReduce job tracker runs at. If
    "local", then jobs are run in-process as a single map and
    reduce task.
  </description>
</property>
...
 
 
Anyclue, on what I'm doing wrong?
Do I have to put my input somewhere else?
I tried putting it in my DFS filesystem root
/tmp/mark/nutch/filesystem
 
and even other places but to no avail...
 
Any help or pointer would be greatly appreciated.
Thanks a mil.
Mark