Configuration and Hadoop cluster setup

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Configuration and Hadoop cluster setup

Doug Cutting
Phantom wrote:

> (1) Set my fs.default.name set to hdfs://<host>:<port> and also specify it
> in the JobConf configuration. Copy my sample input file into HDFS using
> "bin/hadoop fd -put" from my local file system. I then need to specify this
> file to my WordCount sample as input. Should I specify this file with the
> hdfs:// directive ?
>
> (2) Set my fs.default.name set to file://<host>:<port> and also specify it
> in the JobConf configuration. Just specify the input path to the WordCount
> sample and everything should work if the path is available to all machines
> in the cluster ?
>
> Which way should I go ?

Either should work.  So should a third option, which is to have your job
input in the non-default filesystem, but there's currently a bug that
prevents that from working.  But the above two should work.  The second
assumes that the input is available on the same path in the native
filesystem on all nodes.

When naming files in the default filesystem you do not need to specify
their filesystem, since it is the default, but it is not an error to
specify it.

The most common mode of distributed operation is (1): use an HDFS
filesytem as your fs.default.name, copy your initial input into that
filesystem with 'bin/hadoop fs -put localPath hdfsPath', then specify
'hdfsPath' as your job's input.  The "hdfs://host:port" is not required
at this point, since it is the default.

Doug



Reply | Threaded
Open this post in threaded view
|

Re: Configuration and Hadoop cluster setup

Avinash Lakshman-2
I did run it the way you suggested. But I am running into a slew of
ClassNotFoundException┬╣s for the MapClass. Exporting the CLASSPATH doesn┬╣t
seem to fix it. How do I get around it ?

Thanks
Avinash


On 5/29/07 1:30 PM, "Doug Cutting" <[hidden email]> wrote:

> Phantom wrote:
>> > (1) Set my fs.default.name set to hdfs://<host>:<port> and also specify it
>> > in the JobConf configuration. Copy my sample input file into HDFS using
>> > "bin/hadoop fd -put" from my local file system. I then need to specify this
>> > file to my WordCount sample as input. Should I specify this file with the
>> > hdfs:// directive ?
>> >
>> > (2) Set my fs.default.name set to file://<host>:<port> and also specify it
>> > in the JobConf configuration. Just specify the input path to the WordCount
>> > sample and everything should work if the path is available to all machines
>> > in the cluster ?
>> >
>> > Which way should I go ?
>
> Either should work.  So should a third option, which is to have your job
> input in the non-default filesystem, but there's currently a bug that
> prevents that from working.  But the above two should work.  The second
> assumes that the input is available on the same path in the native
> filesystem on all nodes.
>
> When naming files in the default filesystem you do not need to specify
> their filesystem, since it is the default, but it is not an error to
> specify it.
>
> The most common mode of distributed operation is (1): use an HDFS
> filesytem as your fs.default.name, copy your initial input into that
> filesystem with 'bin/hadoop fs -put localPath hdfsPath', then specify
> 'hdfsPath' as your job's input.  The "hdfs://host:port" is not required
> at this point, since it is the default.
>
> Doug
>
>
>
>


12