Submitting jobs into Hadoop

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Submitting jobs into Hadoop

alakshman
Hi

When I submit jobs in Hadoop how do the physical class files get distributed
to the nodes on which the Map/Reduce jobs run ? Is some kind of dynamic
class loading used or are the jar files copied to the machines where they
are needed ?

Thanks
Avinash
Reply | Threaded
Open this post in threaded view
|

Re: Submitting jobs into Hadoop

Doug Cutting
Phantom wrote:
> When I submit jobs in Hadoop how do the physical class files get
> distributed
> to the nodes on which the Map/Reduce jobs run ? Is some kind of dynamic
> class loading used or are the jar files copied to the machines where they
> are needed ?

The job's jar file is unpacked in the directory that tasks are connected
to when they run.  The classpath of the task JVM includes all jars in
the lib/ directory in the job's jar, the classes/ directory from the
jar, and the top-level directory of the jar.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: Submitting jobs into Hadoop

Owen O'Malley-4
On Jul 12, 2007, at 2:10 PM, Doug Cutting wrote:

> The job's jar file is unpacked in the directory that tasks are  
> connected to when they run.  The classpath of the task JVM includes  
> all jars in the lib/ directory in the job's jar, the classes/  
> directory from the jar, and the top-level directory of the jar.

Just a few more details:
   1. The JobClient copies the jar from the local file system into  
HDFS under the "system" directory.
   2. The first task for a given job that is run on a TaskTracker  
reads the jar out of HDFS, writes it locally, and
        expands it.
   3. When the job is done, the jar is deleted from all of the  
TaskTrackers and the system directory.

-- Owen
Reply | Threaded
Open this post in threaded view
|

RE: Submitting jobs into Hadoop

Mahajan, Neeraj
Just out of curiosity, how does it work if I am not using HDFS?

~ Neeraj

-----Original Message-----
From: Owen O'Malley [mailto:[hidden email]]
Sent: Thursday, July 12, 2007 6:47 PM
To: [hidden email]
Subject: Re: Submitting jobs into Hadoop

On Jul 12, 2007, at 2:10 PM, Doug Cutting wrote:

> The job's jar file is unpacked in the directory that tasks are
> connected to when they run.  The classpath of the task JVM includes
> all jars in the lib/ directory in the job's jar, the classes/
> directory from the jar, and the top-level directory of the jar.

Just a few more details:
   1. The JobClient copies the jar from the local file system into HDFS
under the "system" directory.
   2. The first task for a given job that is run on a TaskTracker reads
the jar out of HDFS, writes it locally, and
        expands it.
   3. When the job is done, the jar is deleted from all of the
TaskTrackers and the system directory.

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: Submitting jobs into Hadoop

Owen O'Malley-4

On Jul 12, 2007, at 6:53 PM, Mahajan, Neeraj wrote:

> Just out of curiosity, how does it work if I am not using HDFS?

Ah. I was presenting the typical case. In reality, the "system"  
directory may be in any of the supported file systems, provided that  
it is available to the entire cluster. For example, you could use an  
NFS directory for the system file system. In general though, the best  
aggregate bandwidth will be from HDFS.

-- Owen