Running tasks in the TaskTracker VM

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Running tasks in the TaskTracker VM

Philippe Gassmann
Hi Hadoop guys,

At the moment, for each task (map or reduce) a new JVM is created by the
TaskTracker to run the Job.

We have in our Hadoop cluster a high number of small files thus
requiring a high number of map tasks. I know this is suboptimal, but
aggregating those small files is not possible now. So an idea came to us
: launching jobs in the task tracker JVM so the overhead of creating a
new vm will disappear.

I already have a working patch against the 0.10.1 release of Hadoop that
launch tasks inside the TaskTracker JVM if a specific parameter is set
in the JobConf of the launched Job (for job we trust ;) ). Each new task
have a specific class loader which basically load every needed class by
the Task, as it was running in a brand new JVM. (the same "classpath" is
used)

For that to work, an upgrade of commons-logging to the 1.1 version is
needed in order to circumvent class loader / memory leaks issues. I've
done some profiling using jprofiler on the task tracker to find and to
remove mem leaks. So I'm pretty confident with this code.

If you are interested with that, please let me know.
If so, I will provide a patch against the current Hadoop trunk in Jira
as soon as possible.

--
Philippe.



Reply | Threaded
Open this post in threaded view
|

Re: Running tasks in the TaskTracker VM

Torsten Curdt

On 19.03.2007, at 15:46, Philippe Gassmann wrote:

> Hi Hadoop guys,
>
> At the moment, for each task (map or reduce) a new JVM is created  
> by the
> TaskTracker to run the Job.
>
> We have in our Hadoop cluster a high number of small files thus
> requiring a high number of map tasks. I know this is suboptimal, but
> aggregating those small files is not possible now. So an idea came  
> to us
> : launching jobs in the task tracker JVM so the overhead of creating a
> new vm will disappear.

Cool stuff! :)

cheers
--
Torsten


Reply | Threaded
Open this post in threaded view
|

Re: Running tasks in the TaskTracker VM

Doug Cutting
In reply to this post by Philippe Gassmann
Philippe Gassmann wrote:
> At the moment, for each task (map or reduce) a new JVM is created by the
> TaskTracker to run the Job.
>
> We have in our Hadoop cluster a high number of small files thus
> requiring a high number of map tasks. I know this is suboptimal, but
> aggregating those small files is not possible now. So an idea came to us
> : launching jobs in the task tracker JVM so the overhead of creating a
> new vm will disappear.

A simpler approach might be to develop an InputFormat that includes
multiple files per split.

> I already have a working patch against the 0.10.1 release of Hadoop that
> launch tasks inside the TaskTracker JVM if a specific parameter is set
> in the JobConf of the launched Job (for job we trust ;) ).

Ideally this could be through a task-running interface, that permits one
to plug in different implementations.  For example, sometimes it may
make sense to run tasks in-process, sometimes to run them in a child
JVM, and sometimes to fork a non-Java sub-process.  So, rather than
specifying a flag on the job, one would specify the runner
implementation class.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: Running tasks in the TaskTracker VM

Philippe Gassmann
Doug Cutting a écrit :

> Philippe Gassmann wrote:
>> At the moment, for each task (map or reduce) a new JVM is created by the
>> TaskTracker to run the Job.
>>
>> We have in our Hadoop cluster a high number of small files thus
>> requiring a high number of map tasks. I know this is suboptimal, but
>> aggregating those small files is not possible now. So an idea came to us
>> : launching jobs in the task tracker JVM so the overhead of creating a
>> new vm will disappear.
>
> A simpler approach might be to develop an InputFormat that includes
> multiple files per split.
>

Yes, but the issue remains present if you have to deal with a high
number of map tasks to distribute the load on many machines. Launching a
JVM is costly, let's say it costs 1 second (i'm optimistic) , if you
have to do 2000 map, there will be 2000 seconds lost in launching JVMs...

>> I already have a working patch against the 0.10.1 release of Hadoop that
>> launch tasks inside the TaskTracker JVM if a specific parameter is set
>> in the JobConf of the launched Job (for job we trust ;) ).
>
> Ideally this could be through a task-running interface, that permits
> one to plug in different implementations.  For example, sometimes it
> may make sense to run tasks in-process, sometimes to run them in a
> child JVM, and sometimes to fork a non-Java sub-process.  So, rather
> than specifying a flag on the job, one would specify the runner
> implementation class.
>

A bit of refactoring of the TaskRunner hierarchy is needed for this to
work : the code that launch tasks in the JVM or in a separate process is
very similar and it would have a sense that the TaskRunner would be the
superclass of a InJVMRunner and a ChildJVMRunner.
But what can we do with MapTaskRunner and ReduceTaskRunner ? It is not
acceptable to have let's say : 2 or more implementation of the
MapTaskRunner (one for in a child JVM execution, one for a in tracker
JVM execution...). It would be painful to maintain and very complicated.

> Doug

Reply | Threaded
Open this post in threaded view
|

Re: Running tasks in the TaskTracker VM

Doug Cutting
Philippe Gassmann wrote:
> Yes, but the issue remains present if you have to deal with a high
> number of map tasks to distribute the load on many machines. Launching a
> JVM is costly, let's say it costs 1 second (i'm optimistic) , if you
> have to do 2000 map, there will be 2000 seconds lost in launching JVMs...

The InputFormat controls the number of map tasks.  So, if 2000 is too
many, so that JVM startup time dominates, then you can develop an
InputFormat that splits things into fewer tasks so that this is not a
problem.

> A bit of refactoring of the TaskRunner hierarchy is needed for this to
> work : the code that launch tasks in the JVM or in a separate process is
> very similar and it would have a sense that the TaskRunner would be the
> superclass of a InJVMRunner and a ChildJVMRunner.
> But what can we do with MapTaskRunner and ReduceTaskRunner ? It is not
> acceptable to have let's say : 2 or more implementation of the
> MapTaskRunner (one for in a child JVM execution, one for a in tracker
> JVM execution...). It would be painful to maintain and very complicated.

Perhaps it is too complicated for now, but I think we will want
something like that long-term, so it is worth thinking about.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: Running tasks in the TaskTracker VM

Milind Bhandarkar
In reply to this post by Philippe Gassmann
>
> Yes, but the issue remains present if you have to deal with a high
> number of map tasks to distribute the load on many machines.  
> Launching a
> JVM is costly, let's say it costs 1 second (i'm optimistic) , if you
> have to do 2000 map, there will be 2000 seconds lost in launching  
> JVMs...
>

Executing users' code in system daemons is a security risk. In my  
experience, security always wins in when pitted against performance.  
IMHO, there is a happy middle ground, i.e. to maintain a pool of  
running JVMs that are launched when the tasktracker starts up. Even  
then, care has to be taken against memory leaks etc.

- Milind

--
Milind Bhandarkar
(mailto:[hidden email])
(phone: 408-349-2136 W)


Reply | Threaded
Open this post in threaded view
|

Re: Running tasks in the TaskTracker VM

Torsten Curdt
>> Yes, but the issue remains present if you have to deal with a high
>> number of map tasks to distribute the load on many machines.  
>> Launching a
>> JVM is costly, let's say it costs 1 second (i'm optimistic) , if you
>> have to do 2000 map, there will be 2000 seconds lost in launching  
>> JVMs...
>>
>
> Executing users' code in system daemons is a security risk.

Of course there is security benefit in starting the jobs in a  
different JVM but if you don't trust the code you are executing this  
is probably not for you either. So bottom line is - if you weight up  
the performance penalty against the gained security I am still no  
excited about the JVM spawning idea.

If you really consider security that big of a problem - come up with  
your own language to ease and restrict the jobs.

My 2 cents
--
Torsten



Reply | Threaded
Open this post in threaded view
|

Re: Running tasks in the TaskTracker VM

Owen O'Malley-5
In reply to this post by Philippe Gassmann

On Mar 19, 2007, at 10:51 AM, Philippe Gassmann wrote:

> Doug Cutting a écrit :
>> A simpler approach might be to develop an InputFormat that includes
>> multiple files per split.
>>
>
> Yes, but the issue remains present if you have to deal with a high
> number of map tasks to distribute the load on many machines.  
> Launching a
> JVM is costly, let's say it costs 1 second (i'm optimistic) , if you
> have to do 2000 map, there will be 2000 seconds lost in launching  
> JVMs...

For task granularity, the most that makes sense is roughly 10-50  
tasks/node. Given that a node runs at least 2 tasks at once, it maps  
into 5-25 seconds of wallclock time. It is noticeable, but shouldn't  
be the dominant factor.

>>> I already have a working patch against the 0.10.1 release of  
>>> Hadoop that
>>> launch tasks inside the TaskTracker JVM if a specific parameter  
>>> is set
>>> in the JobConf of the launched Job (for job we trust ;) ).

Another possible direction would be to have the Task JVM ask for  
another Task before exiting. I believe that Ben Reed experimented  
with that and the changes were not too extensive. For security, you  
would want to limit the JVM reuse to tasks within the same job.

As a side note, we've already seen cases of client code that killed  
the task trackers. So it is hardly an abstract concern. *smile* (The  
client code managed to send kill signals to the entire process group,  
which included the task tracker. It was hard to debug and I'm not  
very interested in making it easier for client code to take out the  
servers.)

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: Running tasks in the TaskTracker VM

Stephane Bailliez-3
In reply to this post by Torsten Curdt
Torsten Curdt wrote:

>> Executing users' code in system daemons is a security risk.
>
> Of course there is security benefit in starting the jobs in a different
> JVM but if you don't trust the code you are executing this is probably
> not for you either. So bottom line is - if you weight up the performance
> penalty against the gained security I am still no excited about the JVM
> spawning idea.
>
> If you really consider security that big of a problem - come up with
> your own language to ease and restrict the jobs.

I think security here was more about 'taking down the whole task
tracker' risk.

Being a complete idiot for distributed computing, I would say it is easy
to explode a JVM when doing such distributed jobs, (should it be for OOM
or anything).

If you run within the task tracker vm you'll have to carefully size the
tracker vm to accommodate potentially the resources of all possibles
jobs running at the same time or simply allocate a gigantic amount of
resources 'just in case', which kind of offset the benefits of any
performance improvement to stability.

Not mentioning cleaning up all the mess left by running jobs including
flushing the introspection cache to avoid leaks, which will then impact
performance of other jobs since it is not a selective flush.

Failing jobs are not exactly uncommon and running things in a sandboxed
environment with less risk for the tracker seems like a perfectly
reasonable choice. So yeah, vm pooling certainly makes perfect sense for
it or should probably look at what Doug suggests as well.

My 0.01 kopek ;)

-- stephane

Reply | Threaded
Open this post in threaded view
|

Re: Running tasks in the TaskTracker VM

Torsten Curdt

On 20.03.2007, at 11:19, Stephane Bailliez wrote:

> Torsten Curdt wrote:
>>> Executing users' code in system daemons is a security risk.
>> Of course there is security benefit in starting the jobs in a  
>> different JVM but if you don't trust the code you are executing  
>> this is probably not for you either. So bottom line is - if you  
>> weight up the performance penalty against the gained security I am  
>> still no excited about the JVM spawning idea.
>> If you really consider security that big of a problem - come up  
>> with your own language to ease and restrict the jobs.
>
> I think security here was more about 'taking down the whole task  
> tracker' risk.

Well, the same applies

> Being a complete idiot for distributed computing, I would say it is  
> easy to explode a JVM when doing such distributed jobs, (should it  
> be for OOM or anything).

Then restrict what people can do - at least Google went that route.

> If you run within the task tracker vm you'll have to carefully size  
> the tracker vm to accommodate potentially the resources of all  
> possibles jobs running at the same time or simply allocate a  
> gigantic amount of resources 'just in case', which kind of offset  
> the benefits of any performance improvement to stability.

Question is whether the task tracker should have access to that  
gigantic amount of resources. In one jvm or the other.

> Not mentioning cleaning up all the mess left by running jobs  
> including flushing the introspection cache to avoid leaks, which  
> will then impact performance of other jobs since it is not a  
> selective flush.
>
> Failing jobs are not exactly uncommon and running things in a  
> sandboxed environment with less risk for the tracker seems like a  
> perfectly reasonable choice. So yeah, vm pooling certainly makes  
> perfect sense for it

I am still not convinced - sorry

It's a bit like you would like to run JSPs in a separate JVM because  
they might take down the servlet container.

cheers
--
Torsten
Reply | Threaded
Open this post in threaded view
|

Re: Running tasks in the TaskTracker VM

Stephane Bailliez-3
Torsten Curdt wrote:
>
>> Being a complete idiot for distributed computing, I would say it is
>> easy to explode a JVM when doing such distributed jobs, (should it be
>> for OOM or anything).
>
> Then restrict what people can do - at least Google went that route.

I don't know what Google did on the specifics :)

If you want to do that with Java and restrict memory usage, cpu usage
and descriptor access within each inVM instance. That's a considerable
amount of work that likely implies writing a specific agent for the vm
(or an agent for a specific vm that is, because it's pretty unlikely
that you will get the same results across vms), assuming that can then
really be done at the classloader level for each task (which is pretty
insanely complex to me if you have to consider allocation done at the
parent classloader level, etc..)

At least by forking a vm you can afford to get some reasonably bound
control over the resources usage (or at least memory) without bringing
down everything since a vm is already bound to some degrees.


>> Failing jobs are not exactly uncommon and running things in a
>> sandboxed environment with less risk for the tracker seems like a
>> perfectly reasonable choice. So yeah, vm pooling certainly makes
>> perfect sense for it
>
> I am still not convinced - sorry
>
> It's a bit like you would like to run JSPs in a separate JVM because
> they might take down the servlet container.

it is  a bit too extreme in granularity. I think it is more about like
running n different webapps within the same VM or not. So if one webapp
is resource hog, separating it would not harm the n-1 other applications
and you would either create another server instance or move it away to
another node.

I know of environment with large number of nodes (not related to hadoop)
where they also reboot a set of nodes daily to ensure that all machines
are really in working conditions (it's usually when the machine reboots
due to failure or whatever that someone has to rush to it because some
service forgot to be registered or things like that, so doing this
periodic check gives some people better ideas of their response time to
failure). That depends of operational procedures for sure.

I don't think it should be done in the spirit that everything is perfect
in the perfect world because we know it is not like that. So there will
be compromise between safety and performance and having something
reasonably tolerant to failure is also a performance advantage.

Doing simple things in a task like a deleteOnExit is enough to leak on
some VMs a few kbs each time and stay there until the vm dies (fixed in
1.5.0_10 if I remember well). Figuring out things like that in the end
is likely to take a severe amount of time considering it is an internal
leak and will not appear in your favorite java profiler either.

Bottom line is that even if you're 100% sure of your code which is quite
unlikely (at least for me as far as I'm concerned ), you don't know
third-party code. So without being totally paranoid, this is something
that cannot be ignored.

-- stephane

Reply | Threaded
Open this post in threaded view
|

Re: Running tasks in the TaskTracker VM

Sylvain Wallez
Stephane Bailliez wrote:
> Torsten Curdt wrote:
>>
>>> Being a complete idiot for distributed computing, I would say it is
>>> easy to explode a JVM when doing such distributed jobs, (should it
>>> be for OOM or anything).
>>
>> Then restrict what people can do - at least Google went that route.
>
> I don't know what Google did on the specifics :)

They came up with their own language for mapreduce jobs:
http://labs.google.com/papers/sawzall.html

> If you want to do that with Java and restrict memory usage, cpu usage
> and descriptor access within each inVM instance. That's a considerable
> amount of work that likely implies writing a specific agent for the vm
> (or an agent for a specific vm that is, because it's pretty unlikely
> that you will get the same results across vms), assuming that can then
> really be done at the classloader level for each task (which is pretty
> insanely complex to me if you have to consider allocation done at the
> parent classloader level, etc..)
>
> At least by forking a vm you can afford to get some reasonably bound
> control over the resources usage (or at least memory) without bringing
> down everything since a vm is already bound to some degrees.
>
>
>>> Failing jobs are not exactly uncommon and running things in a
>>> sandboxed environment with less risk for the tracker seems like a
>>> perfectly reasonable choice. So yeah, vm pooling certainly makes
>>> perfect sense for it
>>
>> I am still not convinced - sorry
>>
>> It's a bit like you would like to run JSPs in a separate JVM because
>> they might take down the servlet container.
>
> it is  a bit too extreme in granularity. I think it is more about like
> running n different webapps within the same VM or not. So if one
> webapp is resource hog, separating it would not harm the n-1 other
> applications and you would either create another server instance or
> move it away to another node.
>
> I know of environment with large number of nodes (not related to
> hadoop) where they also reboot a set of nodes daily to ensure that all
> machines are really in working conditions (it's usually when the
> machine reboots due to failure or whatever that someone has to rush to
> it because some service forgot to be registered or things like that,
> so doing this periodic check gives some people better ideas of their
> response time to failure). That depends of operational procedures for
> sure.

This can be another implementation of the TaskTracker: a single JVM that
forks a "replacement JVM" after either a given time or a given amount of
tasks executed. This can avoid JVM fork overhead while also avoiding
memory leak problems.

The forked JVM could even be pre-forked and monitor the active one,
taking over if it no more responds (and eventually killing it).

Sylvain

--
Sylvain Wallez - http://bluxte.net