is a monolithic reduce task the right model?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

is a monolithic reduce task the right model?

Joydeep Sen Sarma
in thinking about Aaron's use case and our own problems with fair sharing of hadoop cluster, one of the things that was obvious was that reduces are a stumbling block for fair sharing. It's easy to imagine a fair scheduling algorithm doing good job of scheduling small map tasks. but the reduces are a problem. they are too big and once scheduled last forever.

another obvious thing is that reduce failures are expensive. all the map outputs need to be refetched and merged again. whereas, in many cases, the failure is in the reduction logic. tying two and two together:

- what if current reduce tasks were broken into separate copy, sort and reduce tasks?

we would get much smaller units of recovery and scheduling.

thoughts?

Joydeep
Reply | Threaded
Open this post in threaded view
|

Re: is a monolithic reduce task the right model?

Doug Cutting
Joydeep Sen Sarma wrote:
> - what if current reduce tasks were broken into separate copy, sort and reduce tasks?
>
> we would get much smaller units of recovery and scheduling.
>
> thoughts?

If copy, sort and reduce are not scheduled together then it would be
very hard to ensure they run on the same node, and if they do not all
run on the same node then we'd have to move their data around, which
would substantially affect throughput, not to mention adding another
copy phase...

Please see https://issues.apache.org/jira/browse/HADOOP-2573 for another
proposed solution to this.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: is a monolithic reduce task the right model?

Ted Dunning-3


Actually, all of my jobs tend to have one of these phases dominate the time.
It isn't always the same phase that dominates, though, so the consideration
isn't simple.

The fact (if it is a fact) that one phase or another dominates means,
however, that splitting them won't help much.


On 1/10/08 9:55 AM, "Doug Cutting" <[hidden email]> wrote:

> Joydeep Sen Sarma wrote:
>> - what if current reduce tasks were broken into separate copy, sort and
>> reduce tasks?
>>
>> we would get much smaller units of recovery and scheduling.
>>
>> thoughts?
>
> If copy, sort and reduce are not scheduled together then it would be
> very hard to ensure they run on the same node, and if they do not all
> run on the same node then we'd have to move their data around, which
> would substantially affect throughput, not to mention adding another
> copy phase...
>
> Please see https://issues.apache.org/jira/browse/HADOOP-2573 for another
> proposed solution to this.
>
> Doug

Reply | Threaded
Open this post in threaded view
|

RE: is a monolithic reduce task the right model?

Devaraj Das
In reply to this post by Joydeep Sen Sarma
By the way, I had created https://issues.apache.org/jira/browse/HADOOP-2568
sometime back. The proposal is basically to have one shuffle task per job
per node and assign reduces with consecutive taskIDs to a particular node.
The shuffle task would fetch multiple consecutive outputs in one go from any
map task node. This will reduce the number of seeks into the map output
files by a factor #maps * #consecutive-reduces for any mapnode-reducenode
pair, and should generally improve the usage of system resources (for e.g.,
fewer number of socket connections for transferring files, and, improved
disk usage).

> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:[hidden email]]
> Sent: Thursday, January 10, 2008 11:16 PM
> To: [hidden email]
> Subject: is a monolithic reduce task the right model?
>
> in thinking about Aaron's use case and our own problems with
> fair sharing of hadoop cluster, one of the things that was
> obvious was that reduces are a stumbling block for fair
> sharing. It's easy to imagine a fair scheduling algorithm
> doing good job of scheduling small map tasks. but the reduces
> are a problem. they are too big and once scheduled last forever.
>
> another obvious thing is that reduce failures are expensive.
> all the map outputs need to be refetched and merged again.
> whereas, in many cases, the failure is in the reduction
> logic. tying two and two together:
>
> - what if current reduce tasks were broken into separate
> copy, sort and reduce tasks?
>
> we would get much smaller units of recovery and scheduling.
>
> thoughts?
>
> Joydeep
>