starting merges before shuffle completion

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

starting merges before shuffle completion

Joydeep Sen Sarma
Hi folks,

 

I searched around JIRA and didn't find anything that resembled this. Is
this something on the roadmap?

 

For normal aggregations, this is never an issue. But in some cases
(typically joins) - map phase can emit lot of data and take quite a bit
of time doing it. Meanwhile the reducers seem to sit around copying data
slowly where they could be merging the map-outputs that are already
copied over.

 

Curious whether I have an outlier application or is this generally
useful/doable ..

 

Thx,

 

Joydeep

 

Reply | Threaded
Open this post in threaded view
|

Re: starting merges before shuffle completion

Sameer Paranjpye
The reduce phase does do merges as it's shuffling. It does a round of
in-memory merges because individual map outputs tend to be small enough
that several of them can be kept in RAM (if they're too large they're
spilt to disk). The results of the in-memory merges are spilt to disk
and merged in their turn. The fan-in to the merge is configurable and
determines how many merges happen.

This is how it *ought* to work. Have you observed anything different? We
may have a bug or 3 to fix here.


Joydeep Sen Sarma wrote:

> Hi folks,
>
>  
>
> I searched around JIRA and didn't find anything that resembled this. Is
> this something on the roadmap?
>
>  
>
> For normal aggregations, this is never an issue. But in some cases
> (typically joins) - map phase can emit lot of data and take quite a bit
> of time doing it. Meanwhile the reducers seem to sit around copying data
> slowly where they could be merging the map-outputs that are already
> copied over.
>
>  
>
> Curious whether I have an outlier application or is this generally
> useful/doable ..
>
>  
>
> Thx,
>
>  
>
> Joydeep
>
>  
>
>

Reply | Threaded
Open this post in threaded view
|

Re: starting merges before shuffle completion

Sameer Paranjpye
Digging some more, it looks like we do the in RAM merges, but don't do
any merges with the data on disk until the map phase finishes.


Sameer Paranjpye wrote:

> The reduce phase does do merges as it's shuffling. It does a round of
> in-memory merges because individual map outputs tend to be small enough
> that several of them can be kept in RAM (if they're too large they're
> spilt to disk). The results of the in-memory merges are spilt to disk
> and merged in their turn. The fan-in to the merge is configurable and
> determines how many merges happen.
>
> This is how it *ought* to work. Have you observed anything different? We
> may have a bug or 3 to fix here.
>
>
> Joydeep Sen Sarma wrote:
>> Hi folks,
>>
>>  
>>
>> I searched around JIRA and didn't find anything that resembled this. Is
>> this something on the roadmap?
>>
>>  
>>
>> For normal aggregations, this is never an issue. But in some cases
>> (typically joins) - map phase can emit lot of data and take quite a bit
>> of time doing it. Meanwhile the reducers seem to sit around copying data
>> slowly where they could be merging the map-outputs that are already
>> copied over.
>>  
>>
>> Curious whether I have an outlier application or is this generally
>> useful/doable ..
>>
>>  
>>
>> Thx,
>>
>>  
>>
>> Joydeep
>>
>>  
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

RE: starting merges before shuffle completion

Joydeep Sen Sarma
in this case the map data is large enough that in-memory merges proably had no effect (but thanks for pointing that out). (the map.out files were about 256-512MB in size - block compressed sequencefiles).

if we could initiate the on-disk merges - that would be awesome.

i am curious whether people think this will help the sort benchmark as well?


-----Original Message-----
From: Sameer Paranjpye [mailto:[hidden email]]
Sent: Tue 11/20/2007 2:12 PM
To: [hidden email]
Subject: Re: starting merges before shuffle completion
 
Digging some more, it looks like we do the in RAM merges, but don't do
any merges with the data on disk until the map phase finishes.


Sameer Paranjpye wrote:

> The reduce phase does do merges as it's shuffling. It does a round of
> in-memory merges because individual map outputs tend to be small enough
> that several of them can be kept in RAM (if they're too large they're
> spilt to disk). The results of the in-memory merges are spilt to disk
> and merged in their turn. The fan-in to the merge is configurable and
> determines how many merges happen.
>
> This is how it *ought* to work. Have you observed anything different? We
> may have a bug or 3 to fix here.
>
>
> Joydeep Sen Sarma wrote:
>> Hi folks,
>>
>>  
>>
>> I searched around JIRA and didn't find anything that resembled this. Is
>> this something on the roadmap?
>>
>>  
>>
>> For normal aggregations, this is never an issue. But in some cases
>> (typically joins) - map phase can emit lot of data and take quite a bit
>> of time doing it. Meanwhile the reducers seem to sit around copying data
>> slowly where they could be merging the map-outputs that are already
>> copied over.
>>  
>>
>> Curious whether I have an outlier application or is this generally
>> useful/doable ..
>>
>>  
>>
>> Thx,
>>
>>  
>>
>> Joydeep
>>
>>  
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

RE: starting merges before shuffle completion

Devaraj Das
There is an issue open for this one:
https://issues.apache.org/jira/browse/HADOOP-910 . We never got to benchmark
this.
Whether or not sort will benefit depends on the the number of reducers
configured for the sort job (assuming that hadoop-910 benefits in general).
The lesser the number of reducers configured for the job, the lesser the
probability of fitting the output from a map in the reducer's ramfs. We have
configs (#reducers, ramfs size, etc.) that will ensure most (~95%) of the
map outputs end up in the ramfs.
The thing to watch out for is the disk contention in cases where we have a
couple (2-3) of maps running in parallel generating huge outputs and a
couple (2) of reducers on the same node doing merges for the on-disk map
outputs.

> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:[hidden email]]
> Sent: Thursday, November 22, 2007 1:40 AM
> To: [hidden email]
> Subject: RE: starting merges before shuffle completion
>
> in this case the map data is large enough that in-memory
> merges proably had no effect (but thanks for pointing that
> out). (the map.out files were about 256-512MB in size - block
> compressed sequencefiles).
>
> if we could initiate the on-disk merges - that would be awesome.
>
> i am curious whether people think this will help the sort
> benchmark as well?
>
>
> -----Original Message-----
> From: Sameer Paranjpye [mailto:[hidden email]]
> Sent: Tue 11/20/2007 2:12 PM
> To: [hidden email]
> Subject: Re: starting merges before shuffle completion
>  
> Digging some more, it looks like we do the in RAM merges, but
> don't do any merges with the data on disk until the map phase
> finishes.
>
>
> Sameer Paranjpye wrote:
> > The reduce phase does do merges as it's shuffling. It does
> a round of
> > in-memory merges because individual map outputs tend to be
> small enough
> > that several of them can be kept in RAM (if they're too
> large they're
> > spilt to disk). The results of the in-memory merges are
> spilt to disk
> > and merged in their turn. The fan-in to the merge is
> configurable and
> > determines how many merges happen.
> >
> > This is how it *ought* to work. Have you observed anything
> different? We
> > may have a bug or 3 to fix here.
> >
> >
> > Joydeep Sen Sarma wrote:
> >> Hi folks,
> >>
> >>  
> >>
> >> I searched around JIRA and didn't find anything that
> resembled this. Is
> >> this something on the roadmap?
> >>
> >>  
> >>
> >> For normal aggregations, this is never an issue. But in some cases
> >> (typically joins) - map phase can emit lot of data and
> take quite a bit
> >> of time doing it. Meanwhile the reducers seem to sit
> around copying data
> >> slowly where they could be merging the map-outputs that are already
> >> copied over.
> >>  
> >>
> >> Curious whether I have an outlier application or is this generally
> >> useful/doable ..
> >>
> >>  
> >>
> >> Thx,
> >>
> >>  
> >>
> >> Joydeep
> >>
> >>  
> >>
> >>
> >
>
>
>
>