How can the reducer be invoked lazily?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How can the reducer be invoked lazily?

Rui Shi
Hi,

How can we specify so that the reducers can be invoked lazily? For instance, I know there are no partitions in the range of 200-300. How can I let the hadoop know that no need to invoke reduce tasks for those partitions?

Thanks,

Rui



      ____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 
Reply | Threaded
Open this post in threaded view
|

RE: How can the reducer be invoked lazily?

Devaraj Das
This is not possible. The framework always creates reduce tasks from 0 -
num_reduces.

> -----Original Message-----
> From: Rui Shi [mailto:[hidden email]]
> Sent: Saturday, December 15, 2007 7:34 AM
> To: [hidden email]
> Subject: How can the reducer be invoked lazily?
>
> Hi,
>
> How can we specify so that the reducers can be invoked
> lazily? For instance, I know there are no partitions in the
> range of 200-300. How can I let the hadoop know that no need
> to invoke reduce tasks for those partitions?
>
> Thanks,
>
> Rui
>
>
>
>      
> ______________________________________________________________
> ______________________
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile.  Try it now.  
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 
>

Reply | Threaded
Open this post in threaded view
|

Re: How can the reducer be invoked lazily?

Ted Dunning-3

Devaraj is correct that there is no mechanism to create reduce tasks only as
necessary, but remember that each reducer does many reductions.  This means
that empty ranges rarely have a large, unbalanced effect.

If this is still a problem you can do two things,

- first, you can use the hash of the real key (put the real key in the
value).  That will cause empty ranges to be spread all over hither and yon,
giving you the balance you seek (this behavior may actually be the default).

- secondly, you can use lots of reducers.  If the number of reducers is
large, then the lost resources due to empty ranges will be small since each
reducer is doing very little work.  If the number of reducers exceeds the
number of available tasks, then you get even better balancing because
machines that do empty ranges (quickly) will ask more more work.

- conversely, you can use just a few reducers.  This way the empty ranges
will only be a small part of any given reducers workload.

Do you have evidence that this is a real problem?


On 12/16/07 4:31 AM, "Devaraj Das" <[hidden email]> wrote:

> This is not possible. The framework always creates reduce tasks from 0 -
> num_reduces.
>
>> -----Original Message-----
>> From: Rui Shi [mailto:[hidden email]]
>> Sent: Saturday, December 15, 2007 7:34 AM
>> To: [hidden email]
>> Subject: How can the reducer be invoked lazily?
>>
>> Hi,
>>
>> How can we specify so that the reducers can be invoked
>> lazily? For instance, I know there are no partitions in the
>> range of 200-300. How can I let the hadoop know that no need
>> to invoke reduce tasks for those partitions?
>>
>> Thanks,
>>
>> Rui
>>
>>
>>
>>      
>> ______________________________________________________________
>> ______________________
>> Be a better friend, newshound, and
>> know-it-all with Yahoo! Mobile.  Try it now.
>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: How can the reducer be invoked lazily?

Rui Shi
In reply to this post by Rui Shi
Hi,

This should not be a real problem and can be remedied by ways you proposed. Just curious whether Hadoop can support this. I have a case that I hash all invalid records to a partition with a large number and found that many empty reducers have to be invoked.

Thanks,

Rui

----- Original Message ----
From: Ted Dunning <[hidden email]>
To: [hidden email]
Sent: Sunday, December 16, 2007 1:55:02 PM
Subject: Re: How can the reducer be invoked lazily?



Devaraj is correct that there is no mechanism to create reduce tasks
 only as
necessary, but remember that each reducer does many reductions.  This
 means
that empty ranges rarely have a large, unbalanced effect.

If this is still a problem you can do two things,

- first, you can use the hash of the real key (put the real key in the
value).  That will cause empty ranges to be spread all over hither and
 yon,
giving you the balance you seek (this behavior may actually be the
 default).

- secondly, you can use lots of reducers.  If the number of reducers is
large, then the lost resources due to empty ranges will be small since
 each
reducer is doing very little work.  If the number of reducers exceeds
 the
number of available tasks, then you get even better balancing because
machines that do empty ranges (quickly) will ask more more work.

- conversely, you can use just a few reducers.  This way the empty
 ranges
will only be a small part of any given reducers workload.

Do you have evidence that this is a real problem?


On 12/16/07 4:31 AM, "Devaraj Das" <[hidden email]> wrote:

> This is not possible. The framework always creates reduce tasks from
 0 -

> num_reduces.
>
>> -----Original Message-----
>> From: Rui Shi [mailto:[hidden email]]
>> Sent: Saturday, December 15, 2007 7:34 AM
>> To: [hidden email]
>> Subject: How can the reducer be invoked lazily?
>>
>> Hi,
>>
>> How can we specify so that the reducers can be invoked
>> lazily? For instance, I know there are no partitions in the
>> range of 200-300. How can I let the hadoop know that no need
>> to invoke reduce tasks for those partitions?
>>
>> Thanks,
>>
>> Rui
>>
>>
>>
>>      
>> ______________________________________________________________
>> ______________________
>> Be a better friend, newshound, and
>> know-it-all with Yahoo! Mobile.  Try it now.
>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>>
>







      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping