Combiner function

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Combiner function

Jackob Carlsson
Hi everyone,
Could anyone please help me to understand the function of combiner?

Thanks in advance
Jackob
Reply | Threaded
Open this post in threaded view
|

Re: Combiner function

Jones, Nick
Hi Jackob,
A combiner acts a lot like a reduce step but it's executed on the mapper
with in-memory data.  I've seen a reduction in job execution time by
adding one.  The one caveat to keep in mind is that it may or may not
run on a particular map attempt.

Nick


On 8/2/2010 10:39 AM, Jackob Carlsson wrote:
> Hi everyone,
> Could anyone please help me to understand the function of combiner?
>
> Thanks in advance
> Jackob
>    

Reply | Threaded
Open this post in threaded view
|

Re: Combiner function

zaki rahaman
In reply to this post by Jackob Carlsson
From the Wiki: http://wiki.apache.org/hadoop/HadoopMapReduce

In simple cases, your combiner may simply be your reduce function/code
applied to your map output before it's shuffled, sorted, and available for
reduce tasks. (This is often the case with counting/simple aggregation).

On Mon, Aug 2, 2010 at 11:39 AM, Jackob Carlsson
<[hidden email]>wrote:

> Hi everyone,
> Could anyone please help me to understand the function of combiner?
>
> Thanks in advance
> Jackob
>



--
Zaki Rahaman
Reply | Threaded
Open this post in threaded view
|

Re: Combiner function

Harsh J
In reply to this post by Jackob Carlsson
As others have pointed out, its mostly applied as an optimization
step. In most cases one's 'Mapper' outputs carry at least a small
group of similar keys that go on to the reducer after a copy and a
sort phase. To reduce it locally (in-memory) via a 'Combiner' helps
reduce data in the copy-sort stages until the 'Reducer' operation
kicks-in.

Do note that, implementation-wise, a 'combiner' class must always
collect the same key-value pair types as the mapper function.

On Mon, Aug 2, 2010 at 9:09 PM, Jackob Carlsson
<[hidden email]> wrote:
> Hi everyone,
> Could anyone please help me to understand the function of combiner?
>
> Thanks in advance
> Jackob
>



--
Harsh J
www.harshj.com
Reply | Threaded
Open this post in threaded view
|

Re: Combiner function

Jackob Carlsson
In reply to this post by Jones, Nick
Thanks Nick, but "in-memory" means a combiner can only be used over a single
mapper?right?! Is there a way we use it for several mappers as well? Also
what do you mean by "it may or may not run on a particular map attempt"?

Br,
Jackob

On Mon, Aug 2, 2010 at 5:43 PM, Nick Jones <[hidden email]> wrote:

> Hi Jackob,
> A combiner acts a lot like a reduce step but it's executed on the mapper
> with in-memory data.  I've seen a reduction in job execution time by adding
> one.  The one caveat to keep in mind is that it may or may not run on a
> particular map attempt.
>
> Nick
>
>
>
> On 8/2/2010 10:39 AM, Jackob Carlsson wrote:
>
>> Hi everyone,
>> Could anyone please help me to understand the function of combiner?
>>
>> Thanks in advance
>> Jackob
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Combiner function

Edward Capriolo
On Mon, Aug 2, 2010 at 4:28 PM, Jackob Carlsson
<[hidden email]> wrote:

> Thanks Nick, but "in-memory" means a combiner can only be used over a single
> mapper?right?! Is there a way we use it for several mappers as well? Also
> what do you mean by "it may or may not run on a particular map attempt"?
>
> Br,
> Jackob
>
> On Mon, Aug 2, 2010 at 5:43 PM, Nick Jones <[hidden email]> wrote:
>
>> Hi Jackob,
>> A combiner acts a lot like a reduce step but it's executed on the mapper
>> with in-memory data.  I've seen a reduction in job execution time by adding
>> one.  The one caveat to keep in mind is that it may or may not run on a
>> particular map attempt.
>>
>> Nick
>>
>>
>>
>> On 8/2/2010 10:39 AM, Jackob Carlsson wrote:
>>
>>> Hi everyone,
>>> Could anyone please help me to understand the function of combiner?
>>>
>>> Thanks in advance
>>> Jackob
>>>
>>>
>>
>>
>

> Is there a way we use it for several mappers as well?
No. That is the exact opposite goal of the combiner. It runs locally.
>it may or may not run on a particular map attempt
It only runs when certain thresholds in the framework are reached.

http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/
Reply | Threaded
Open this post in threaded view
|

Re: Combiner function

Jackob Carlsson
Thanks Edward.

> Is there a way we use it for several mappers as well?

> No. That is the exact opposite goal of the combiner. It runs locally.


OK, lets say a stupid scenario, when for instance one mapper is late to
produce the results and it cause a waiting for a reducer task. Then, how to
optimize this case?



> >it may or may not run on a particular map attempt
> It only runs when certain thresholds in the framework are reached.
>
> http://philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/
>


What are these thresholds that may or may not run on a particular map
attempt?