[bug] combiner class never used

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[bug] combiner class never used

Stefan Groschupf-2
Hi,
just to add this to the archive. I note that the combiner class as it  
is settedin the regex demo, is never used in the reducer task.
Also only the mapper task tests if there is a combiner class setted  
to use the combining collector for map tasks.

So we may can remove getter and setter for combinder class in the job  
Conf or should add the usage of the combiner class.:)

Stefan
Reply | Threaded
Open this post in threaded view
|

Re: [bug] combiner class never used

Bryan A. P. Pendleton
I'm pretty new to this stuff, too, but I can't see why the current use of
the combiner is a problem. The combiner class is basically there as an
option to reduce the amount of intermediate output that gets written into
NDFS before the reduce is started - there's no reason to call it from the
reducer too. Generally, it probably even does the same logic - I've been
building testing scenarios that work exactly that way, using the same class
for reduce as combine.

On 1/29/06, Stefan Groschupf <[hidden email]> wrote:

>
> Hi,
> just to add this to the archive. I note that the combiner class as it
> is settedin the regex demo, is never used in the reducer task.
> Also only the mapper task tests if there is a combiner class setted
> to use the combining collector for map tasks.
>
> So we may can remove getter and setter for combinder class in the job
> Conf or should add the usage of the combiner class.:)
>
> Stefan
>

--
Bryan A. Pendleton
Ph: (877) geek-1-bp
Reply | Threaded
Open this post in threaded view
|

Re: [bug] combiner class never used

Stefan Groschupf-2
I agree.
But in general we can remove the possibility to configure combiner  
classes in the jobconf in case custom combiner classes are never used.
It is just about cleaning up the API.
Stefan

Am 30.01.2006 um 20:51 schrieb Bryan A. Pendleton:

> I'm pretty new to this stuff, too, but I can't see why the current  
> use of
> the combiner is a problem. The combiner class is basically there as an
> option to reduce the amount of intermediate output that gets  
> written into
> NDFS before the reduce is started - there's no reason to call it  
> from the
> reducer too. Generally, it probably even does the same logic - I've  
> been
> building testing scenarios that work exactly that way, using the  
> same class
> for reduce as combine.
>
> On 1/29/06, Stefan Groschupf <[hidden email]> wrote:
>>
>> Hi,
>> just to add this to the archive. I note that the combiner class as it
>> is settedin the regex demo, is never used in the reducer task.
>> Also only the mapper task tests if there is a combiner class setted
>> to use the combining collector for map tasks.
>>
>> So we may can remove getter and setter for combinder class in the job
>> Conf or should add the usage of the combiner class.:)
>>
>> Stefan
>>
>
> --
> Bryan A. Pendleton
> Ph: (877) geek-1-bp

Reply | Threaded
Open this post in threaded view
|

Re: [bug] combiner class never used

Bryan A. P. Pendleton
Remove the combiner configuration, and just use the configured Reducer as a
combiner? I think the functionality is too valuable to entirely remove, if
that's what you're suggesting. Perhaps then it would be worth exposing some
property available to the Reducer.configure call which would specify which
kind of behavior (local "combine", or distributed "reduce").

On 1/30/06, Stefan Groschupf <[hidden email]> wrote:

>
> I agree.
> But in general we can remove the possibility to configure combiner
> classes in the jobconf in case custom combiner classes are never used.
> It is just about cleaning up the API.
> Stefan
>
> Am 30.01.2006 um 20:51 schrieb Bryan A. Pendleton:
>
> > I'm pretty new to this stuff, too, but I can't see why the current
> > use of
> > the combiner is a problem. The combiner class is basically there as an
> > option to reduce the amount of intermediate output that gets
> > written into
> > NDFS before the reduce is started - there's no reason to call it
> > from the
> > reducer too. Generally, it probably even does the same logic - I've
> > been
> > building testing scenarios that work exactly that way, using the
> > same class
> > for reduce as combine.
> >
> > On 1/29/06, Stefan Groschupf <[hidden email]> wrote:
> >>
> >> Hi,
> >> just to add this to the archive. I note that the combiner class as it
> >> is settedin the regex demo, is never used in the reducer task.
> >> Also only the mapper task tests if there is a combiner class setted
> >> to use the combining collector for map tasks.
> >>
> >> So we may can remove getter and setter for combinder class in the job
> >> Conf or should add the usage of the combiner class.:)
> >>
> >> Stefan
> >>
> >
> > --
> > Bryan A. Pendleton
> > Ph: (877) geek-1-bp
>
>


--
Bryan A. Pendleton
Ph: (877) geek-1-bp
Reply | Threaded
Open this post in threaded view
|

Re: [bug] combiner class never used

Andrew McNabb
On Mon, Jan 30, 2006 at 12:53:30PM -0800, Bryan A. Pendleton wrote:
> Remove the combiner configuration, and just use the configured Reducer as a
> combiner? I think the functionality is too valuable to entirely remove, if
> that's what you're suggesting. Perhaps then it would be worth exposing some
> property available to the Reducer.configure call which would specify which
> kind of behavior (local "combine", or distributed "reduce").
>

Not all Reducers can necessarily be run on the same data several times.
Having them as two separate options (as it is currently) is probably the
safest way, in my personal opinion.

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

attachment0 (193 bytes) Download Attachment