What is the class that launches the reducers?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

What is the class that launches the reducers?

xeonmailinglist-gmail
I am trying to implement a mechanism in MapReduce v2 that allows to suspend
and resume a job. I must suspend a job when all the mappers finish, and
resume the job from that point after some time. I do this, because I want
to verify the integrity of the map output before executing the reducers.

I am looking for the class that tells when the Reduce tasks should start.
Does anyone know where is this?
Reply | Threaded
Open this post in threaded view
|

Re: What is the class that launches the reducers?

xeonmailinglist-gmail
I am using Mapreduce v2.

On Aug 25, 2016 8:18 PM, "xeon Mailinglist" <[hidden email]>
wrote:

> I am trying to implement a mechanism in MapReduce v2 that allows to
> suspend and resume a job. I must suspend a job when all the mappers finish,
> and resume the job from that point after some time. I do this, because I
> want to verify the integrity of the map output before executing the
> reducers.
>
> I am looking for the class that tells when the Reduce tasks should start.
> Does anyone know where is this?
>
Reply | Threaded
Open this post in threaded view
|

Re: What is the class that launches the reducers?

Haibo Chen
One thing you can try is to write a map-only job first and then verify the
map out.

On Thu, Aug 25, 2016 at 1:18 PM, xeon Mailinglist <[hidden email]
> wrote:

> I am using Mapreduce v2.
>
> On Aug 25, 2016 8:18 PM, "xeon Mailinglist" <[hidden email]>
> wrote:
>
> > I am trying to implement a mechanism in MapReduce v2 that allows to
> > suspend and resume a job. I must suspend a job when all the mappers
> finish,
> > and resume the job from that point after some time. I do this, because I
> > want to verify the integrity of the map output before executing the
> > reducers.
> >
> > I am looking for the class that tells when the Reduce tasks should start.
> > Does anyone know where is this?
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: What is the class that launches the reducers?

xeonmailinglist-gmail
But then I need to set identity maps to run the reducers. If I suspend a
job after the maps finish, I don't need to set identity maps up. I want to
suspend a job so that I don't run identity maps and get better performance.

On Aug 25, 2016 10:12 PM, "Haibo Chen" <[hidden email]> wrote:

One thing you can try is to write a map-only job first and then verify the
map out.

On Thu, Aug 25, 2016 at 1:18 PM, xeon Mailinglist <[hidden email]
> wrote:

> I am using Mapreduce v2.
>
> On Aug 25, 2016 8:18 PM, "xeon Mailinglist" <[hidden email]>
> wrote:
>
> > I am trying to implement a mechanism in MapReduce v2 that allows to
> > suspend and resume a job. I must suspend a job when all the mappers
> finish,
> > and resume the job from that point after some time. I do this, because I
> > want to verify the integrity of the map output before executing the
> > reducers.
> >
> > I am looking for the class that tells when the Reduce tasks should start.
> > Does anyone know where is this?
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: What is the class that launches the reducers?

Haibo Chen
Mappers and reducers that are all scheduled when the job starts running.
Don't think there is a knob to suspend the job once all mappers finish.

On Thu, Aug 25, 2016 at 4:10 PM, xeon Mailinglist <[hidden email]
> wrote:

> But then I need to set identity maps to run the reducers. If I suspend a
> job after the maps finish, I don't need to set identity maps up. I want to
> suspend a job so that I don't run identity maps and get better performance.
>
> On Aug 25, 2016 10:12 PM, "Haibo Chen" <[hidden email]> wrote:
>
> One thing you can try is to write a map-only job first and then verify the
> map out.
>
> On Thu, Aug 25, 2016 at 1:18 PM, xeon Mailinglist <
> [hidden email]> wrote:
>
>> I am using Mapreduce v2.
>>
>> On Aug 25, 2016 8:18 PM, "xeon Mailinglist" <[hidden email]>
>> wrote:
>>
>> > I am trying to implement a mechanism in MapReduce v2 that allows to
>> > suspend and resume a job. I must suspend a job when all the mappers
>> finish,
>> > and resume the job from that point after some time. I do this, because I
>> > want to verify the integrity of the map output before executing the
>> > reducers.
>> >
>> > I am looking for the class that tells when the Reduce tasks should
>> start.
>> > Does anyone know where is this?
>> >
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: What is the class that launches the reducers?

Daniel Templeton
In reply to this post by xeonmailinglist-gmail
How are you intending to verify the map output?  It's only partially dumped
to disk.  None of the intermediate data goes into HDFS.

Daniel

On Aug 25, 2016 4:10 PM, "xeon Mailinglist" <[hidden email]>
wrote:

> But then I need to set identity maps to run the reducers. If I suspend a
> job after the maps finish, I don't need to set identity maps up. I want to
> suspend a job so that I don't run identity maps and get better performance.
>
> On Aug 25, 2016 10:12 PM, "Haibo Chen" <[hidden email]> wrote:
>
> One thing you can try is to write a map-only job first and then verify the
> map out.
>
> On Thu, Aug 25, 2016 at 1:18 PM, xeon Mailinglist <
> [hidden email]
> > wrote:
>
> > I am using Mapreduce v2.
> >
> > On Aug 25, 2016 8:18 PM, "xeon Mailinglist" <[hidden email]>
> > wrote:
> >
> > > I am trying to implement a mechanism in MapReduce v2 that allows to
> > > suspend and resume a job. I must suspend a job when all the mappers
> > finish,
> > > and resume the job from that point after some time. I do this, because
> I
> > > want to verify the integrity of the map output before executing the
> > > reducers.
> > >
> > > I am looking for the class that tells when the Reduce tasks should
> start.
> > > Does anyone know where is this?
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: What is the class that launches the reducers?

xeonmailinglist-gmail
Right now the map and reduce task produces digests of the output. This
logic is inside the map and reduce functions. I need to pause the execution
when all maps finish because there will be an external program that is
synchronizing several mapreduce runtimes. When all map tasks finish from
the several jobs, the map output will be verified. Then, this external
program will resume the execution.

I really want to create a knob in mapreduce by modifying the source code,
because with this knob I can exclude the identity maps execution and boost
the performance. I think the devs should create this feature.

Anyway, I am looking in the source code for the part where reduce tasks are
set to launch. Does anyone know which class launches the reduce tasks in
mapreduce v2?

On Aug 26, 2016 02:07, "Daniel Templeton" <[hidden email]> wrote:

> How are you intending to verify the map output?  It's only partially
> dumped to disk.  None of the intermediate data goes into HDFS.
>
> Daniel
>
> On Aug 25, 2016 4:10 PM, "xeon Mailinglist" <[hidden email]>
> wrote:
>
>> But then I need to set identity maps to run the reducers. If I suspend a
>> job after the maps finish, I don't need to set identity maps up. I want to
>> suspend a job so that I don't run identity maps and get better
>> performance.
>>
>> On Aug 25, 2016 10:12 PM, "Haibo Chen" <[hidden email]> wrote:
>>
>> One thing you can try is to write a map-only job first and then verify the
>> map out.
>>
>> On Thu, Aug 25, 2016 at 1:18 PM, xeon Mailinglist <
>> [hidden email]
>> > wrote:
>>
>> > I am using Mapreduce v2.
>> >
>> > On Aug 25, 2016 8:18 PM, "xeon Mailinglist" <[hidden email]>
>> > wrote:
>> >
>> > > I am trying to implement a mechanism in MapReduce v2 that allows to
>> > > suspend and resume a job. I must suspend a job when all the mappers
>> > finish,
>> > > and resume the job from that point after some time. I do this,
>> because I
>> > > want to verify the integrity of the map output before executing the
>> > > reducers.
>> > >
>> > > I am looking for the class that tells when the Reduce tasks should
>> start.
>> > > Does anyone know where is this?
>> > >
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: What is the class that launches the reducers?

Haibo Chen
If you really want to do this, I guess you can move the scheduling of
reducers to MRClientProtocol so that only when MR AM is notified to proceed
by a client, does it continue to run reducers. The scheduling of reducers
is currently done in JobImpl.java at
SetupCompletedTransition.transition() Not sure if this change breaks
something.

On Thu, Aug 25, 2016 at 11:07 PM, xeon Mailinglist <
[hidden email]> wrote:

> Right now the map and reduce task produces digests of the output. This
> logic is inside the map and reduce functions. I need to pause the execution
> when all maps finish because there will be an external program that is
> synchronizing several mapreduce runtimes. When all map tasks finish from
> the several jobs, the map output will be verified. Then, this external
> program will resume the execution.
>
> I really want to create a knob in mapreduce by modifying the source code,
> because with this knob I can exclude the identity maps execution and boost
> the performance. I think the devs should create this feature.
>
> Anyway, I am looking in the source code for the part where reduce tasks are
> set to launch. Does anyone know which class launches the reduce tasks in
> mapreduce v2?
>
> On Aug 26, 2016 02:07, "Daniel Templeton" <[hidden email]> wrote:
>
> > How are you intending to verify the map output?  It's only partially
> > dumped to disk.  None of the intermediate data goes into HDFS.
> >
> > Daniel
> >
> > On Aug 25, 2016 4:10 PM, "xeon Mailinglist" <[hidden email]>
> > wrote:
> >
> >> But then I need to set identity maps to run the reducers. If I suspend a
> >> job after the maps finish, I don't need to set identity maps up. I want
> to
> >> suspend a job so that I don't run identity maps and get better
> >> performance.
> >>
> >> On Aug 25, 2016 10:12 PM, "Haibo Chen" <[hidden email]> wrote:
> >>
> >> One thing you can try is to write a map-only job first and then verify
> the
> >> map out.
> >>
> >> On Thu, Aug 25, 2016 at 1:18 PM, xeon Mailinglist <
> >> [hidden email]
> >> > wrote:
> >>
> >> > I am using Mapreduce v2.
> >> >
> >> > On Aug 25, 2016 8:18 PM, "xeon Mailinglist" <
> [hidden email]>
> >> > wrote:
> >> >
> >> > > I am trying to implement a mechanism in MapReduce v2 that allows to
> >> > > suspend and resume a job. I must suspend a job when all the mappers
> >> > finish,
> >> > > and resume the job from that point after some time. I do this,
> >> because I
> >> > > want to verify the integrity of the map output before executing the
> >> > > reducers.
> >> > >
> >> > > I am looking for the class that tells when the Reduce tasks should
> >> start.
> >> > > Does anyone know where is this?
> >> > >
> >> >
> >>
> >
>