[VOTE] Merge YARN-3926 (resource profile) to trunk

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[VOTE] Merge YARN-3926 (resource profile) to trunk

Wangda Tan
 Hi folks,

Per earlier discussion [1], I'd like to start a formal vote to merge
feature branch YARN-3926 (Resource profile) to trunk. The vote will run for
7 days and will end August 30 10:00 AM PDT.

Briefly, YARN-3926 can extend resource model of YARN to support resource
types other than CPU and memory, so it will be a cornerstone of features
like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139), FPGA
support (YARN-5983), network IO scheduling/isolation (YARN-2140). In
addition to that, YARN-3926 allows admin to preconfigure resource profiles
in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64 GB
disk>, so applications can request "m3.large" profile instead of specifying
all resource types’s values.

There are 32 subtasks that were completed as part of this effort.

This feature needs to be explicitly turned on before use. We paid close
attention to compatibility, performance, and scalability of this feature,
mentioned in [1], we didn't see observable performance regression in large
scale SLS (scheduler load simulator) executions and saw less than 5%
performance regression by using micro benchmark added by YARN-6775.

This feature works from end-to-end (including UI/CLI/application/server),
we have setup a cluster with this feature turned on runs for several weeks,
we didn't see any issues by far.

Merge JIRA: YARN-7013 (Jenkins gave +1 already).
Documentation: YARN-7056

Special thanks to a team of folks who worked hard and contributed towards
this effort including design discussion/development/reviews, etc.: Varun
Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu,
Karthik Kambatla, Jason Lowe, Arun Suresh.

Regards,
Wangda Tan

[1]
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4%3DBbp4G8inQZmaMg%40mail.gmail.com%3E
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Merge YARN-3926 (resource profile) to trunk

Sunil G
Thanks Arun for checking the feature.

* can you folks point me to any test application / framework or has this
been integrated with MapReduce
Currently this feature is integrated with Distributed Shell.
Reference: YARN-5588
This is not yet integrated with MapReduce. This work is ongoing
in YARN-6504.

*  Can you maybe comment a bit on the type of scale testing done ?
We have done scale testing by using SLS with this feature turned off and
also turned on with only Memory and VCores. This performance was on par
with trunk with a variance of ~2%. I will let Wangda to add more color here
with data.

*  Is there a plan to merge this with branch-2 ?
We had a discussion with few folks here in Bangalore from MS and Huawei.
And will be looking into same as this branch is merged in trunk.

- Sunil



On Sat, Aug 26, 2017 at 9:47 AM Arun Suresh <[hidden email]> wrote:

> Really looking forward to getting this in.
>
> Couple of questions:
> * Can you maybe comment a bit on the type of scale testing done ?
> Specifically, the number of resources tested with and any point where it is
> discovered that performance might take a hit. Also, given that we do not
> have AM's that currently use this feature, can you folks point me to any
> test application / framework or has this been integrated with MapReduce ?
> * Is there a plan to merge this with branch-2 ? - Since we would like to
> see this in 2.9.0 as well.
>
> Just to clarify, I am a +1 for merging, irrespective of the above - given
> that this is an opt-in feature after all. I am just eager to start using it
> :)
>
> Cheers
> -Arun
>
>
> On Thu, Aug 24, 2017 at 10:54 AM, Sunil G <[hidden email]> wrote:
>
>> Thank you very much Varun Vasudev, Wangda Tan, Daniel and all the folks
>> who
>> helped in getting this feature in this level.
>>
>> Starting with my +1 (binding).
>>
>>
>> # Tested a 5 node cluster with resource profiles enabled/disabled (feature
>> is disabled by default)
>>
>> # All apis added are marked as Unstable/Evolving (very few)
>>
>> # There is no compatibility break with older versions (we have added UT
>> cases also to ensure same)
>>
>> # Performance tests were done using SLS and also with some tight loops
>> unit
>> tests. There is no much regression with current trunk.
>>
>> # Latest jenkins +1 on YARN-7013 for whole branch code.
>>
>> # Verified old RM UI and new YARN UI (newly added resources could be seen
>> easily)
>>
>>
>> Once again thanks all the folks who helped in getting this feature. Kudos!
>>
>>
>> Thanks
>>
>> - Sunil
>>
>>
>> On Thu, Aug 24, 2017 at 12:20 AM Wangda Tan <[hidden email]> wrote:
>>
>> >  Hi folks,
>> >
>> > Per earlier discussion [1], I'd like to start a formal vote to merge
>> > feature branch YARN-3926 (Resource profile) to trunk. The vote will run
>> for
>> > 7 days and will end August 30 10:00 AM PDT.
>> >
>> > Briefly, YARN-3926 can extend resource model of YARN to support resource
>> > types other than CPU and memory, so it will be a cornerstone of features
>> > like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139),
>> FPGA
>> > support (YARN-5983), network IO scheduling/isolation (YARN-2140). In
>> > addition to that, YARN-3926 allows admin to preconfigure resource
>> profiles
>> > in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64
>> GB
>> > disk>, so applications can request "m3.large" profile instead of
>> specifying
>> > all resource types’s values.
>> >
>> > There are 32 subtasks that were completed as part of this effort.
>> >
>> > This feature needs to be explicitly turned on before use. We paid close
>> > attention to compatibility, performance, and scalability of this
>> feature,
>> > mentioned in [1], we didn't see observable performance regression in
>> large
>> > scale SLS (scheduler load simulator) executions and saw less than 5%
>> > performance regression by using micro benchmark added by YARN-6775.
>> >
>> > This feature works from end-to-end (including
>> UI/CLI/application/server),
>> > we have setup a cluster with this feature turned on runs for several
>> weeks,
>> > we didn't see any issues by far.
>> >
>> > Merge JIRA: YARN-7013 (Jenkins gave +1 already).
>> > Documentation: YARN-7056
>> >
>> > Special thanks to a team of folks who worked hard and contributed
>> towards
>> > this effort including design discussion/development/reviews, etc.: Varun
>> > Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu,
>> > Karthik Kambatla, Jason Lowe, Arun Suresh.
>> >
>> > Regards,
>> > Wangda Tan
>> >
>> > [1]
>> >
>> >
>> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4%3DBbp4G8inQZmaMg%40mail.gmail.com%3E
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Merge YARN-3926 (resource profile) to trunk

Wangda Tan
In reply to this post by Wangda Tan
Hi all,

Given we have 3 binding +1s, the vote passes. I just push changes to trunk.
Will update JIRAs accordingly.

Thanks everybody for helping this feature and voting!

Best,
Wangda


On Sat, Aug 26, 2017 at 8:58 AM, Sunil G <[hidden email]> wrote:

> Hi Daniel
>
> Thank you very much for the support.
>
> * When you say that the feature can be turned
> off, do you mean resource types or resource profiles?  I know there's an
> off-by-default property that governs resource profiles, but I didn't see
> any way to turn off resource types.
> Yes,*yarn.resourcemanager.resource-profiles.enabled* is false by default
> and controls off/on of this feature. Now regarding new resource types, its
> been loaded from "*resource-types.xml"* and by default this XML file is not
> available in the package. Thus prevents any issues in default case. Once
> this file is added to a cluster then new resources will be loaded from
> same.
>
> * Even if only CPU and memory are configured, i.e. no additional resource
> types, the code path is different than it was.
> Earlier primitive data types were used to represent vcores and memory. As
> per resource profile work, all resources under YARN is categorized as
> ResourceInformation and placed under existing Resource object. So memory
> and vcores will be accessible and operable with same set of public apis
> from Resources or ResourceCalculator (DRC) same as earlier even when
> feature is off (Code path is same, but improved to support a unified
> ResourceInformation class instead of memory/vcores primitive types).
>
> Thanks
> Sunil
>
>
>
>
> On Sat, Aug 26, 2017 at 8:10 PM Daniel Templeton <[hidden email]>
> wrote:
>
> > Quick question, Wangda.  When you say that the feature can be turned
> > off, do you mean resource types or resource profiles?  I know there's an
> > off-by-default property that governs resource profiles, but I didn't see
> > any way to turn off resource types.  Even if only CPU and memory are
> > configured, i.e. no additional resource types, the code path is
> > different than it was.  Specifically, where CPU and memory were
> > primitives before, they're now entries in an array whose indexes have to
> > be looked up through the ResourceUtils class.  Did I miss something?
> >
> > For those who haven't followed the feature closely, there are really two
> > features here.  Resource types allows for declarative extension of the
> > resource system in YARN.  Resource profiles builds on top of resource
> > types to allow a user to request a group of resources as a profile, much
> > like EC2 instance types, e.g. "fast-compute" might mean 32GB RAM, 8
> > vcores, and 2 GPUs.
> >
> > Daniel
> >
> > On 8/23/17 11:49 AM, Wangda Tan wrote:
> > >   Hi folks,
> > >
> > > Per earlier discussion [1], I'd like to start a formal vote to merge
> > > feature branch YARN-3926 (Resource profile) to trunk. The vote will run
> > for
> > > 7 days and will end August 30 10:00 AM PDT.
> > >
> > > Briefly, YARN-3926 can extend resource model of YARN to support
> resource
> > > types other than CPU and memory, so it will be a cornerstone of
> features
> > > like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139),
> FPGA
> > > support (YARN-5983), network IO scheduling/isolation (YARN-2140). In
> > > addition to that, YARN-3926 allows admin to preconfigure resource
> > profiles
> > > in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64
> GB
> > > disk>, so applications can request "m3.large" profile instead of
> > specifying
> > > all resource types’s values.
> > >
> > > There are 32 subtasks that were completed as part of this effort.
> > >
> > > This feature needs to be explicitly turned on before use. We paid close
> > > attention to compatibility, performance, and scalability of this
> feature,
> > > mentioned in [1], we didn't see observable performance regression in
> > large
> > > scale SLS (scheduler load simulator) executions and saw less than 5%
> > > performance regression by using micro benchmark added by YARN-6775.
> > >
> > > This feature works from end-to-end (including
> UI/CLI/application/server),
> > > we have setup a cluster with this feature turned on runs for several
> > weeks,
> > > we didn't see any issues by far.
> > >
> > > Merge JIRA: YARN-7013 (Jenkins gave +1 already).
> > > Documentation: YARN-7056
> > >
> > > Special thanks to a team of folks who worked hard and contributed
> towards
> > > this effort including design discussion/development/reviews, etc.:
> Varun
> > > Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu,
> > > Karthik Kambatla, Jason Lowe, Arun Suresh.
> > >
> > > Regards,
> > > Wangda Tan
> > >
> > > [1]
> > >
> > http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/
> 201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4%
> 3DBbp4G8inQZmaMg%40mail.gmail.com%3E
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>