[VOTE] Release candidate 0.20.203.0-rc0

classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[VOTE] Release candidate 0.20.203.0-rc0

Owen O'Malley
I think everything is ready to go on the 0.20.203.0 release. It includes security and a lot of improvements in the capacity scheduler and JobTracker.

Should we release http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Eli Collins
Hey Owen,

I took a quick look at the changes in the branch (specifically the
range of 200 or so changes where the first line of the commit doesn't
reference a jira).  Most of these look like backports of patches on
jira, however there also seem to be changes that don't correspond to
changes in trunk or patches on jiras.  Some of these introduce new
configuration options (eg hadoop.security.uid.cache.secs) or public
classes (eg QueueProcessingStatistics) that don't exist in trunk.

How do we ensure future releases won't violate compatibility with
respect to this release? Do we plan to have jiras with patches against
trunk for these changes, at least for the set of changes that affect
public APIs? If so, should that come first?

Thanks,
Eli

On Fri, Apr 29, 2011 at 4:09 PM, Owen O'Malley <[hidden email]> wrote:
> I think everything is ready to go on the 0.20.203.0 release. It includes security and a lot of improvements in the capacity scheduler and JobTracker.
>
> Should we release http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>
> -- Owen
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Owen O'Malley
On Fri, Apr 29, 2011 at 7:21 PM, Eli Collins <[hidden email]> wrote:

> I took a quick look at the changes in the branch


Thanks for taking a look. Please continue to inspect and test out the
release and vote. I'm really excited that we'll have a release that has
security and the "fred" user limits. Users desperately need these
improvements. Furthermore, it is really important that Hadoop gets back on
to a regular release cycle with releases coming out frequently. The current
stable release of 0.20.2 was released a year ago, which is much much too
long.


> How do we ensure future releases won't violate compatibility with
> respect to this release?


We are still in catchup mode in terms of making sure that everything gets
committed to trunk. Of course our users will correctly complain if the later
releases have regressions relative to this release. Thanks for pointing out
the issue.


> Do we plan to have jiras with patches against
> trunk for these changes, at least for the set of changes that affect
> public APIs?


Yes, I'll work on ensuring the necessary patches get applied to trunk.


> If so, should that come first?
>

The most import question is this release candidate a usable replacement and
improvement on the current stable release 0.20.2. I believe it is a huge
improvement and should be released.

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Eli Collins
On Sat, Apr 30, 2011 at 7:17 AM, Owen O'Malley <[hidden email]> wrote:

> On Fri, Apr 29, 2011 at 7:21 PM, Eli Collins <[hidden email]> wrote:
>
>> I took a quick look at the changes in the branch
>
>
> Thanks for taking a look. Please continue to inspect and test out the
> release and vote. I'm really excited that we'll have a release that has
> security and the "fred" user limits. Users desperately need these
> improvements. Furthermore, it is really important that Hadoop gets back on
> to a regular release cycle with releases coming out frequently. The current
> stable release of 0.20.2 was released a year ago, which is much much too
> long.
>

Agree.  Is branch-0.20 dead now? Ie the 20.3 and 20.4 fix versions
will never be released?  Are all the patches that have been committed
there since 0.20.2 (expecting that they'd be released) in
branch-0.20-security-203?  Users that saw jiras closed out with issues
committed to branch-0.20 with fix version 20.3 probably expect those
come out in this release.

>> How do we ensure future releases won't violate compatibility with
>> respect to this release?
>
>
> We are still in catchup mode in terms of making sure that everything gets
> committed to trunk. Of course our users will correctly complain if the later
> releases have regressions relative to this release. Thanks for pointing out
> the issue.
>

Seems like we need to do this before releasing 0.20.203 to prevent
blocking the upcoming 0.22 release, due to it being a regression
against the stable release (the sooner a release from trunk can
replace a 0.20 based release the better).

>
>> Do we plan to have jiras with patches against
>> trunk for these changes, at least for the set of changes that affect
>> public APIs?
>
>
> Yes, I'll work on ensuring the necessary patches get applied to trunk.
>

Awesome, thanks.

>
>> If so, should that come first?
>>
>
> The most import question is this release candidate a usable replacement and
> improvement on the current stable release 0.20.2. I believe it is a huge
> improvement and should be released.
>

Definitely a huge improvement, thanks for all the work. I had a
specific concern that it might block 0.22, and a general concern that
we don't want to set a precedent of releasing code that didn't go
through the normal code change (patch review and vote on jira/list).
Thank you for addressing this via getting the patches on trunk.

Thanks,
Eli
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Devaraj Das
In reply to this post by Owen O'Malley
+1 based on some single node tests I did (with security ON).


On 4/29/11 4:09 PM, "Owen O'Malley" <[hidden email]> wrote:

I think everything is ready to go on the 0.20.203.0 release. It includes security and a lot of improvements in the capacity scheduler and JobTracker.

Should we release http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?

-- Owen

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Chris Douglas
In reply to this post by Owen O'Malley
+1

Signature matches, md5/sha1 match. Also tried a basic HDFS upgrade
from 0.20.2 to 0.20.203 with fresh configs on a single node: all OK,
including rollback to 0.20.2. -C

On Fri, Apr 29, 2011 at 4:09 PM, Owen O'Malley <[hidden email]> wrote:
> I think everything is ready to go on the 0.20.203.0 release. It includes security and a lot of improvements in the capacity scheduler and JobTracker.
>
> Should we release http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>
> -- Owen
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Arun C Murthy-2
In reply to this post by Eli Collins
Eli,

On Apr 30, 2011, at 7:19 PM, "Eli Collins" <[hidden email]> wrote:
> Seems like we need to do this before releasing 0.20.203 to prevent
> blocking the upcoming 0.22 release, due to it being a regression
> against the stable release (the sooner a release from trunk can
> replace a 0.20 based release the better).
>

I don't see the issue - we can just mark the appropriate jiras as blockers on 0.22 or 0.23 as necessary and release 0.20.203, correct? The RMs for the releases and the rest of us can help make that distinction. As everyone agrees we need to get back into the habit of making timely and progressive releases, both 0.20.203 & 0.22 are steps in the same direction.

thanks,
Arun

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Nigel Daley-2
In reply to this post by Devaraj Das
I would like to see CI setup on this branch before we release anything from it.  I've copied the 0.20 build config and tried running it on this branch, but getting a native compile failure: https://builds.apache.org/hudson/view/G-L/view/Hadoop/job/Hadoop-0.20.203-Build/1/console

Nige

On Apr 30, 2011, at 8:11 PM, Devaraj Das wrote:

> +1 based on some single node tests I did (with security ON).
>
>
> On 4/29/11 4:09 PM, "Owen O'Malley" <[hidden email]> wrote:
>
> I think everything is ready to go on the 0.20.203.0 release. It includes security and a lot of improvements in the capacity scheduler and JobTracker.
>
> Should we release http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>
> -- Owen
>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Eli Collins
In reply to this post by Arun C Murthy-2
On Sun, May 1, 2011 at 7:20 PM, Arun C Murthy <[hidden email]> wrote:

> Eli,
>
> On Apr 30, 2011, at 7:19 PM, "Eli Collins" <[hidden email]> wrote:
>> Seems like we need to do this before releasing 0.20.203 to prevent
>> blocking the upcoming 0.22 release, due to it being a regression
>> against the stable release (the sooner a release from trunk can
>> replace a 0.20 based release the better).
>>
>
> I don't see the issue - we can just mark the appropriate jiras as blockers on 0.22 or 0.23 as necessary and release 0.20.203, correct? The RMs for the releases and the rest of us can help make that distinction. As everyone agrees we need to get back into the habit of making timely and progressive releases, both 0.20.203 & 0.22 are steps in the same direction.

Marking those issues as blockers for the next release (0.22) slows
down the next release. As you say we should be doing things that help
us make timely and progressive releases from trunk, this does the
opposite, if I understand correctly.

Thanks,
Eli
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Owen O'Malley
In reply to this post by Nigel Daley-2

On May 1, 2011, at 8:52 PM, Nigel Daley wrote:

> I would like to see CI setup on this branch before we release anything from it.  I've copied the 0.20 build config and tried running it on this branch, but getting a native compile failure: https://builds.apache.org/hudson/view/G-L/view/Hadoop/job/Hadoop-0.20.203-Build/1/console

An Apache CI build is a nice to have, but clearly isn't required.

It looks to be failing on the standard libc functions. Which distribution and version of Linux is it? Which version of gcc and libc are you using? You are probably going to need to log in and build by hand to see what is going on in that environment.

I compiled it on:
  RedHat Enterprise Linux Server release 5.4
  gcc 4.1.2
  automake 1.9.6
  autoconf 2.59

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Doug Cutting
In reply to this post by Owen O'Malley
On 04/29/2011 04:09 PM, Owen O'Malley wrote:
> I think everything is ready to go on the 0.20.203.0 release. It
> includes security and a lot of improvements in the capacity scheduler
> and JobTracker.

This does not appear to include the 0.20-append work?  So it's not
advisable to use HBase with this revision, correct?

> Should we release
> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?

The patch selection process for this branch did not appear to be a
community process.  A massive patch set was committed en-masse with no
public discussion before or after about its specific composition.

Long-term the users of a project benefit from a community that
collaborates using open, interactive processes.  If a particular set of
patches, not created through such a process, is valuable to end users,
then it can be distributed on github or elsewhere under a different
name, but should not be granted the imprimatur of a community product.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Alan Gates
In reply to this post by Owen O'Malley
 From the viewpoint of a downstream user, I'd like to see this  
released.  Right now Hive 0.7 and soon HCatalog 0.1 have to depend on  
a Cloudera distribution because they need security.  Having Apache  
products depend on 3rd party distributions of Apache products is  
bogus.  The sooner this is out the sooner we can fix this.

Alan.

On Apr 29, 2011, at 4:09 PM, Owen O'Malley wrote:

> I think everything is ready to go on the 0.20.203.0 release. It  
> includes security and a lot of improvements in the capacity  
> scheduler and JobTracker.
>
> Should we release http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>
> -- Owen

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Doug Cutting
On 05/02/2011 11:37 AM, Alan Gates wrote:
> From the viewpoint of a downstream user, I'd like to see this released.
> Right now Hive 0.7 and soon HCatalog 0.1 have to depend on a Cloudera
> distribution because they need security.  Having Apache products depend
> on 3rd party distributions of Apache products is bogus.  The sooner this
> is out the sooner we can fix this.

Alan,

Cloudera could upload its CDH3 patchset to a branch in Apache subversion
and call a release vote on it and I would vote against it.  The
interactive community process is to me what makes it Apache.

Releases should branch from trunk or use an existing release branch.  A
release branch should be open for patches from the general community.
Neither were the case here.  This is neither a subset or a superset of
the 0.20 branch that the community has invested in.  The change log for
this includes around 500 changes, yet only 24 issues are assigned to it
in Jira, the community's issue tracker.

Yes, the current situation is bad, but shortcutting the community
process doesn't fix it, it just hides it.

Cheers,

Doug
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Eli Collins
In reply to this post by Owen O'Malley
On Fri, Apr 29, 2011 at 4:09 PM, Owen O'Malley <[hidden email]> wrote:
> I think everything is ready to go on the 0.20.203.0 release. It includes security and a lot of improvements in the capacity scheduler and JobTracker.
>
> Should we release http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>

Based on the discussion I still have the following questions:

1. Does this release replace subsequent releases from branch-0.20? Ie
is the goal to replace the 0.20.3 or 0.20.4 release with releases from
the branch-0.20-security branches?  If not, where does this release
fit in?  If so, I think we need to do the following before releasing
from branch-0.20-security:

- Make sure branch-0.20-security-203 contains the patches from 0.20.2,
since this branch is based on 0.20.1 it's not clear that it doesn't
regress against the current stable 0.20 release. Perhaps the best way
to do this is via a rebase.

- Make sure branch-0.20-security-203 (and future 0.20 based release
branches) contain the patches that were checked in for 0.20.3 and
0.20.4. These branches contain important bug fixes (eg HDFS-1258,
HDFS-909, etc) that are not present in this branch, and should be. The
expectation of people that checked in patches to branch-0.20 and the
users who filed the jiras is that they be fixed in the next stable
release.

- Remove the 0.20.3 and 0.20.4 fix versions from jira to make it clear
what the next release is.


2. What are the compatibility implications?  Specifically, do we need
to block the next major release (0.22) on getting patches in this
release committed to trunk?  Should the pace of major version releases
be slowed down by minor version releases?


3. Patches normally go through jira, get reviewed, committed to trunk,
and then merged to a release branch.  Why not use the same process
here?  I'm concerned that we're setting a precedent that patches don't
need to be reviewed and voted on.


Given that we're releasing common, hdfs and mapreduce perhaps general@
is a better place than common-dev@ for release discussion.

Thanks,
Eli
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Tom White-2
On Mon, May 2, 2011 at 12:16 PM, Eli Collins <[hidden email]> wrote:

> On Fri, Apr 29, 2011 at 4:09 PM, Owen O'Malley <[hidden email]> wrote:
>> I think everything is ready to go on the 0.20.203.0 release. It includes security and a lot of improvements in the capacity scheduler and JobTracker.
>>
>> Should we release http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>>
>
> Based on the discussion I still have the following questions:
>
> 1. Does this release replace subsequent releases from branch-0.20? Ie
> is the goal to replace the 0.20.3 or 0.20.4 release with releases from
> the branch-0.20-security branches?  If not, where does this release
> fit in?  If so, I think we need to do the following before releasing
> from branch-0.20-security:
>
> - Make sure branch-0.20-security-203 contains the patches from 0.20.2,
> since this branch is based on 0.20.1 it's not clear that it doesn't
> regress against the current stable 0.20 release. Perhaps the best way
> to do this is via a rebase.

I just did a quick search, and these are the JIRAs that are in 0.20.2
but appear not to be in 0.20.203.0.

HADOOP-5611
HADOOP-5612
HADOOP-5623
HADOOP-5759
HADOOP-6269
HADOOP-6315
HADOOP-6386
HADOOP-6428
HADOOP-6575
HADOOP-6576
HDFS-579
HDFS-596
HDFS-723
HDFS-732
HDFS-792
MAPREDUCE-623
MAPREDUCE-1070
MAPREDUCE-1163
MAPREDUCE-1251

>
> - Make sure branch-0.20-security-203 (and future 0.20 based release
> branches) contain the patches that were checked in for 0.20.3 and
> 0.20.4. These branches contain important bug fixes (eg HDFS-1258,
> HDFS-909, etc) that are not present in this branch, and should be. The
> expectation of people that checked in patches to branch-0.20 and the
> users who filed the jiras is that they be fixed in the next stable
> release.

These JIRAs are the ones committed to the 0.20 branch (for 0.20.3) but
are not marked as being in 0.20.203.0

HADOOP-6724
HADOOP-6833
HADOOP-6881
HADOOP-6923
HADOOP-6928
HADOOP-7116
HDFS-1024
HDFS-1041
HDFS-1240
HDFS-1258
HDFS-1377
HDFS-1404
HDFS-1406
HDFS-727
HDFS-908
HDFS-909
MAPREDUCE-1280
MAPREDUCE-1407
MAPREDUCE-1734
MAPREDUCE-1832
MAPREDUCE-1880
MAPREDUCE-2262

Tom

>
> - Remove the 0.20.3 and 0.20.4 fix versions from jira to make it clear
> what the next release is.
>
>
> 2. What are the compatibility implications?  Specifically, do we need
> to block the next major release (0.22) on getting patches in this
> release committed to trunk?  Should the pace of major version releases
> be slowed down by minor version releases?
>
>
> 3. Patches normally go through jira, get reviewed, committed to trunk,
> and then merged to a release branch.  Why not use the same process
> here?  I'm concerned that we're setting a precedent that patches don't
> need to be reviewed and voted on.
>
>
> Given that we're releasing common, hdfs and mapreduce perhaps general@
> is a better place than common-dev@ for release discussion.
>
> Thanks,
> Eli
>
Reply | Threaded
Open this post in threaded view
|

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Eric Baldeschwieler
In reply to this post by Doug Cutting

Hi folks,

This strikes me as a bit odd. I think we have already discussed this at length and agreed that a release could proceed.

Since then, Arun and Owen have worked actively to incorporated community feedback into this release.

All parties making Hadoop releases other then Apache have already incorporated most of the patches in this release into their products, including doug's organization. I don't see how Hadoop's users benefit from Apache not incorporating them into an Apache release.

As previously discussed, all parties are welcome to champion altenative releases from Apache if they want to invest in making Apache Hadoop better.

Thanks!!

E14

---
E14 - typing on glass

On May 2, 2011, at 12:16 PM, "Ian Holsman" <[hidden email]> wrote:

> moving this thread to general@
>
> On May 3, 2011, at 3:58 AM, Doug Cutting wrote:
>
>>> Should we release
>>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>>
>> The patch selection process for this branch did not appear to be a
>> community process.  A massive patch set was committed en-masse with no
>> public discussion before or after about its specific composition.
>
> guys...
> 1. do we agree this is an issue
> 2. if it is, how we do get the communication & discussion on list?
>
> what do people think are the major issues that are stopping people talking about stuff on list are?
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Arun C Murthy
In reply to this post by Doug Cutting
Doug,

On May 2, 2011, at 10:58 AM, Doug Cutting wrote:
> The patch selection process for this branch did not appear to be a
> community process.  A massive patch set was committed en-masse with no
> public discussion before or after about its specific composition.

Lets review:

# You proposed to release off the Yahoo security patchset first in  
April, 2010: http://s.apache.org/5Gv
# I started this discussion again in Jan, 2011: http://s.apache.org/uf
# We went through several iterations:
  - I first committed a jumbo patch upon which some reservations were  
expressed.
  - Owen went ahead and broke them up to commit individual patches to  
incorporate the provided feedback.
# Roy clearly clarified the way forward: http://s.apache.org/tD4 
(which Owen has since incorporatedk by breaking into individual  
patches).

Your current stance given the history, is surprising, to say the  
least... we have already discussed this. It is clear that the  
community (including downstream Apache projects like Pig, Hive and  
HCatalog) will substantially benefit from an Apache release of this  
improved codebase.

thanks,
Arun

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

stack-3
In reply to this post by Tom White-2
How hard would it be to get the patches Tom lists below into
branch-0.20-security-203?  I'd think it'd be an easier sell if it were
a superset of all in 0.20, especially since it bears its name.

Otherwise, glad to see the release candidate.

St.Ack
Reply | Threaded
Open this post in threaded view
|

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

Eli Collins
In reply to this post by Eric Baldeschwieler
Hey Eric,

I don't have any objections to a release from
branch-0.20-security-203.  However when I examined the specific patch
set I noticed the are important implications with respect to
compatibility (of for 0.20.2 and 0.22), a question about project model
(eg not reviewing patches on jira before committing them, not having
patches go through trunk, etc), and some open questions for users (eg
is this the next dot release of the stable branch?).

I agree this is a valuable artifact, but that doesn't mean it's OK to
ignore compatibility concerns, etc.

I've listed specifics questions/comments here:
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201105.mbox/%3CBANLkTinZ=xb6kJ5PTeLN5KKD9b-cwaM0OQ@...%3E

Thanks,
Eli

On Mon, May 2, 2011 at 1:05 PM, Eric Baldeschwieler
<[hidden email]> wrote:

>
> Hi folks,
>
> This strikes me as a bit odd. I think we have already discussed this at length and agreed that a release could proceed.
>
> Since then, Arun and Owen have worked actively to incorporated community feedback into this release.
>
> All parties making Hadoop releases other then Apache have already incorporated most of the patches in this release into their products, including doug's organization. I don't see how Hadoop's users benefit from Apache not incorporating them into an Apache release.
>
> As previously discussed, all parties are welcome to champion altenative releases from Apache if they want to invest in making Apache Hadoop better.
>
> Thanks!!
>
> E14
>
> ---
> E14 - typing on glass
>
> On May 2, 2011, at 12:16 PM, "Ian Holsman" <[hidden email]> wrote:
>
>> moving this thread to general@
>>
>> On May 3, 2011, at 3:58 AM, Doug Cutting wrote:
>>
>>>> Should we release
>>>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>>>
>>> The patch selection process for this branch did not appear to be a
>>> community process.  A massive patch set was committed en-masse with no
>>> public discussion before or after about its specific composition.
>>
>> guys...
>> 1. do we agree this is an issue
>> 2. if it is, how we do get the communication & discussion on list?
>>
>> what do people think are the major issues that are stopping people talking about stuff on list are?
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release candidate 0.20.203.0-rc0

Arun C Murthy
In reply to this post by Tom White-2

On May 2, 2011, at 12:31 PM, Tom White wrote:
> I just did a quick search, and these are the JIRAs that are in 0.20.2
> but appear not to be in 0.20.203.0.

Thanks Tom.

I did a quick analysis:

# Remaining for 0.20.203
  * HADOOP-5611
  * HADOOP-5612
  * HADOOP-5623
  * HDFS-596
  * HDFS-723
  * HDFS-732
  * HDFS-579
  * MAPREDUCE-1070
  * HADOOP-6315
  * MAPREDUCE-1163

# Fixed, missing in CHANGES.txt

  * HADOOP-5759
  * HADOOP-6269
  * HADOOP-6386
  * HADOOP-6428

# Build, not necessary
* MAPREDUCE-1251 (fixed)

Broken tests, fixed already
* HADOOP-6575
* HADOOP-6576
* HDFS-792
-- MAPREDUCE-623

I'll work fix the 'remaining' ones.

thanks,
Arun
12