[VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Matt Foley-2
>> Python as runtime requirement. Are you planing to migrate all
BASH scripts provided by Hadoop (or dynamically created -ie launcher
scripts) to Python?

I don't intend to mandate use of Python.  Rather, I want there to be a
cross-platform option available.  Things that are best done in
platform-specific manner, should be done in shell for linux, and powershell
for windows.  But things that are best done in a platform-independent way,
can be, with a lower long-term maintenance cost than using different
scripts per platform.

This means that some, but not all, existing scripts may naturally migrate
to Python as the overall system is ported to Windows.  Hopefully when
someone is porting a script that can be well done in a platform-independent
way, they will be able to choose Python and write a single script that can
replace the shell script and make it unnecessary to maintain two scripts
(doing the same job but in different languages!) going forward.

>>  What else in the current build, besides saveVersion.sh, you see
as candidate to be migrated to Python?

I have a greatly improved version of src/docs/relnotes.py that I would like
to submit, for auto-gen of release notes.
That's all that I have on my hotlist right now, although I anticipate that
some of the shell scripts invoked by ant may be natural candidates.

>> How are you planning to define what Python modules can be used?
Will developers have to install them manually?

That's something the community will work out, the same way they decide what
library jars to include, and when to upgrade those versions.  But first,
let's get an agreement in principle that this is the direction we want to
go.

Cheers,
--Matt

On Thu, Nov 29, 2012 at 3:26 PM, Alejandro Abdelnur <[hidden email]>wrote:

> Matt, thanks for the clarification.
>
> I may have missed the main point of the PROPOSAL thread then. I personally
> want to continue the discussion before voting.
>
> * Phyton as runtime requirement. Are you planing to migrate all BASH
> scripts provided by Hadoop (or dynamically created -ie launcher scripts)
>  to Phyton?
> * What else in the current build, besides saveVersion.sh, you see as
> candidate to be migrated to Phyton?
> * How are you planning to define what Phyton modules can be used? Will
> developers have to install them manually?
>
> Cheers
>
>
> On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <[hidden email]>
> wrote:
>
> > Hi Alejandro,
> > Please see in-line below.
> >
> > On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur <[hidden email]>
> >  wrote:
> >
> > > Matt,
> > >
> > > The scope of this vote seems different from what was discussed in the
> > > PROPOSAL thread.
> > > In the PROPOSAL thread you indicated this was for Hadoop1 because it is
> > ANT
> > > based. And the main reason was to remove saveVersion.sh.
> > > Your #3  was not discussed in the proposal, was it?
> > >
> >
> > The item #3 was in my original statement of the problem, with which I
> > started the proposal thread.  In fact, the thread title was "[PROPOSAL]
> > introduce Python as build-time and run-time dependency for Hadoop and
> > throughout Hadoop stack".  It is true that only one or two people chose
> to
> > discuss #3 further in that thread.
> >
> > The point is not just to replace a single script, but to provide a means
> to
> > do cross-platform scripts, which will over time replace many
> > non-platform-specific scripts written in platform-specific languages.
> >
> >
> > >
> > > It seems this vote is dragging much more stuff it was originally
> > discussed.
> > > I think you should suspend the vote, recap the motivation and then
> > restart
> > > the vote.
> > >
> >
> > I respectfully disagree.  I believe a careful reading of the cited
> > discussion thread, plus my own statement of the vote, provides sufficient
> > background for a thoughtful decision on the subject.  Presumably so do
> the
> > ten other people who had already voted before you made that comment.
> >
> > If several other people want more discussion first, please speak up.
> > Thanks,
> > --Matt
> >
> > As things are laid out at the moment my vote is:
> > >
> > > -1 (It still seems an overkill to introduce a new runtime requirement
> for
> > > building to replace a script.)
> > > +1 (I think this is the right way to simplify the build)
> > > -1 (AFAIK there is not such requirement at the moment, and if it comes
> it
> > > would be in the form of an AM, which I'd argue it should leave outside
> of
> > > Hadoop)
> > >
> > > Thx
> > >
> > >
> > > On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
> > > [hidden email]> wrote:
> > >
> > > > +1, +1, +1
> > > >
> > > > -Giri
> > > >
> > > >
> > > > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <[hidden email]>
> wrote:
> > > >
> > > > > For discussion, please see previous thread "[PROPOSAL] introduce
> > Python
> > > > as
> > > > > build-time and run-time dependency for Hadoop and throughout Hadoop
> > > > stack".
> > > > >
> > > > > This vote consists of three separate items:
> > > > >
> > > > > 1. Contributors shall be allowed to use Python as a
> > > platform-independent
> > > > > scripting language for build-time tasks, and add Python as a
> > build-time
> > > > > dependency.
> > > > > Please vote +1, 0, -1.
> > > > >
> > > > > 2. Contributors shall be encouraged to use Maven tasks in
> combination
> > > > with
> > > > > either plug-ins or Groovy scripts to do cross-platform build-time
> > > tasks,
> > > > > even under ant in Hadoop-1.
> > > > > Please vote +1, 0, -1.
> > > > >
> > > > > 3. Contributors shall be allowed to use Python as a
> > > platform-independent
> > > > > scripting language for run-time tasks, and add Python as a run-time
> > > > > dependency.
> > > > > Please vote +1, 0, -1.
> > > > >
> > > > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
> > > contributors
> > > > to
> > > > > use Maven plug-ins or Groovy as the only means of cross-platform
> > > > build-time
> > > > > tasks, or to simply continue using platform-dependent scripts as is
> > > being
> > > > > done today.
> > > > >
> > > > > Vote closes at 12:30pm PST on Saturday 1 December.
> > > > > ---------
> > > > > Personally, my vote is +1, +1, +1.
> > > > > I think #2 is preferable to #1, but still has many unknowns in it,
> > and
> > > > > until those are worked out I don't want to delay moving to
> > > cross-platform
> > > > > scripts for build-time tasks.
> > > > >
> > > > > Best regards,
> > > > > --Matt
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Alejandro
> > >
> >
>
>
>
> --
> Alejandro
>
Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Mahadevan Venkatraman
In reply to this post by Ivan Mitic
+1, +1, +1 (non-binding)

Supporting Comments:

Build-time scripts: Using a platform independent language such as python (or maven in certain cases) will greatly help in reducing build breaks and improve on build script maintainability.

Run-time scripts: Most run-time scripts are end-user visible and are scripts that are needed to be run by admin such as starting/stop Hadoop cluster (hadoop-daemons) or by developers submitting a job (hadoop.cmd). There seem to be two types of script files:
    - Scripts intended for a cluster admin or an IT admin:
        - It is desirable to use a common set of python scripts that work across all platforms. However, in a Windows enterprise environment IT admins won't like it if they have to run python scripts to start/stop a cluster. So for these, there should be a PowerShell interface wrapper that can accept the right parameters and pass it down to the python script. Hopefully, the power-shell layer can be a simple pass-thru. This way the python scripts is like any other Java code hidden behind a well-known API surface. IT Admins can't debug it or modify it easily, but this is fine since for scripts like the aforementioned there isn't a requirement that IT Admins should be able to easily be able to view/modify the underlying code.
       - For Windows specific things not supported by Python natively, such as setting ACLs, starting/stopping windows services it should be possible to re-factor the code appropriately. But a little bit of powershell/cmd for these call outs would be unavoidable.

    - Scripts intended for developers/cluster users:
      - Most of these scripts (e.g. hadoop.cmd) would be behind other API surface such as WebHDFS, ODBC, JDBC, Templeton etc. So the advantage of having a common script across platforms outweighs the use of cmd/powershell as a native windows feature. Again, it should also be possible to provide simple powershell wrappers for a windows environment.

Thanks, Mahadevan.

-----Original Message-----
From: Ivan Mitic [mailto:[hidden email]]
Sent: Thursday, November 29, 2012 3:41 PM
To: [hidden email]; [hidden email]
Subject: RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

+1, +1, +1 (some comments inline)

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Matt Foley
Sent: Saturday, November 24, 2012 12:13 PM
To: [hidden email]
Subject: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

For discussion, please see previous thread "[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency.
Please vote +1, 0, -1.

2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1.
Please vote +1, 0, -1.

>>> I believe 1&2 in combination make a total sense. I ported a few scripts to Python, and thus far, it showed to be up to the task and satisfy the cross-platform requirements. In my option, it is also important to agree on the version, as I've run into some breaking changes in version 3+.


3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency.

>>> This is a great aspirational goal! Maintaining two sets of scripts would be a real challenge.


Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today.

Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks.

Best regards,
--Matt




Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Matt Foley-2
In reply to this post by Alejandro Abdelnur
Hello again.  Crossed in the mail.

* What kind of tasks you envision Python scripts will enable that are
> not possible today?


The point isn't to open brave new worlds.  The point is to avoid the
nightmare of having to maintain multiple "parallel" scripts doing the SAME
THING in multiple scripting languages.  I know from experience that they
never get maintained right.  It's just a huge source of bugs, because when
they are in different languages, it can be quite difficult to determine
that they are *really* doing the same thing.  And in a case like shell vs
powershell, it will be very common to have contributors who are not experts
in both.

I care deeply about having a high-quality release in both Linux and
Windows.  And having a cross-platform scripting language will make it much
easier to maintain that quality over time, without "slip" between the two
platforms.

* Will the requirement of Python be pushed to clients using the
> hadoop script? If so, this would affect all downstream projects that use
> hadoop script in one why or the other, right?


If question #3 passes, then Python will become a run-time dependency for
Hadoop.  That means it would need to be installed as part of the Hadoop
install preparation, just like all the other Hadoop run-time dependencies.

Is the main motivation of the proposal to make things easier for window,
> so there is no need for cygwin? If that is the case, have you considered
> doing directly BAT scripts? If you take Tomcat for example, they have BAT
> scripts and SH scripts and things work quite nicely.


Of course it is sufficient, from the simple implementation perspective, to
translate all the shell scripts into bat or (better) powershell scripts.
 That is, in fact, the most evident alternative to my proposals #1 and #3.

However, I ask -- beg! -- the community to consider it from the software
engineering perspective.  We aren't here to just implement something once
and be done.  It has to be maintained, as most of you on this list are well
aware, for years and years, across multiple generations.  And trying to
maintain parallel scripts in multiple languages, when not necessitated by
genuine platform-specific requirements, is just creating bug generators in
the system.

Personally, I wouldn't be trilled to see the logic in the scripts to
> get more complex, but on the opposite direction; IMO, scripts should be
> trimmed to set env vars (with no voodoo logic), build the classpath (with
> no voodoo logic, just from a set of dirs) and call Java.


See the first item above.  The point is to enable cross-platform scripting
of the things we already have to script.  IMO, scripts should get out of
the env var business entirely, but that's unrelated to this question :-)

Finally, this is code change, so I'm not sure why we are doing a vote.


I view this as a tools issue, that affects questions that go beyond the
one-time choice of how to write (or re-write) saveVersion.sh.  Also Aaron
(atm) recommended that I bring it to the list.  So here we are :-)

Cheers,
--Matt

On Thu, Nov 29, 2012 at 5:25 PM, Alejandro Abdelnur <[hidden email]>wrote:

> Matt,
>
> Let me repost my previous questions and a few more. I'd appreciate your
> answers, as it will help me understand the full impact this would have in
> Hadoop and related projects.
>
> * Phyton as runtime requirement. Are you planing to migrate all BASH
> scripts provided by Hadoop (or dynamically created -ie launcher scripts)
>  to Phyton?
> * What else in the current build, besides saveVersion.sh, you see as
> candidate to be migrated to Phyton?
> * How are you planning to define what Phyton modules can be used? Will
> developers have to install them manually?
> * What kind of tasks you envision Python scripts will enable that are not
> possible today?
> * Will the requirement of Python be pushed to clients using the hadoop
> script? If so, this would affect all downstream projects that use hadoop
> script in one why or the other, right?
>
> Is the main motivation of the proposal to make things easier for window, so
> there is no need for cygwin? If that is the case, have you considered doing
> directly BAT scripts? If you take Tomcat for example, they have BAT scripts
> and SH scripts and things work quite nicely.
>
> Personally, I wouldn't be trilled to see the logic in the scripts to get
> more complex, but on the opposite direction; IMO, scripts should be trimmed
> to set env vars (with no voodoo logic), build the classpath (with no voodoo
> logic, just from a set of dirs) and call Java.
>
> Finally, this is code change, so I'm not sure why we are doing a vote.
>
> Thx.
>
> On Thu, Nov 29, 2012 at 3:26 PM, Alejandro Abdelnur <[hidden email]
> >wrote:
>
> > Matt, thanks for the clarification.
> >
> > I may have missed the main point of the PROPOSAL thread then. I
> personally
> > want to continue the discussion before voting.
> >
> > * Phyton as runtime requirement. Are you planing to migrate all BASH
> > scripts provided by Hadoop (or dynamically created -ie launcher scripts)
> >  to Phyton?
> > * What else in the current build, besides saveVersion.sh, you see as
> > candidate to be migrated to Phyton?
> > * How are you planning to define what Phyton modules can be used? Will
> > developers have to install them manually?
> >
> > Cheers
> >
> >
> > On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <[hidden email]
> >wrote:
> >
> >> Hi Alejandro,
> >> Please see in-line below.
> >>
> >> On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur <[hidden email]>
> >>  wrote:
> >>
> >> > Matt,
> >> >
> >> > The scope of this vote seems different from what was discussed in the
> >> > PROPOSAL thread.
> >> > In the PROPOSAL thread you indicated this was for Hadoop1 because it
> is
> >> ANT
> >> > based. And the main reason was to remove saveVersion.sh.
> >> > Your #3  was not discussed in the proposal, was it?
> >> >
> >>
> >> The item #3 was in my original statement of the problem, with which I
> >> started the proposal thread.  In fact, the thread title was "[PROPOSAL]
> >> introduce Python as build-time and run-time dependency for Hadoop and
> >> throughout Hadoop stack".  It is true that only one or two people chose
> to
> >> discuss #3 further in that thread.
> >>
> >> The point is not just to replace a single script, but to provide a means
> >> to
> >> do cross-platform scripts, which will over time replace many
> >> non-platform-specific scripts written in platform-specific languages.
> >>
> >>
> >> >
> >> > It seems this vote is dragging much more stuff it was originally
> >> discussed.
> >> > I think you should suspend the vote, recap the motivation and then
> >> restart
> >> > the vote.
> >> >
> >>
> >> I respectfully disagree.  I believe a careful reading of the cited
> >> discussion thread, plus my own statement of the vote, provides
> sufficient
> >> background for a thoughtful decision on the subject.  Presumably so do
> the
> >> ten other people who had already voted before you made that comment.
> >>
> >> If several other people want more discussion first, please speak up.
> >> Thanks,
> >> --Matt
> >>
> >> As things are laid out at the moment my vote is:
> >> >
> >> > -1 (It still seems an overkill to introduce a new runtime requirement
> >> for
> >> > building to replace a script.)
> >> > +1 (I think this is the right way to simplify the build)
> >> > -1 (AFAIK there is not such requirement at the moment, and if it comes
> >> it
> >> > would be in the form of an AM, which I'd argue it should leave outside
> >> of
> >> > Hadoop)
> >> >
> >> > Thx
> >> >
> >> >
> >> > On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
> >> > [hidden email]> wrote:
> >> >
> >> > > +1, +1, +1
> >> > >
> >> > > -Giri
> >> > >
> >> > >
> >> > > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <[hidden email]>
> >> wrote:
> >> > >
> >> > > > For discussion, please see previous thread "[PROPOSAL] introduce
> >> Python
> >> > > as
> >> > > > build-time and run-time dependency for Hadoop and throughout
> Hadoop
> >> > > stack".
> >> > > >
> >> > > > This vote consists of three separate items:
> >> > > >
> >> > > > 1. Contributors shall be allowed to use Python as a
> >> > platform-independent
> >> > > > scripting language for build-time tasks, and add Python as a
> >> build-time
> >> > > > dependency.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > 2. Contributors shall be encouraged to use Maven tasks in
> >> combination
> >> > > with
> >> > > > either plug-ins or Groovy scripts to do cross-platform build-time
> >> > tasks,
> >> > > > even under ant in Hadoop-1.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > 3. Contributors shall be allowed to use Python as a
> >> > platform-independent
> >> > > > scripting language for run-time tasks, and add Python as a
> run-time
> >> > > > dependency.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
> >> > contributors
> >> > > to
> >> > > > use Maven plug-ins or Groovy as the only means of cross-platform
> >> > > build-time
> >> > > > tasks, or to simply continue using platform-dependent scripts as
> is
> >> > being
> >> > > > done today.
> >> > > >
> >> > > > Vote closes at 12:30pm PST on Saturday 1 December.
> >> > > > ---------
> >> > > > Personally, my vote is +1, +1, +1.
> >> > > > I think #2 is preferable to #1, but still has many unknowns in it,
> >> and
> >> > > > until those are worked out I don't want to delay moving to
> >> > cross-platform
> >> > > > scripts for build-time tasks.
> >> > > >
> >> > > > Best regards,
> >> > > > --Matt
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Alejandro
> >> >
> >>
> >
> >
> >
> > --
> > Alejandro
> >
>
>
>
> --
> Alejandro
>
Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Chuan Liu
+1 +1 +1

Agree with Matt on the code maintainability.

I think on one side we have Shell which is a script language and OS dependent, e.g. as in bash vs powershell;
on the other side we have Java which is not a script language and OS independent.
I would accept any script language that can fix the gap as an OS independent scripting language.
Personally, I also prefer Python over Ruby.

Thanks,
Chuan

________________________________________
From: [hidden email] on behalf of Matt Foley
Sent: Thursday, November 29, 2012 6:26 PM
To: [hidden email]
Subject: Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Hello again.  Crossed in the mail.

* What kind of tasks you envision Python scripts will enable that are
> not possible today?


The point isn't to open brave new worlds.  The point is to avoid the
nightmare of having to maintain multiple "parallel" scripts doing the SAME
THING in multiple scripting languages.  I know from experience that they
never get maintained right.  It's just a huge source of bugs, because when
they are in different languages, it can be quite difficult to determine
that they are *really* doing the same thing.  And in a case like shell vs
powershell, it will be very common to have contributors who are not experts
in both.

I care deeply about having a high-quality release in both Linux and
Windows.  And having a cross-platform scripting language will make it much
easier to maintain that quality over time, without "slip" between the two
platforms.

* Will the requirement of Python be pushed to clients using the
> hadoop script? If so, this would affect all downstream projects that use
> hadoop script in one why or the other, right?


If question #3 passes, then Python will become a run-time dependency for
Hadoop.  That means it would need to be installed as part of the Hadoop
install preparation, just like all the other Hadoop run-time dependencies.

Is the main motivation of the proposal to make things easier for window,
> so there is no need for cygwin? If that is the case, have you considered
> doing directly BAT scripts? If you take Tomcat for example, they have BAT
> scripts and SH scripts and things work quite nicely.


Of course it is sufficient, from the simple implementation perspective, to
translate all the shell scripts into bat or (better) powershell scripts.
 That is, in fact, the most evident alternative to my proposals #1 and #3.

However, I ask -- beg! -- the community to consider it from the software
engineering perspective.  We aren't here to just implement something once
and be done.  It has to be maintained, as most of you on this list are well
aware, for years and years, across multiple generations.  And trying to
maintain parallel scripts in multiple languages, when not necessitated by
genuine platform-specific requirements, is just creating bug generators in
the system.

Personally, I wouldn't be trilled to see the logic in the scripts to
> get more complex, but on the opposite direction; IMO, scripts should be
> trimmed to set env vars (with no voodoo logic), build the classpath (with
> no voodoo logic, just from a set of dirs) and call Java.


See the first item above.  The point is to enable cross-platform scripting
of the things we already have to script.  IMO, scripts should get out of
the env var business entirely, but that's unrelated to this question :-)

Finally, this is code change, so I'm not sure why we are doing a vote.


I view this as a tools issue, that affects questions that go beyond the
one-time choice of how to write (or re-write) saveVersion.sh.  Also Aaron
(atm) recommended that I bring it to the list.  So here we are :-)

Cheers,
--Matt

On Thu, Nov 29, 2012 at 5:25 PM, Alejandro Abdelnur <[hidden email]>wrote:

> Matt,
>
> Let me repost my previous questions and a few more. I'd appreciate your
> answers, as it will help me understand the full impact this would have in
> Hadoop and related projects.
>
> * Phyton as runtime requirement. Are you planing to migrate all BASH
> scripts provided by Hadoop (or dynamically created -ie launcher scripts)
>  to Phyton?
> * What else in the current build, besides saveVersion.sh, you see as
> candidate to be migrated to Phyton?
> * How are you planning to define what Phyton modules can be used? Will
> developers have to install them manually?
> * What kind of tasks you envision Python scripts will enable that are not
> possible today?
> * Will the requirement of Python be pushed to clients using the hadoop
> script? If so, this would affect all downstream projects that use hadoop
> script in one why or the other, right?
>
> Is the main motivation of the proposal to make things easier for window, so
> there is no need for cygwin? If that is the case, have you considered doing
> directly BAT scripts? If you take Tomcat for example, they have BAT scripts
> and SH scripts and things work quite nicely.
>
> Personally, I wouldn't be trilled to see the logic in the scripts to get
> more complex, but on the opposite direction; IMO, scripts should be trimmed
> to set env vars (with no voodoo logic), build the classpath (with no voodoo
> logic, just from a set of dirs) and call Java.
>
> Finally, this is code change, so I'm not sure why we are doing a vote.
>
> Thx.
>
> On Thu, Nov 29, 2012 at 3:26 PM, Alejandro Abdelnur <[hidden email]
> >wrote:
>
> > Matt, thanks for the clarification.
> >
> > I may have missed the main point of the PROPOSAL thread then. I
> personally
> > want to continue the discussion before voting.
> >
> > * Phyton as runtime requirement. Are you planing to migrate all BASH
> > scripts provided by Hadoop (or dynamically created -ie launcher scripts)
> >  to Phyton?
> > * What else in the current build, besides saveVersion.sh, you see as
> > candidate to be migrated to Phyton?
> > * How are you planning to define what Phyton modules can be used? Will
> > developers have to install them manually?
> >
> > Cheers
> >
> >
> > On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <[hidden email]
> >wrote:
> >
> >> Hi Alejandro,
> >> Please see in-line below.
> >>
> >> On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur <[hidden email]>
> >>  wrote:
> >>
> >> > Matt,
> >> >
> >> > The scope of this vote seems different from what was discussed in the
> >> > PROPOSAL thread.
> >> > In the PROPOSAL thread you indicated this was for Hadoop1 because it
> is
> >> ANT
> >> > based. And the main reason was to remove saveVersion.sh.
> >> > Your #3  was not discussed in the proposal, was it?
> >> >
> >>
> >> The item #3 was in my original statement of the problem, with which I
> >> started the proposal thread.  In fact, the thread title was "[PROPOSAL]
> >> introduce Python as build-time and run-time dependency for Hadoop and
> >> throughout Hadoop stack".  It is true that only one or two people chose
> to
> >> discuss #3 further in that thread.
> >>
> >> The point is not just to replace a single script, but to provide a means
> >> to
> >> do cross-platform scripts, which will over time replace many
> >> non-platform-specific scripts written in platform-specific languages.
> >>
> >>
> >> >
> >> > It seems this vote is dragging much more stuff it was originally
> >> discussed.
> >> > I think you should suspend the vote, recap the motivation and then
> >> restart
> >> > the vote.
> >> >
> >>
> >> I respectfully disagree.  I believe a careful reading of the cited
> >> discussion thread, plus my own statement of the vote, provides
> sufficient
> >> background for a thoughtful decision on the subject.  Presumably so do
> the
> >> ten other people who had already voted before you made that comment.
> >>
> >> If several other people want more discussion first, please speak up.
> >> Thanks,
> >> --Matt
> >>
> >> As things are laid out at the moment my vote is:
> >> >
> >> > -1 (It still seems an overkill to introduce a new runtime requirement
> >> for
> >> > building to replace a script.)
> >> > +1 (I think this is the right way to simplify the build)
> >> > -1 (AFAIK there is not such requirement at the moment, and if it comes
> >> it
> >> > would be in the form of an AM, which I'd argue it should leave outside
> >> of
> >> > Hadoop)
> >> >
> >> > Thx
> >> >
> >> >
> >> > On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
> >> > [hidden email]> wrote:
> >> >
> >> > > +1, +1, +1
> >> > >
> >> > > -Giri
> >> > >
> >> > >
> >> > > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <[hidden email]>
> >> wrote:
> >> > >
> >> > > > For discussion, please see previous thread "[PROPOSAL] introduce
> >> Python
> >> > > as
> >> > > > build-time and run-time dependency for Hadoop and throughout
> Hadoop
> >> > > stack".
> >> > > >
> >> > > > This vote consists of three separate items:
> >> > > >
> >> > > > 1. Contributors shall be allowed to use Python as a
> >> > platform-independent
> >> > > > scripting language for build-time tasks, and add Python as a
> >> build-time
> >> > > > dependency.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > 2. Contributors shall be encouraged to use Maven tasks in
> >> combination
> >> > > with
> >> > > > either plug-ins or Groovy scripts to do cross-platform build-time
> >> > tasks,
> >> > > > even under ant in Hadoop-1.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > 3. Contributors shall be allowed to use Python as a
> >> > platform-independent
> >> > > > scripting language for run-time tasks, and add Python as a
> run-time
> >> > > > dependency.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
> >> > contributors
> >> > > to
> >> > > > use Maven plug-ins or Groovy as the only means of cross-platform
> >> > > build-time
> >> > > > tasks, or to simply continue using platform-dependent scripts as
> is
> >> > being
> >> > > > done today.
> >> > > >
> >> > > > Vote closes at 12:30pm PST on Saturday 1 December.
> >> > > > ---------
> >> > > > Personally, my vote is +1, +1, +1.
> >> > > > I think #2 is preferable to #1, but still has many unknowns in it,
> >> and
> >> > > > until those are worked out I don't want to delay moving to
> >> > cross-platform
> >> > > > scripts for build-time tasks.
> >> > > >
> >> > > > Best regards,
> >> > > > --Matt
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Alejandro
> >> >
> >>
> >
> >
> >
> > --
> > Alejandro
> >
>
>
>
> --
> Alejandro
>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Bikas Saha
+1, +1, +1 (non-binding)

We have had promising results for 1 and 2 when porting to Windows. 3 would
allow us to remove platform dependencies from test code. Agree that there
might be some nuanced operations that require OS specific environments but
this would lead to keeping them at a minimum.

Bikas

On 11/29/12 7:22 PM, "Chuan Liu" <[hidden email]> wrote:

>+1 +1 +1
>
>Agree with Matt on the code maintainability.
>
>I think on one side we have Shell which is a script language and OS
>dependent, e.g. as in bash vs powershell;
>on the other side we have Java which is not a script language and OS
>independent.
>I would accept any script language that can fix the gap as an OS
>independent scripting language.
>Personally, I also prefer Python over Ruby.
>
>Thanks,
>Chuan
>
>________________________________________
>From: [hidden email] on behalf of Matt Foley
>Sent: Thursday, November 29, 2012 6:26 PM
>To: [hidden email]
>Subject: Re: [VOTE] introduce Python as build-time and run-time
>dependency for Hadoop and throughout Hadoop stack
>
>Hello again.  Crossed in the mail.
>
>* What kind of tasks you envision Python scripts will enable that are
>> not possible today?
>
>
>The point isn't to open brave new worlds.  The point is to avoid the
>nightmare of having to maintain multiple "parallel" scripts doing the SAME
>THING in multiple scripting languages.  I know from experience that they
>never get maintained right.  It's just a huge source of bugs, because when
>they are in different languages, it can be quite difficult to determine
>that they are *really* doing the same thing.  And in a case like shell vs
>powershell, it will be very common to have contributors who are not
>experts
>in both.
>
>I care deeply about having a high-quality release in both Linux and
>Windows.  And having a cross-platform scripting language will make it much
>easier to maintain that quality over time, without "slip" between the two
>platforms.
>
>* Will the requirement of Python be pushed to clients using the
>> hadoop script? If so, this would affect all downstream projects that use
>> hadoop script in one why or the other, right?
>
>
>If question #3 passes, then Python will become a run-time dependency for
>Hadoop.  That means it would need to be installed as part of the Hadoop
>install preparation, just like all the other Hadoop run-time dependencies.
>
>Is the main motivation of the proposal to make things easier for window,
>> so there is no need for cygwin? If that is the case, have you considered
>> doing directly BAT scripts? If you take Tomcat for example, they have
>>BAT
>> scripts and SH scripts and things work quite nicely.
>
>
>Of course it is sufficient, from the simple implementation perspective, to
>translate all the shell scripts into bat or (better) powershell scripts.
> That is, in fact, the most evident alternative to my proposals #1 and #3.
>
>However, I ask -- beg! -- the community to consider it from the software
>engineering perspective.  We aren't here to just implement something once
>and be done.  It has to be maintained, as most of you on this list are
>well
>aware, for years and years, across multiple generations.  And trying to
>maintain parallel scripts in multiple languages, when not necessitated by
>genuine platform-specific requirements, is just creating bug generators in
>the system.
>
>Personally, I wouldn't be trilled to see the logic in the scripts to
>> get more complex, but on the opposite direction; IMO, scripts should be
>> trimmed to set env vars (with no voodoo logic), build the classpath
>>(with
>> no voodoo logic, just from a set of dirs) and call Java.
>
>
>See the first item above.  The point is to enable cross-platform scripting
>of the things we already have to script.  IMO, scripts should get out of
>the env var business entirely, but that's unrelated to this question :-)
>
>Finally, this is code change, so I'm not sure why we are doing a vote.
>
>
>I view this as a tools issue, that affects questions that go beyond the
>one-time choice of how to write (or re-write) saveVersion.sh.  Also Aaron
>(atm) recommended that I bring it to the list.  So here we are :-)
>
>Cheers,
>--Matt
>
>On Thu, Nov 29, 2012 at 5:25 PM, Alejandro Abdelnur
><[hidden email]>wrote:
>
>> Matt,
>>
>> Let me repost my previous questions and a few more. I'd appreciate your
>> answers, as it will help me understand the full impact this would have
>>in
>> Hadoop and related projects.
>>
>> * Phyton as runtime requirement. Are you planing to migrate all BASH
>> scripts provided by Hadoop (or dynamically created -ie launcher scripts)
>>  to Phyton?
>> * What else in the current build, besides saveVersion.sh, you see as
>> candidate to be migrated to Phyton?
>> * How are you planning to define what Phyton modules can be used? Will
>> developers have to install them manually?
>> * What kind of tasks you envision Python scripts will enable that are
>>not
>> possible today?
>> * Will the requirement of Python be pushed to clients using the hadoop
>> script? If so, this would affect all downstream projects that use hadoop
>> script in one why or the other, right?
>>
>> Is the main motivation of the proposal to make things easier for
>>window, so
>> there is no need for cygwin? If that is the case, have you considered
>>doing
>> directly BAT scripts? If you take Tomcat for example, they have BAT
>>scripts
>> and SH scripts and things work quite nicely.
>>
>> Personally, I wouldn't be trilled to see the logic in the scripts to get
>> more complex, but on the opposite direction; IMO, scripts should be
>>trimmed
>> to set env vars (with no voodoo logic), build the classpath (with no
>>voodoo
>> logic, just from a set of dirs) and call Java.
>>
>> Finally, this is code change, so I'm not sure why we are doing a vote.
>>
>> Thx.
>>
>> On Thu, Nov 29, 2012 at 3:26 PM, Alejandro Abdelnur <[hidden email]
>> >wrote:
>>
>> > Matt, thanks for the clarification.
>> >
>> > I may have missed the main point of the PROPOSAL thread then. I
>> personally
>> > want to continue the discussion before voting.
>> >
>> > * Phyton as runtime requirement. Are you planing to migrate all BASH
>> > scripts provided by Hadoop (or dynamically created -ie launcher
>>scripts)
>> >  to Phyton?
>> > * What else in the current build, besides saveVersion.sh, you see as
>> > candidate to be migrated to Phyton?
>> > * How are you planning to define what Phyton modules can be used? Will
>> > developers have to install them manually?
>> >
>> > Cheers
>> >
>> >
>> > On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <[hidden email]
>> >wrote:
>> >
>> >> Hi Alejandro,
>> >> Please see in-line below.
>> >>
>> >> On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur
>><[hidden email]>
>> >>  wrote:
>> >>
>> >> > Matt,
>> >> >
>> >> > The scope of this vote seems different from what was discussed in
>>the
>> >> > PROPOSAL thread.
>> >> > In the PROPOSAL thread you indicated this was for Hadoop1 because
>>it
>> is
>> >> ANT
>> >> > based. And the main reason was to remove saveVersion.sh.
>> >> > Your #3  was not discussed in the proposal, was it?
>> >> >
>> >>
>> >> The item #3 was in my original statement of the problem, with which I
>> >> started the proposal thread.  In fact, the thread title was
>>"[PROPOSAL]
>> >> introduce Python as build-time and run-time dependency for Hadoop and
>> >> throughout Hadoop stack".  It is true that only one or two people
>>chose
>> to
>> >> discuss #3 further in that thread.
>> >>
>> >> The point is not just to replace a single script, but to provide a
>>means
>> >> to
>> >> do cross-platform scripts, which will over time replace many
>> >> non-platform-specific scripts written in platform-specific languages.
>> >>
>> >>
>> >> >
>> >> > It seems this vote is dragging much more stuff it was originally
>> >> discussed.
>> >> > I think you should suspend the vote, recap the motivation and then
>> >> restart
>> >> > the vote.
>> >> >
>> >>
>> >> I respectfully disagree.  I believe a careful reading of the cited
>> >> discussion thread, plus my own statement of the vote, provides
>> sufficient
>> >> background for a thoughtful decision on the subject.  Presumably so
>>do
>> the
>> >> ten other people who had already voted before you made that comment.
>> >>
>> >> If several other people want more discussion first, please speak up.
>> >> Thanks,
>> >> --Matt
>> >>
>> >> As things are laid out at the moment my vote is:
>> >> >
>> >> > -1 (It still seems an overkill to introduce a new runtime
>>requirement
>> >> for
>> >> > building to replace a script.)
>> >> > +1 (I think this is the right way to simplify the build)
>> >> > -1 (AFAIK there is not such requirement at the moment, and if it
>>comes
>> >> it
>> >> > would be in the form of an AM, which I'd argue it should leave
>>outside
>> >> of
>> >> > Hadoop)
>> >> >
>> >> > Thx
>> >> >
>> >> >
>> >> > On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
>> >> > [hidden email]> wrote:
>> >> >
>> >> > > +1, +1, +1
>> >> > >
>> >> > > -Giri
>> >> > >
>> >> > >
>> >> > > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <[hidden email]>
>> >> wrote:
>> >> > >
>> >> > > > For discussion, please see previous thread "[PROPOSAL]
>>introduce
>> >> Python
>> >> > > as
>> >> > > > build-time and run-time dependency for Hadoop and throughout
>> Hadoop
>> >> > > stack".
>> >> > > >
>> >> > > > This vote consists of three separate items:
>> >> > > >
>> >> > > > 1. Contributors shall be allowed to use Python as a
>> >> > platform-independent
>> >> > > > scripting language for build-time tasks, and add Python as a
>> >> build-time
>> >> > > > dependency.
>> >> > > > Please vote +1, 0, -1.
>> >> > > >
>> >> > > > 2. Contributors shall be encouraged to use Maven tasks in
>> >> combination
>> >> > > with
>> >> > > > either plug-ins or Groovy scripts to do cross-platform
>>build-time
>> >> > tasks,
>> >> > > > even under ant in Hadoop-1.
>> >> > > > Please vote +1, 0, -1.
>> >> > > >
>> >> > > > 3. Contributors shall be allowed to use Python as a
>> >> > platform-independent
>> >> > > > scripting language for run-time tasks, and add Python as a
>> run-time
>> >> > > > dependency.
>> >> > > > Please vote +1, 0, -1.
>> >> > > >
>> >> > > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
>> >> > contributors
>> >> > > to
>> >> > > > use Maven plug-ins or Groovy as the only means of
>>cross-platform
>> >> > > build-time
>> >> > > > tasks, or to simply continue using platform-dependent scripts
>>as
>> is
>> >> > being
>> >> > > > done today.
>> >> > > >
>> >> > > > Vote closes at 12:30pm PST on Saturday 1 December.
>> >> > > > ---------
>> >> > > > Personally, my vote is +1, +1, +1.
>> >> > > > I think #2 is preferable to #1, but still has many unknowns in
>>it,
>> >> and
>> >> > > > until those are worked out I don't want to delay moving to
>> >> > cross-platform
>> >> > > > scripts for build-time tasks.
>> >> > > >
>> >> > > > Best regards,
>> >> > > > --Matt
>> >> > > >
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Alejandro
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Alejandro
>> >
>>
>>
>>
>> --
>> Alejandro
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Luke Lu
In reply to this post by Matt Foley-2
Thanks for the voting thread. Otherwise, many committers would have missed
it.

I agree that this is a superset of code change that has larger impact than
typical code change.

On Thu, Nov 29, 2012 at 6:26 PM, Matt Foley <[hidden email]> wrote:

> > Finally, this is code change, so I'm not sure why we are doing a vote.
>
>
> I view this as a tools issue, that affects questions that go beyond the
> one-time choice of how to write (or re-write) saveVersion.sh.  Also Aaron
> (atm) recommended that I bring it to the list.  So here we are :-)
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Luke Lu-2
I'd like to change my binding vote to -1, -0, -1.

Considering the hadoop stack/ecosystem as a whole, I think the best cross
platform scripting language to adopt is jruby for following reasons:

1. HBase already adopted jruby for HBase shell, which all current platform
vendors support.
2. We can control the version of language implementation at a per release
basis.
3. We don't have to introduce new dependencies in the de facto hadoop
stack. (see 1).

I'm all for improving multi-platform support. I think the best way to do
this is to have a thin native script wrappers (using env vars) to call the
cross-platform jruby scripts.

__Luke



On Fri, Nov 30, 2012 at 3:21 AM, Luke Lu <[hidden email]> wrote:

> Thanks for the voting thread. Otherwise, many committers would have missed
> it.
>
> I agree that this is a superset of code change that has larger impact than
> typical code change.
>
>
> On Thu, Nov 29, 2012 at 6:26 PM, Matt Foley <[hidden email]> wrote:
>
>> > Finally, this is code change, so I'm not sure why we are doing a vote.
>>
>>
>> I view this as a tools issue, that affects questions that go beyond the
>> one-time choice of how to write (or re-write) saveVersion.sh.  Also Aaron
>> (atm) recommended that I bring it to the list.  So here we are :-)
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Steve Loughran-3
In reply to this post by Radim Kolar-2
On 30 November 2012 00:29, Radim Kolar <[hidden email]> wrote:

>
> * What else in the current build, besides saveVersion.sh, you see as
> candidate to be migrated to Phyton?
>
> inline ant scripts
>

=0. Ant's versioning is stricter; you can pull down the exact Jar versions,
and some of us in the Ant team worked very hard to get it going everywhere.
You don't gain anything by going to .py

-steve
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Steve Loughran-3
In reply to this post by Luke Lu-2
On 30 November 2012 12:57, Luke Lu <[hidden email]> wrote:

> I'd like to change my binding vote to -1, -0, -1.
>
> Considering the hadoop stack/ecosystem as a whole, I think the best cross
> platform scripting language to adopt is jruby for following reasons:
>
> 1. HBase already adopted jruby for HBase shell, which all current platform
> vendors support.
> 2. We can control the version of language implementation at a per release
> basis.
> 3. We don't have to introduce new dependencies in the de facto hadoop
> stack. (see 1).
>
>
I don't see why these arguments should have any impact on using python at
build time, as it doesn't introduce any dependencies downstream. Yes, you
need python at build time, but that's no worse than having a protoc
compiler, gcc and the automake toolchain.



> I'm all for improving multi-platform support. I think the best way to do
> this is to have a thin native script wrappers (using env vars) to call the
> cross-platform jruby scripts.
>
>
Were it not for the env-var configuration hierarchy mess that things are in
today, I'd agree. where do you set your env vars? hadoop-env.sh? Where does
that come from? the hadoop conf dir? How do you find that? An env variable
or a ../../conf from bin/hadoop.sh which breaks once you start symlinking
to hadoop/bin; or do you assume a root installation in /etc/hadoop/conf,
which points to /etc/alternatives/hadoop-conf, which can then point back to
/etc/hadoop/conf.pseudo ? And what about JAVA_HOME?

Those env vars are  something I'd like see the back of.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Radim Kolar-2
In reply to this post by Steve Loughran-3

>> inline ant scripts
>>
>> =0. Ant's versioning is stricter; you can pull down the exact Jar versions,
>> and some of us in the Ant team worked very hard to get it going everywhere.
>> You don't gain anything by going to .py
there are sh scripts inside maven ant plugin stuff
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Luke Lu
In reply to this post by Steve Loughran-3
On Fri, Nov 30, 2012 at 5:29 AM, Steve Loughran <[hidden email]>wrote:

> where do you set your env vars... and what about JAVA_HOME
>

There should be only two env vars (JAVA_HOME and HADOOP_HOME) to deal with
in the native scripts (.bat on windows and .sh on unix platforms) to
boostrap jruby scripts, which deal with the rest of the envs.

__Luke
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Luke Lu
In reply to this post by Steve Loughran-3
On Fri, Nov 30, 2012 at 5:29 AM, Steve Loughran <[hidden email]>wrote:

> Yes, you need python at build time, but that's no worse than having a
> protoc
> compiler, gcc and the automake toolchain.
>

The problem is that python is known to have _backward_ compatibility issues
on various platforms. It would be very annoying/time consuming to deal with
various support issues regarding python versions on various platforms.

I agree that autotools is a nightmare and should be converted (in branch-1
as well) to cmake (which has good versioning support :) The goal is to have
less external dependencies, not more, again mostly due to support issues.
If we want to introduce an external dependencies, we need to pick something
that are easy to support compatibility wise.

__Luke
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Doug Cutting
In reply to this post by Matt Foley-2
-1, +1, -1

Run- & build-time scripting should be limited to operations that are
impossible in Java.  These should not be complex nor should we
encourage more complexity in them.  A parallel set of simple .bat
files for such operations seems preferable to adding a Python
dependency.

Doug

On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <[hidden email]> wrote:

> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Jitendra Pandey
In reply to this post by Radim Kolar-2
+1, +1, +1

On Fri, Nov 30, 2012 at 5:40 AM, Radim Kolar <[hidden email]> wrote:

>
>  inline ant scripts
>>>
>>> =0. Ant's versioning is stricter; you can pull down the exact Jar
>>> versions,
>>> and some of us in the Ant team worked very hard to get it going
>>> everywhere.
>>> You don't gain anything by going to .py
>>>
>> there are sh scripts inside maven ant plugin stuff
>



--
<http://hortonworks.com/download/>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Raja Aluri
In reply to this post by Matt Foley-2
+1, +1, +1 (non binding)

It makes it a lot easier to make build tools (that cannot be developed
easily using maven) work across non-unix like platforms (especially
windows).

Raja



On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <[hidden email]> wrote:

> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Eli Collins
In reply to this post by Matt Foley-2
-1, 0, -1

IIUC the only platform we plan to add support for that we can't easily
support today (w/o an emulation layer like cygwin) is Windows, and it
seems like making the bash scripts simpler and having parallel bat
files is IMO a better approach.

On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <[hidden email]> wrote:

> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Steve Loughran-3
On 1 December 2012 01:08, Eli Collins <[hidden email]> wrote:

> -1, 0, -1
>
> IIUC the only platform we plan to add support for that we can't easily
> support today (w/o an emulation layer like cygwin) is Windows, and it
> seems like making the bash scripts simpler and having parallel bat
> files is IMO a better approach.
>
>
WinNT Bat/CMD files are the worst possible scripting language invented. At
the very least, .py should be the language of choice there
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Steve Loughran-3
In reply to this post by Radim Kolar-2
On 30 November 2012 13:40, Radim Kolar <[hidden email]> wrote:

>
>  inline ant scripts
>>>
>>> =0. Ant's versioning is stricter; you can pull down the exact Jar
>>> versions,
>>> and some of us in the Ant team worked very hard to get it going
>>> everywhere.
>>> You don't gain anything by going to .py
>>>
>> there are sh scripts inside maven ant plugin stuff
>

Which is because there are some things you can't do in Java -run rpmbuild
to pick up file permissions and hanging symlinks that only become valid on
deployment.

The reason Ant is used to start them is Maven views trying to run native
scripts as a forbidden action - probably popping up some patronising text
"you are trying to run a shell script, please look at
maven.apache.org/wiki/whymavenwontletyoudothings/ to understand this; they
also view building RPMs as not something to encourage either.

(but we digress into an ant vs maven argument. I do actually appreciate the
consistent target naming across projects and the ability for the IDE to set
up  structure, it's just the entire underlying architecture and
implementation that I dislike)
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Doug Cutting
In reply to this post by Steve Loughran-3
On Sat, Dec 1, 2012 at 2:44 AM, Steve Loughran <[hidden email]> wrote:
> WinNT Bat/CMD files are the worst possible scripting language invented. At
> the very least, .py should be the language of choice there

The scripts should not have so much logic that .bat files are a problem.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Joep Rottinghuis
In reply to this post by Matt Foley-2
0, 0, -1 (non-binding)

Joep

On Nov 24, 2012, at 12:13 PM, Matt Foley <[hidden email]> wrote:

> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt
123