[DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

Alejandro Abdelnur
On the '[VOTE] introduce Python as build-time and run-time dependency for
Hadoop and throughout Hadoop stack' thread the vote for using Maven plugins
for the build had an overwhelming acceptance (several +1s, a few 0s and no
-1s).

Current tasks of the build for being 'pluginized' are:

1* cmake (HADOOP-8887)
2* protoc(HADOOP-9117)
3* saveVersion (HADOOP-8924) (but this one has been -1)

I've tested #2 and #3 in Windows an the worked as expected. Regarding #1,
if cmake is supported (and used for hadoop) in Window, i believe it should
work with minimal changes.

As I mentioned in HADOOP-8924, "Writing Maven plugins is more complex that
writing scripts, I don't dispute this. The main motivation for using Maven
plugins is to keep things in the POM declarative and to hide (if necessary)
different handlings for different platforms."

In order to enable the use of custom plugins for the Hadoop build, we need
to have such plugins available in a Maven repo for Maven to download and
use them. Note that they cannot be part of the same build that is building
Hadoop itself.

While we could argue this plugins are general purpose and they should be
developed in a different standalone project, doing this would definitely
slow down when we can use them in Hadoop.

Because of this, I propose we create a Hadoop subproject 'hadoop-build' to
host custom Maven plugins (and custom Ant tasks for branch-1) for Hadoop.

Being a sub-project will allow us to do independent releases (from Hadoop
releases) of plugins  we need for Hadoop build. Plus, use them immediately
in the Hadoop build.

Eventually, if the Maven community picks up some of these plugins we could
simply remove them from our hadoop-build project and change Hadoop POMs to
use the new ones.

Looking forward to hear what others think.

Thanks.

--
Alejandro
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

Colin McCabe
I think this is a good idea, for a few different reasons.

* We all know Java, so this code will be readable by all.
* It will benefit the wider community of Maven users, not just our project.
* It gets rid of the shell dependency.  The shell dependency is
problematic because Windows doesn't support most UNIX shell code, and
also because different Linux distributions have different default
shells.  The /bin/dash versus /bin/bash issue has caused problems in
the past.

Is there any way this code could be used in branch-1 to solve some of
the issues there?  I know the protoc code, at least, would be useful
there.

regards,
Colin


On Wed, Dec 5, 2012 at 9:24 AM, Alejandro Abdelnur <[hidden email]> wrote:

> On the '[VOTE] introduce Python as build-time and run-time dependency for
> Hadoop and throughout Hadoop stack' thread the vote for using Maven plugins
> for the build had an overwhelming acceptance (several +1s, a few 0s and no
> -1s).
>
> Current tasks of the build for being 'pluginized' are:
>
> 1* cmake (HADOOP-8887)
> 2* protoc(HADOOP-9117)
> 3* saveVersion (HADOOP-8924) (but this one has been -1)
>
> I've tested #2 and #3 in Windows an the worked as expected. Regarding #1,
> if cmake is supported (and used for hadoop) in Window, i believe it should
> work with minimal changes.
>
> As I mentioned in HADOOP-8924, "Writing Maven plugins is more complex that
> writing scripts, I don't dispute this. The main motivation for using Maven
> plugins is to keep things in the POM declarative and to hide (if necessary)
> different handlings for different platforms."
>
> In order to enable the use of custom plugins for the Hadoop build, we need
> to have such plugins available in a Maven repo for Maven to download and
> use them. Note that they cannot be part of the same build that is building
> Hadoop itself.
>
> While we could argue this plugins are general purpose and they should be
> developed in a different standalone project, doing this would definitely
> slow down when we can use them in Hadoop.
>
> Because of this, I propose we create a Hadoop subproject 'hadoop-build' to
> host custom Maven plugins (and custom Ant tasks for branch-1) for Hadoop.
>
> Being a sub-project will allow us to do independent releases (from Hadoop
> releases) of plugins  we need for Hadoop build. Plus, use them immediately
> in the Hadoop build.
>
> Eventually, if the Maven community picks up some of these plugins we could
> simply remove them from our hadoop-build project and change Hadoop POMs to
> use the new ones.
>
> Looking forward to hear what others think.
>
> Thanks.
>
> --
> Alejandro
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

Alejandro Abdelnur
On Wed, Dec 5, 2012 at 10:35 AM, Colin McCabe <[hidden email]>wrote:

>
> Is there any way this code could be used in branch-1 to solve some of
> the issues there?  I know the protoc code, at least, would be useful
> there.
>

We could have the logic done by plugins in standalone classes, and then
have Ant Tasks and Maven Plugin wrappers classes wiring them to the
corresponding build framework. This means we could produce build JARs with
both Ant Tasks and Maven Plugins for the same encapsulated logic.

--
Alejandro
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

Doug Cutting
In reply to this post by Alejandro Abdelnur
On Wed, Dec 5, 2012 at 9:24 AM, Alejandro Abdelnur <[hidden email]> wrote:
> Current tasks of the build for being 'pluginized' are:
>
> 1* cmake (HADOOP-8887)
> 2* protoc(HADOOP-9117)
> 3* saveVersion (HADOOP-8924) (but this one has been -1)
> [ ... ]
> While we could argue this plugins are general purpose and they should be
> developed in a different standalone project, doing this would definitely
> slow down when we can use them in Hadoop.

Yes, these examples do seem general-purpose and not Hadoop-specific.
Unless we have Hadoop-specific plugins that Codehaus won't accept then
this seems like it might just end up as a staging area where we put
plugins until they get released somewhere else.  So it might be
interesting to start exploring just how hard it is to get a new plugin
added at Codehaus.  They make it sound pretty easy.

http://mojo.codehaus.org/contribution/submitting-a-plugin.html

In the meantime or if that gets stalled then publishing them ourselves
seems reasonable.  +1

Doug
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

Radim Kolar-2
1. cmake and protoc maven plugins already exists. why you want to write
a new ones?
2. Groovy accepts java syntax. Just rewrite saveVersion.sh to java (its
done already in JIRA) and put it in pom.xml - no overhaul of build
infrastructure needed.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

Colin McCabe
On Fri, Dec 7, 2012 at 5:31 PM, Radim Kolar <[hidden email]> wrote:
> 1. cmake and protoc maven plugins already exists. why you want to write a
> new ones?

This has already been discussed; see
https://groups.google.com/forum/?fromgroups=#!topic/cmake-maven-project-users/5FpfUHmg5Ho

Actually the situation is even worse than it might seem from that
thread, since it turns out that com.googlecode.cmake-maven-project has
no support for any platforms but Windows.  It also has no support for
running native unit tests, which is a big motivation behind
HADOOP-8887.

> 2. Groovy accepts java syntax. Just rewrite saveVersion.sh to java (its done
> already in JIRA) and put it in pom.xml - no overhaul of build infrastructure
> needed.

Part of the reason for this thread is so that we can come up with a
solution for both branch-1 and later branches.  This would not be
accomplished by putting all the logic into a pom.xml file, since
branch-1 doesn't use Maven.

best,
Colin
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

Colin McCabe
On Mon, Dec 10, 2012 at 10:50 AM, Colin McCabe <[hidden email]> wrote:

> On Fri, Dec 7, 2012 at 5:31 PM, Radim Kolar <[hidden email]> wrote:
>> 1. cmake and protoc maven plugins already exists. why you want to write a
>> new ones?
>
> This has already been discussed; see
> https://groups.google.com/forum/?fromgroups=#!topic/cmake-maven-project-users/5FpfUHmg5Ho
>
> Actually the situation is even worse than it might seem from that
> thread, since it turns out that com.googlecode.cmake-maven-project has
> no support for any platforms but Windows.  It also has no support for
> running native unit tests, which is a big motivation behind
> HADOOP-8887.

To clarify this a little bit, some of the later versions of the
googlecode plugin add rudimentary support for gcc, but I wasn't able
to get it to work locally.  I spent a lot of time on this when I did
the cmake conversion, and it didn't really pay off.

As I commented, the plugin from HADOOP-8887 also does a lot more,
including add the ability to support things like "mvn test
-Dtest=my_native_test".

cheers,
Colin

>
>> 2. Groovy accepts java syntax. Just rewrite saveVersion.sh to java (its done
>> already in JIRA) and put it in pom.xml - no overhaul of build infrastructure
>> needed.
>
> Part of the reason for this thread is so that we can come up with a
> solution for both branch-1 and later branches.  This would not be
> accomplished by putting all the logic into a pom.xml file, since
> branch-1 doesn't use Maven.
>
> best,
> Colin
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

Radim Kolar-2
what is proposed build workflow?

Its enough to do mvn install to get custom plugins available to maven in
build phase or you must setup own mvn repo and do mvn deploy?
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

Alejandro Abdelnur
Radim,

you can do mvn install in the plugins project and then you'll be able to
use it from the project you are using the plugin.

if the plugin is avail in a maven repo, then you don't need to do that.

Thx


On Tue, Dec 11, 2012 at 7:16 PM, Radim Kolar <[hidden email]> wrote:

> what is proposed build workflow?
>
> Its enough to do mvn install to get custom plugins available to maven in
> build phase or you must setup own mvn repo and do mvn deploy?
>



--
Alejandro
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

Colin McCabe
In reply to this post by Radim Kolar-2
Hi Radim,

In general, Maven plugins are built and deployed to a repository.
Then, Maven fetches the precompiled binaries from this repository
based on a specific version number in the pom.  This is how Maven
plugins work in general. not specific to this proposal.

I did experiment with bundling the Maven plugins with the main Hadoop
source in HADOOP-8887; however, this is not really feasible.  Hence
the proposal for a separate sub-project here.

So basically the summary is the Maven workflow wouldn't change.

Hope this helps.
Colin


On Tue, Dec 11, 2012 at 7:16 PM, Radim Kolar <[hidden email]> wrote:
> what is proposed build workflow?
>
> Its enough to do mvn install to get custom plugins available to maven in
> build phase or you must setup own mvn repo and do mvn deploy?
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] create a hadoop-build subproject (a follow up on the thread 'introduce Python as build-time...')

Doug Cutting
In reply to this post by Alejandro Abdelnur
On Wed, Dec 12, 2012 at 10:20 AM, Alejandro Abdelnur <[hidden email]> wrote:
> you can do mvn install in the plugins project and then you'll be able to
> use it from the project you are using the plugin.

Avro has its Maven plugins in a module that's used when compiling
other modules.  You can build all of Avro without installing any of
it.  So I do not think that Maven plugins need to be installed to be
used, but can just be in a separate Maven module.  The Hadoop plugins
might, e.g., be placed a sub-module of Common.

Doug