HADOOP-14163 proposal for new hadoop.apache.org

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

HADOOP-14163 proposal for new hadoop.apache.org

Marton Elek

Hi,

In the previous thread the current forrest based hadoop site is identified as one of the pain points of the release process.

I created a new version of the site with exactly the same content.

 As it uses newer site generator (hugo), now:

1. It’s enough to create one new markdown file per release, and all the documentation/download links will be automatically added.
2. It requires only one single binary to render.


A preview version is temporary hosted at

     http://hadoop.anzix.net/ 

to make it easier to review.


For more details, you can check my comments on the issue https://issues.apache.org/jira/browse/HADOOP-14163

I would be thankful to get any feedback/review.

Cheers,
Marton



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: HADOOP-14163 proposal for new hadoop.apache.org

Owen O'Malley
Thanks for addressing this. Getting rid of Hadoop's use of forrest is a
good thing.

In terms of content, the documentation links should be sorted by number
with only the latest from each minor release line (eg. 3.0, 2.7, 2.6).

The download page points to the mirrors for checksums and signatures. It
should use the direct links, such as

https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz.asc
https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz.mds

Speaking of which, Hadoop's dist directory is huge and should be heavily
pruned. We should probably take it down to just hadoop-2.6.5, hadoop-2.7.3,
and hadoop-3.0.0-alpha2.

You might also want to move us to git-pubsub so that we can use a branch in
our source code git repository to publish the html. Typically this uses the
asf-site branch.

.. Owen

On Mon, Mar 13, 2017 at 7:28 AM, Marton Elek <[hidden email]> wrote:

>
> Hi,
>
> In the previous thread the current forrest based hadoop site is identified
> as one of the pain points of the release process.
>
> I created a new version of the site with exactly the same content.
>
>  As it uses newer site generator (hugo), now:
>
> 1. It’s enough to create one new markdown file per release, and all the
> documentation/download links will be automatically added.
> 2. It requires only one single binary to render.
>
>
> A preview version is temporary hosted at
>
>      http://hadoop.anzix.net/
>
> to make it easier to review.
>
>
> For more details, you can check my comments on the issue
> https://issues.apache.org/jira/browse/HADOOP-14163
>
> I would be thankful to get any feedback/review.
>
> Cheers,
> Marton
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: HADOOP-14163 proposal for new hadoop.apache.org

Marton Elek


Thank you all of the feedbacks, I fixed all of them (except one, see the comment below) and updated the http://hadoop.anzix.net preview site.

So the next steps:

0. Let me know if you have any comment about the latest version

1. I wait for the 2.8.0 announcement, and migrate the new announcement as well. (wouldn't like to complicate the 2.8.0 with the site change)

2. I like the suggestion of Owen to move the site to a specific git branch. I wouldn't like to pending on it if it's too much time, but if any of the commiters could pick it up, I would wait for it.

I tested it, and seems to be easy:

git svn clone https://svn.apache.org/repos/asf/hadoop/common/site/main
cd main
git remote add elek [hidden email]:elek/hadoop.git
git push elek master:asf-site

According to the blog entry, an INFRA issue should be opened (I guess by a commiter or maybe a pmc member):

https://blogs.apache.org/infra/entry/git_based_websites_available

3. After that I can submit the new site as a regular patch against the asf-site branch.

4. If it's merged, I can update the release wiki pages

Marton

ps:

The only suggested item which is not implemented is the short version names in the documentation menu (2.7 instead of 2.7.3).

I think there are two forces: usability of the site and the simplicity of the site generation. Ideally a new release could be added to the site as easy as possible (that was one of the motivation of the migration).

While a new tag could be added to the header of the markdown files (eg: versionLine: 3.0), it requires multiple files update during a new release. And if something would be missed, there could be displayed multiple "2.7" menu item (one for 2.7.3 and for 2.7.4). So the current method is not so nice, but much more bug-safe.

I prefer to keep the current/content in this step (if possible) and if the site is migrated we can submit new patches (hopefully against a git branch) in the normal way and further improve the site.


________________________________________
From: Owen O'Malley <[hidden email]>
Sent: Monday, March 13, 2017 6:15 PM
To: Marton Elek
Cc: [hidden email]
Subject: Re: HADOOP-14163 proposal for new hadoop.apache.org

Thanks for addressing this. Getting rid of Hadoop's use of forrest is a
good thing.

In terms of content, the documentation links should be sorted by number
with only the latest from each minor release line (eg. 3.0, 2.7, 2.6).

The download page points to the mirrors for checksums and signatures. It
should use the direct links, such as

https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz.asc
https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz.mds

Speaking of which, Hadoop's dist directory is huge and should be heavily
pruned. We should probably take it down to just hadoop-2.6.5, hadoop-2.7.3,
and hadoop-3.0.0-alpha2.

You might also want to move us to git-pubsub so that we can use a branch in
our source code git repository to publish the html. Typically this uses the
asf-site branch.

.. Owen

On Mon, Mar 13, 2017 at 7:28 AM, Marton Elek <[hidden email]> wrote:

>
> Hi,
>
> In the previous thread the current forrest based hadoop site is identified
> as one of the pain points of the release process.
>
> I created a new version of the site with exactly the same content.
>
>  As it uses newer site generator (hugo), now:
>
> 1. It’s enough to create one new markdown file per release, and all the
> documentation/download links will be automatically added.
> 2. It requires only one single binary to render.
>
>
> A preview version is temporary hosted at
>
>      http://hadoop.anzix.net/
>
> to make it easier to review.
>
>
> For more details, you can check my comments on the issue
> https://issues.apache.org/jira/browse/HADOOP-14163
>
> I would be thankful to get any feedback/review.
>
> Cheers,
> Marton
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: HADOOP-14163 proposal for new hadoop.apache.org

Andrew Wang
Thanks again for working on this Marton!

Based on my read of the blog post you linked, we should have the git branch
ready before asking infra to switch it over.

I can do a more detailed review on the JIRA once you rev, and can help with
the INFRA ticket once it's ready. We'll also have to update BUILDING.txt
and the wiki instructions as part of this.

Best,
Andrew

On Fri, Mar 24, 2017 at 3:06 AM, Marton Elek <[hidden email]> wrote:

>
>
> Thank you all of the feedbacks, I fixed all of them (except one, see the
> comment below) and updated the http://hadoop.anzix.net preview site.
>
> So the next steps:
>
> 0. Let me know if you have any comment about the latest version
>
> 1. I wait for the 2.8.0 announcement, and migrate the new announcement as
> well. (wouldn't like to complicate the 2.8.0 with the site change)
>
> 2. I like the suggestion of Owen to move the site to a specific git
> branch. I wouldn't like to pending on it if it's too much time, but if any
> of the commiters could pick it up, I would wait for it.
>
> I tested it, and seems to be easy:
>
> git svn clone https://svn.apache.org/repos/asf/hadoop/common/site/main
> cd main
> git remote add elek [hidden email]:elek/hadoop.git
> git push elek master:asf-site
>
> According to the blog entry, an INFRA issue should be opened (I guess by a
> commiter or maybe a pmc member):
>
> https://blogs.apache.org/infra/entry/git_based_websites_available
>
> 3. After that I can submit the new site as a regular patch against the
> asf-site branch.
>
> 4. If it's merged, I can update the release wiki pages
>
> Marton
>
> ps:
>
> The only suggested item which is not implemented is the short version
> names in the documentation menu (2.7 instead of 2.7.3).
>
> I think there are two forces: usability of the site and the simplicity of
> the site generation. Ideally a new release could be added to the site as
> easy as possible (that was one of the motivation of the migration).
>
> While a new tag could be added to the header of the markdown files (eg:
> versionLine: 3.0), it requires multiple files update during a new release.
> And if something would be missed, there could be displayed multiple "2.7"
> menu item (one for 2.7.3 and for 2.7.4). So the current method is not so
> nice, but much more bug-safe.
>
> I prefer to keep the current/content in this step (if possible) and if the
> site is migrated we can submit new patches (hopefully against a git branch)
> in the normal way and further improve the site.
>
>
> ________________________________________
> From: Owen O'Malley <[hidden email]>
> Sent: Monday, March 13, 2017 6:15 PM
> To: Marton Elek
> Cc: [hidden email]
> Subject: Re: HADOOP-14163 proposal for new hadoop.apache.org
>
> Thanks for addressing this. Getting rid of Hadoop's use of forrest is a
> good thing.
>
> In terms of content, the documentation links should be sorted by number
> with only the latest from each minor release line (eg. 3.0, 2.7, 2.6).
>
> The download page points to the mirrors for checksums and signatures. It
> should use the direct links, such as
>
> https://dist.apache.org/repos/dist/release/hadoop/common/
> hadoop-2.7.3/hadoop-2.7.3-src.tar.gz.asc
> https://dist.apache.org/repos/dist/release/hadoop/common/
> hadoop-2.7.3/hadoop-2.7.3-src.tar.gz.mds
>
> Speaking of which, Hadoop's dist directory is huge and should be heavily
> pruned. We should probably take it down to just hadoop-2.6.5, hadoop-2.7.3,
> and hadoop-3.0.0-alpha2.
>
> You might also want to move us to git-pubsub so that we can use a branch in
> our source code git repository to publish the html. Typically this uses the
> asf-site branch.
>
> .. Owen
>
> On Mon, Mar 13, 2017 at 7:28 AM, Marton Elek <[hidden email]>
> wrote:
>
> >
> > Hi,
> >
> > In the previous thread the current forrest based hadoop site is
> identified
> > as one of the pain points of the release process.
> >
> > I created a new version of the site with exactly the same content.
> >
> >  As it uses newer site generator (hugo), now:
> >
> > 1. It’s enough to create one new markdown file per release, and all the
> > documentation/download links will be automatically added.
> > 2. It requires only one single binary to render.
> >
> >
> > A preview version is temporary hosted at
> >
> >      http://hadoop.anzix.net/
> >
> > to make it easier to review.
> >
> >
> > For more details, you can check my comments on the issue
> > https://issues.apache.org/jira/browse/HADOOP-14163
> >
> > I would be thankful to get any feedback/review.
> >
> > Cheers,
> > Marton
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: HADOOP-14163 proposal for new hadoop.apache.org

Allen Wittenauer-6

> On Aug 8, 2017, at 12:36 AM, Akira Ajisaka <[hidden email]> wrote:
>
> Now I'm okay with not creating another repo.
> I'm thinking the following procedures may work:
>
> 1. Create ./asf-site directory
> 2. Add the content of https://github.com/elek/hadoop-site-proposal to the directory
> 3. Generate web pages and push them to asf-site branch
> 4. Create a CI job to run 3. automatically when ./asf-site directory is changed


        Yup.  To be more specific on the Jenkins part:

        MultiSCM build. Build should be set to poll SCM, probably @daily or equally reasonable.

        first SCM: clone hadoop/trunk to one dir
        second SCM: clone hadoop/asf-site to another dir

        (Letting Jenkins manage those dirs takes quite a bit of the work out of it)

        Run a (modified?) form of create-release so that you get an exact replica of what a released site looks like.

        Take site tarball and unpack it into asf-site/.../trunk (or current? or whatever?)

        build main site then commit back to asf-site

        commit an empty commit to asf-site to work around  INFRA-10751.  Recommend comment be the git hash of the current hadoop/trunk

        FWIW, what we do in Yetus is we actually have the src for the main yetus site as part of our source tree.  It gets built as part of the release.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]