HADOOP_CLIENT_OPTS getting set multiple times (is this a bug?)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

HADOOP_CLIENT_OPTS getting set multiple times (is this a bug?)

Tom Brown
I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK).

I noticed that my task tracker processes have multiple "-Xmx" configs
attached, and that the later ones (128m) were overriding the ones I
had intended to be used (500m).

After digging through the various scripts, I found that the problem is
happening because "hadoop-env.sh" is getting invoked multiple times.
The deb file created a link from "/etc/profile.d/" to hadoop-env.sh,
so this file is run whenever I log in. The "hadoop" script also
invokes hadoop-env.sh (via "hadoop-config.sh"). The following sequence
is causing the problem:

1. The first time hadoop-env.sh is invoked (when the user logs in),
HADOOP_CLIENT_OPTS is set to "-Xmx128m ...".

2. The second time hadoop-env.sh is invoked (when a Hadoop process is
started), HADOOP_OPTS is set to "... $HADOOP_CLIENT_OPTS" (thereby
including the memory setting for all Hadoop processes in general)

3. Also during the second execution, HADOOP_CLIENT_OPTS is recursively
set to "-Xmx128m $HADOOP_CLIENT_OPTS" (so it now contains "-Xmx128m
-Xmx128m").

4. When the actual hadoop process is started, it always includes both
JAVA_HEAP_SIZE and HADOOP_OPTS (in that order), but since HADOOP_OPTS
also has a memory setting and is later in the command line, it takes
precedence.

I couldn't find any bug that matched this, so I thought I'd reach out
to the community: Is this a known bug? Do the scripts and deb file
belong to Hadoop in general, or is this the responsibility of a
specific distribution?

Thanks in advance!

--Tom
Reply | Threaded
Open this post in threaded view
|

Re: HADOOP_CLIENT_OPTS getting set multiple times (is this a bug?)

Matt Foley
>> I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK). ...
>> Do the scripts and deb file belong to Hadoop in general, or is this the responsibility of a specific distribution?

Hi Tom,
Good description.  I searched in Jira for "HADOOP_CLIENT_OPTS", and it appears there are at least two bugs open on this issue (although in the later context of 2.0.2 and 3.0):  HADOOP-9211 and HADOOP-9351.  I encourage you to follow and/or contribute to those jiras if you are interested in improving the usability issue.

Regarding whether you are looking at Apache stuff or something specific to a distro:
I'm going to get a little pedantic here, sorry, there's no other way to explain it and the differences are actually important from a legalistic standpoint.

As members of this community, we wear multiple "hats".  I'm a committer and PMC member for the Apache Hadoop project, and wearing that "hat" I was also the release manager for Hadoop-1.0.2.  I think you found those deb packages in the Apache artifact repositories.  If so, it was compiled by me, as Release Manager for that release of Hadoop-1.  But it wasn't compiled by Hortonworks -- even though I am also an employee of Hortonworks and Hortonworks supports my work on behalf of the community.

Hortonworks makes releases of HDP, their supported product which includes or is "powered by" Apache Hadoop and related projects.  Other companies also publish distributions powered by Hadoop.  But those distros are available from their respective companies' web sites.  Anything you download from Apache is provided by members of the Apache Hadoop community on a non-commercial basis.  All of our companies are proud to support this work, as it is part of the opensource "virtuous circle" between the community and the companies, the technology and the commerce.

Hope that helps.  Feel free to contact me off-list if you want to discuss more.
Regards,
--Matt


On Tue, Mar 12, 2013 at 12:50 PM, Tom Brown <[hidden email]> wrote:
I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK).

I noticed that my task tracker processes have multiple "-Xmx" configs
attached, and that the later ones (128m) were overriding the ones I
had intended to be used (500m).

After digging through the various scripts, I found that the problem is
happening because "hadoop-env.sh" is getting invoked multiple times.
The deb file created a link from "/etc/profile.d/" to hadoop-env.sh,
so this file is run whenever I log in. The "hadoop" script also
invokes hadoop-env.sh (via "hadoop-config.sh"). The following sequence
is causing the problem:

1. The first time hadoop-env.sh is invoked (when the user logs in),
HADOOP_CLIENT_OPTS is set to "-Xmx128m ...".

2. The second time hadoop-env.sh is invoked (when a Hadoop process is
started), HADOOP_OPTS is set to "... $HADOOP_CLIENT_OPTS" (thereby
including the memory setting for all Hadoop processes in general)

3. Also during the second execution, HADOOP_CLIENT_OPTS is recursively
set to "-Xmx128m $HADOOP_CLIENT_OPTS" (so it now contains "-Xmx128m
-Xmx128m").

4. When the actual hadoop process is started, it always includes both
JAVA_HEAP_SIZE and HADOOP_OPTS (in that order), but since HADOOP_OPTS
also has a memory setting and is later in the command line, it takes
precedence.

I couldn't find any bug that matched this, so I thought I'd reach out
to the community: Is this a known bug? Do the scripts and deb file
belong to Hadoop in general, or is this the responsibility of a
specific distribution?

Thanks in advance!

--Tom