Issue with Hadoop Job History Server

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with Hadoop Job History Server

Gao, Yunlong
To whom it may concern,

I am using Hadoop 2.7.1.2.3.6.0-3796, with the Hortonworks distribution of HDP-2.3.6.0-3796. I have a question with the Hadoop Job History sever. 

After I set up everything, the resource manager/name nodes/data nodes seem to be running fine. But the job history server is not working correctly.  The issue with it is that the UI of the job history server does not show any jobs.  And all the rest calls to the job history server do not work either. Also notice that there is no logs in HDFS under the directory of "mapreduce.jobhistory.done-dir"

I have tried with different things, including restarting the job history server and monitor the log -- no error/exceptions is observed. I also rename the /hadoop/mapreduce/jhs/mr-jhs-state for the state recovery of job history server, and then restart it again, but no particular error happens. I tried with some other random stuff that I borrowed from online blogs/documents but got no luck.


Any help would be very much appreciated.

Thanks,
Yunlong

Reply | Threaded
Open this post in threaded view
|

Re: Issue with Hadoop Job History Server

Rohith Sharma K S-3
MR jobs and JHS should have same configurations for done-dir if configured. Otherwise staging-dir should be same for both. Make sure both Job and JHS has same configurations value.

Usually what would happen is , MRApp writes job file in one location and HistoryServer trying to read from different location. This causes, JHS to display empty jobs.

Thanks & Regards
Rohith Sharma K S

On Aug 18, 2016, at 12:35 PM, Gao, Yunlong <[hidden email]> wrote:

To whom it may concern,

I am using Hadoop 2.7.1.2.3.6.0-3796, with the Hortonworks distribution of HDP-2.3.6.0-3796. I have a question with the Hadoop Job History sever. 

After I set up everything, the resource manager/name nodes/data nodes seem to be running fine. But the job history server is not working correctly.  The issue with it is that the UI of the job history server does not show any jobs.  And all the rest calls to the job history server do not work either. Also notice that there is no logs in HDFS under the directory of "mapreduce.jobhistory.done-dir"

I have tried with different things, including restarting the job history server and monitor the log -- no error/exceptions is observed. I also rename the /hadoop/mapreduce/jhs/mr-jhs-state for the state recovery of job history server, and then restart it again, but no particular error happens. I tried with some other random stuff that I borrowed from online blogs/documents but got no luck.


Any help would be very much appreciated.

Thanks,
Yunlong


Reply | Threaded
Open this post in threaded view
|

RE: Issue with Hadoop Job History Server

Benjamin Ross
Rohith,
Thanks - we're still having issues.  Can you help out with this?

How do you specify the done directory for an MR job?  The job history done dir is mapreduce.jobhistory.done-dir.  I specified the job one as mapreduce.jobtracker.jobhistory.location as per the documentation here.
drwx------   - yarn          hadoop          0 2016-07-14 16:39 /ats/done
drwxr-xr-x   - yarn          hadoop          0 2016-07-14 16:39 /ats/done/1468528507723
drwxr-xr-x   - yarn          hadoop          0 2016-07-14 16:39 /ats/done/1468528507723/0000
drwxr-xr-x   - yarn          hadoop          0 2016-07-25 20:10 /ats/done/1468528507723/0000/000
drwxrwxrwx   - mapred        hadoop          0 2016-07-19 14:47 /mr-history/done
drwxrwx---   - mapred        hadoop          0 2016-07-19 14:47 /mr-history/done/2016
drwxrwx---   - mapred        hadoop          0 2016-07-19 14:47 /mr-history/done/2016/07
drwxrwx---   - mapred        hadoop          0 2016-07-27 13:49 /mr-history/done/2016/07/19
drwxrwxrwt   - bross         hdfs            0 2016-08-15 22:39 /tmp/hadoop-yarn/staging/history/done_intermediate
       =========> lots of recent data in /tmp/hadoop-yarn/staging/history/done_intermediate
  <configuration>
    
    <property>
      <name>mapreduce.admin.map.child.java.opts</name>
      <value>-server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.6.0-3796</value>
    </property>
    
    <property>
      <name>mapreduce.admin.reduce.child.java.opts</name>
      <value>-server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.6.0-3796</value>
    </property>
    
    <property>
      <name>mapreduce.admin.user.env</name>
      <value>LD_LIBRARY_PATH=/usr/hdp/2.3.6.0-3796/hadoop/lib/native:/usr/hdp/2.3.6.0-3796/hadoop/lib/native/Linux-amd64-64</value>
    </property>
    
    <property>
      <name>mapreduce.am.max-attempts</name>
      <value>2</value>
    </property>
    
    <property>
      <name>mapreduce.application.classpath</name>
      <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.3.6.0-3796/hadoop/lib/hadoop-lzo-0.6.0.2.3.6.0-3796.jar:/etc/hadoop/conf/secure</value>
    </property>
    
    <property>
      <name>mapreduce.application.framework.path</name>
      <value>/hdp/apps/2.3.6.0-3796/mapreduce/mapreduce.tar.gz#mr-framework</value>
    </property>
    
    <property>
      <name>mapreduce.cluster.administrators</name>
      <value> hadoop</value>
    </property>
    
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
    
    <property>
      <name>mapreduce.job.counters.max</name>
      <value>130</value>
    </property>
    
    <property>
      <name>mapreduce.job.emit-timeline-data</name>
      <value>false</value>
    </property>
    
    <property>
      <name>mapreduce.job.reduce.slowstart.completedmaps</name>
      <value>0.05</value>
    </property>
    
    <property>
      <name>mapreduce.job.user.classpath.first</name>
      <value>true</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>bodcdevhdp6.dev.lattice.local:10020</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.bind-host</name>
      <value>0.0.0.0</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.done-dir</name>
      <value>/mr-history/done</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.intermediate-done-dir</name>
      <value>/mr-history/tmp</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.recovery.enable</name>
      <value>true</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.recovery.store.class</name>
      <value>org.apache.hadoop.mapreduce.v2.hs.HistoryServerLeveldbStateStoreService</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.recovery.store.leveldb.path</name>
      <value>/hadoop/mapreduce/jhs</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.webapp.address</name>
      <value>bodcdevhdp6.dev.lattice.local:19888</value>
    </property>
    
    <property>
      <name>mapreduce.jobtracker.jobhistory.completed.location</name>
      <value>/mr-history/done</value>
    </property>
    
    <property>
      <name>mapreduce.map.java.opts</name>
      <value>-Xmx4915m</value>
    </property>
    
    <property>
      <name>mapreduce.map.log.level</name>
      <value>INFO</value>
    </property>
    
    <property>
      <name>mapreduce.map.memory.mb</name>
      <value>6144</value>
    </property>
    
    <property>
      <name>mapreduce.map.output.compress</name>
      <value>false</value>
    </property>
    
    <property>
      <name>mapreduce.map.sort.spill.percent</name>
      <value>0.7</value>
    </property>
    
    <property>
      <name>mapreduce.map.speculative</name>
      <value>false</value>
    </property>
    
    <property>
      <name>mapreduce.output.fileoutputformat.compress</name>
      <value>false</value>
    </property>
    
    <property>
      <name>mapreduce.output.fileoutputformat.compress.type</name>
      <value>BLOCK</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.input.buffer.percent</name>
      <value>0.0</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.java.opts</name>
      <value>-Xmx9830m</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.log.level</name>
      <value>INFO</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.memory.mb</name>
      <value>12288</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.fetch.retry.enabled</name>
      <value>1</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.fetch.retry.interval-ms</name>
      <value>1000</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.fetch.retry.timeout-ms</name>
      <value>30000</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.input.buffer.percent</name>
      <value>0.7</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.merge.percent</name>
      <value>0.66</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.parallelcopies</name>
      <value>30</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.speculative</name>
      <value>false</value>
    </property>
    
    <property>
      <name>mapreduce.shuffle.port</name>
      <value>13562</value>
    </property>
    
    <property>
      <name>mapreduce.task.io.sort.factor</name>
      <value>100</value>
    </property>
    
    <property>
      <name>mapreduce.task.io.sort.mb</name>
      <value>2047</value>
    </property>
    
    <property>
      <name>mapreduce.task.timeout</name>
      <value>300000</value>
    </property>
    
    <property>
      <name>yarn.app.mapreduce.am.admin-command-opts</name>
      <value>-Dhdp.version=2.3.6.0-3796</value>
    </property>
    
    <property>
      <name>yarn.app.mapreduce.am.command-opts</name>
      <value>-Xmx4915m -Dhdp.version=${hdp.version}</value>
    </property>
    
    <property>
      <name>yarn.app.mapreduce.am.log.level</name>
      <value>INFO</value>
    </property>
    
    <property>
      <name>yarn.app.mapreduce.am.resource.mb</name>
      <value>6144</value>
    </property>
    
    <property>
      <name>yarn.app.mapreduce.am.staging-dir</name>
      <value>/user</value>
    </property>
    
  </configuration>

Thanks,
Ben


From: Rohith Sharma K S [[hidden email]]
Sent: Thursday, August 18, 2016 3:17 AM
To: Gao, Yunlong
Cc: [hidden email]; Benjamin Ross
Subject: Re: Issue with Hadoop Job History Server

MR jobs and JHS should have same configurations for done-dir if configured. Otherwise staging-dir should be same for both. Make sure both Job and JHS has same configurations value.

Usually what would happen is , MRApp writes job file in one location and HistoryServer trying to read from different location. This causes, JHS to display empty jobs.

Thanks & Regards
Rohith Sharma K S

On Aug 18, 2016, at 12:35 PM, Gao, Yunlong <[hidden email]> wrote:

To whom it may concern,

I am using Hadoop 2.7.1.2.3.6.0-3796, with the Hortonworks distribution of HDP-2.3.6.0-3796. I have a question with the Hadoop Job History sever. 

After I set up everything, the resource manager/name nodes/data nodes seem to be running fine. But the job history server is not working correctly.  The issue with it is that the UI of the job history server does not show any jobs.  And all the rest calls to the job history server do not work either. Also notice that there is no logs in HDFS under the directory of "mapreduce.jobhistory.done-dir"

I have tried with different things, including restarting the job history server and monitor the log -- no error/exceptions is observed. I also rename the /hadoop/mapreduce/jhs/mr-jhs-state for the state recovery of job history server, and then restart it again, but no particular error happens. I tried with some other random stuff that I borrowed from online blogs/documents but got no luck.


Any help would be very much appreciated.

Thanks,
Yunlong




Click here to report this email as spam.



This message has been scanned for malware by Websense. www.websense.com

Reply | Threaded
Open this post in threaded view
|

RE: Issue with Hadoop Job History Server

Benjamin Ross
Turns out we made a stupid mistake - our system was managing to mix configuration between an old cluster and a new cluster.  So, things are working now.

Thanks,
Ben

From: Benjamin Ross
Sent: Thursday, August 18, 2016 10:05 AM
To: Rohith Sharma K S; Gao, Yunlong
Cc: [hidden email]
Subject: RE: Issue with Hadoop Job History Server

Rohith,
Thanks - we're still having issues.  Can you help out with this?

How do you specify the done directory for an MR job?  The job history done dir is mapreduce.jobhistory.done-dir.  I specified the job one as mapreduce.jobtracker.jobhistory.location as per the documentation here.
drwx------   - yarn          hadoop          0 2016-07-14 16:39 /ats/done
drwxr-xr-x   - yarn          hadoop          0 2016-07-14 16:39 /ats/done/1468528507723
drwxr-xr-x   - yarn          hadoop          0 2016-07-14 16:39 /ats/done/1468528507723/0000
drwxr-xr-x   - yarn          hadoop          0 2016-07-25 20:10 /ats/done/1468528507723/0000/000
drwxrwxrwx   - mapred        hadoop          0 2016-07-19 14:47 /mr-history/done
drwxrwx---   - mapred        hadoop          0 2016-07-19 14:47 /mr-history/done/2016
drwxrwx---   - mapred        hadoop          0 2016-07-19 14:47 /mr-history/done/2016/07
drwxrwx---   - mapred        hadoop          0 2016-07-27 13:49 /mr-history/done/2016/07/19
drwxrwxrwt   - bross         hdfs            0 2016-08-15 22:39 /tmp/hadoop-yarn/staging/history/done_intermediate
       =========> lots of recent data in /tmp/hadoop-yarn/staging/history/done_intermediate
  <configuration>
    
    <property>
      <name>mapreduce.admin.map.child.java.opts</name>
      <value>-server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.6.0-3796</value>
    </property>
    
    <property>
      <name>mapreduce.admin.reduce.child.java.opts</name>
      <value>-server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.6.0-3796</value>
    </property>
    
    <property>
      <name>mapreduce.admin.user.env</name>
      <value>LD_LIBRARY_PATH=/usr/hdp/2.3.6.0-3796/hadoop/lib/native:/usr/hdp/2.3.6.0-3796/hadoop/lib/native/Linux-amd64-64</value>
    </property>
    
    <property>
      <name>mapreduce.am.max-attempts</name>
      <value>2</value>
    </property>
    
    <property>
      <name>mapreduce.application.classpath</name>
      <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.3.6.0-3796/hadoop/lib/hadoop-lzo-0.6.0.2.3.6.0-3796.jar:/etc/hadoop/conf/secure</value>
    </property>
    
    <property>
      <name>mapreduce.application.framework.path</name>
      <value>/hdp/apps/2.3.6.0-3796/mapreduce/mapreduce.tar.gz#mr-framework</value>
    </property>
    
    <property>
      <name>mapreduce.cluster.administrators</name>
      <value> hadoop</value>
    </property>
    
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
    
    <property>
      <name>mapreduce.job.counters.max</name>
      <value>130</value>
    </property>
    
    <property>
      <name>mapreduce.job.emit-timeline-data</name>
      <value>false</value>
    </property>
    
    <property>
      <name>mapreduce.job.reduce.slowstart.completedmaps</name>
      <value>0.05</value>
    </property>
    
    <property>
      <name>mapreduce.job.user.classpath.first</name>
      <value>true</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>bodcdevhdp6.dev.lattice.local:10020</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.bind-host</name>
      <value>0.0.0.0</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.done-dir</name>
      <value>/mr-history/done</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.intermediate-done-dir</name>
      <value>/mr-history/tmp</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.recovery.enable</name>
      <value>true</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.recovery.store.class</name>
      <value>org.apache.hadoop.mapreduce.v2.hs.HistoryServerLeveldbStateStoreService</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.recovery.store.leveldb.path</name>
      <value>/hadoop/mapreduce/jhs</value>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.webapp.address</name>
      <value>bodcdevhdp6.dev.lattice.local:19888</value>
    </property>
    
    <property>
      <name>mapreduce.jobtracker.jobhistory.completed.location</name>
      <value>/mr-history/done</value>
    </property>
    
    <property>
      <name>mapreduce.map.java.opts</name>
      <value>-Xmx4915m</value>
    </property>
    
    <property>
      <name>mapreduce.map.log.level</name>
      <value>INFO</value>
    </property>
    
    <property>
      <name>mapreduce.map.memory.mb</name>
      <value>6144</value>
    </property>
    
    <property>
      <name>mapreduce.map.output.compress</name>
      <value>false</value>
    </property>
    
    <property>
      <name>mapreduce.map.sort.spill.percent</name>
      <value>0.7</value>
    </property>
    
    <property>
      <name>mapreduce.map.speculative</name>
      <value>false</value>
    </property>
    
    <property>
      <name>mapreduce.output.fileoutputformat.compress</name>
      <value>false</value>
    </property>
    
    <property>
      <name>mapreduce.output.fileoutputformat.compress.type</name>
      <value>BLOCK</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.input.buffer.percent</name>
      <value>0.0</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.java.opts</name>
      <value>-Xmx9830m</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.log.level</name>
      <value>INFO</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.memory.mb</name>
      <value>12288</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.fetch.retry.enabled</name>
      <value>1</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.fetch.retry.interval-ms</name>
      <value>1000</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.fetch.retry.timeout-ms</name>
      <value>30000</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.input.buffer.percent</name>
      <value>0.7</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.merge.percent</name>
      <value>0.66</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.shuffle.parallelcopies</name>
      <value>30</value>
    </property>
    
    <property>
      <name>mapreduce.reduce.speculative</name>
      <value>false</value>
    </property>
    
    <property>
      <name>mapreduce.shuffle.port</name>
      <value>13562</value>
    </property>
    
    <property>
      <name>mapreduce.task.io.sort.factor</name>
      <value>100</value>
    </property>
    
    <property>
      <name>mapreduce.task.io.sort.mb</name>
      <value>2047</value>
    </property>
    
    <property>
      <name>mapreduce.task.timeout</name>
      <value>300000</value>
    </property>
    
    <property>
      <name>yarn.app.mapreduce.am.admin-command-opts</name>
      <value>-Dhdp.version=2.3.6.0-3796</value>
    </property>
    
    <property>
      <name>yarn.app.mapreduce.am.command-opts</name>
      <value>-Xmx4915m -Dhdp.version=${hdp.version}</value>
    </property>
    
    <property>
      <name>yarn.app.mapreduce.am.log.level</name>
      <value>INFO</value>
    </property>
    
    <property>
      <name>yarn.app.mapreduce.am.resource.mb</name>
      <value>6144</value>
    </property>
    
    <property>
      <name>yarn.app.mapreduce.am.staging-dir</name>
      <value>/user</value>
    </property>
    
  </configuration>

Thanks,
Ben


From: Rohith Sharma K S [[hidden email]]
Sent: Thursday, August 18, 2016 3:17 AM
To: Gao, Yunlong
Cc: [hidden email]; Benjamin Ross
Subject: Re: Issue with Hadoop Job History Server

MR jobs and JHS should have same configurations for done-dir if configured. Otherwise staging-dir should be same for both. Make sure both Job and JHS has same configurations value.

Usually what would happen is , MRApp writes job file in one location and HistoryServer trying to read from different location. This causes, JHS to display empty jobs.

Thanks & Regards
Rohith Sharma K S

On Aug 18, 2016, at 12:35 PM, Gao, Yunlong <[hidden email]> wrote:

To whom it may concern,

I am using Hadoop 2.7.1.2.3.6.0-3796, with the Hortonworks distribution of HDP-2.3.6.0-3796. I have a question with the Hadoop Job History sever. 

After I set up everything, the resource manager/name nodes/data nodes seem to be running fine. But the job history server is not working correctly.  The issue with it is that the UI of the job history server does not show any jobs.  And all the rest calls to the job history server do not work either. Also notice that there is no logs in HDFS under the directory of "mapreduce.jobhistory.done-dir"

I have tried with different things, including restarting the job history server and monitor the log -- no error/exceptions is observed. I also rename the /hadoop/mapreduce/jhs/mr-jhs-state for the state recovery of job history server, and then restart it again, but no particular error happens. I tried with some other random stuff that I borrowed from online blogs/documents but got no luck.


Any help would be very much appreciated.

Thanks,
Yunlong




Click here to report this email as spam.



This message has been scanned for malware by Websense. www.websense.com