Hadoop JobHistory
轉自:http://www.cnblogs.com/luogankun/p/4019303.html
hadoop jobhistory記錄下已運行完的MapReduce作業信息并存放在指定的HDFS目錄下,默認情況下是沒有啟動的,需要配置完后手工啟動服務。
mapred-site.xml添加如下配置
<property><name>mapreduce.jobhistory.address</name><value>hadoop000:10020</value><description>MapReduce JobHistory Server IPC host:port</description> </property><property><name>mapreduce.jobhistory.webapp.address</name><value>hadoop000:19888</value><description>MapReduce JobHistory Server Web UI host:port</description> </property><property><name>mapreduce.jobhistory.done-dir</name><value>/history/done</value> </property><property><name>mapreduce.jobhistory.intermediate-done-dir</name><value>/history/done_intermediate</value></property>啟動history-server:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver停止history-server:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserverhistory-server啟動之后,可以通過瀏覽器訪問WEBUI: hadoop000:19888
在hdfs上會生成兩個目錄
hadoop fs -ls /historydrwxrwx--- - spark supergroup 0 2014-10-11 15:11 /history/done drwxrwxrwt - spark supergroup 0 2014-10-11 15:16 /history/done_intermediate mapreduce.jobhistory.done-dir(/history/done): Directory where history files are managed by the MR JobHistory Server(已完成作業信息)
mapreduce.jobhistory.intermediate-done-dir(/history/done_intermediate): Directory where history files are written by MapReduce jobs.(正在運行作業信息)
測試:
通過hive查詢city表觀察hdfs文件目錄和hadoop000:19888
hive> select id, name from city;觀察hdfs文件目錄:
1)歷史作業記錄是按照年/月/日的形式分別存放在相應的目錄(/history/done/2014/10/11/000000);
2)每個作業有2個不同的后綴名的記錄:jhist和xml
hadoop fs -ls /history/done/2014/10/11/000000 -rwxrwx--- 1 spark supergroup 22572 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002-1413012208648-spark-select+id%2C+name+from+city%28Stage%2D1%29-1413012224777-1-0-SUCCEEDED-root.spark-1413012216261.jhist -rwxrwx--- 1 spark supergroup 160149 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002_conf.xml觀察WEBUI: hadoop000:19888
在WEBUI中展現了每個job使用的Map/Reduce的數量、作業提交時間、作業啟動時間、作業完成時間、Job ID、提交人User、隊列等信息;
點擊【job_1413011730351_0002】彈出頁面顯示類似信息:Aggregation is not enabled. Try the nodemanager at ......
解決方法:?yarn-site.xml添加如下配置
<property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property>重啟yarn即可。
總結
以上是生活随笔為你收集整理的Hadoop JobHistory的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: hadoop2.x常用端口及定义方法
- 下一篇: Ant打可执行jar包指南