启动namenode报错:Journal Storage Directory /var/bigdata/hadoop/full/dfs/jn/dmgeo not formatted
生活随笔
收集整理的這篇文章主要介紹了
启动namenode报错:Journal Storage Directory /var/bigdata/hadoop/full/dfs/jn/dmgeo not formatted
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
啟動namenode報錯:Journal Storage Directory /var/bigdata/hadoop/full/dfs/jn/dmgeo not formatted
在測試flink的HA時,把某個節點(部署了jobmanager和namenode)的節點reboot了,然后啟動時發現namenode沒有起來,報錯大概如下:
org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal Storage Directory /tmp/hadoop/dfs/journalnode/xxxx not formattedat org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:457)原因:大概為journalnode保存的元數據和namenode的不一致,導致,3臺機器中有2臺報了這個錯誤。
解決:在nn1上啟動journalnode,再執行hdfs namenode -initializeSharedEdits,使得journalnode與namenode保持一致。再重新啟動namenode就沒有問題了。
但又遇到flink的jobmanager啟動不了,報錯如下:
ERROR org.apache.flink.runtime.entrypint.XlusterEntrypoint -Fatal error occurred in the cluster entrypoint.org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take leadership with session id xxxxxxxxxxxxxxxxxxxxxxxxxx.... caused by: java.lang.RuntimeException: org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph from state handle under /xxxxx. This indicates that the retrieved state handle is broken. Try cleaning the state handle store. .. caused by: java.io.FileNotFoundException: File does not exitst: /xxxx/submittedJobGraphe439cfc979db節點reboot時,是有任務在執行的,而剛才journalnode的initializeSharedEdits導致某些文件丟失了,而jobmanager在讀取這個提交的job時發生了報錯,故在zookeeper刪除flink任務的引用即可
./zkCli.sh -server zookeeper的hostset /flink/default/running_job_registry/xxxxx DONE delete /flink/default/jobgraphs/xxxx解決后,重新啟動jobmanager、taskmanager沒有問題了,再提交任務就可以了。
總結
以上是生活随笔為你收集整理的启动namenode报错:Journal Storage Directory /var/bigdata/hadoop/full/dfs/jn/dmgeo not formatted的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 关于linux分区与挂载的解释
- 下一篇: hbase中对deadserver处理存