(超详细版)Linux下Hadoop2.7.1集群环境的搭建(3台为例)
2019獨角獸企業重金招聘Python工程師標準>>>
一、基礎環境
在Linux上安裝Hadoop之前,需要先安裝兩個程序:
1.1 安裝說明
1. JDK 1.6或更高版本(本文所提到的安裝的是jdk1.7); redHat自帶的jdk一般不用,刪除后重新裝自己需要的
2. SSH(安全外殼協議),推薦安裝MobaXterm_Personal。(功能的,好用)
二、Host配置
由于我搭建Hadoop集群包含三臺機器,所以需要修改調整各臺機器的hosts文件配置,進入/etc/hosts,配置主機名和ip的映射,命令如下:
vim /etc/hosts
如果沒有足夠的權限,可以切換用戶為root。
三臺機器的內容統一增加以下host配置:
可以通過hostname來修改服務器名稱為redHat1,redHat2,redHat3
三、Hadoop的安裝與配置
3.1 創建文件目錄
為了便于管理,給redHat1的hdfs的NameNode、DataNode及臨時文件,在用戶目錄下創建目錄:
/data/hdfs/name
/data/hdfs/data
/data/hdfs/tmp
然后將這些目錄通過scp命令拷貝到redHat2和redHat3的相同目錄下。
3.2 下載
首先到Apache官網下載Hadoop,從中選擇推薦的下載鏡像,我選擇hadoop-2.7.1的版本,并使用以下命令下載到redHat1機器的
/data目錄:
wget http://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz
然后使用以下命令將hadoop-2.7.1.tar.gz?解壓縮到/data目錄
tar -zxvf hadoop-2.7.1.tar.gz
3.3 配置環境變量
回到/data目錄,配置hadoop環境變量,命令如下:
vim /etc/profile
在/etc/profile添加如下內容
立刻讓hadoop環境變量生效,執行如下命令:
source /etc/profile
再使用hadoop命令,發現可以有提示了,則表示配置生效了。
hadoop
3.4 Hadoop的配置
進入hadoop-2.7.1的配置目錄:
cd /data/hadoop-2.7.1/etc/hadoop
依次修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml以及slaves文件。
3.4.1 修改core-site.xml
vim core-site.xml
?
<configuration><property><name>hadoop.tmp.dir</name><value>file:/data/hdfs/tmp</value><description>A base for other temporary directories.</description></property><property><name>io.file.buffer.size</name><value>131072</value></property><property><name>fs.default.name</name><value>hdfs://redHat1:9000</value></property><property><name>hadoop.proxyuser.root.hosts</name><value>*</value></property><property><name>hadoop.proxyuser.root.groups</name><value>*</value></property> </configuration>注意:hadoop.tmp.dir的value填寫對應前面創建的目錄
?
3.4.2 修改vim hdfs-site.xml
vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration><property><name>dfs.replication</name><value>2</value></property><property><name>dfs.namenode.name.dir</name><value>file:/data/hdfs/name</value><final>true</final></property><property><name>dfs.datanode.data.dir</name><value>file:/data/hdfs/data</value><final>true</final></property><property><name>dfs.namenode.secondary.http-address</name><value>redHat1:9001</value></property><property><name>dfs.webhdfs.enabled</name><value>true</value></property><property><name>dfs.permissions</name><value>false</value></property> </configuration>注意:dfs.namenode.name.dir和dfs.datanode.data.dir的value填寫對應前面創建的目錄
?
3.4.3 修改vim mapred-site.xml
復制template,生成xml,命令如下:
cp mapred-site.xml.template mapred-site.xml
vim? mapred-site.xml
?
???
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
? Licensed under the Apache License, Version 2.0 (the "License");
? you may not use this file except in compliance with the License.
? You may obtain a copy of the License at
? ? http://www.apache.org/licenses/LICENSE-2.0
? Unless required by applicable law or agreed to in writing, software
? distributed under the License is distributed on an "AS IS" BASIS,
? WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
? See the License for the specific language governing permissions and
? limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
?<property>
? ?<name>mapreduce.framework.name</name>
? ?<value>yarn</value>
?</property>
</configuration>
?
3.4.4 修改vim?yarn-site.xml
vim? yarn-site.xml
?
<?xml version="1.0"?><!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.--> <configuration><!-- Site specific YARN configuration properties --><property><name>yarn.resourcemanager.address</name><value>redHat1:18040</value></property><property><name>yarn.resourcemanager.scheduler.address</name><value>redHat1:18030</value></property><property><name>yarn.resourcemanager.webapp.address</name><value>redHat1:18088</value></property><property><name>yarn.resourcemanager.resource-tracker.address</name><value>redHat1:18025</value></property><property><name>yarn.resourcemanager.admin.address</name><value>redHat1:18141</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce.shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property> </configuration>3.4.5 修改data/hadoop-2.7.1/etc/hadoop/redHat1
將原來的localhost刪除,改成如下內容
vim?/data/hadoop-2.7.1/etc/hadoop/slaves
最后,將整個hadoop-2.7.1文件夾及其子文件夾使用scp復制到redHat2和redHat3的相同目錄中:
scp?-r /data/hadoop-2.7.1?redHat2:/data
scp?-r /data/hadoop-2.7.1?redHat3:/data
四、運行Hadoop
首先要格式化:
hadoop?namenode?-format
sh ./start-all.sh
查看集群狀態:
/data/hadoop-2.7.1/bin/hdfs dfsadmin -report
測試yarn:
http://192.168.92.140:18088/cluster/cluster
測試查看HDFS:
http://192.168.92.140:50070/dfshealth.html#tab-overview
?
重點::配置運行Hadoop中遇見的問題
1 JAVA_HOME未設置?
啟動的時候報:
則需要/data/hadoop-2.7.1/etc/hadoop/hadoop-env.sh,添加JAVA_HOME路徑
要將路徑寫為絕對路徑,不要用出事自動獲取那種。
2.?FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-336454126-127.0.0.1-1419216478581 (storage id DS-445205871-127.0.0.1-50010-1419216613930) service to /192.168.149.128:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-445205871-127.0.0.1-50010-1419216613930, infoPort=50075, ipcPort=50020, storageInfo=lv=-47;cid=CID-41993190-ade1-486c-8fe1-395c1d6f5739;nsid=1679060915;c=0)
?
原因:
由于本地dfs.data.dir目錄下的數據文件和namenode已知的不一致,導致datanode節點不被namenode接受。
解決:
1,刪除dfs.namenode.name.dir和dfs.datanode.data.dir 目錄下的所有文件
2,修改hosts
?cat /etc/hosts
127.0.0.1?? localhost localhost.localdomain localhost4 localhost4.localdomain4
::1???????? localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.149.128 localhost
3,重新格式化:bin/hadoop namenode -format
4,啟動
重新啟動
?
?
?
?
?
轉載于:https://my.oschina.net/yjktpd/blog/1808168
總結
以上是生活随笔為你收集整理的(超详细版)Linux下Hadoop2.7.1集群环境的搭建(3台为例)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 阿里财报:云计算年度营收133亿,季度营
- 下一篇: 问题:linux系统经常出现断网的情况,