hadoop3伪分布式安装
一、安裝hadoop
1、偽分布式模式
有namenode,datanode,resoucrcemanager,nodemanager等進程,這些進程運行在同一臺服務器上
2、ssh免密碼連接
執行命令:ssh-keygen -t rsa?一路回車即可
復制秘鑰到本地: cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
驗證?ssh?本機ip?成功
3、防火墻關閉
systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall開機啟動
4、jdk安裝配置
安裝方式自行百度
驗證:java -version
5、Hadoop配置
解壓
命令:tar cvf hadoop-3.1.2.tar -C /ecapp ; mv hadoop-3.1.2 hadoop
二、配置文件
1、core-site.xml
| <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--指定HADOOP所使用的文件系統schema(URI),HDFS的老大(NameNode)的地址--> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.0.143:9000</value> ?? </property> <!--指定HADOOP運行時產生文件的存儲目錄--> <property> <name>hadoop.tmp.dir</name> <value>/data/hadoop/hdfs/meta</value> ?? </property> ??? </configuration> |
mkdir -p data/hadoop/hdfs/meta
2、hdfs-site.xml
datanode?配置成多個目錄,每個目錄存儲的數據不一樣。類似多個DataNode
| <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--指定HDFS副本的數量--> <property> <name>dfs.replication</name> <value>1</value> </property> ????<property> ???????<name>dfs.name.dir</name> ???????<value>/data/hadoop/hdfs/namenode</value> ????</property> <property> ?????? <name>dfs.datanode.data.dir</name> ?????? <value>/data/hadoop/hdfs/datanode1,/data/hadoop/hdfs/datanode2</value> </property> </configuration> |
3、yarn-site.xml
| <?xml version="1.0"?> <configuration> ????<!--?指定YARN(ResourceManager)的地址--> ????<property> ????????<name>yarn.resourcemanager.hostname</name> ????????<value>192.168.0.143</value> ????</property> ????<!--?指定reducer獲取數據的方式,此處做了mapreduce和spark配置,如果沒有spark可以先關閉spark_shuffle配置?--> ????<property> ????????<name>yarn.nodemanager.aux-services</name> ????????<value>mapreduce_shuffle,spark_shuffle</value> ????</property> ????<!-- spark shuffle的類?--> ????<property> ???????<name>yarn.nodemanager.aux-services.spark_shuffle.class</name> ???????<value>org.apache.spark.network.yarn.YarnShuffleService</value> ????</property> ????<!-- nodemanager的總內存量,我們服務器是8GB,此處配置大概6GB --> ????<property> ????????<description>Amount of physical memory, in MB, that can be allocated for containers.</description> ????????<name>yarn.nodemanager.resource.memory-mb</name> ????????<value>6000</value> ????</property> <!-- yarn最小分配內存?--> ????<property> ????????<description>The minimum allocation for every container request at the RM, ?????????????????????in MBs. Memory requests lower than this won't take effect, ?????????????????????and the specified value will get allocated at minimum.</description> ????????<name>yarn.scheduler.minimum-allocation-mb</name> ????????<value>512</value> ????</property> ????<!-- yarn最大分配內存?--> ????<property> ????????<description>The maximum allocation for every container request at the RM, ?????????????????????in MBs. Memory requests higher than this won't take effect, ?????????????????????and will get capped to this value.</description> ????????<name>yarn.scheduler.maximum-allocation-mb</name> ????????<value>6000</value> ????</property> ? ????<!--?聚合log?方便后面開啟history服務?--> ????<property> ???? <name>yarn.log-aggregation-enable</name> ???? <value>true</value> ????</property> ????<property> ???? <name>yarn.log-aggregation.retain-seconds</name> ???? <value>2592000</value> ????</property> ????<!--?歷史服務器web地址?--> ????<property> ???? <name>yarn.log.server.url</name> ???? <value>http://192.168.0.143:8988/jobhistory/logs</value> ????</property> ????<!-- log數據存放hdfs地址?--> ????<property> ???? <name>yarn.nodemanager.remote-app-log-dir</name> ???? <value>hdfs://192.168.0.143:9000/user/root/yarn-logs/</value> ????</property> ????<!--?此處cpu核數,我們機器是4核,此處不可多配,建議略小于系統CPU核數?--> ??<property> ????<description>Number of vcores that can be allocated ????for containers. This is used by the RM scheduler when allocating ????resources for containers. This is not used to limit the number of ????CPUs used by YARN containers. If it is set to -1 and ????yarn.nodemanager.resource.detect-hardware-capabilities is true, it is ????automatically determined from the hardware in case of Windows and Linux. ????In other cases, number of vcores is 8 by default.</description> ????<name>yarn.nodemanager.resource.cpu-vcores</name> ????<value>4</value> ??</property> ????<!-- cpu最小分配?--> ??<property> ????<description>The minimum allocation for every container request at the RM ????in terms of virtual CPU cores. Requests lower than this will be set to the ????value of this property. Additionally, a node manager that is configured to ????have fewer virtual cores than this value will be shut down by the resource ????manager.</description> ????<name>yarn.scheduler.minimum-allocation-vcores</name> ????<value>1</value> ??</property> ????<!-- cpu最大分配?--> ??<property> ????<description>The maximum allocation for every container request at the RM ????in terms of virtual CPU cores. Requests higher than this will throw an ????InvalidResourceRequestException.</description> ????<name>yarn.scheduler.maximum-allocation-vcores</name> ????<value>3</value> ??</property> ????<!-- classpath配置?--> ??<property> ????<name>yarn.application.classpath</name> ?<value>/ecapp/hadoop/etc/hadoop:/ecapp/hadoop/share/hadoop/common/lib/*:/ecapp/hadoop/share/hadoop/common/*:/ecapp/hadoop/share/hadoop/hdfs:/ecapp/hadoop/share/hadoop/hdfs/lib/*:/ecapp/hadoop/share/hadoop/hdfs/*:/ecapp/hadoop/share/hadoop/mapreduce/lib/*:/ecapp/hadoop/share/hadoop/mapreduce/*:/ecapp/hadoop/share/hadoop/yarn:/ecapp/hadoop/share/hadoop/yarn/lib/*:/ecapp/hadoop/share/hadoop/yarn/*</value> ??</property> </configuration> |
4、mapred-site.xml
| <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--指定mr運行在yarn上--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--?歷史服務器配置,同步mapreduce任務到歷史服務器?--> <property> ????<name>mapreduce.jobhistory.address</name> ????<value>192.168.0.143:10020</value> </property> <property> ????<name>mapreduce.jobhistory.webapp.address</name> ????<value>192.168.0.143:8988</value> </property> <!--?歷史服務器緩存任務數?--> <property> ????<name>mapreduce.jobhistory.joblist.cache.size</name> ????<value>5000</value> </property> </configuration> |
5、workers
localhost ??//?此處是配置的本機
6、hadoop-env.sh?
最后加入你的javahome
JAVA_HOME=/ecapp/jdk
三、linux環境配置
1、/etc/profile文件配置
| export JAVA_HOME=/ecapp/jdk export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar # hadoop?相關配置 export HADOOP_HOME=/ecapp/hadoop #export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib:$HADOOP_PREFIX/lib/native" export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" #export HADOOP_ROOT_LOGGER=DEBUG,console export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$JAVA_HOME/bin #hadoop-3.1.0必須添加如下5個變量否則啟動報錯,hadoop-2.x貌似不需要 export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root |
2、應用一下環境: source /etc/profile
四、啟動服務
1、格式化NameNode
命令: hdfs namenode -format
| 中間沒有報錯并且最后顯示如下信息表示格式化成功 ... /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at ecs-6531-0002 ************************************************************/ 如果格式化NameNode之后運行過hadoop,然后又想再格式化一次NameNode,那么需要先刪除第一次運行Hadoop后產生的VERSION文件,否則會出錯 |
2、啟動
start-all.sh?啟動所有服務,啟動日志在hadoop軟件目錄的logs下
3、jps查看服務進程
6662 Jps
9273 DataNode #hdfs worker節點
5465 SecondaryNameNode #hdfs備份節點
9144 NameNode #hdfs主節點
9900 NodeManager #yarn的worker節點
9575 ResourceManager #yarn的主節點
4、啟動歷史任務服務器
命令:mapred --daemon start historyserver
jps看到
12710 JobHistoryServer
五、web頁面
hdfs地址:http://192.168.0.143:9870/dfshealth.html#tab-overview
?
轉載于:https://www.cnblogs.com/charon2/p/11315433.html
總結
以上是生活随笔為你收集整理的hadoop3伪分布式安装的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: postgresql语句
- 下一篇: Spark 系列(一)—— Spark简