日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spark-1.4.0集群搭建

發(fā)布時間:2025/3/17 编程问答 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Spark-1.4.0集群搭建 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

主要內(nèi)容

  • Ubuntu 10.04 系統(tǒng)設(shè)置
  • ZooKeeper集群搭建
  • Hadoop-2.4.1集群搭建
  • Spark 1.4.0集群搭建

假設(shè)已經(jīng)安裝好Ubuntu操作系統(tǒng)

Ubuntu 10.04設(shè)置

1.主機(jī)規(guī)劃

主機(jī)名IP地址進(jìn)程號
SparkMaster192.168.1.103ResourceManager DataNode、NodeManager、JournalNode、QuorumPeerMain
SparkSlave01192.168.1.101ResourceManager DataNode、NodeManager、JournalNode、QuorumPeerMain NameNode、DFSZKFailoverController(zkfc)
SparkSlave02192.168.1.102DataNode、NodeManager、JournalNode、QuorumPeerMain NameNode、DFSZKFailoverController(zkfc)

**說明:
1.在hadoop2.0中通常由兩個NameNode組成,一個處于active狀態(tài),另一個處于standby狀態(tài)。Active NameNode對外提供服務(wù),而Standby NameNode則不對外提供服務(wù),僅同步active namenode的狀態(tài),以便能夠在它失敗時快速進(jìn)行切換。
hadoop2.0官方提供了兩種HDFS HA的解決方案,一種是NFS,另一種是QJM。這里我們使用簡單的QJM。在該方案中,主備NameNode之間通過一組JournalNode同步元數(shù)據(jù)信息,一條數(shù)據(jù)只要成功寫入多數(shù)JournalNode即認(rèn)為寫入成功。通常配置奇數(shù)個JournalNode
這里還配置了一個zookeeper集群,用于ZKFC(DFSZKFailoverController)故障轉(zhuǎn)移,當(dāng)Active NameNode掛掉了,會自動切換Standby NameNode為standby狀態(tài)
2.hadoop-2.2.0中依然存在一個問題,就是ResourceManager只有一個,存在單點(diǎn)故障,hadoop-2.4.1解決了這個問題,有兩個ResourceManager,一個是Active,一個是Standby,狀態(tài)由zookeeper進(jìn)行協(xié)調(diào)**

2. 修改主機(jī)名稱設(shè)置
利用vi /etc/hostname修改主機(jī)名稱

3. 修改主機(jī)IP地址
利用vi /etc/network/interfaces修改主要IP

主機(jī)/etc/network/interfaces文件內(nèi)容
SparkMasterauto loiface
lo inet loopback
auto eth0
iface eth0 inet static
address 192.168.1.103
netmask 255.255.255.0
gateway 192.168.1.1
SparkSlave01auto loiface
lo inet loopback
auto eth0
iface eth0 inet static
address 192.168.1.101
netmask 255.255.255.0
gateway 192.168.1.1
SparkSlave02auto loiface
lo inet loopback
auto eth0
iface eth0 inet static
address 192.168.1.102
netmask 255.255.255.0
gateway 192.168.1.1

4. 修改域名解析服務(wù)器
由于需要聯(lián)網(wǎng)安裝OpenSSH等實(shí)現(xiàn)名密碼登錄,因此這邊需要配置對應(yīng)的域名解析服務(wù)器

主機(jī)/etc/resolv.conf文件內(nèi)容
SparkMasterdomain localdomain
search localdomain
nameserver 8.8.8.8
SparkSlave01domain localdomain
search localdomain
nameserver 8.8.8.8
SparkSlave02domain localdomain
search localdomain
nameserver 8.8.8.8

5.修改主機(jī)名與IP地址映射

主機(jī)/etc/resolv.conf文件內(nèi)容
SparkMaster127.0.0.1 SparkMaster localhost.localdomain localhost
192.168.1.101 SparkSlave01
192.168.1.102 SparkSlave02
192.168.1.103 SparkMaster
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
SparkSlave01127.0.0.1 SparkSlave01 localhost.localdomain localhost
192.168.1.101 SparkSlave01
192.168.1.102 SparkSlave02
192.168.1.103 SparkMaster
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
SparkSlave02127.0.0.1 SparkSlave02 localhost.localdomain localhost
192.168.1.101 SparkSlave01
192.168.1.102 SparkSlave02
192.168.1.103 SparkMaster
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

完成上述步驟后重新啟動機(jī)器
6.安裝SSH (三臺主機(jī)執(zhí)行相同命令)

sudo apt-get install openssh-server
然后確認(rèn)sshserver是否啟動了:
ps -e |grep ssh

7.設(shè)置無密碼登錄 (三臺主機(jī)執(zhí)行相同命令)
執(zhí)行命令:ssh-keygen -t rsa
執(zhí)行完這個命令后,會生成兩個文件id_rsa(私鑰)、id_rsa.pub(公鑰)
將公鑰拷貝到要免登陸的機(jī)器上
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh-copy-id -i SparkMaster
ssh-copy-id -i SparkSlave02
ssh-copy-id -i SparkSlave01

ZooKeeper集群搭建

本集群用的ZooKeeper版本是3.4.5,將/hadoopLearning/zookeeper-3.4.5/conf目錄下的zoo_sample.cfg文件名重命名為zoo.cfg
vi conf/zoo.cfg,在文件中填入以下內(nèi)容:

# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. # ZK文件存放目錄 dataDir=/hadoopLearning/zookeeper-3.4.5/zookeeper_data # the port at which the clients will connect clientPort=2181 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # #http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 server.1=SparkSlave01:2888:3888 server.2=SparkSlave02:2888:3888 server.3=SparkMaster:2888:3888

在/hadoopLearning/zookeeper-3.4.5/目錄下創(chuàng)建zookeeper_data
然后cd zookeeper_data進(jìn)入該目錄,執(zhí)行命令
touch myid
echo 3 > myid
利用scp -r zookeeper-3.4.5 root@SparkSlave01:/hadoopLearning/
scp -r zookeeper-3.4.5 root@SparkSlave02:/hadoopLearning/
將文件拷貝到其它服務(wù)器上,然后分別進(jìn)入zookeeper_data目錄執(zhí)行SparkSlave01服務(wù)器上echo 1> myid
SparkSlave02服務(wù)器上echo 2> myid

root@SparkMaster:/hadoopLearning/zookeeper-3.4.5/bin ./zkServer.sh start 在其它兩臺機(jī)器上執(zhí)行相同操作root@SparkMaster:/hadoopLearning/zookeeper-3.4.5/bin zkServer.sh status JMX enabled by default Using config: /hadoopLearning/zookeeper-3.4.5/bin/../conf/zoo.cfg Mode: leader

至此ZooKeeper集群搭建完畢

- Hadoop-2.4.1集群搭建

將Hadoop安裝路徑HAD00P_HOME=/hadoopLearning/hadoop-2.4.1加入到環(huán)境變量

export JAVA_HOME=/hadoopLearning/jdk1.7.0_67 export JRE_HOME=${JAVA_HOME}/jre export HAD00P_HOME=/hadoopLearning/hadoop-2.4.1 export SCALA_HOME=/hadoopLearning/scala-2.10.4 export ZOOKEEPER_HOME=/hadoopLearning/zookeeper-3.4.5 export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:${HAD00P_HOME}/bin:${HAD00P_HOME}/sbin:${ZOOKEEPER_HOME}/bin:${SCALA_HOME}/bin:/hadoopLearning/idea-IC-141.1532.4/bin:$PATH 修改hadoo-env.shexport JAVA_HOME=/usr/java/jdk1.7.0_55 修改core-site.xml<configuration> <!-- 指定hdfs的nameservice為ns1 --><property><name>fs.defaultFS</name><value>hdfs://ns1</value></property><!-- 指定hadoop臨時目錄 --><property><name>hadoop.tmp.dir</name><value>/hadoopLearning/hadoop-2.4.1/tmp</value></property><!-- 指定zookeeper地址 --><property><name>ha.zookeeper.quorum</name><value>SparkMaster:2181,SparkSlave01:2181,SparkSlave02:2181</value></property> </configuration>

修改hdfs-site.xml

<configuration><!--指定hdfs的nameservice為ns1,需要和core-site.xml中的保持一致 --><property><name>dfs.nameservices</name><value>ns1</value></property><!-- ns1下面有兩個NameNode,分別是nn1,nn2 --><property><name>dfs.ha.namenodes.ns1</name><value>nn1,nn2</value></property><!-- nn1的RPC通信地址 --><property><name>dfs.namenode.rpc-address.ns1.nn1</name><value>SparkSlave01:9000</value></property><!-- nn1的http通信地址 --><property><name>dfs.namenode.http-address.ns1.nn1</name><value>SparkSlave01:50070</value></property><!-- nn2的RPC通信地址 --><property><name>dfs.namenode.rpc-address.ns1.nn2</name><value>SparkSlave02:9000</value></property><!-- nn2的http通信地址 --><property><name>dfs.namenode.http-address.ns1.nn2</name><value>SparkSlave02:50070</value></property><!-- 指定NameNode的元數(shù)據(jù)在JournalNode上的存放位置 --><property><name>dfs.namenode.shared.edits.dir</name><value>qjournal://SparkMaster:8485;SparkSlave01:8485;SparkSlave02:8485/ns1</value></property><!-- 指定JournalNode在本地磁盤存放數(shù)據(jù)的位置 --><property><name>dfs.journalnode.edits.dir</name><value>/hadoopLearning/hadoop-2.4.1/journal</value></property><!-- 開啟NameNode失敗自動切換 --><property><name>dfs.ha.automatic-failover.enabled</name><value>true</value></property><!-- 配置失敗自動切換實(shí)現(xiàn)方式 --><property><name>dfs.client.failover.proxy.provider.ns1</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property><!-- 配置隔離機(jī)制方法,多個機(jī)制用換行分割,即每個機(jī)制暫用一行--><property><name>dfs.ha.fencing.methods</name><value>sshfenceshell(/bin/true)</value></property><!-- 使用sshfence隔離機(jī)制時需要ssh免登陸 --><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/home/hadoop/.ssh/id_rsa</value></property><!-- 配置sshfence隔離機(jī)制超時時間 --><property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value></property></configuration> 修改mapred-site.xml<configuration><!-- 指定mr框架為yarn方式 --><property><name>mapreduce.framework.name</name><value>yarn</value></property></configuration> 修改yarn-site.xml <configuration><!-- 開啟RM高可靠 --><property><name>yarn.resourcemanager.ha.enabled</name><value>true</value></property><!-- 指定RM的cluster id --><property><name>yarn.resourcemanager.cluster-id</name><value>SparkCluster</value></property><!-- 指定RM的名字 --><property><name>yarn.resourcemanager.ha.rm-ids</name><value>rm1,rm2</value></property><!-- 分別指定RM的地址 --><property><name>yarn.resourcemanager.hostname.rm1</name><value>SparkMaster</value></property><property><name>yarn.resourcemanager.hostname.rm2</name><value>SparkSlave01</value></property><!-- 指定zk集群地址 --><property><name>yarn.resourcemanager.zk-address</name><value>SparkMaster:2181,SparkSlave01:2181,SparkSlave02:2181</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property></configuration> 配置Slaves SparkMaster SparkSlave01 SparkSlave02

將配置好的hadoop-2.4.1拷到其它服務(wù)器上

scp -r /etc/profile root@SparkSlave01:/etc/profile scp -r /hadoopLearning/hadoop-2.4.1/ root@SparkSlave01:/hadoopLearning/scp -r /etc/profile root@SparkSlave02:/etc/profile scp -r /hadoopLearning/hadoop-2.4.1/ root@SparkSlave02:/hadoopLearning/ 啟動journalnodehadoop-daemons.sh start journalnode#運(yùn)行jps命令檢驗(yàn),SparkMaster、SparkSlave01、SparkSlave02上多了JournalNode進(jìn)程 格式化HDFS#在SparkSlave01上執(zhí)行命令:hdfs namenode -format#格式化后會在根據(jù)core-site.xml中的hadoop.tmp.dir配置生成個文件,這里我配置的是/hadoopLearning/hadoop-2.4.1/tmp,然后將/hadoopLearning/hadoop-2.4.1/tmp拷貝到SparkSlave02的/hadoopLearning/hadoop-2.4.1/下。 scp -r tmp/ sparkslave02:/hadoopLearning/hadoop-2.4.1/ 格式化ZK(在SparkSlave01上執(zhí)行即可)hdfs zkfc -formatZK 啟動HDFS(在SparkSlave01上執(zhí)行)sbin/start-dfs.sh 啟動YARN(#####注意#####:是在SparkMaster上執(zhí)行start-yarn.sh,把namenode和resourcemanager分開是因?yàn)樾阅軉栴},因?yàn)樗麄兌家加么罅抠Y源,所以把他們分開了,他們分開了就要分別在不同的機(jī)器上啟動)sbin/start-yarn.sh

打開瀏覽器輸入:
http://sparkmaster:8088可以看到以下頁面:

輸入http://sparkslave01:50070可以看到以下頁面:

輸入http://sparkslave02:50070可以看到以下頁面

輸入以下命令上傳文件到hadoop
hadoop fs -put /etc/profile /

在active namenode上查看上傳成功的文件

至此hadoop集群搭建成功

  • Spark-1.4.0集群搭建
    以Spark Standalone為例

1 在SparkMaster上安裝Scala 2.10.4和spark-1.4.0-bin-hadoop2.4,解壓對應(yīng)安裝包到/hadoopLearning目錄,修改/etc/profile文件,內(nèi)容如下:

export JAVA_HOME=/hadoopLearning/jdk1.7.0_67 export JRE_HOME=${JAVA_HOME}/jre export HAD00P_HOME=/hadoopLearning/hadoop-2.4.1 export SCALA_HOME=/hadoopLearning/scala-2.10.4 export SPARK_HOME=/hadoopLearning/spark-1.4.0-bin-hadoop2.4 export ZOOKEEPER_HOME=/hadoopLearning/zookeeper-3.4.5 export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:${HAD00P_HOME}/bin:${HAD00P_HOME}/sbin:${ZOOKEEPER_HOME}/bin:${SCALA_HOME}/bin:/hadoopLearning/idea-IC-141.1532.4/bin:${SPARK_HOME}/bin:${SPARK_HOME}/sbin:$PATH

2 進(jìn)入/hadoopLearning/spark-1.4.0-bin-hadoop2.4/conf目錄
cp spark-defaults.conf.template spark-defaults.conf
cp spark-env.sh.template spark-env.sh
在spark-defaults.conf中添加如下內(nèi)容:

export JAVA_HOME=/hadoopLearning/jdk1.7.0_67 export HADOOP_CONF_DIR=/hadoopLearning/hadoop-2.4.1/etc/hadoop

在spark-defaults.conf中添加如下內(nèi)容:

spark.master=spark://sparkmaster:7077 spark.eventLog.enabled=true //hdfs://ns1是前面core-site.xml中定義的hdfs名稱 spark.eventLog.dir=hdfs://ns1/user/spark/applicationHistory

3 將sparkmaster中的安裝配置拷由到sparkslave01,sparkslave02上

scp -r /hadoopLearning/scala-2.10.4 sparkslave01:/hadoopLearning/scp -r /hadoopLearning/scala-2.10.4 sparkslave02:/hadoopLearning/scp -r /hadoopLearning/spark-1.4.0-bin-hadoop2.4 sparkslave01:/hadoopLearning/scp -r /hadoopLearning/spark-1.4.0-bin-hadoop2.4 sparkslave02:/hadoopLearning/scp -r /etc/profile sparkslave01:/etc/profile scp -r /etc/profile sparkslave02:/etc/profile

4 將sparkmaster中的 /hadoopLearning/spark-1.4.0-bin-hadoop2.4/sbin中執(zhí)行以下命令:

./start-all.sh

利用jps在各主要上查看,可以看到sparkmaster上多了進(jìn)程master,而sparkslave01,sparkslave02多了進(jìn)程worker

在瀏覽器中輸入http://sparkmaster:8080,可以看到如下界面:

該圖中顯示了集群的運(yùn)行相關(guān)信息,說明集群初步搭建成功

5 spark-1.4.0 集群程序運(yùn)行測試
上傳 README.md文件到hdfs /user/root目錄root@sparkmaster:/hadoopLearning/spark-1.4.0-bin-hadoop2.4# hadoop fs -put README.md /user/root

在sparkmaster節(jié)點(diǎn),進(jìn)入 /hadoopLearning/spark-1.4.0-bin-hadoop2.4/bin目錄,執(zhí)行spark-shell,刷新http://sparkmaster:8080后可以看到以下內(nèi)容:

輸入下列語句:

val textCount = sc.textFile(“README.md”).filter(line => line.contains(“Spark”)).count()

程序結(jié)果如下:

至此,Spark集群搭建成功

總結(jié)

以上是生活随笔為你收集整理的Spark-1.4.0集群搭建的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。