基本环境安装: Centos7+Java+Hadoop+Spark+HBase+ES+Azkaban
1.? 安裝VM14的方法在 人工智能標簽中的《跨平臺踩的大坑有提到》
2. CentOS分區設置:?
/boot:1024M,標準分區格式創建。
swap:4096M,標準分區格式創建。
/:剩余所有空間,采用lvm卷組格式創建
?
其他按需要設置就好, 配置好后使用?vi /etc/sysconfig/network-scripts/ifcfg-eno16777736 設置網絡連接;
HWADDR=00:0C:29:B3:AE:0E TYPE=Ethernet BOOTPROTO=static DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no NAME=eno16777736 UUID=2cb8e76d-0626-4f8e-87e5-7e0743e4555f ONBOOT=yes IPADDR=192.168.10.186 NETMASK=255.255.255.0 GATEWAY=192.168.10.1 DNS1=192.168.10.1
網關與IP與Windows主機匹配, 不能隨便亂配!
ipconfig /all 按照主機網絡配置虛擬機
虛擬機Ping主機, 如果一直卡住, 則修改防火墻入站規則, 啟用文件與打印共享中的PV4 IN 公用;
service network restart命令重啟網卡,生效剛剛修改ip地址,ping www.baidu.com測試網絡連通性。
配置細則參考:?http://www.cnblogs.com/wcwen1990/p/7630545.html
?
修改主機映射:vi /etc/hostname pc.apachevi /etc/hosts 192.168.1.186 pc.apache 192.168.1.187 pc.apache2 ... reboot #即可生效
?
配置免秘鑰登入:很多公司都修改了ssh端口,使用 vi ~/.ssh/config 修改, 注意: config的最大使用權限600, chmod -R 600 config。 其配置為: Host * Port 你的端口0.安裝lrzsz工具: yum install -y lrzsz1.首先以root用戶身份,修改:vim /etc/ssh/sshd_config StrictModes no RSAAuthentication yes PubkeyAuthentication yes AuthorizedKeysFile .ssh/authorized_keys2.創建用戶hadoop: useradd hadoop passwd hadoop3. 切換到hadoop用戶: su - hadoop4. 三臺機器都生成證書: ssh-keygen -t rsa5 每臺機器的證書,通過如下命令導入到一個相同的文件。這樣,authorized_keys文件中追加了三臺機器各自生成的證書: cat id_rsa.pub >> authorized_keys6.將包含三臺機器證書的文件authorized_keys分發到三臺機器的~/.ssh/authorized_keys目錄下: rz上傳,sz下載7 然后把三臺機器 .ssh/ 文件夾權限改為700,authorized_keys文件權限改為644: chmod 700 ~/.ssh chmod 644 ~/.ssh/authorized_keys
?
3. 安裝jdk
將jdk解壓到software目錄后, 添加環境變量;
vi /etc/profile export JAVA_HOME=/opt/software/jdk1.8.0_191 export PATH=$JAVA_HOME/bin:$PATHsource /etc/profile
java安裝好后, 可使用jps。
zookeeper安裝:進入目錄 mkdir zkDatacp conf/zoo_sample.cfg conf/zoo.cfg$vim conf/zoo.cfgdataDir=/data/software/zookeeper-3.4.5/zkDataserver.1=pc1.hadoop:2888:3888server.2=pc2.hadoop:2888:3888server.3=pc3.hadoop:2888:3888#在三臺機器上同樣敲入 #pc1 : $vim zkData/myid 1 1 對應 server.1
?
配置好后, 將java, zookeepr, hadoop 全部復制到其他節點,? 并修改各節點環境變量。
配置ZOOKEEPER + HADOOP HA:
Zookeeper + Hadoop HA:rm -rf /data/software/zookeeper-3.4.5/zkData/* vim /data/software/zookeeper-3.4.5/zkData/myid rm -rf /data/dataware/*zkServer.sh start zkServer.sh status zkServer.sh stop先清空其他節點的配置: rm -rf /data/software/hadoop-2.7.3/etc/* scp -r hadoop/ hadoop@app-003:/data/software/hadoop-2.7.3/etc/第一次初始化: 先在各節點啟動: hadoop-daemon.sh start journalnode 在主節點啟動: hdfs namenode -format 將core-site.xml 中hadoop.tmp.dir的目錄, 在主節點復制到第二個namenode節點:scp -r /data/dataware/hadoop/tmp/ hadoop@app-002:/data/dataware/hadoop/ 在主節點啟動: hdfs zkfc -formatZK主節點啟動: start-dfs.sh #主節點啟動: yarn-daemon.sh start resourcemanager 主節點啟動: start-yarn.shhadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 1 3第二次及以后啟動hadoop時, 可直接啟動dfs不需要再先啟動 journalnode, 但是這就遇到一個問題, journalnode啟動需要時間,只有其穩定后, namenode才能穩定。 否則會出現一直連不到app:8485的情況。解決: ipc.client.connect.max.retrie 20 ; ipc.client.connect.retry.interval 5000主節點啟動: stop-yarn.sh stop-dfs.sh 各節點啟動: zkServer.sh stop
?
?
?
4. 安裝hadoop單機版(集群可忽略此處)以及設置防火墻與linxu安全模式
先將hadoop添加進環境變量,? $HADOOP_HOME/bin
#關閉防火墻 service iptables stop#關閉防火墻開機啟動 chkconfig iptable s off#關閉linux安全模式 /etc/sysconfig/selinux#關閉centos7防火墻 systemctl stop firewalld.service # 關閉firewall systemctl disable firewalld.service # 禁止firewall開機啟動報錯: The authenticity of host 'XXXX' can't be established錯誤解決 vim /etc/ssh/ssh_config 最后面添加 StrictHostKeyChecking no UserKnownHostsFile /dev/null無法連接MKS, 在任務管理器中打開所有VM服務進程bin/hdfs namenode -format sbin/start-dfs.sh sbin/start-yarn.shpc.apache:50070 pc.apache:8088hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 1 3關閉方式: 除了使用 stop命令外, 實在沒辦法了可以使用: killall java相關配置如下:vim hadoop-env.shexport JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/opt/software/jdk1.8.0_191vim core-site.xml<configuration><property><name>fs.defaultFS</name><value>hdfs://pc.apache:8020</value></property><property><name>hadoop.tmp.dir</name><value>/opt/software/hadoop-2.7.3/data</value></property> </configuration>vim hdfs-site.xml<configuration><property><name>dfs.replication</name><value>1</value></property><property><name>dfs.permissions</name><value>false</value></property><property><name>dfs.namenode.name.dir</name><value>/opt/software/hadoop-2.7.3/data/name</value></property><property><name>dfs.webhdfs.enable</name><value>true</value></property><property><name>dfs.permissions.enable</name><value>false</value></property></configuration>vim mapred-site.xml<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><!--指定jobhistory服務的主機及RPC端口號--><property><name>mapreduce.jobhistory.address</name><!--配置實際的主機名和端口--><value>pc.apache:10020</value></property><!--指定jobhistory服務的web訪問的主機及RPC端口號--><property><name>mapreduce.jobhistory.webapp.address</name><value>pc.apache:19888</value></property></configuration>vim slavespc.apachevim yarn-site.xml<configuration><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><!-- 指定ResorceManager所在服務器的主機名--><property><name>yarn.resourcemanager.hostname</name><value>pc.apache</value></property><!--啟用日志聚合功能--><property><name>yarn.log-aggregation-enable</name><value>true</value></property><!--日志保存時間--><property><name>yarn.log-aggregation.retain-seconds</name><value>86400</value></property></configuration>
?
?
5. 安裝hive
?
完全卸載mysql:yum remove mysql-community mysql-community-server mysql-community-libs mysql-community-common -yyum -y remove mysql57-community-release-el7-10.noarchrpm -qa |grep -i mysql rpm -ev MySQL-client-5.5.60-1.el7.x86_64 --nodepsfind / -name mysql 全部刪除rpm -qa|grep mysql rpm -ev mysql57-community-release-el7-8.noarch
安裝MySQL:wget http://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpmyum localinstall mysql57-community-release-el7-8.noarch.rpmyum repolist enabled | grep "mysql.*-community.*"yum install mysql-community-serversystemctl start mysqld systemctl status mysqld systemctl enable mysqld systemctl daemon-reload首次安裝,獲取臨時密碼: grep 'temporary password' /var/log/mysqld.log mysql -uroot -p #即可登入但是,在登入之前最好設置密碼規則, 便于修改 vim /etc/my.cnf validate_password = offsystemctl restart mysqld 重啟服務 ALTER USER 'root'@'localhost' IDENTIFIED BY '111111'; #修改密碼grant all privileges on *.* to 'root'@'%' identified by '111111'; #給其他用戶權限 flush privileges;在/etc/my.cnf 配置編碼 [mysqld] character_set_server=utf8 init_connect='SET NAMES utf8'
create user 'hive'@'localhost' identified by 'hive'; create database hive; alter database hive character set latin1;grant all on hive.* to hive@'%' identified by 'hive'; grant all on hive.* to hive@'localhost' identified by 'hive'; grant all on metastore.* to hive@'localhost' identified by 'hive'; grant all on metastore.* to hive@'%' identified by 'hive';show grants for hive@'localhost'; flush privileges; 如果重裝HIVE: 刪除Hive在MySQL的元數據, 如下。drop database metastore;select * from metastore.SDS; select * from metastore.DBS; delete from `metastore`.`TABLE_PARAMS` drop table `metastore`.`TABLE_PARAMS` delete from `metastore`.`TBLS` drop table `metastore`.`TBLS`delete from metastore.SDS delete from metastore.DBS drop table metastore.SDS drop table metastore.DBS
下載并解壓好hive項目cp hive-default.xml.template hive-site.xml cp hive-env.sh.template hive-env.sh cp hive-log4j2.properties.template hive-log4j2.propertiesvim hive-site.xml<configuration> <property><name>hive.cli.print.header</name><value>true</value><description>Whether to print the names of the columns in query output.</description> </property><property><name>hive.cli.print.current.db</name><value>true</value><description>Whether to include the current database in the Hive prompt.</description> </property><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://pc1.hadoop:3306/metastore?createDatabaseIfNotExist=true</value><description>JDBC connect string for a JDBC metastore</description> </property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value><description>Driver class name for a JDBC metastore</description> </property><property><name>javax.jdo.option.ConnectionUserName</name><value>hive</value><description>username to use against metastore database</description> </property><property><name>javax.jdo.option.ConnectionPassword</name><value>hive</value><description>password to use against metastore database</description> </property><property> <name>hive.server2.long.polling.timeout</name> <value>5000</value> </property><property> <name>hive.server2.thrift.port</name> <value>10001</value><!-- 因為Spark-sql服務端口是10000, 避免沖突,這里改成10001--> </property><property> <name>hive.server2.thrift.bind.host</name> <value>pc1.hadoop</value> </property></configuration>vim hive-env.shexport JAVA_HOME=/opt/software/jdk1.8.0_191 export HADOOP_HOME=/opt/software/hadoop-2.7.3/etc/hadoop export HIVE_CONF_DIR=/opt/software/hive-2.1.1/confvim hive-log4j2.propertieshive.log.dir=/ # 配置一下目錄地址
?
在Maven倉庫下載一個mysql-connector-java-5.1.22-bin.jar , 放入hive目錄下的 lib文件夾中
?
在Hive目錄下, 初始化數據庫。報錯: Host is not allowed to connect to this MySQL server use mysql; update user set host = '%' where user = 'root'; FLUSH PRIVILEGES;schematool -dbType mysql -initSchema將HIVE_HOME添加到環境變量, 執行hive后, 報錯:?Relative path in absolute URI
在hive-site.xml中 找到所有${system:java.io.tmpdir} 替換成 ./hive/logs/iotemp
再次運行hive
啟動hive服務
hive --service metastore &
hive --service hiveserver2 &
?
第一次啟動spark:hadoop fs -put /data/software/spark-2.1.1/jars/* /user/spark/libs/start-master.sh start-slaves.sh
?
?
?6. 安裝spark
第一步 下載一個scala: https://www.scala-lang.org/download/2.11.8.html #將scala添加進環境變量即可;第二步 解壓spark-2.2.0-bin-hadoop2.7.tgz, 并將spark添加環境變量;vim spark-env.shexport JAVA_HOME=/opt/software/jdk1.8.0_191export HADOOP_CONF_DIR=/opt/software/hadoop-2.7.3export HIVE_CONF_DIR=/opt/software/hive-2.1.1export SCALA_HOME=/opt/software/scala-2.11.8export SPARK_WORK_MEMORY=1gexport MASTER=spark://pc.apache:7077由于Spark-SQL需要用到Hive數據源, 因此需要修改Hive中的hive-site.xml<property><name>hive.metastore.uris</name><value>thrift://pc.apache:9083</value></property>修改好后, 將其復制到spark/conf 目錄下 cp hive-site.xml /opt/software/spark-2.2.0-bin-hadoop2.7/conf/復制依賴的jars: cp $HIVE_HOME/lib/hive-hbase-handler-2.1.1.jar $SPARK_HOME/jars/mkdir $SPARK_HOME/lib cp $HIVE_HOME/lib/mysql-connector-java-5.1.34.jar $SPARK_HOME/lib/ cp $HIVE_HOME/lib/metrics-core-2.2.0.jar $SPARK_HOME/libcp $HBASE_HOME/lib/guava-12.0.1.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/hbase-common-1.2.5-tests.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/hbase-client-1.2.5.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/hbase-protocol-1.2.5.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/hbase-common-1.2.5.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/hbase-server-1.2.5.jar $SPARK_HOME/lib/將上述環境變量添加進 vim $SPARK_HOME/conf/spark-env.shexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/guava-12.0.1.jarexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/hbase-client-1.2.5.jarexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/hbase-common-1.2.5.jarexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/hbase-common-1.2.5-tests.jarexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/hbase-protocol-1.2.5.jarexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/hbase-server-1.2.5.jarexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/htrace-core-3.1.0-incubating.jarexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/mysql-connector-java-5.1.34.jarexport SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/metrics-core-2.2.0.jar啟動hive數據源, 測試spark-sql nohup hive --service metastore >/opt/software/metastore.log 2>&1 &
?
啟動spark/opt/software/spark-2.2.0-bin-hadoop2.7/sbin/start-master.sh /opt/software/spark-2.2.0-bin-hadoop2.7/sbin/start-slaves.shweb: http://192.168.1.186:8080/啟動spark-sql時報錯: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
將hive.site.xml文件中的
<property>
??<name>hive.metastore.schema.verification</name>
??<value>true</value>
</property>
改為false即可
但是spark-sql服務卻起不來, 執行$SPARK_HOME/sbin/start-thriftserver.sh 時報錯: Could not create ServerSocket on address pc.apache/192.168.1.186:10001
使用 jps -ml查看java進程詳情;
原因是 HIVE的服務與SPARK-SQL服務只能起一個? 我把HIVE服務注釋掉就可以了, 但是這樣是不妥的, 可能要把HIVE服務設置成什么其他的端口避免10001? 有待進一步測試!
?
最后關于Spark安裝的,將介紹Spark on yarn
spark-shell --master yarn-client啟動后會發現報錯: Error initializing SparkContextvim yarn-site.xml<property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value><description>Whether virtual memory limits will be enforced for containers</description></property><property><name>yarn.nodemanager.vmem-pmem-ratio</name><value>4</value><description>Ratio between virtual memory to physical memory when setting memory limits for containers</description></property>
再次啟動?spark-shell --master yarn-client 即可;
?
?
總結安裝流程:
node1 node2 node3 nn1 nn2 dn3 dn1 dn2 nm3 rm1 rm2 zk3 nm1 nm2 mysql zk1 zk2 hivestat hivserv hivemeta主節點啟動: start-dfs.sh #主節點啟動: yarn-daemon.sh start resourcemanager 主節點啟動: start-yarn.sh stop-yarn.sh stop-dfs.shhive --service metastore > /home/hadoop/hive.meta & hive --service hiveserver2 > /home/hadoop/hive.log &#hadoop fs -mkdir -p /user/spark/libs/ #hadoop fs -put /data/software/spark-2.1.1/jars/* /user/spark/libs/ hadoop fs -mkdir -p /tmp/spark/logs/start-master.sh start-slaves.shzkCli.sh rm -rf /data/software/spark-2.1.1/conf/ scp -r /data/software/spark-2.1.1/conf/ hadoop@app-002:/data/software/spark-2.1.1/Yarn運行日志: /tmp/logs/hadoop/logs提交任務做測試:hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 1 3spark-shell --master yarn --deploy-mode clientspark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ --driver-memory 1000m \ --executor-memory 1000m \ --executor-cores 1 \ /data/software/spark-2.1.1/examples/jars/spark-examples_2.11-2.1.1.jar \ 3
一些debug方法:
debug:nohup java -jar sent-mail.jar > log.txt &查看端口是否占用:netstat -ntulp |grep 8020 查看一個服務有多少端口:ps -ef |grep mysqldrm -rf /data/software/hadoop-2.7.3/logs/*單獨啟動各個組件, 查看bug產生原因。 hadoop-daemon.sh start namenode hadoop-daemon.sh start datanodejps -ml kill -9
?
?
?
?
?
?
?
7. 安裝Zookeeper + HBase
如果集群中有Zookeeper集群, 使用集群中的比較好, 如果是單機測試, 用HBase自帶的Zookeeper就好;
vim hbase-env.shexport JAVA_HOME=/opt/software/jdk1.8.0_191 export HBASE_MANAGES_ZK=true export HADOOP_HOME=/opt/software/hadoop-2.7.3 export HBASE_CLASSPATH=/opt/software/hadoop-2.7.3/etc/hadoop export HBASE_PID_DIR=/opt/software/hbase-1.2.5/pidsvim hbase-site.xml<property>
<name>hbase.rootdir</name>
<value>hdfs://pc.apache:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>pc.apache</value>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://pc.apache:60000</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/opt/software/hbase-1.2.5/tmp</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/software/hbase-1.2.5/zooData</value>
</property>
/opt/software/hbase-1.2.5/bin/start-hbase.sh
status
http://192.168.1.186:16010/
?
Hbase window最簡單安裝版本下載:hbase-1.2.3。(http://apache.fayea.com/hbase/stable/)必須: Java_Home, Hadoop_Home1. 修改 hbase-1.0.2\conf\hbase-env.cmd 文件set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_05 set HBASE_MANAGES_ZK=flase 2. 修改 hbase-1.0.2\conf\hbase-env.sh 文件export HBASE_MANAGES_ZK=false3. 修改hbase-1.0.2\conf\hbase-site.xml 文件 路徑自己改為自己的實際路徑<configuration><property><name>hbase.rootdir</name><value>file:///E:/software/hbase-1.4.10/root</value></property><property> <name>hbase.tmp.dir</name><value>E:/software/hbase-1.4.10/tmp</value></property><property> <name>hbase.zookeeper.quorum</name> <value>127.0.0.1</value></property><property> <name>hbase.zookeeper.property.dataDir</name> <value>E:/software/hbase-1.4.10/zoo</value></property><property> <name>hbase.cluster.distributed</name> <value>false</value></property> </configuration>4.進入到bin目錄 點擊 start-hbase.cmd在該目錄下執行命令窗口 : hbase shellcreate 'test','cf' scan 'test'
?
轉載于:https://www.cnblogs.com/ruili07/p/10020165.html
總結
以上是生活随笔為你收集整理的基本环境安装: Centos7+Java+Hadoop+Spark+HBase+ES+Azkaban的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Angular cli 发布自定义组件
- 下一篇: CS229 6.6 Neurons N