基于Hadoop2.7.3集群数据仓库Hive1.2.2的部署及使用
基于Hadoop2.7.3集群數(shù)據(jù)倉(cāng)庫(kù)Hive1.2.2的部署及使用
HBase是一種分布式、面向列的NoSQL數(shù)據(jù)庫(kù),基于HDFS存儲(chǔ),以表的形式存儲(chǔ)數(shù)據(jù),表由行和列組成,列劃分到列族中。HBase不提供類SQL查詢語(yǔ)言,要想像SQL這樣查詢數(shù)據(jù),可以使用Phonix,讓SQL查詢轉(zhuǎn)換成hbase的掃描和對(duì)應(yīng)的操作,也可以使用現(xiàn)在說(shuō)講Hive倉(cāng)庫(kù)工具,讓HBase作為Hive存儲(chǔ)。
Hive是運(yùn)行在Hadoop之上的數(shù)據(jù)倉(cāng)庫(kù),將結(jié)構(gòu)化的數(shù)據(jù)文件映射為一張數(shù)據(jù)庫(kù)表,提供簡(jiǎn)單類SQL查詢語(yǔ)言,稱為HQL,并將SQL語(yǔ)句轉(zhuǎn)換成MapReduce任務(wù)運(yùn)算。有利于利用SQL語(yǔ)言查詢、分析數(shù)據(jù),適于處理不頻繁變動(dòng)的數(shù)據(jù)。Hive底層可以是HBase或者HDFS存儲(chǔ)的文件。
兩者都是基于Hadoop上不同的技術(shù),相互結(jié)合使用,可處理企業(yè)中不同類型的業(yè)務(wù),利用Hive處理非結(jié)構(gòu)化離線分析統(tǒng)計(jì),利用HBase處理在線查詢。
1.安裝hive通過(guò)二進(jìn)制包安裝
下載地址:http://mirrors.shuosc.org/apache/hive/stable/apache-hive-1.2.2-bin.tar.gz
tar -zxf apache-hive-1.2.2-bin.tar.gz
配置環(huán)境變量
# vi /etc/profile HIVE_HOME=/data/yunva/apache-hive-1.2.2-bin PATH=$PATH:$HIVE_HOME/bin export HIVE_NAME PATH # source /etc/profile?
2.安裝mysql,存儲(chǔ)hive相關(guān)的信息(此處因?yàn)橘Y源使用問(wèn)題,mysql安裝在了另外的服務(wù)器中)
# yum install -y mariadb mariadb-server # systemctl start mariadb?
在MySQL創(chuàng)建Hive元數(shù)據(jù)存放庫(kù)和連接用戶
mysql>create database hive; mysql>grant all on *.* to'hive'@'%' identified by 'hive'; mysql>flush privileges;?
3.配置hive
cd /data/yunva/apache-hive-1.2.2-bin/conf cp hive-default.xml.template hive-default.xml?
配置hive連接mysql的信息
# vim hive-site.xml
?
4.安裝java連接mysql的驅(qū)動(dòng)
下載地址:https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.45.tar.gz
將解壓的mysql-connector-java-5.1.45-bin.jar放到/data/yunva/apache-hive-1.2.2-bin/lib目錄
5.啟動(dòng)Hive服務(wù)
# hive --service metastore &[root@test3 apache-hive-1.2.2-bin]# ps -ef|grep hive root 4302 3176 99 14:09 pts/0 00:00:06 /usr/java/jdk1.8.0_65/bin/java -Xmx256m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/yunva/hadoop-2.7.3/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/yunva/hadoop-2.7.3 -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /data/yunva/apache-hive-1.2.2-bin/lib/hive-service-1.2.2.jar org.apache.hadoop.hive.metastore.HiveMetaStore root 4415 3176 0 14:09 pts/0 00:00:00 grep hive [root@test3 apache-hive-1.2.2-bin]# jps 15445 HRegionServer 4428 Jps 4302 RunJa # hive會(huì)啟動(dòng)叫做RunJa的程序?
客戶端配置,需要集成Hadoop環(huán)境
scp -P 48490 -r apache-hive-1.2.2-bin 10.10.114.112:/data/yunva
配置環(huán)境變量:
vim /etc/profile
# hive client
HIVE_HOME=/data/yunva/apache-hive-1.2.2-bin
PATH=$PATH:$HIVE_HOME/bin
export HIVE_NAME PATH
# vi hive-site.xml(或者直接使用原有配置不變,此時(shí)hive就有兩個(gè)服務(wù)端了)
<configuration> <!--通過(guò)thrift方式連接hive--><property><name>hive.metastore.uris</name><value>thrift://hive_server_ip:9083</value></property> </configuration>?
簡(jiǎn)單測(cè)試:
執(zhí)行hive命令會(huì)進(jìn)入命令界面:
?
6.Hive常用SQL命令
6.1先創(chuàng)建一個(gè)測(cè)試庫(kù)
?
創(chuàng)建tb1表,并指定字段分隔符為tab鍵(否則會(huì)插入NULL)
hive> create table tb1(id int,name string) row format delimited fields terminated by '\t';?
如果想再創(chuàng)建一個(gè)表,而且表結(jié)構(gòu)和tb1一樣,可以這樣:
hive> create table table2 like tb1;
查看下表結(jié)構(gòu):
hive> describe table2;
OK
id int
name string
Time taken: 0.126 seconds, Fetched: 2 row(s)
6.2從本地文件中導(dǎo)入數(shù)據(jù)到Hive表
先創(chuàng)建數(shù)據(jù)文件,鍵值要以tab鍵空格:
?
再導(dǎo)入數(shù)據(jù):
hive> load data local inpath'/root/seasons.txt' overwrite into table tb1;
查詢是否導(dǎo)入成功
?
6.3從HDFS中導(dǎo)入數(shù)據(jù)到Hive表:
列出hdfs文件系統(tǒng)根目錄下的目錄
hadoop fs -ls /
創(chuàng)建test根目錄
hadoop fs -mkdir /test
put 命令向/test目錄寫(xiě)入文件為siji.txt
hadoop fs -put /root/seasons.txt /test/siji.txt
查看siji.txt文件內(nèi)容
# hadoop fs -cat /test/siji.txt 17/12/06 14:54:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1 spring 2 summer 3 autumn 4 winte?
hive> load data inpath '/test/siji.txt' overwrite into table table2;
Loading data to table test.table2
Table test.table2 stats: [numFiles=1, numRows=0, totalSize=36, rawDataSize=0]
OK
Time taken: 0.336 seconds
查詢是否導(dǎo)入成功
hive> select * from table2; OK 1 spring 2 summer 3 autumn 4 winter Time taken: 0.074 seconds, Fetched: 4 row(s)?
6.4上面是基本表的簡(jiǎn)單操作,為了提高處理性能,Hive引入了分區(qū)機(jī)制,那我們就了解分區(qū)表概念:
1>.分區(qū)表是在創(chuàng)建表時(shí)指定的分區(qū)空間
2>.一個(gè)表可以有一個(gè)或多個(gè)分區(qū),意思把數(shù)據(jù)劃分成塊
3>.分區(qū)以字段的形式在表結(jié)構(gòu)中,不存放實(shí)際數(shù)據(jù)內(nèi)容
分區(qū)表優(yōu)點(diǎn):將表中數(shù)據(jù)根據(jù)條件分配到不同的分區(qū)中,縮小查詢范圍,提高檢索速度和處理性能
6.5單分區(qū)表:
創(chuàng)建單分區(qū)表tb2(HDFS表目錄下只有一級(jí)目錄):
hive> create table tb2(id int,name string) partitioned by (dt string) row format delimited fields terminated by '\t';
注:dt可以理解為分區(qū)名稱。
從文件中把數(shù)據(jù)導(dǎo)入到Hive分區(qū)表,并定義分區(qū)信息(需要已經(jīng)存在的表)
hive> load data local inpath '/root/seasons.txt' into table tb2 partition (dt='2017-12-06');
hive> load data local inpath '/root/seasons.txt' into table tb2 partition (dt='2017-12-07');
查看表數(shù)據(jù)
hive> select * from tb2; OK 1 spring 2017-12-06 2 summer 2017-12-06 3 autumn 2017-12-06 4 winter 2017-12-06 1 spring 2017-12-07 2 summer 2017-12-07 3 autumn 2017-12-07 4 winter 2017-12-07 Time taken: 0.086 seconds, Fetched: 8 row(s)?
查看HDFS倉(cāng)庫(kù)中表目錄變化
[root@test4_haili_dev ~]# hadoop fs -ls -R /user/hive/warehouse/test.db/tb2 17/12/06 15:09:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable drwxrwxrwx - root supergroup 0 2017-12-06 15:07 /user/hive/warehouse/test.db/tb2/dt=2017-12-06 -rwxrwxrwx 3 root supergroup 36 2017-12-06 15:07 /user/hive/warehouse/test.db/tb2/dt=2017-12-06/seasons.txt drwxrwxrwx - root supergroup 0 2017-12-06 15:07 /user/hive/warehouse/test.db/tb2/dt=2017-12-07 -rwxrwxrwx 3 root supergroup 36 2017-12-06 15:07 /user/hive/warehouse/test.db/tb2/dt=2017-12-07/seasons.txt?
可以看到tb2表導(dǎo)入的數(shù)據(jù)根據(jù)日期將數(shù)據(jù)劃分到不同目錄下
6.6多分區(qū)表:
創(chuàng)建多分區(qū)表tb3(HDFS表目錄下有一級(jí)目錄,一級(jí)目錄下再有子級(jí)目錄)
hive> create table table3(id int,name string) partitioned by (dt string,location string) row format delimited fields terminated by '\t';
從文件中把數(shù)據(jù)導(dǎo)入到Hive分區(qū)表,并定義分區(qū)信息
hive> load data local inpath '/root/seasons.txt' into table table3 partition (dt='2017-12-06',location='guangzhou');
hive> load data local inpath '/root/seasons.txt' into table table3 partition (dt='2017-12-07',location='shenzhen');
查看表數(shù)據(jù)
hive> select * from table3; OK 1 spring 2017-12-06 guangzhou 2 summer 2017-12-06 guangzhou 3 autumn 2017-12-06 guangzhou 4 winter 2017-12-06 guangzhou 1 spring 2017-12-07 shenzhen 2 summer 2017-12-07 shenzhen 3 autumn 2017-12-07 shenzhen 4 winter 2017-12-07 shenzhen?
查看HDFS倉(cāng)庫(kù)中表目錄變化
[root@test3 yunva]# hadoop fs -ls -R /user/hive/warehouse/test.db/table3 17/12/06 15:22:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable drwxrwxrwx - root supergroup 0 2017-12-06 15:19 /user/hive/warehouse/test.db/table3/dt=2017-12-06 drwxrwxrwx - root supergroup 0 2017-12-06 15:19 /user/hive/warehouse/test.db/table3/dt=2017-12-06/location=guangzhou -rwxrwxrwx 3 root supergroup 36 2017-12-06 15:19 /user/hive/warehouse/test.db/table3/dt=2017-12-06/location=guangzhou/seasons.txt drwxrwxrwx - root supergroup 0 2017-12-06 15:20 /user/hive/warehouse/test.db/table3/dt=2017-12-07 drwxrwxrwx - root supergroup 0 2017-12-06 15:20 /user/hive/warehouse/test.db/table3/dt=2017-12-07/location=shenzhen -rwxrwxrwx 3 root supergroup 36 2017-12-06 15:20 /user/hive/warehouse/test.db/table3/dt=2017-12-07/location=shenzhen/seasons.txt?
可以看到表中一級(jí)dt分區(qū)目錄下又分成了location分區(qū)。
查看表分區(qū)信息
hive> show partitions table3;
OK
dt=2017-12-06/location=guangzhou
dt=2017-12-07/location=shenzhen
Time taken: 0.073 seconds, Fetched: 2 row(s)
根據(jù)分區(qū)查詢數(shù)據(jù)
hive> select name from table3 where dt='2017-12-06';
OK
spring
summer
autumn
winter
Time taken: 0.312 seconds, Fetched: 4 row(s)
重命名分區(qū)
hive> alter table table3 partition (dt='2017-12-06',location='guangzhou') rename to partition(dt='20171206',location='shanghai');
刪除分區(qū)
hive> alter table table3 drop partition(dt='2017-12-06',location='guangzhou');
OK
Time taken: 0.113 seconds
可以看到已經(jīng)查不出來(lái)了
hive> select name from table3 where dt='2017-12-06';
OK
Time taken: 0.078 seconds
模糊搜索表
hive> show tables 'tb*';
OK
tb1
tb2
給表新添加一列
hive> alter table tb1 add columns (comment string); OK Time taken: 0.106 seconds hive> describe tb1; OK id int name string comment string Time taken: 0.079 seconds, Fetched: 3 row(s)?
重命名表
hive> alter table tb1 rename to new_tb1;
OK
Time taken: 0.095 seconds
hive> show tables;
OK
new_tb1
table2
table3
tb2
刪除表
hive> drop table new_tb1;
OK
Time taken: 0.094 seconds
hive> show tables;
OK
table2
table3
tb2
轉(zhuǎn)載于:https://www.cnblogs.com/reblue520/p/7993026.html
總結(jié)
以上是生活随笔為你收集整理的基于Hadoop2.7.3集群数据仓库Hive1.2.2的部署及使用的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 网速变得奇慢说明可能需要安装金山ARP防
- 下一篇: 网络安全管理实践(第2版)