當(dāng)前位置：首頁 > 编程语言 > java >内容正文

java

Hive的安装和使用以及Java操作hive

發(fā)布時間：2025/3/15 java 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hive的安装和使用以及Java操作hive 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

Hive 引言

簡介

hive是facebook開源，并捐獻給了apache組織，作為apache組織的頂級項目(hive.apache.org)。 hive是一個基于大數(shù)據(jù)技術(shù)的數(shù)據(jù)倉庫(DataWareHouse)技術(shù)，主要是通過將用戶書寫的SQL語句翻譯成MapReduce代碼，然后發(fā)布任務(wù)給MR框架執(zhí)行，完成SQL 到 MapReduce的轉(zhuǎn)換。可以將結(jié)構(gòu)化的數(shù)據(jù)文件映射為一張數(shù)據(jù)庫表，并提供類SQL查詢功能。

總結(jié)

Hive是一個數(shù)據(jù)倉庫
Hive構(gòu)建在HDFS上，可以存儲海量數(shù)據(jù)。
Hive允許程序員使用SQL命令來完成數(shù)據(jù)的分布式計算，計算構(gòu)建在yarn之上。(Hive會將SQL轉(zhuǎn)化為MR操作)

優(yōu)點：

? 簡化程序員的開發(fā)難度，寫SQL即可，避免了去寫mapreduce,減少開發(fā)人員的學(xué)習(xí)成本

缺點：

? 延遲較高(MapReduce本身延遲，Hive SQL向MapReduce轉(zhuǎn)化優(yōu)化提交)，適合做大數(shù)據(jù)的離線處理(TB PB級別的數(shù)據(jù)，統(tǒng)計結(jié)果延遲1天產(chǎn)出)

Hive不適合場景：

? 1：小數(shù)據(jù)量

? 2：實時計算

數(shù)據(jù)庫 DataBase
- 數(shù)據(jù)量級小，數(shù)據(jù)價值高
數(shù)據(jù)倉庫 DataWareHouse
- 數(shù)據(jù)體量大，數(shù)據(jù)價值低

Hive 的架構(gòu)

1. 簡介

HDFS：用來存儲hive倉庫的數(shù)據(jù)文件 yarn：用來完成hive的HQL轉(zhuǎn)化的MR程序的執(zhí)行 MetaStore：保存管理hive維護的元數(shù)據(jù) Hive：用來通過HQL的執(zhí)行，轉(zhuǎn)化為MapReduce程序的執(zhí)行，從而對HDFS集群中的數(shù)據(jù)文件進行統(tǒng)計。

2. 圖

Hive的安裝

# 步驟 1. HDFS(Hadoop2.9.2) 2. Yarn(Hadoop2.9.2) 3. MySQL(5.6) 4. Hive(1.2.1)

虛擬機內(nèi)存設(shè)置至少1G

1. 安裝mysql數(shù)據(jù)庫

參考MySQL安裝文檔

2. 安裝Hadoop

# 配置hdfs和yarn的配置信息 [root@hive40 ~]# jps 1651 NameNode 2356 NodeManager 2533 Jps 1815 DataNode 2027 SecondaryNameNode 2237 ResourceManager

3. 安裝hive

1 上傳hive安裝包到linux中

2 解壓縮hive

[root@hadoop ~]# tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /opt/installs [root@hadoop ~]# mv apache-hive-1.2.1-bin hive1.2.1

3 配置環(huán)境變量

export HIVE_HOME=/opt/installs/hive1.2.1 export PATH=$PATH:$HIVE_HOME/bin

4 加載系統(tǒng)配置生效

[root@hadoop ~]# source /etc/profile

5 配置hive

hive-env.sh

拷貝一個hive-env.sh:[root@hadoop10 conf]# cp hive-env.sh.template hive-env.sh

# 配置hadoop目錄 HADOOP_HOME=/opt/installs/hadoop2.9.2/ # 指定hive的配置文件目錄 export HIVE_CONF_DIR=/opt/installs/hive1.2.1/conf/

hive-site.xml

拷貝得到hive-site.xml：[root@hadoop10 conf]# cp hive-default.xml.template hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://hadoop10:3306/hive</value></property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value></property><property><name>javax.jdo.option.ConnectionUserName</name><value>root</value></property><property><name>javax.jdo.option.ConnectionPassword</name><value>admins</value></property> </configuration>

登錄mysql創(chuàng)建hive數(shù)據(jù)庫(使用命令行創(chuàng)建)

create database hive

復(fù)制mysql驅(qū)動jar到hive的lib目錄中

4 啟動

1. 啟動 hadoop

啟動hadoop

# 啟動HDFS start-dfs.sh # 啟動yarn start-yarn.sh

2. 本地啟動hive

初始化元數(shù)據(jù)：schematool -dbType mysql -initSchema

初始化mysql的hivedatabase中的信息。

3. 啟動Hive的兩種方式

# 本地模式啟動【管理員模式】 # 啟動hive服務(wù)器，同時進入hive的客戶端。只能通過本地方式訪問。 [root@hadoop10 ~]# hive Logging initialized using configuration in jar:file:/opt/installs/hive1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties hive> # 客戶端操作之HQL(Hive Query language) # 1.查看數(shù)據(jù)庫hive> show databases; # 2. 創(chuàng)建一個數(shù)據(jù)庫hive> create database baizhi; # 3. 查看database hive> show databases; # 4. 切換進入數(shù)據(jù)庫hive> use baizhi; # 5.查看所有表hive> show tables; # 6.創(chuàng)建一個表hive> create table t_user(id string,name string,age int); # 7. 添加一條數(shù)據(jù)(轉(zhuǎn)化為MR執(zhí)行--不讓用，僅供測試)hive> insert into t_user values('1001','zhangsan',20); # 8.查看表結(jié)構(gòu)hive> desc t_user; # 9.查看表的schema描述信息。(表元數(shù)據(jù)，描述信息)hive> show create table t_user;# 明確看到，該表的數(shù)據(jù)存放在hdfs中。 # 10 .查看數(shù)據(jù)庫結(jié)構(gòu)hive> desc database baizhi; # 11.查看當(dāng)前庫hive> select current_database(); # 12 其他sqlselect * from t_user;select count(*) from t_user; (Hive會啟動MapReduce)select * from t_user order by id;

3.hive的客戶端和服務(wù)端

# 啟動hive的服務(wù)器，可以允許遠(yuǎn)程連接方式訪問。 // 前臺啟動 [root@hadoop10 ~]# hiveserver2 // 后臺啟動 [root@hadoop10 ~]# hiveserver2 &

beeline客戶端

# 啟動客戶端 [root@hadoop10 ~]# beeline beeline> !connect jdbc:hive2://hadoop10:10000 回車輸入mysql用戶名回車輸入mysql密碼

DBeaver客戶端(圖形化界面)

# 1: 解壓 # 2: 準(zhǔn)備dbeaver連接hive的依賴jarhadoop-common-2.9.2hive-jdbc-1.2.1-standalone # 3:啟動

JDBC

# 導(dǎo)入依賴 <dependency><groupId>org.apache.hive</groupId><artifactId>hive-jdbc</artifactId><version>1.2.1</version> </dependency> <dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.9.2</version> </dependency> # JDBC操作Hive public static void main(String[] args) throws Exception {BasicConfigurator.configure();//開啟日志//加載hive驅(qū)動Class.forName("org.apache.hive.jdbc.HiveDriver");//連接hive數(shù)據(jù)庫Connection conn = DriverManager.getConnection("jdbc:hive2://hadoop10:10000/baizhi","root","admins");String sql = "select * from t_user1";PreparedStatement pstm = conn.prepareStatement(sql);ResultSet rs = pstm.executeQuery();while(rs.next()){String id = rs.getString("id");String name = rs.getString("name");int age = rs.getInt("age");System.out.println(id+":"+name+":"+age);}rs.close();pstm.close();conn.close(); }

4. 數(shù)據(jù)類型

數(shù)據(jù)類型（primitive，array，map，struct )

primitive(原始類型)：

hive數(shù)據(jù)類型字節(jié)備注

TINYINT	1	java-byte 整型
SMALLINT	2	java-short 整型
INT	4	java-int 整型
BIGINT	8	java-long 整型
BOOLEAN		布爾
FLOAT	4	浮點型
DOUBLE	8	浮點型
STRING		字符串無限制
VARCHAR		字符串 varchar(20) 最長20
CHAR		字符串 char(20) 定長20
BINARY		二進制類型
TIMESTAMP		時間戳類型
DATE		日期類型

array（數(shù)組類型）：
# 建表 create table t_tab(score array<float>，字段名 array<泛型> );
map（key-value類型）：MAP <primitive_type, data_type>
# 建表 create table t_tab(score map<string,float> );
struct（結(jié)構(gòu)體類型）：STRUCT <col_name:data_type, …>
# 建表 create table t_tab(info struct<name:string,age:int,sex:char(1)>，列名 struct<屬性名:類型,屬性名:類型> );

Hive數(shù)據(jù)導(dǎo)入

1.自定義分隔符

# 分隔符設(shè)計分隔符含義備注

,	用來表示每個列的值之間分隔符。 fields
-	用來分割array中每個元素，以及struct中的每個值，以及map中kv與kv之間。 collection items
\|	用來分割map的k和v之間 map keys
\n	每條數(shù)據(jù)分割使用換行。 lines

# 建表 create table t_person(id string,name string,salary double,birthday date,sex char(1),hobbies array<string>,cards map<string,string>,addr struct<city:string,zipCode:string> ) row format delimited fields terminated by ','--列的分割 collection items terminated by '-'--數(shù)組 struct的屬性 map的kv和kv之間 map keys terminated by '|'-- map的k與v的分割 lines terminated by '\n';--行數(shù)據(jù)之間的分割 # 測試數(shù)據(jù) 1,張三,8000.0,2019-9-9,1,抽煙-喝酒-燙頭,123456|中國銀行-22334455|建設(shè)銀行,北京-10010 2,李四,9000.0,2019-8-9,0,抽煙-喝酒-燙頭,123456|中國銀行-22334455|建設(shè)銀行,鄭州-45000 3,王五,7000.0,2019-7-9,1,喝酒-燙頭,123456|中國銀行-22334455|建設(shè)銀行,北京-10010 4,趙6,100.0,2019-10-9,0,抽煙-燙頭,123456|中國銀行-22334455|建設(shè)銀行,鄭州-45000 5,于謙,1000.0,2019-10-9,0,抽煙-喝酒,123456|中國銀行-22334455|建設(shè)銀行,北京-10010 6,郭德綱,1000.0,2019-10-9,1,抽煙-燙頭,123456|中國銀行-22334455|建設(shè)銀行,天津-20010 # 導(dǎo)入數(shù)據(jù) # 在hive命令行中執(zhí)行 -- local 代表本地路徑，如果不寫，代表讀取文件來自于HDFS -- overwrite 是覆蓋的意思，可以省略。 load data [local] inpath ‘/opt/datas/person1.txt’ [overwrite] into table t_person; # 本質(zhì)上就是將數(shù)據(jù)上傳到hdfs中(數(shù)據(jù)是受hive的管理)

2.JSON分割符

jar添加和數(shù)據(jù)導(dǎo)入，建表，在beeline里面操作

數(shù)據(jù)

# 1.本地創(chuàng)建json文件 {"id":1,"name":"zhangsan","sex":0,"birth":"1991-02-08"} {"id":2,"name":"lisi","sex":1,"birth":"1991-02-08"}

添加格式解析器的jar(本地客戶端命令)

# 在hive的客戶端執(zhí)行(臨時添加jar到hive的classpath，有效期本鏈接內(nèi)) add jar /opt/installs/hive1.2.1/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar# 補充：永久添加，Hive服務(wù)器級別有效。 1. 將需要添加到hive的classpath的jar，拷貝到hive下的auxlib目錄下， 2. 重啟hiveserver即可。

建表

create table t_person2(id string,name string,sex char(1),birth date )row format serde 'org.apache.hive.hcatalog.data.JsonSerDe';

加載文件數(shù)據(jù)(本地客戶端命令)

# 注意：導(dǎo)入的json數(shù)據(jù)dbeaver看不了。(因為導(dǎo)入后的表本質(zhì)上就是該json文件。) load data local inpath '/opt/person.json' into table t_person2;

查看數(shù)據(jù)

select * from t_person2;

3. 正則分隔符

數(shù)據(jù)：access.log

INFO 192.168.1.1 2019-10-19 QQ com.baizhi.service.IUserService#login INFO 192.168.1.1 2019-10-19 QQ com.baizhi.service.IUserService#login ERROR 192.168.1.3 2019-10-19 QQ com.baizhi.service.IUserService#save WARN 192.168.1.2 2019-10-19 QQ com.baizhi.service.IUserService#login DEBUG 192.168.1.3 2019-10-19 QQ com.baizhi.service.IUserService#login ERROR 192.168.1.1 2019-10-19 QQ com.baizhi.service.IUserService#register

建表語句

create table t_access(level string,ip string,log_time date,app string,service string,method string )row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'--正則表達式的格式轉(zhuǎn)化類 with serdeproperties("input.regex"="(.*)\\s(.*)\\s(.*)\\s(.*)\\s(.*)#(.*)");--(.*) 表示任意字符 \\s表示空格

導(dǎo)入數(shù)據(jù)

load data local inpath '/opt/access.log' into table t_access;

查看數(shù)據(jù)

select * from t_access;

HQL高級

– SQL關(guān)鍵詞執(zhí)行順序
from > where條件 > group by > having條件>select>order by>limit

注意：sql一旦出現(xiàn)group by，后續(xù)的關(guān)鍵詞能夠操作字段只有(分組依據(jù)字段，組函數(shù)處理結(jié)果)

HQL高級

# 0. 各個數(shù)據(jù)類型的字段訪問(array、map、struct) select name,salary,hobbies[1],cards['123456'],addr.city from t_person; # 1. 條件查詢：= != >= <= select * from t_person where addr.city='鄭州'; # 2. and or between and select * from t_person where salary>5000 and array_contains(hobbies,'抽煙'); # 3. order by[底層會啟動mapreduce進行排序] select * from t_person order by salary desc; # 4. limit(hive沒有起始下標(biāo)) select * from t_person sort by salary desc limit 5; # 5. 去重 select distinct addr.city from t_person; select distinct(addr.city) from t_person; # 表連接 select ... from table1 t1 left join table2 t2 on 條件 where 條件 group by having 1. 查詢性別不同，但是薪資相同的人員信息。 select t1.name,t1.sex,t1.salary,t2.name,t2.sex,t2.salary from t_person t1 join t_person t2 on t1.salary = t2.salary where t1.sex != t2.sex; 2. 查詢擁有相同第一愛好且來自不同城市的人信息。 SELECT t1.name,t1.salary,t1.hobbies,t1.addr.city,t2.name,t2.salary,t2.hobbies,t2.addr.city from t_person t1 join t_person t2 on t1.hobbies[0]=t2.hobbies[0] where t1.addr.city != t2.addr.city;

# 單行函數(shù)(show functions) 查看所有函數(shù) -- 查看hive系統(tǒng)所有函數(shù) show functions;1. array_contains(列,值); select name,hobbies from t_person where array_contains(hobbies,'喝酒'); 2. length(列) select length('123123'); 3. concat(列,列) select concat('123123','aaaa'); 4. to_date('1999-9-9') select to_date('1999-9-9'); 5. year(date),month(date), 6. date_add(date,數(shù)字) select name,date_add(birthday,-9) from t_person; # 組函數(shù) 概念： max、min、sum、avg、count等。select max(salary) from t_person where addr.city='北京'; select count(id) from t_person; # 炸裂函數(shù)(集合函數(shù)) -- 查詢所有的愛好， select explode(hobbies) as hobby from t_person # lateral view -- 為指定表，的邊緣拼接一個列。(類似表連接) -- lateral view：為表的拼接一個列(炸裂結(jié)果) -- 語法：from 表 lateral view explode(數(shù)組字段) 別名 as 字段名; -- 查看id，name，愛好。一個愛好一條信息。 select id,name,hobby from t_person lateral view explode(hobbies) t_hobby as hobby # 分組 1. group by(查看各個城市的均薪) select addr.city,avg(salary) from t_person group by addr.city; 2. having(查看平均工資超過5000的城市和均薪) select addr.city,avg(salary) from t_person group by addr.city having avg(salary)>5000; 3. 統(tǒng)計各個愛好的人數(shù) --explod+lateral view select hobby,count( * ) from t_person lateral view explode(hobbies) t_hobby as hobby group by hobby; 4. 統(tǒng)計最受歡迎的愛好TOP1 SELECT hb,count( * ) numfrom t_person lateral view explode(hobbies) h as hbgroup by hborder by num desc limit 1; # 子查詢 -- 統(tǒng)計有哪些愛好，并去重。 select distinct t.hobby from (select explode(hobbies) as hobby from t_person ) t

行列相轉(zhuǎn)

# 案例表和數(shù)據(jù) --## 表（電影觀看日志） create table t_visit_video (username string,video_name string,video_date date )row format delimited fields terminated by ','; --## 數(shù)據(jù)：豆瓣觀影日志數(shù)據(jù)。(用戶觀影日志數(shù)據(jù) 按照天存放 1天一個日志文件) 張三,大唐雙龍傳,2020-03-21 李四,天下無賊,2020-03-21 張三,神探狄仁杰,2020-03-21 李四,霸王別姬,2020-03-21 李四,霸王別姬,2020-03-21 王五,機器人總動員,2020-03-21 王五,放牛班的春天,2020-03-21 王五,盜夢空間,2020-03-21

# collect_list(組函數(shù)) 作用：對分組后的，每個組的某個列的值進行收集匯總。語法：select collect_list(列) from 表 group by 分組列; select username,collect_list(video_name) from t_visit_video group by username;

# collect_set(組函數(shù)) 作用：對分組后的，每個組的某個列的值進行收集匯總，并去掉重復(fù)值。語法：select collect_set(列) from 表 group by 分組列; select username,collect_set(video_name) from t_visit_video group by username;

# concat_ws(單行函數(shù)) 作用：如果某個字段是數(shù)組，對該值得多個元素使用指定分隔符拼接。 select id,name,concat_ws(',',hobbies) from t_person; --# 將t_visit_video數(shù)據(jù)轉(zhuǎn)化為如下圖效果 --統(tǒng)計每個人，2020-3-21看過的電影。 select username,concat_ws(',',collect_set(video_name)) from t_visit_video group by username;

全排序和局部排序

# 全局排序語法：select * from 表 order by 字段 asc|desc; -- 按照薪資降序排序 select * from t_person order by salary desc; # 局部排序(分區(qū)排序) 概念：啟動多個reduceTask，對數(shù)據(jù)進行排序(預(yù)排序)，局部有序。局部排序關(guān)鍵詞 sort by默認(rèn)reducetask個數(shù)只有1個，所有分區(qū)也只有一個。所以默認(rèn)和全排序效果一樣。語法：select * from 表 distribute by 分區(qū)字段 sort by 字段 asc|desc; -- 1. 開啟reduce個數(shù)-- 設(shè)置reduce個數(shù)set mapreduce.job.reduces = 3;-- 查看reduce個數(shù)set mapreduce.job.reduces; -- 2. 使用sort by排序 +distribute by 指定分區(qū)列。(使用distribute后select就只能*)select * from t_person distribute by addr.city sort by salary desc;

Hive中表分類

4.1 管理表

由Hive全權(quán)管理的表

? 所謂的管理表指hive是否具備數(shù)據(jù)的管理權(quán)限，如果該表是管理表，當(dāng)用戶刪除表的同時，hive也會將表所對應(yīng)的數(shù)據(jù)刪除，因此在生產(chǎn)環(huán)境下，為了防止誤操作，帶來數(shù)據(jù)損失，一般考慮將表修改為非管理表-外部表

總結(jié)：Hive的管理，表結(jié)構(gòu)，hdfs中表的數(shù)據(jù)文件，都?xì)wHive全權(quán)管理。---- hive刪除管理表，HDFS對應(yīng)文件也會被刪除。

缺點：數(shù)據(jù)不安全。

4.2 外部表

引用映射HDFS數(shù)據(jù)作為表管理,但無法刪除數(shù)據(jù)

外部表和管理表最大的區(qū)別在于刪除外部表，只是將MySQL中對應(yīng)該表的元數(shù)據(jù)信息刪除，并不會刪除hdfs上的數(shù)據(jù)，因此外部表可以實現(xiàn)和第三方應(yīng)用共享數(shù)據(jù)。在創(chuàng)建外表的時候需要添加一個關(guān)鍵字"external"即可。create external xxx()…

# 創(chuàng)建外部表 1. 準(zhǔn)備數(shù)據(jù)文件personout.txt 2. 上傳至hdfs中，該數(shù)據(jù)文件必須被放在一個單獨的文件夾內(nèi)。該文件夾內(nèi)的數(shù)據(jù)文件被作為表數(shù)據(jù) 3. 創(chuàng)建表: create external location在最后使用location 指定hdfs中數(shù)據(jù)文件所在的文件夾即可。create external table t_personout(id int,name string,salary double,birthday date,sex char(1),hobbies array<string>,cards map<string,string>,addr struct<city:string,zipCode:string>)row format delimitedfields terminated by ',' --列的分割collection items terminated by '-'--數(shù)組 struct的屬性 map的kv和kv之間map keys terminated by '|'lines terminated by '\n'location '/file';4. 查詢表數(shù)據(jù)

4.3 分區(qū)表

將表按照某個列的一定規(guī)則進行分區(qū)存放，減少海量數(shù)據(jù)情況下的數(shù)據(jù)檢索范圍，提高查詢效率；

舉例：電影表、用戶表

分區(qū)方案：按照用戶區(qū)域、電影類型

應(yīng)用：依據(jù)實際業(yè)務(wù)功能，拿查詢條件的列作為分區(qū)列來進行分區(qū)，縮小MapReduce的掃描范圍，提高MapReduce的執(zhí)行效率，

總結(jié)：

? table中的多個分區(qū)的數(shù)據(jù)是分區(qū)管理

? 1：刪除數(shù)據(jù)按照分區(qū)刪除。如果刪除某個分區(qū)，則將分區(qū)對應(yīng)的數(shù)據(jù)也刪除(外部表，數(shù)據(jù)刪除，數(shù)據(jù)文件依然在)。

? 2：查詢統(tǒng)計，多個分區(qū)被一個表管理起來。

? select * from 表 where 分區(qū)字段為條件。

4.3.1 創(chuàng)建分區(qū)表

數(shù)據(jù)源文件

# 文件"bj.txt" (china bj數(shù)據(jù)) 1001,張三,1999-1-9,1000.0 1002,李四,1999-2-9,2000.0 1008,孫帥,1999-9-8,50000.0 1010,王宇希,1999-10-9,10000.0 1009,劉春陽,1999-9-9,10.0 # 文件“tj.txt” (china tj數(shù)據(jù)) 1006,郭德綱,1999-6-9,6000.0 1007,胡鑫喆,1999-7-9,7000.0

建表

create external table t_user_part(id string,name string,birth date,salary double )partitioned by(country string,city string)--指定分區(qū)列,按照國家和城市分區(qū)。 row format delimited fields terminated by ',' lines terminated by '\n';

創(chuàng)建分區(qū)表并導(dǎo)入數(shù)據(jù)

# 導(dǎo)入china和bj的數(shù)據(jù) load data local inpath "/opt/bj.txt" into table t_user_part partition(country='china',city='bj'); # 導(dǎo)入china和heb的數(shù)據(jù) load data local inpath "/opt/tj.txt" into table t_user_part partition(country='china',city='tj');

查看分區(qū)信息

show partitions t_user_part;

使用分區(qū)查詢:本質(zhì)上只要查詢條件在存在分區(qū)列

select * from t_user_part where city = 'bj'

刪除分區(qū)信息

會連同分區(qū)數(shù)據(jù)一塊刪除

外部分區(qū)表，刪除后，hive不管理數(shù)據(jù)，但是數(shù)據(jù)文件依然存在

alter table t_user_part drop partition(country='china',city='bj');

添加分區(qū)(了解)

alter table t_user_part add partition(country='china',city='heb') location '/file/t_user_part/heb'; # 表分類 1. 管理表hive中table數(shù)據(jù)和hdfs數(shù)據(jù)文件都是被hive管理。 2. 外部表--常用--hdfs文件安全。hive的table數(shù)據(jù)，如果刪除hive中的table，外部hdfs的數(shù)據(jù)文件依舊保留。 3. 分區(qū)表--重要。將table按照不同分區(qū)管理。好處：如果where條件中有分區(qū)字段，則Hive會自動對分區(qū)內(nèi)的數(shù)據(jù)進行檢索(不再掃描其他分區(qū)數(shù)據(jù))，提高h(yuǎn)ive的查詢效率。

Hive自定義函數(shù)

內(nèi)置函數(shù)

# 查看hive內(nèi)置函數(shù) show functions; # 查看函數(shù)描述信息 desc function max;

用戶自定義函數(shù)UDF

用戶定義函數(shù)-UDF:user-defined function

操作作用于單個數(shù)據(jù)行，并且產(chǎn)生一個數(shù)據(jù)行作為輸出。大多數(shù)函數(shù)都屬于這一類（比如數(shù)學(xué)函數(shù)和字符串函數(shù)）。

用戶定義函數(shù)-UDF

user-defined function

簡單來說：

UDF:返回對應(yīng)值，一對一

# 0. 導(dǎo)入hive依賴 <dependency><groupId>org.apache.hive</groupId><artifactId>hive-exec</artifactId><version>1.2.1</version> </dependency> # 1.定義一個類繼承UDF 1. 必須繼承UDF 2. 方法名必須是evaluate import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; @Description(name = "hello",value = "hello(str1,str2)-用來獲取 '你好 str1,str2 有美女嗎?'的結(jié)果"//這里的中文解釋以后看的時候會有亂碼，最好寫英文。 ) public class HelloUDF extends UDF {// 方法名必須交evaluatepublic String evaluate(String s1,String s2){return "你好，"+s1+","+s2+"有美女嗎?";} } # 2. 配置maven打包環(huán)境，打包jar <properties><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <build><finalName>funcHello</finalName><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-jar-plugin</artifactId><version>2.4</version><configuration><includes><include>**/function/**</include></includes></configuration></plugin></plugins></build> # 打包 mvn package # 3. 上傳linux，導(dǎo)入到函數(shù)庫中。 # 在hive命令中執(zhí)行 add jar /opt/doc/funcHello.jar; # hive session級別的添加， delete jar /opt/doc/funcHello.jar; # 如果重寫，記得刪除。create [temporary] function hello as "function.HelloUDF"; # temporary是會話級別。 # 刪除導(dǎo)入的函數(shù) drop [temporary] function hello; # 4. 查看函數(shù)并使用函數(shù) -- 1. 查看函數(shù) desc function hello; desc function extended hello; -- 2. 使用函數(shù)進行查詢 select hello(userid,cityname) from logs;

導(dǎo)入奇葩的依賴方法-pentahu

# 下載 https://public.nexus.pentaho.org/repository/proxied-pentaho-public-repos-group/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde-javadoc.jar # 放在本地英文目錄下 D:\work\pentaho-aggdesigner-algorithm-5.1.5-jhyde-javadoc.jar # 執(zhí)行mvn安裝本地依賴的命令 D:\work> mvn install:install-file -DgroupId=org.pentaho -DartifactId=pentaho-aggdesigner-algorithm -Dversion=5.1.5-jhyde -Dpackaging=jar -Dfile=pentaho-aggdesigner-algorithm-5.1.5-jhyde-javadoc.jar

總結(jié)

以上是生活随笔為你收集整理的Hive的安装和使用以及Java操作hive的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： ueditor video 设置宽高的问
下一篇：【四】Java流程控制

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

java

Hive的安装和使用以及Java操作hive

Hive 引言

簡介

Hive 的架構(gòu)

1. 簡介

2. 圖

Hive的安裝

1. 安裝mysql數(shù)據(jù)庫

2. 安裝Hadoop

3. 安裝hive

1 上傳hive安裝包到linux中

2 解壓縮hive

3 配置環(huán)境變量

4 加載系統(tǒng)配置生效

5 配置hive

4 啟動

1. 啟動 hadoop

2. 本地啟動hive

3. 啟動Hive的兩種方式

3.hive的客戶端和服務(wù)端

beeline客戶端

DBeaver客戶端(圖形化界面)

JDBC

4. 數(shù)據(jù)類型

Hive數(shù)據(jù)導(dǎo)入

1.自定義分隔符

2.JSON分割符

3. 正則分隔符

HQL高級

HQL高級

全排序和局部排序

Hive中表分類

4.1 管理表

4.2 外部表

4.3 分區(qū)表

4.3.1 創(chuàng)建分區(qū)表

Hive自定義函數(shù)

內(nèi)置函數(shù)

用戶自定義函數(shù)UDF

導(dǎo)入奇葩的依賴方法-pentahu

總結(jié)