當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hive关于数据表的增删改（内部表、外部表、分区表、分桶表数据类型、分隔符类型）

發(fā)布時間：2024/7/5 编程问答 56 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hive关于数据表的增删改（内部表、外部表、分区表、分桶表数据类型、分隔符类型）小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

建表

基本語句格式

CREATE [external] TABLE if not exists student #默認建立內(nèi)部表，加上external則是建立外部表(id int COMMENT'學(xué)號',sname string COMMENT'用戶名',age int COMMENT'年齡')#字段名稱，字段類型，字段描述信息 COMMENT '記錄學(xué)生學(xué)號'#表的描述信息PARTITION BY (department string COMMENT'根據(jù)部門分區(qū)')#設(shè)定分區(qū)的字段名稱，此字段為全新字段，不得是表中任意字段;clustered BY (id,age)#指定分桶的字段，此字段必須來自表中字段;sorted BY (id ASC, age DESC)#指定一個桶內(nèi)的排序規(guī)則（按照id升序，按照年齡降序）;INTO 9 buckets #指定桶的個數(shù)為9ROW FORMAT delimited fields terminated BY '\t' #指定逗號為列分隔符collection items terminated by ',' #集合中的元素間的分隔符map keys terminated by ':' #元素內(nèi)部的分隔符lines terminated BY '\n'#指定換行符為行分隔符stored AS textfile #指定最終表數(shù)據(jù)的存儲格式，默認是textfile，還可以是rcfile(行列結(jié)合的格式)/parquet(壓縮格式)location '/user/student' #指定hive上表在hdfs上的存儲路徑，默認的是配置路徑下

1.內(nèi)部表

CREATE TABLE if not exists student(id int ,sname string ,age int) ROW FORMAT delimited FIELDS terminated BY ',' LINES terminated BY '\n'；

2.外部表

CREATE external TABLE if not exists student(id int ,sname string ,age int) ROW FORMAT delimited FIELDS terminated BY ',' LINES terminated BY '\n' location '/user/student';

注意：

建外部表之后，若是從HDFS上加載數(shù)據(jù)的話（本質(zhì)上是移動數(shù)據(jù)），會將數(shù)據(jù)從原路徑移動到建表的路徑下，

若兩者路徑不一致，則會導(dǎo)致其他部門的代碼無法獲取到數(shù)據(jù)，

因此，建立外部表的時候一定要location路徑，且該路徑與數(shù)據(jù)的原路徑保持一致，

否則，會造成工作的重大失誤！！！

3.分區(qū)表

CREATE TABLE if not exists student(id int ,sname string ,age int) PARTITION BY (city string ) ROW FORMAT delimited FIELDS terminated BY ',' LINES terminated BY '\n'；

4.分桶表

CREATE TABLE if not exists student(id int ,sname string ,age int) clustered BY (id,age) sorted BY (id ASC, age DESC) INTO 9 buckets ROW FORMAT delimited FIELDS terminated BY ',' LINES terminated BY '\n'；

5.復(fù)制表

CREATE TABLE if not exists student01 LIKE student;

復(fù)制一個表結(jié)構(gòu)和student一樣的表，不復(fù)制數(shù)據(jù)
復(fù)制表student01是內(nèi)部表還是外部表與student沒關(guān)系，只與創(chuàng)建過程中是否指定了關(guān)鍵字external

6.查詢表

CREATE TABLE if not exists student01 AS select id,sname FROM student;

查詢出student表中的字段存入新建的表中，就叫查詢表

數(shù)據(jù)類型詳解

簡單類型：

string
float
int (4個字節(jié)的整數(shù))
bigint (8個字節(jié)的整數(shù))
smallint (2個字節(jié)的整數(shù))
tinyint (1個字節(jié)的整數(shù))

復(fù)雜類型：

1)array:數(shù)組，存儲多個元素，且元素是相同的數(shù)據(jù)類型，比如：beijing,shenzhen,wuhan

CREATE TABLE if not exists student(id int ,city array<string>) ROW FORMAT delimited fields terminated BY '\t' collection items terminated by ',';

查找時通過下標定位

SELECT city[0] FROM student; #輸出：beijingSELECT city FROM student; #輸出：[beijing,shenzhen,wuhan]

2)map:映射，比如：Chinese：80，math：90

CREATE TABLE if not exists student(id int ,score map<string,int>) ROW FORMAT delimited fields terminated BY '\t' collection items terminated by ',' map keys terminated by ':';

查找時通過key定位

SELECT score FROM student; #輸出：{Chinese：80，math：90} SELECT score['math'] FROM student; #輸出：90

3)struct:結(jié)構(gòu)體,存儲多個元素，元素類型可以不同，比如：mengfan,man,180

CREATE TABLE if not exists student(id int ,remarks struct<name:string,sex:string,height:int>) ROW FORMAT delimited fields terminated BY '\t' collection items terminated by ','

查找時通過.查找具體的信息

SELECT remarks FROM student; #輸出：{name：mengfan,sex：man,height：180}SELECT remarks.height FROM student; #輸出：180

分隔符詳解

1)列分隔符、元素分隔符、元素內(nèi)分隔符、行分隔符的順序

ROW FORMAT delimited fields terminated BY '\t' #指定空格為列分隔符

collection items terminated by ',' #集合中的元素間的分隔符

map keys terminated by ':' #元素內(nèi)部的分隔符

lines terminated BY '\n'#指定換行符為行分隔符

比如：有一行數(shù)據(jù)： 0001 黃海霞，孟凡語文：80，數(shù)學(xué)：90

空格是兩列數(shù)據(jù)的分隔符，逗號是兩個元素之間的分隔符，冒號是元素內(nèi)部的分隔符

三種分隔符的順序為：由外到內(nèi)，順序不可改變，行分隔符在最后；

2)多字節(jié)的分隔符

hive默認只能解析單字節(jié)分隔符，數(shù)據(jù)：0001::黃海霞::80

方法1：修改serde library，默認是lazysimpleserde,改為RegexSerDe

create table stu (id int,sname string,score string) row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe' #指定序列化類庫，正則表達式 with serdeproperties('input.regex'='(.*)::(.*)::(.*)' #定義輸入的正則表達式 ,'output.format.string'='%1$s %2$S %3$S'); #定義輸出的結(jié)果

注意：

數(shù)據(jù)中有多少個分隔符，表達式中就要有多少個(.*)::(.*)::(.*)
分隔符若是|，則需要用\\轉(zhuǎn)義，(.*)\\|\\|(.*)\\|\\|(.*)
輸出結(jié)果中的 %1$s 個數(shù)要與輸入中的(.*)的個數(shù)對應(yīng)
(.*)代表的是任意字符
$s 是占位符

方法2：修改hive底層源碼，影響較大，其他使用單字節(jié)的文件容易出錯，不推薦使用

查看表

SHOW tables#查看數(shù)據(jù)表 SHOW tables in hd_hive; #查看hd_hive庫中的數(shù)據(jù)表 SHOW tables LIKE 's*'; #查看表名是以s開頭的表 SHOW create table student; #查看student表的建表語句 SHOW PARTITIONS student; #查看student表的分區(qū) SHOW PARTITIONS student PARTITION(city='beijing'); #查看student表的beijing分區(qū)DESC student; #查看表的字段信息 DESC formatted student; #查看元數(shù)據(jù)庫中，表的字段信息（已格式化） DESC extended student; #查看元數(shù)據(jù)庫中，表的詳細信息（未格式化）

刪除表

DROP TABLE IF EXISTS student; #刪除表，包括表結(jié)構(gòu)和表數(shù)據(jù) TRUNCATE TABLE IF EXISTS student; #清空表，保留表結(jié)構(gòu)，刪除表數(shù)據(jù)

內(nèi)部表：刪除的時候，元數(shù)據(jù)庫會刪除，且HDFS上的原始數(shù)據(jù)也會被刪除；
外部表：刪除的時候，元數(shù)據(jù)庫會刪除，但HDFS上的原始數(shù)據(jù)不會被刪除；

修改表

1.修改表名

ALTER TABLE student RENAME TO stu; #修改表名，student改為stu

2.修改表字段信息

ALTER TABLE student ADD COLUMNS(department string); #添加字段 ALTER TABLE student REPLACE COLUMNS(id int ,sname string); #替換表的所有列的字段 ALTER TABLE student CHANGE sname tname string; #修改字段名稱，需同時指定新字段的類型ALTER TABLE student CHANGE age age string; #修改字段類型（有限制）int能轉(zhuǎn)化成string，只能小轉(zhuǎn)大，不可逆;但是hive1.2.2版本中沒有這個限制

3.修改表分區(qū)信息

ALTER TABLE student ADD if not exists PARTITION (city='shanghai'); #添加分區(qū) ALTER TABLE student ADD if not exists PARTITION (city='shenzhen') location '/user/student/shenzhen';#添加分區(qū)的同時指定路徑 ALTER TABLE student DROP if exists PARTITION (city='beijing'); #刪除分區(qū)

修改分區(qū)：

ALTER TABLE student PARTITION (city='shenzhen') SET location '/user/shenzhen'; #修改分區(qū)的路徑 ALTER TABLE student PARTITION (city='shenzhen') ENABLE no_drop; #防止分區(qū)被刪除 ALTER TABLE student PARTITION (city='shenzhen') ENABLE offline; #防止分區(qū)被查詢

其他輔助命令

SHOW CREATE TABLE student; # 查看建表語句，將默認語句自動補全后的建表語句

創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎勵來咯，堅持創(chuàng)作打卡瓜分現(xiàn)金大獎

總結(jié)

以上是生活随笔為你收集整理的Hive关于数据表的增删改（内部表、外部表、分区表、分桶表数据类型、分隔符类型）的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python怎么封装供java调用_py
下一篇：亚马逊出的平板电脑_亚马逊推出了这款不到