當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hive- 表

發(fā)布時間：2025/4/14 编程问答 40 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hive- 表小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

在hive中表的類型：管理表和托管表（外部表）。

內(nèi)部表也稱之為MANAGER_TABLE,默認存儲在/user/hive/warehouse下，也可以通過location指定；刪除表時，會刪除表的數(shù)據(jù)以及元數(shù)據(jù)；

外部表稱之為EXTERNAL_TABLE。在創(chuàng)建表時可以自己指定目錄位置（LOCATION），數(shù)據(jù)存儲所在的目錄；刪除表時，只會刪除元數(shù)據(jù)不會刪除表數(shù)據(jù)；

創(chuàng)建外部表實例

create external table if not exists default.emp_ext( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int ) row format delimited fields terminated by '\t' location '/opt/input／emp';

分區(qū)表實際上就是對應(yīng)一個HDFS文件系統(tǒng)上的獨立的文件夾，該文件夾下是該分區(qū)所以的數(shù)據(jù)文件。hive中的分區(qū)就是分目錄，把一個大的數(shù)據(jù)集根據(jù)業(yè)務(wù)需要分割成更小的數(shù)據(jù)集。

在查詢時通過WHERE子句中的表達來選擇所需要的指定的分區(qū)，這樣的查詢效率會提高很多。

create external table if not exists default.emp_partition( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int ) partitioned by(month string) row format delimited fields terminated by '\t';

分區(qū)表注意事項：

修復(fù)表：msck repair table table_name;

可以寫shell腳本

dfs -mkdir -p /user/hive/warehouse/dept_part/day=20171025; dfs -put /opt/weblog/log.log /user/hive/warehouse/dept_part/day=20171025;alter table dept_part and partition('day=20171025');

查看表的分區(qū)數(shù)：show partitions dept_part;

導(dǎo)入數(shù)據(jù)進入hive表

load　data [local] inpath 'filepath' [overwrite] into table tablename　into　tablename [partition (partcol1=val,...)]；

參數(shù)帶local意思是本地文件，不帶就是HDFS文件

參數(shù)帶overwrite意思是覆蓋原本文件的內(nèi)容，不帶就追加內(nèi)容

分區(qū)表加載，特殊性partition (partcol1=val,...)

1.加載本地文件到hive表

load data local inpath '/root/emp.txt' into table default.emp

2.加載hdfs文件到hive表中

load data inpath '/root/emp.txt' into table default.emp

3.加載數(shù)據(jù)覆蓋表中已有的數(shù)據(jù)

load data inpath '/root/emp.txt' overwrite into table default.emp

4.創(chuàng)建表是通過insert加載

create　table default.emp_ci like emp; insert into table default.emp_ci select * from default.emp;

5.創(chuàng)建表的時候通過指定location指定加載

導(dǎo)出hive表數(shù)據(jù)

insert overwrite local directory '/opt/datas/hive/hive_exp_emp' select * from default.emp

row format delimited fields terminated by '\t';#bin/hive -e "select * from default.emp;" > /opt/datas/hive/exp_res.txt

hive表多重插入
假如有一個需求：
從t_4中篩選出不同的數(shù)據(jù)，插入另外兩張表中；

insert overwrite table t_4_st_lt_200 partition(day='1') select ip,url,staylong from t_4 where staylong<200;insert overwrite table t_4_st_gt_200 partition(day='1') select ip,url,staylong from t_4 where staylong>200;

但是以上實現(xiàn)方式有一個弊端，兩次篩選job，要分別啟動兩次mr過程，要對同一份源表數(shù)據(jù)進行兩次讀取
如果使用多重插入語法，則可以避免上述弊端，提高效率：源表只要讀取一次即可

from t_4 insert overwrite table t_4_st_lt_200 partition(day='2') select ip,url,staylong where staylong<200 insert overwrite table t_4_st_gt_200 partition(day='2') select ip,url,staylong where staylong>200;

轉(zhuǎn)載于:https://www.cnblogs.com/RzCong/p/7732590.html

總結(jié)

以上是生活随笔為你收集整理的Hive- 表的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

Hive

上一篇： ListView实现分页
下一篇： noip提高组2000 乘积最大

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

Hive- 表

總結(jié)