當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hive-分区分桶操作

發(fā)布時間：2025/3/8 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hive-分区分桶操作小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

在大數(shù)據(jù)中，最常用的一種思想就是分治，我們可以把大的文件切割劃分成一個個的小的文件，這樣每次操作一個小的文件就會很容易了，同樣的道理，在hive當中也是支持這種思想的，就是我們可以把大的數(shù)據(jù)，按照每天，或者每小時進行切分成一個個的小的文件，這樣去操作小的文件就會容易得多了。

一、分區(qū)表操作

企業(yè)常見的分區(qū)規(guī)則：按天進行分區(qū)（一天一個分區(qū)）

1、創(chuàng)建分區(qū)表語法

create table score(s_id string,c_id string, s_score int) partitioned by (month string) row format delimited fields terminated by '\t';

2、創(chuàng)建一個表帶多個分區(qū)

create table score2 (s_id string,c_id string, s_score int) partitioned by (year string,month string,day string) row format delimited fields terminated by '\t';

3、加載數(shù)據(jù)到分區(qū)表中

load data local inpath '/export/servers/hivedatas/score.csv' into table score partition (month='201806');

4、加載數(shù)據(jù)到一個多分區(qū)的表中去

load data local inpath '/export/servers/hivedatas/score.csv' into table score2 partition(year='2018',month='06',day='01');

5、多分區(qū)聯(lián)合查詢使用union all來實現(xiàn)

select * from score where month = '201806' union all select * from score where month = '201806'; 1

6、查看分區(qū)

show partitions score;

7、添加一個分區(qū)

alter table score add partition(month='201805');

8、同時添加多個分區(qū)

alter table score add partition(month='201804') partition(month = '201803');

注意：添加分區(qū)之后就可以在hdfs文件系統(tǒng)當中看到表下面多了一個文件夾

9、刪除分區(qū)

alter table score drop partition(month = '201806');

特別強調(diào):
分區(qū)字段絕對不能出現(xiàn)在數(shù)據(jù)庫表已有的字段中!

作用:
將數(shù)據(jù)按區(qū)域劃分開，查詢時不用掃描無關(guān)的數(shù)據(jù)，加快查詢速度。

二、分桶表操作

是在已有的表結(jié)構(gòu)之上新添加了特殊的結(jié)構(gòu)。

將數(shù)據(jù)按照指定的字段進行分成多個桶中去，說白了就是將數(shù)據(jù)按照字段進行劃分，可以將數(shù)據(jù)按照字段劃分到多個文件當中去

1、開啟hive的桶表功能

set hive.enforce.bucketing=true;

2、設(shè)置reduce的個數(shù)

set mapreduce.job.reduces=3;

3、創(chuàng)建桶表

create table course (c_id string,c_name string,t_id string) clustered by(c_id) into 3 buckets row format delimited fields terminated by '\t';

桶表的數(shù)據(jù)加載，由于通標的數(shù)據(jù)加載通過hdfs dfs -put文件或者通過load data均不好使，只能通過insert overwrite

創(chuàng)建普通表，并通過insert overwrite的方式將普通表的數(shù)據(jù)通過查詢的方式加載到桶表當中去

4、創(chuàng)建普通表

create table course_common (c_id string,c_name string,t_id string) row format delimited fields terminated by '\t';

5、普通表中加載數(shù)據(jù)

load data local inpath '/export/servers/hivedatas/course.csv' into table course_common;

6、通過insert overwrite給桶表中加載數(shù)據(jù)

insert overwrite table course select * from course_common cluster by(c_id);

特別強調(diào):
分桶字段必須是表中的字段。

分桶邏輯:
對分桶字段求哈希值,用哈希值與分桶的數(shù)量取余,余幾,這個數(shù)據(jù)就放在哪個桶內(nèi)。

總結(jié)

以上是生活随笔為你收集整理的Hive-分区分桶操作的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Redis（二）：Redis入门与性能测
下一篇： weblogic命令行操作