當前位置：首頁 >

Hive的数据加载与导出

發布時間：2024/7/5 31 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hive的数据加载与导出小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

普通表的加載

1.load方式

load data [local] inpath [源文件路徑] into table 目標表名;

從HDFS上加載數據，本質上是移動文件所在的路徑

load data inpath '/user/student.txt' into table student;

從本地加載數據，本質上是復制本地的文件到HDFS上

load data local inpath '/user/student.txt' into table student;

2.insert方式

插入一條數據(單重，先生成臨時表再拷貝到student中，效率低)

insert into table student values(00011,黃海霞,18);

插入多條數據(單重，查詢結果導入student中，效率低)

insert into table student select * from stu where age >=18;

多重插入(只掃描一次源表，將結果插入到多個新表中，效率高，常用)

from stu insert into table student01 select * where age >=18 insert into table student02 select * where age <18;

分區表的加載（常用）

一般不使用load方式，因為這種方式不會自動檢驗原表與目標表的列是否對應，數據易出錯；
一般也不建議使用insert方式單條插入；

靜態分區，即分區個數較少，可列舉：

1)手動添加分區

ALTER TABLE student ADD if not exists PARTITION(city='beijing');

ALTER TABLE student ADD if not exists PARTITION(city='shanghai');

2)添加數據

from stu insert into table student01 partition(city='beijing') select id, sname, age where city='beijing' insert into table student02 partition(city='shanghai') select id, sname, age where city='shanghai';

動態分區，即分區個數較多，比如日期、年齡：

1)修改分區模式為非嚴格模式

set hive.exec.dynamic.partition.mode = nonstrict;#hive2版本，默認是strict

set hive.exec.dynamic.partition = true;#hive1版本，先開啟動態分區

set hive.exec.dynamic.partition.mode = nonstrict;#hive1版本，再開啟非嚴格模式

2)添加數據

若city是指定的自動分區字段，則select中必須包含city，且在最后一個；
若分區字段是兩個，city和age,則partition(city,age),city為主，age為次，select中city,age在最后且順序不能變；

from stu

insert into table student01 partition(city) select id, sname, age, city

insert into table student02 partition(city) select id, sname, age, city ;

分桶表的加載

不允許使用load方式；

1）先建一個分桶表student；

CREATE TABLE if not exists student(id int ,sname string ,age int, city string) clustered BY (age) sorted BY (city) INTO 3 buckets ROW FORMAT delimited FIELDS terminated BY ','

2）添加數據

insert into table student select * from stu;

添加數據的時候，reducetask 實際運行個數，默認值是1，但是因為分3個桶，因此此時reducetask 實際運行個數=3；
reducetask 最大運行個數是1009；
每一個reduce的吞吐量是256M；

=================================================================

數據的導出

單重

insert overwrite directory '/user/stu01.txt' select * from student where age >=18; #到HDFS

insert overwrite local directory '/user/stu01.txt' select * from student where age >=18; #到本地

多重

from student insert overwrite local directory '/user/stu01.txt' select * where age >=18 insert overwrite local directory '/user/stu01.txt' select * where age >=18;

到HDFS的話，去掉local即可

總結

以上是生活随笔為你收集整理的Hive的数据加载与导出的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python模块time_Python模
下一篇：如何并行运行程序