Hive之bucket表使用场景
生活随笔
收集整理的這篇文章主要介紹了
Hive之bucket表使用场景
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
來源:https://www.cnblogs.com/duanxingxing/p/5156951.html
?
前言
bucket table(桶表)是對數(shù)據(jù)進行哈希取值,然后放到不同文件中存儲;
?
應用場景
當數(shù)據(jù)量比較大,我們需要更快的完成任務,多個map和reduce進程是唯一的選擇。
但是如果輸入文件是一個的話,map任務只能啟動一個。
此時bucket table是個很好的選擇,通過指定CLUSTERED的字段,將文件通過hash打散成多個小文件。
?
create table test (id int,name string ) CLUSTERED BY(id) SORTED BY(name) INTO 32 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘/t’;?
?
執(zhí)行insert前不要忘記設置
set hive.enforce.bucketing = true;強制采用多個reduce進行輸出
?
hive> INSERT OVERWRITE TABLE test select * from test09; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 32 In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number> In order to set a constant number of reducers:set mapred.reduce.tasks=<number> Starting Job = job_201103070826_0018, Tracking URL = http://hadoop00:50030/jobdetails.jsp?jobid=job_201103070826_0018 Kill Command = /home/hjl/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hadoop00:9001 -kill job_201103070826_0018 2011-03-08 11:34:23,055 Stage-1 map = 0%, reduce = 0% 2011-03-08 11:34:27,084 Stage-1 map = 6%, reduce = 0% ************************************************* Ended Job = job_201103070826_0018 Loading data to table test 5 Rows loaded to test OK Time taken: 175.036 seconds?
?
hive的sunwg_test11文件夾下面出現(xiàn)了32個文件,而不是一個文件
?
[hadoop@hadoop00 ~]$ hadoop fs -ls /ticketdev/test Found 32 items -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000000_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000001_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000002_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000003_0 -rw-r–r– 3 ticketdev hadoop 8 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000004_0 -rw-r–r– 3 ticketdev hadoop 9 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000005_0 -rw-r–r– 3 ticketdev hadoop 8 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000006_0 -rw-r–r– 3 ticketdev hadoop 9 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000007_0 -rw-r–r– 3 ticketdev hadoop 9 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000008_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000009_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000010_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000011_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000012_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:20 /ticketdev/test/attempt_201103070826_0018_r_000013_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000014_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000015_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000016_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000017_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000018_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000019_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000020_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000021_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000022_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000023_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000024_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000025_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000026_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000027_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000028_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000029_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000030_0 -rw-r–r– 3 ticketdev hadoop 0 2011-03-08 11:21 /ticketdev/test/attempt_201103070826_0018_r_000031_0?
?
文件被打散后,可以啟動多個mapreduce task
當執(zhí)行一些操作的時候,你會發(fā)現(xiàn)系統(tǒng)啟動了32個map任務
總結
以上是生活随笔為你收集整理的Hive之bucket表使用场景的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 网络上的计算机无权限访问权限,电脑连不上
- 下一篇: 非常适合新手使用的吉他调音软件!