當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Hive中排序和聚集

發(fā)布時(shí)間：2024/4/14 编程问答 38 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hive中排序和聚集小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

//五種子句是有嚴(yán)格順序的： where → group by → having → order by → limit
//where和having的區(qū)別:
//where是先過(guò)濾再分組(對(duì)原始數(shù)據(jù)過(guò)濾),where限定聚合函數(shù)
hive> select count(*),age from tea where id>18 group by age;

//having是先分組再過(guò)濾(對(duì)每個(gè)組進(jìn)行過(guò)濾,having后只能跟select中已有的列)
hive> select age,count(*) c from tea group by age having c>2;

//group by后面沒(méi)有的列,select后面也絕不能有(聚合函數(shù)除外)
hive> select ip,sum(load) as c from logs group by ip sort by c desc limit 5;

//distinct關(guān)鍵字返回唯一不同的值(返回age和id均不相同的記錄)
hive> select distinct age,id from tea;

//hive只支持Union All,不支持Union
//hive的Union All相對(duì)sql有所不同,要求列的數(shù)量相同,并且對(duì)應(yīng)的列名也相同,但不要求類(lèi)的類(lèi)型相同(可能是存在隱式轉(zhuǎn)換吧)
select name,age from tea where id<80
union all
select name,age from stu where age>18;

Order?By特性：

對(duì)數(shù)據(jù)進(jìn)行全局排序，只有一個(gè)reducer?task，效率低下。
與mysql中?order?by區(qū)別在于：在?strict?模式下，必須指定?limit，否則執(zhí)行會(huì)報(bào)錯(cuò)

? 使用命令set hive.mapred.mode; 查詢(xún)當(dāng)前模式 ? 使用命令set hive.mapred.mode=strick; 設(shè)置當(dāng)前模式 hive> select * from logs where date='2015-01-02' order by te; FAILED: SemanticException 1:52 In strict mode,if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'te'

對(duì)于分區(qū)表，還必須顯示指定分區(qū)字段查詢(xún)

hive> select * from logs order by te limit 5; FAILED: SemanticException [Error 10041]: No partition predicate found for Alias "logs" Table "logs"

?Sort?BY特性：

可以有多個(gè)Reduce?Task（以DISTRIBUTE?BY后字段的個(gè)數(shù)為準(zhǔn)）。也可以手工指定：set?mapred.reduce.tasks=4;
每個(gè)Reduce?Task?內(nèi)部數(shù)據(jù)有序，但全局無(wú)序?

set mapred.reduce.tasks = 2; insert overwrite local directory '/root/hive/b'select * from logs sort by te;

?上述查詢(xún)語(yǔ)句，將結(jié)果保存在本地磁盤(pán)?/root/hive/b?，此目錄下產(chǎn)生2個(gè)結(jié)果文件：000000_0?+?000001_0?。每個(gè)文件中依據(jù)te字段排序。?

Distribute?by特性：

按照指定的字段對(duì)數(shù)據(jù)進(jìn)行劃分到不同的輸出 reduce?文件中
distribute?by相當(dāng)于MR?中的paritioner，默認(rèn)是基于hash?實(shí)現(xiàn)的
distribute?by通常與Sort by連用

set mapred.reduce.tasks = 2; insert overwrite local directory '/root/hive/b'select * from logsdistribute by datesort by te;

Cluster?By特性：

如果?Sort?By?和?Distribute?By?中所有的列相同，可以縮寫(xiě)為Cluster?By以便同時(shí)指定兩者所使用的列。
注意被cluster by指定的列只能是降序，不能指定asc和desc。一般用于桶表

set mapred.reduce.tasks = 2; insert overwrite local directory '/root/hive/b'select * from logscluster by date;

轉(zhuǎn)載于:https://www.cnblogs.com/skyl/p/4736477.html

總結(jié)

以上是生活随笔為你收集整理的Hive中排序和聚集的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

Hive

上一篇： Coreseek-带中文分词的Sphin
下一篇： java Socket 长连接心跳包

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

Hive中排序和聚集

總結(jié)