日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hive _偏门常用查询函数(二)附带实例(列转行、窗口函数)

發(fā)布時間:2024/2/28 编程问答 37 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Hive _偏门常用查询函数(二)附带实例(列转行、窗口函数) 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

接上篇博客:

Hive _偏門常用查詢函數(shù)(一)附帶實(shí)例

https://blog.csdn.net/qq_41946557/article/details/102904642


列轉(zhuǎn)行

1.函數(shù)說明

EXPLODE(col):將hive一中復(fù)雜的array或者map結(jié)構(gòu)拆分成多行。

LATERAL VIEW

用法:LATERAL VIEW udtf(expression) tableAlias AS columnAlias

解釋:用于和split, explode等UDTF一起使用,它能夠?qū)⒁涣袛?shù)據(jù)拆成多行數(shù)據(jù),在此基礎(chǔ)上可以對拆分后的數(shù)據(jù)進(jìn)行聚合。

2.數(shù)據(jù)準(zhǔn)備

《疑犯追蹤》 懸疑,動作,科幻,劇情 《Lie to me》 懸疑,警匪,動作,心理,劇情 《戰(zhàn)狼2》 戰(zhàn)爭,動作,災(zāi)難

建表:

create table movie_info(movie string, category array<string>) row format delimited fields terminated by "\t" collection items terminated by ","; load data local inpath "/root/movie" into table movie_info;

?需求:將電影分類中的數(shù)組數(shù)據(jù)展開。

selectmovie,category_name from movie_info lateral view explode(category) table_tmp as category_name;

結(jié)果展示:


窗口函數(shù)

1.相關(guān)函數(shù)說明

OVER():指定分析函數(shù)工作的數(shù)據(jù)窗口大小,這個數(shù)據(jù)窗口大小可能會隨著行的變而變化

CURRENT ROW:當(dāng)前行

n?PRECEDING:往前n行數(shù)據(jù)

n?FOLLOWING:往后n行數(shù)據(jù)

UNBOUNDED:起點(diǎn),UNBOUNDED PRECEDING 表示從前面的起點(diǎn), UNBOUNDED?FOLLOWING表示到后面的終點(diǎn)

LAG(col,n):往前第n行數(shù)據(jù)

LEAD(col,n):往后第n行數(shù)據(jù)

NTILE(n):把有序分區(qū)中的行分發(fā)到指定數(shù)據(jù)的組中,各個組有編號,編號從1開始,對于每一行,NTILE返回此行所屬的組的編號。注意:n必須為int類型。

2.數(shù)據(jù)準(zhǔn)備:name,orderdate,cost

jack,2017-01-01,10 tony,2017-01-02,15 jack,2017-02-03,23 tony,2017-01-04,29 jack,2017-01-05,46 jack,2017-04-06,42 tony,2017-01-07,50 jack,2017-01-08,55 mart,2017-04-08,62 mart,2017-04-09,68 neil,2017-05-10,12 mart,2017-04-11,75 neil,2017-06-12,80 mart,2017-04-13,94

建表:

create table business( name string, orderdate string, cost int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; load data local inpath "/root/business" into table business;

需求:

  • 查詢在2017年4月份購買過的顧客及總?cè)藬?shù)
select name,count(*) over () from business where substring(orderdate,1,7) = '2017-04' group by name;

結(jié)果展示:

  • 查詢顧客的購買明細(xì)及月購買總額
select name,orderdate,cost,sum(cost) over(partition by month(orderdate)) from business;select name,orderdate,cost,sum(cost) over(distribute by month(orderdate)) from business;

結(jié)果展示:

+-------+-------------+-------+---------------+--+ | name | orderdate | cost | sum_window_0 | +-------+-------------+-------+---------------+--+ | jack | 2017-01-01 | 10 | 205 | | jack | 2017-01-08 | 55 | 205 | | tony | 2017-01-07 | 50 | 205 | | jack | 2017-01-05 | 46 | 205 | | tony | 2017-01-04 | 29 | 205 | | tony | 2017-01-02 | 15 | 205 | | jack | 2017-02-03 | 23 | 23 | | mart | 2017-04-13 | 94 | 341 | | jack | 2017-04-06 | 42 | 341 | | mart | 2017-04-11 | 75 | 341 | | mart | 2017-04-09 | 68 | 341 | | mart | 2017-04-08 | 62 | 341 | | neil | 2017-05-10 | 12 | 12 | | neil | 2017-06-12 | 80 | 80 | +-------+-------------+-------+---------------+--+

?

  • 上述的場景,要將cost按照日期進(jìn)行累加

按部分析:

0: jdbc:hive2://henu2:10000> select * from business order by orderdate;

日期 排序查詢后的結(jié)果:

+----------------+---------------------+----------------+--+ | business.name | business.orderdate | business.cost | +----------------+---------------------+----------------+--+ | jack | 2017-01-01 | 10 | | tony | 2017-01-02 | 15 | | tony | 2017-01-04 | 29 | | jack | 2017-01-05 | 46 | | tony | 2017-01-07 | 50 | | jack | 2017-01-08 | 55 | | jack | 2017-02-03 | 23 | | jack | 2017-04-06 | 42 | | mart | 2017-04-08 | 62 | | mart | 2017-04-09 | 68 | | mart | 2017-04-11 | 75 | | mart | 2017-04-13 | 94 | | neil | 2017-05-10 | 12 | | neil | 2017-06-12 | 80 | +----------------+---------------------+----------------+--+

?cost按照日期進(jìn)行累加

0: jdbc:hive2://henu2:10000> select *,sum(cost) over(sort by orderdate rows between UN BOUNDED PRECEDING and CURRENT ROW) from business;

結(jié)果展示:

?

三行一算:

0: jdbc:hive2://henu2:10000> select *,sum(cost) over(sort by orderdate rows between 1 preceding and 1 following) from business;

?結(jié)果展示:

另外補(bǔ)充,自行演示:?

select name,orderdate,cost, sum(cost) over() as sample1,--所有行相加 sum(cost) over(partition by name) as sample2,--按name分組,組內(nèi)數(shù)據(jù)相加 sum(cost) over(partition by name order by orderdate) as sample3,--按name分組,組內(nèi)數(shù)據(jù)累加 sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--和sample3一樣,由起點(diǎn)到當(dāng)前行的聚合 sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --當(dāng)前行和前面一行做聚合 sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--當(dāng)前行和前邊一行及后面一行 sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --當(dāng)前行及后面所有行 from business;

?結(jié)果展示:

  • 查詢顧客上次的購買時間
0: jdbc:hive2://henu2:10000> select *,lag(orderdate,1) over(distribute by name sort byorderdate) from business;

結(jié)果展示:

補(bǔ)充:

select *, lag(orderdate,1) over(distribute by name sort by orderdate), lead(orderdate,1) over(distribute by name sort by orderdate) from business;

結(jié)果展示:?

?

  • 查詢前20%時間的訂單信息

首先:分為5個組

select *,ntile(5) over(sort by orderdate) from business;

結(jié)果展示:

最終語句:

select name,orderdate,cost from (select name,orderdate,cost,ntile(5) over(sort by orderdate) gid from business) t where t.gid = 1;

結(jié)果展示:

?

總結(jié)

以上是生活随笔為你收集整理的Hive _偏门常用查询函数(二)附带实例(列转行、窗口函数)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。