當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

HiveQL：查询

發(fā)布時(shí)間：2024/7/5 编程问答 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 HiveQL：查询小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

文章目錄

- 1. select from
- - 1.1 正則表達(dá)式指定列
  - 1.2 使用列值計(jì)算
  - 1.3 使用函數(shù)
  - 1.4 limit 限制返回行數(shù)
  - 1.5 別名 as name
  - 1.6 case when then 語句
- 2. where 語句
- 3. JOIN 優(yōu)化
- 4. 抽樣查詢
- 5. union all

學(xué)自《hive編程指南》

1. select from

hive (default)> create table employees(> name string,> salary float,> subordinates array<string>,> deductions map<string, float>,> address struct<street:string, city:string, state:string, zip:int>)> partitioned by(country string, state string);hive (default)> load data local inpath "/home/hadoop/workspace/employees.txt"> overwrite into table employees> partition(country='US', state='CA'); Loading data to table default.employees partition (country=US, state=CA)hive (default)> select * from employees; John Doe 100000.0 ["Mary Smith","Todd Jones"] {"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1} {"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600} US CA Mary Smith 80000.0 ["Bill King"] {"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1} {"street":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601} US CA Todd Jones 70000.0 [] {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1} {"street":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700} US CA Bill King 60000.0 [] {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1} {"street":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100} US CA Boss Man 200000.0 ["John Doe","Fred Finance"] {"Federal Taxes":0.3,"State Taxes":0.07,"Insurance":0.05} {"street":"1 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500} US CA Fred Finance 150000.0 ["Stacy Accountant"] {"Federal Taxes":0.3,"State Taxes":0.07,"Insurance":0.05} {"street":"2 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500} US CA Stacy Accountant 60000.0 [] {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1} {"street":"300 Main St.","city":"Naperville","state":"IL","zip":60563} US CA

可以對(duì)表起別名

hive (default)> select name, salary from employees; hive (default)> select e.name, e.salary from employees e;John Doe 100000.0 Mary Smith 80000.0 Todd Jones 70000.0 Bill King 60000.0 Boss Man 200000.0 Fred Finance 150000.0 Stacy Accountant 60000.0

提取數(shù)組元素 [idx]，不存在為NULL，提取出的字符串也沒有引號(hào)

hive (default)> select e.name, e.subordinates[0] from employees e;John Doe Mary Smith Mary Smith Bill King Todd Jones NULL Bill King NULL Boss Man John Doe Fred Finance Stacy Accountant Stacy Accountant NULL

提取 map 元素 [key]

hive (default)> select e.name, e.deductions['State Taxes'] from employees e;John Doe 0.05 Mary Smith 0.05 Todd Jones 0.03 Bill King 0.03 Boss Man 0.07 Fred Finance 0.07 Stacy Accountant 0.03

提取 struct 中的元素，使用 .

hive (default)> select e.name, e.address.city from employees e;John Doe Chicago Mary Smith Chicago Todd Jones Oak Park Bill King Obscuria Boss Man Chicago Fred Finance Chicago Stacy Accountant Naperville

1.1 正則表達(dá)式指定列

select `price.*` from stocks;

以 price為前綴的列

1.2 使用列值計(jì)算

計(jì)算稅后薪資

hive (default)> select upper(name), salary, deductions['Federal Taxes'],> round(salary*(1-deductions['Federal Taxes'])) from employees;JOHN DOE 100000.0 0.2 80000.0 MARY SMITH 80000.0 0.2 64000.0 TODD JONES 70000.0 0.15 59500.0 BILL KING 60000.0 0.15 51000.0 BOSS MAN 200000.0 0.3 140000.0 FRED FINANCE 150000.0 0.3 105000.0 STACY ACCOUNTANT 60000.0 0.15 51000.0

1.3 使用函數(shù)

聚合函數(shù)

select count(*), avg(salary) from employees; set hive.map.aggr=true; # 可以提高聚合性能，但需要更多內(nèi)存 select distinct address.city from employees; # distinct 去重

表生成函數(shù)，將單列擴(kuò)展為多行或者多列

hive (default)> select explode(subordinates) as sub from employees;Mary Smith Todd Jones Bill King John Doe Fred Finance Stacy Accountant

內(nèi)置函數(shù)

1.4 limit 限制返回行數(shù)

limit n 返回 n 行

1.5 別名 as name

1.6 case when then 語句

hive (default)> select name, salary,> case when salary < 50000 then 'low'> else 'high'> end as bracket from employees;John Doe 100000.0 high Mary Smith 80000.0 high Todd Jones 70000.0 high Bill King 60000.0 high Boss Man 200000.0 high Fred Finance 150000.0 high Stacy Accountant 60000.0 high

2. where 語句

過濾條件
like, rlike(正則)

hive (default)> select name, address.street from employees where address.street like "%Ave."; OK John Doe 1 Michigan Ave. Todd Jones 200 Chicago Ave.hive (default)> select name, address.street from employees where address.street like "%Chi%"; OK Todd Jones 200 Chicago Ave.hive (default)> select name, address.street from employees where address.street rlike ".*(Chicago|Ontario).*"; OK Mary Smith 100 Ontario St. Todd Jones 200 Chicago Ave.

3. JOIN 優(yōu)化

多個(gè)表 join 把小的表放在左邊

4. 抽樣查詢

分桶抽樣

hive> select name from employees tablesample(bucket 3 out of 4 on rand()); John Doehive> select name from employees tablesample(bucket 3 out of 4 on rand()); Boss Man Fred Finance

不使用 rand()，每次結(jié)果是一樣的

hive> select name from employees tablesample(bucket 3 out of 4 on name); Mary Smith Todd Joneshive> select name from employees tablesample(bucket 3 out of 4 on name); Mary Smith Todd Jones

百分比抽樣

hive> select name from employees tablesample(70 percent);John Doe Mary Smith Todd Jones Bill King Boss Man

5. union all

將多個(gè)表進(jìn)行合并，每個(gè)表必須有相同的列，且字段類型一致

hive> select name from(> select e1.name from employees e1 where e1.name like "Mary%"> union all> select e2.name from employees e2 where e2.name like "Bill%"> ) name_tab> sort by name;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = hadoop_20210411221203_b3dde291-8596-4b91-95e0-707eeaa873f6 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number> In order to set a constant number of reducers:set mapreduce.job.reduces=<number> Job running in-process (local Hadoop) 2021-04-11 22:12:04,856 Stage-1 map = 100%, reduce = 100% Ended Job = job_local1468526053_0003 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 31360 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msecBill King Mary Smith

總結(jié)

以上是生活随笔為你收集整理的HiveQL：查询的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

hiveql

上一篇： LeetCode 878. 第 N 个神
下一篇： Chapter4-2_Speech_Sy