hive udf 分组取top1_项目实战从0到1之hive(27)数仓项目(九)数仓搭建 DWS 层
? ? ? ? ? ? ? ? ? ? ? ? ?點(diǎn)擊上方藍(lán)字關(guān)注我們? ? ? ? ? ? ? ? ??
一、數(shù)倉搭建 - DWS 層
1.1 業(yè)務(wù)術(shù)語
1)用戶
用戶以設(shè)備為判斷標(biāo)準(zhǔn),在移動統(tǒng)計(jì)中,每個獨(dú)立設(shè)備認(rèn)為是一個獨(dú)立用戶。Android
系統(tǒng)根據(jù) IMEI 號,IOS 系統(tǒng)根據(jù) OpenUDID 來標(biāo)識一個獨(dú)立用戶,每部手機(jī)一個用戶
2)新增用戶
首次聯(lián)網(wǎng)使用應(yīng)用的用戶。如果一個用戶首次打開某 APP,那這個用戶定義為新增用
戶;卸載再安裝的設(shè)備,不會被算作一次新增。新增用戶包括日新增用戶、周新增用戶、月
新增用戶
3)活躍用戶
打開應(yīng)用的用戶即為活躍用戶,不考慮用戶的使用情況。每天一臺設(shè)備打開多次會被計(jì)
為一個活躍用戶
4)周(月)活躍用戶
某個自然周(月)內(nèi)啟動過應(yīng)用的用戶,該周(月)內(nèi)的多次啟動只記一個活躍用戶
5)月活躍率
月活躍用戶與截止到該月累計(jì)的用戶總和之間的比例
6)沉默用戶
用戶僅在安裝當(dāng)天(次日)啟動一次,后續(xù)時間無再啟動行為。該指標(biāo)可以反映新增用
戶質(zhì)量和用戶與 APP 的匹配程度
7)版本分布
不同版本的周內(nèi)各天新增用戶數(shù),活躍用戶數(shù)和啟動次數(shù)。利于判斷 APP 各個版本之
間的優(yōu)劣和用戶行為習(xí)慣
8)本周回流用戶
上周未啟動過應(yīng)用,本周啟動了應(yīng)用的用戶
9)連續(xù) n 周活躍用戶
連續(xù) n 周,每周至少啟動一次
10)忠誠用戶
連續(xù)活躍 5 周以上的用戶
11)連續(xù)活躍用戶
連續(xù) 2 周及以上活躍的用戶
12)近期流失用戶
連續(xù) n(2<= n <= 4)周沒有啟動應(yīng)用的用戶。(第 n+1 周沒有啟動過)
13)留存用戶
某段時間內(nèi)的新增用戶,經(jīng)過一段時間后,仍然使用應(yīng)用的被認(rèn)作是留存用戶;這部分
用戶占當(dāng)時新增用戶的比例即是留存率
例如,5 月份新增用戶 200,這 200 人在 6 月份啟動過應(yīng)用的有 100 人,7 月份啟動過應(yīng)用的有 80 人,8 月份啟動過應(yīng)用的有 50 人;則 5 月份新增用戶一個月后的留存率是 50%,二個月后的留存率是 40%,三個月后的留存率是 25%
14)用戶新鮮度
每天啟動應(yīng)用的新老用戶比例,即新增用戶數(shù)占活躍用戶數(shù)的比例
15)單次使用時長
每次啟動使用的時間長度
16)日使用時長
累計(jì)一天內(nèi)的使用時間長度
17)啟動次數(shù)計(jì)算標(biāo)準(zhǔn)
IOS 平臺應(yīng)用退到后臺就算一次獨(dú)立的啟動;Android 平臺我們規(guī)定,兩次啟動之間的間隔小于 30 秒,被計(jì)算一次啟動。用戶在使用過程中,若因收發(fā)短信或接電話等退出應(yīng)用30 秒又再次返回應(yīng)用中,那這兩次行為應(yīng)該是延續(xù)而非獨(dú)立的,所以可以被算作一次使用行為,即一次啟動。業(yè)內(nèi)大多使用 30 秒這個標(biāo)準(zhǔn),但用戶還是可以自定義此時間間隔
1.2 系統(tǒng)函數(shù)
1.2.1 collect_set 函數(shù)
1)創(chuàng)建原數(shù)據(jù)表
drop table if exists stud;
create table stud (name string, area string, course string, score int);
2)向原數(shù)據(jù)表中插入數(shù)據(jù)
insert into table stud values('zhang3','bj','math',88);
insert into table stud values('li4','bj','math',99);
insert into table stud values('wang5','sh','chinese',92);
insert into table stud values('zhao6','sh','chinese',54);
insert into table stud values('tian7','bj','chinese',91);
3)查詢表中數(shù)據(jù)
select * from stud;
stud.name stud.area stud.course stud.score
zhang3 bj math 88
li4 bj math 99
wang5 sh chinese 92
zhao6 sh chinese 54
tian7 bj chinese 91
4)把同一分組的不同行的數(shù)據(jù)聚合成一個集合
select course, collect_set(area), avg(score) from stud group by course;
chinese ["sh","bj"] 79.0
math ["bj"] 93.5
5) 用下標(biāo)可以取某一個
select course, collect_set(area)[0], avg(score) from
stud group by course;
chinese sh 79.0
math bj 93.5
1.2.2 nvl 函數(shù)
1)基本語法
NVL(表達(dá)式 1,表達(dá)式 2)
如果表達(dá)式 1 為空值,NVL 返回值為表達(dá)式 2 的值,否則返回表達(dá)式 1 的值。 該函數(shù)的目的是把一個空值(null)轉(zhuǎn)換成一個實(shí)際的值。其表達(dá)式的值可以是數(shù)字型、字符型和日期型。但是表達(dá)式 1 和表達(dá)式 2 的數(shù)據(jù)類型必須為同一個類型
1.2.3 日期處理函數(shù)
1)date_format 函數(shù)(根據(jù)格式整理日期)
hive (gmall)> select date_format('2020-03-10','yyyy-MM');
2020-03
2)date_add 函數(shù)(加減日期)
hive (gmall)> select date_add('2020-03-10',-1);
2020-03-09
hive (gmall)> select date_add('2020-03-10',1);
2020-03-11
3)next_day 函數(shù)
(1)取當(dāng)前天的下一個周一
hive (gmall)> select next_day('2020-03-12','MO');
2020-03-16
說明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)
(2)取當(dāng)前周的周一
hive (gmall)> select date_add(next_day('2020-03-12','MO'),-7);
2020-03-11
4)last_day 函數(shù)(求當(dāng)月最后一天日期)
hive (gmall)> select last_day('2020-03-10');
2020-03-31
1.3 DWS 層(用戶行為)
1.3.1 每日設(shè)備行為
每日設(shè)備行為,主要按照 設(shè)備 id 統(tǒng)計(jì)
1)建表語句
drop table if exists dws_uv_detail_daycount;
create external table dws_uv_detail_daycount
(
`mid_id` string COMMENT '設(shè)備唯一標(biāo)識',
`user_id` string COMMENT '用戶標(biāo)識',
`version_code` string COMMENT '程序版本號',
`version_name` string COMMENT '程序版本名',
`lang` string COMMENT '系統(tǒng)語言',
`source` string COMMENT '渠道號',
`os` string COMMENT '安卓系統(tǒng)版本',
`area` string COMMENT '區(qū)域',
`model` string COMMENT '手機(jī)型號',
`brand` string COMMENT '手機(jī)品牌',
`sdk_version` string COMMENT 'sdkVersion',
`gmail` string COMMENT 'gmail',
`height_width` string COMMENT '屏幕寬高',
`app_time` string COMMENT '客戶端日志產(chǎn)生時的時間',
`network` string COMMENT '網(wǎng)絡(luò)模式',
`lng` string COMMENT '經(jīng)度',
`lat` string COMMENT '緯度',
`login_count` bigint COMMENT '活躍次數(shù)'
)
partitioned by(dt string)
stored as parquet
location '/warehouse/gmall/dws/dws_uv_detail_daycount';
2)數(shù)據(jù)裝載
insert overwrite table dws_uv_detail_daycount partition(dt='2020-03-10')
select
mid_id,
concat_ws('|', collect_set(user_id)) user_id,
concat_ws('|', collect_set(version_code)) version_code,
concat_ws('|', collect_set(version_name)) version_name,
concat_ws('|', collect_set(lang))lang,
concat_ws('|', collect_set(source)) source,
concat_ws('|', collect_set(os)) os,
concat_ws('|', collect_set(area)) area,
concat_ws('|', collect_set(model)) model,
concat_ws('|', collect_set(brand)) brand,
concat_ws('|', collect_set(sdk_version)) sdk_version,
concat_ws('|', collect_set(gmail)) gmail,
concat_ws('|', collect_set(height_width)) height_width,
concat_ws('|', collect_set(app_time)) app_time,
concat_ws('|', collect_set(network)) network,
concat_ws('|', collect_set(lng)) lng,
concat_ws('|', collect_set(lat)) lat,
count(*) login_count
from dwd_start_log
where dt='2020-03-10'
group by mid_id;
3)查詢加載結(jié)果
select * from dws_uv_detail_daycount where dt='2020-03-10';
1.4 DWS 層(業(yè)務(wù))
DWS 層的寬表字段,是站在不同維度的視角去看事實(shí)表,重點(diǎn)關(guān)注事實(shí)表的度量值
1.4.1 每日會員行為
1)建表語句
drop table if exists dws_user_action_daycount;
create external table dws_user_action_daycount
(
user_id string comment '用戶 id',
login_count bigint comment '登錄次數(shù)',
cart_count bigint comment '加入購物車次數(shù)',
cart_amount double comment '加入購物車金額',
order_count bigint comment '下單次數(shù)',
order_amount decimal(16,2) comment '下單金額',
payment_count bigint comment '支付次數(shù)',
payment_amount decimal(16,2) comment '支付金額'
) COMMENT '每日用戶行為'
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dws/dws_user_action_daycount/'
tblproperties ("parquet.compression"="lzo");
2)數(shù)據(jù)裝載
with
tmp_login as
(
select
user_id,
count(*) login_count
from dwd_start_log
where dt='2020-03-10'
and user_id is not null
group by user_id
),
tmp_cart as
(
select
user_id,
count(*) cart_count,
sum(cart_price*sku_num) cart_amount
from dwd_fact_cart_info
where dt='2020-03-10'
and user_id is not null
and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
group by user_id
),
tmp_order as
(
select
user_id,
count(*) order_count,
sum(final_total_amount) order_amount
from dwd_fact_order_info
where dt='2020-03-10'
group by user_id
) ,
tmp_payment as
(
select
user_id,
count(*) payment_count,
sum(payment_amount) payment_amount
from dwd_fact_payment_info
where dt='2020-03-10'
group by user_id
)
insert overwrite table dws_user_action_daycount partition(dt='2020-03-10')
select
user_actions.user_id,
sum(user_actions.login_count),
sum(user_actions.cart_count),
sum(user_actions.cart_amount),
sum(user_actions.order_count),
sum(user_actions.order_amount),
sum(user_actions.payment_count),
sum(user_actions.payment_amount)
from
(
select
user_id,
login_count,
0 cart_count,
0 cart_amount,
0 order_count,
0 order_amount,
0 payment_count,
0 payment_amount
from
tmp_login
union all
select
user_id,
0 login_count,
cart_count,
cart_amount,
0 order_count,
0 order_amount,
0 payment_count,
0 payment_amount
from
tmp_cart
union all
select
user_id,
0 login_count,
0 cart_count,
0 cart_amount,
order_count,
order_amount,
0 payment_count,
0 payment_amount
from tmp_order
union all
select
user_id,
0 login_count,
0 cart_count,
0 cart_amount,
0 order_count,
0 order_amount,
payment_count,
payment_amount
from tmp_payment
) user_actions
group by user_id;
3)查詢加載結(jié)果
select * from dws_user_action_daycount where dt=‘2020-03-10’;
drop table if exists dws_sku_action_daycount;
create external table dws_sku_action_daycount
(
sku_id string comment 'sku_id',
order_count bigint comment '被下單次數(shù)',
order_num bigint comment '被下單件數(shù)',
order_amount decimal(16,2) comment '被下單金額',
payment_count bigint comment '被支付次數(shù)',
payment_num bigint comment '被支付件數(shù)',
payment_amount decimal(16,2) comment '被支付金額',
refund_count bigint comment '被退款次數(shù)',
refund_num bigint comment '被退款件數(shù)',
refund_amount decimal(16,2) comment '被退款金額',
cart_count bigint comment '被加入購物車次數(shù)',
cart_num bigint comment '被加入購物車件數(shù)',
favor_count bigint comment '被收藏次數(shù)',
appraise_good_count bigint comment '好評數(shù)',
appraise_mid_count bigint comment '中評數(shù)',
appraise_bad_count bigint comment '差評數(shù)',
appraise_default_count bigint comment '默認(rèn)評價數(shù)'
) COMMENT '每日商品行為'
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dws/dws_sku_action_daycount/'
tblproperties ("parquet.compression"="lzo");
2)數(shù)據(jù)裝載
注意:如果是 23 點(diǎn) 59 下單,支付日期跨天。需要從訂單詳情里面取出支付時間是今天,訂單時間是昨天或者今天的訂單
with
tmp_order as
(
select
sku_id,
count(*) order_count,
sum(sku_num) order_num,
sum(total_amount) order_amount
from dwd_fact_order_detail
where dt='2020-03-10'
group by sku_id
),
tmp_payment as
(
select
sku_id,
count(*) payment_count,
sum(sku_num) payment_num,
sum(total_amount) payment_amount
from dwd_fact_order_detail
where dt='2020-03-10'
and order_id in
(
select
id
from dwd_fact_order_info
where (dt='2020-03-10' or dt=date_add('2020-03-10',-1))
and date_format(payment_time,'yyyy-MM-dd')='2020-03-10'
)
group by sku_id
),
tmp_refund as
(
select
sku_id,
count(*) refund_count,
sum(refund_num) refund_num,
sum(refund_amount) refund_amount
from dwd_fact_order_refund_info
where dt='2020-03-10'
group by sku_id
),
tmp_cart as
(
select
sku_id,
count(*) cart_count,
sum(sku_num) cart_num
from dwd_fact_cart_info
where dt='2020-03-10'
and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
group by sku_id
),
tmp_favor as
(
select
sku_id,
count(*) favor_count
from dwd_fact_favor_info
where dt='2020-03-10'
and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
group by sku_id
),
tmp_appraise as
(
select
sku_id,
sum(if(appraise='1201',1,0)) appraise_good_count,
sum(if(appraise='1202',1,0)) appraise_mid_count,
sum(if(appraise='1203',1,0)) appraise_bad_count,
sum(if(appraise='1204',1,0)) appraise_default_count
from dwd_fact_comment_info
where dt='2020-03-10'
group by sku_id
)
insert overwrite table dws_sku_action_daycount partition(dt='2020-03-10')
select
sku_id,
sum(order_count),
sum(order_num),
sum(order_amount),
sum(payment_count),
sum(payment_num),
sum(payment_amount),
sum(refund_count),
sum(refund_num),
sum(refund_amount),
sum(cart_count),
sum(cart_num),
sum(favor_count),
sum(appraise_good_count),
sum(appraise_mid_count),
sum(appraise_bad_count),
sum(appraise_default_count)
from
(
select
sku_id,
order_count,
order_num,
order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
from tmp_order
union all
select
sku_id,
0 order_count,
0 order_num,
0 order_amount,
payment_count,
payment_num,
payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
from tmp_payment
union all
select
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
refund_count,
refund_num,
refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
from tmp_refund
union all
select
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
cart_count,
cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
from tmp_cart
union all
select
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
from tmp_favor
union all
select
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
appraise_good_count,
appraise_mid_count,
appraise_bad_count,
appraise_default_count
from tmp_appraise
)tmp
group by sku_id;
3)查詢加載結(jié)果
select * from dws_sku_action_daycount where dt='2020-03-10';
1.4.5 每日購買行為
1)建表語句
drop table if exists dws_sale_detail_daycount;
create external table dws_sale_detail_daycount
(
user_id string comment '用戶 id',
sku_id string comment '商品 id',
user_gender string comment '用戶性別',
user_age string comment '用戶年齡',
user_level string comment '用戶等級',
order_price decimal(10,2) comment '商品價格',
sku_name string comment '商品名稱',
sku_tm_id string comment '品牌 id',
sku_category3_id string comment '商品三級品類 id',
sku_category2_id string comment '商品二級品類 id',
sku_category1_id string comment '商品一級品類 id',
sku_category3_name string comment '商品三級品類名稱',
sku_category2_name string comment '商品二級品類名稱',
sku_category1_name string comment '商品一級品類名稱',
spu_id string comment '商品 spu',
sku_num int comment '購買個數(shù)',
order_count bigint comment '當(dāng)日下單單數(shù)',
order_amount decimal(16,2) comment '當(dāng)日下單金額'
) COMMENT '每日購買行為'
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dws/dws_sale_detail_daycount/'
tblproperties ("parquet.compression"="lzo");
2)數(shù)據(jù)裝載
insert overwrite table dws_sale_detail_daycount partition(dt='2020-03-10')
select
op.user_id,
op.sku_id,
ui.gender,
months_between('2020-03-10', ui.birthday)/12 age,
ui.user_level,
si.price,
si.sku_name,
si.tm_id,
si.category3_id,
si.category2_id,
si.category1_id,
si.category3_name,
si.category2_name,
si.category1_name,
si.spu_id,
op.sku_num,
op.order_count,
op.order_amount
from
(
select
user_id,
sku_id,
sum(sku_num) sku_num,
count(*) order_count,
sum(total_amount) order_amount
from dwd_fact_order_detail
where dt='2020-03-10'
group by user_id, sku_id
)op
join
(
select
*
from dwd_dim_user_info_his
where end_date='9999-99-99'
)ui on op.user_id = ui.id
join
(
select
*
from dwd_dim_sku_info
where dt='2020-03-10'
)si on op.sku_id = si.id;
3)查詢加載結(jié)果
select * from dws_sale_detail_daycount where dt='2020-03-10';
1.5 DWS 層數(shù)據(jù)導(dǎo)入腳本
1)vim dwd_to_dws.sh
在腳本中填寫如下內(nèi)容
#!/bin/bash
APP=gmall
hive=/opt/modules/hive/bin/hive
# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當(dāng)前時間的前一天
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table ${APP}.dws_uv_detail_daycount partition(dt='$do_date')
select
mid_id,
concat_ws('|', collect_set(user_id)) user_id,
concat_ws('|', collect_set(version_code)) version_code,
concat_ws('|', collect_set(version_name)) version_name,
concat_ws('|', collect_set(lang))lang,
concat_ws('|', collect_set(source)) source,
concat_ws('|', collect_set(os)) os,
concat_ws('|', collect_set(area)) area,
concat_ws('|', collect_set(model)) model,
concat_ws('|', collect_set(brand)) brand,
concat_ws('|', collect_set(sdk_version)) sdk_version,
concat_ws('|', collect_set(gmail)) gmail,
concat_ws('|', collect_set(height_width)) height_width,
concat_ws('|', collect_set(app_time)) app_time,
concat_ws('|', collect_set(network)) network,
concat_ws('|', collect_set(lng)) lng,
concat_ws('|', collect_set(lat)) lat,
count(*) login_count
from ${APP}.dwd_start_log
where dt='$do_date'
group by mid_id;
with
tmp_login as
(
select
user_id,
count(*) login_count
from ${APP}.dwd_start_log
where dt='$do_date'
and user_id is not null
group by user_id
),
tmp_cart as
(
select
user_id,
count(*) cart_count,
sum(cart_price*sku_num) cart_amount
from ${APP}.dwd_fact_cart_info
where dt='$do_date'
and user_id is not null
and date_format(create_time,'yyyy-MM-dd')='$do_date'
group by user_id
),
tmp_order as
(
select
user_id,
count(*) order_count,
sum(final_total_amount) order_amount
from ${APP}.dwd_fact_order_info
where dt='$do_date'
group by user_id
) ,
tmp_payment as
(
select
user_id,
count(*) payment_count,
sum(payment_amount) payment_amount
from ${APP}.dwd_fact_payment_info
where dt='$do_date'
group by user_id
)
insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')
select
user_actions.user_id,
sum(user_actions.login_count),
sum(user_actions.cart_count),
sum(user_actions.cart_amount),
sum(user_actions.order_count),
sum(user_actions.order_amount),
sum(user_actions.payment_count),
sum(user_actions.payment_amount)
from
(
select
user_id,
login_count,
0 cart_count,
0 cart_amount,
0 order_count,
0 order_amount,
0 payment_count,
0 payment_amount
from
tmp_login
union all
select
user_id,
0 login_count,
cart_count,
cart_amount,
0 order_count,
0 order_amount,
0 payment_count,
0 payment_amount
from
tmp_cart
union all
select
user_id,
0 login_count,
0 cart_count,
0 cart_amount,
order_count,
order_amount,
0 payment_count,
0 payment_amount
from tmp_order
union all
select
user_id,
0 login_count,
0 cart_count,
0 cart_amount,
0 order_count,
0 order_amount,
payment_count,
payment_amount
from tmp_payment
) user_actions
group by user_id;
with
tmp_order as
(
select
sku_id,
count(*) order_count,
sum(sku_num) order_num,
sum(total_amount) order_amount
from ${APP}.dwd_fact_order_detail
where dt='$do_date'
group by sku_id
),
tmp_payment as
(
select
sku_id,
count(*) payment_count,
sum(sku_num) payment_num,
sum(total_amount) payment_amount
from ${APP}.dwd_fact_order_detail
where dt='$do_date'
and order_id in
(
select
id
from ${APP}.dwd_fact_order_info
where (dt='$do_date' or dt=date_add('$do_date',-1))
and date_format(payment_time,'yyyy-MM-dd')='$do_date'
)
group by sku_id
),
tmp_refund as
(
select
sku_id,
count(*) refund_count,
sum(refund_num) refund_num,
sum(refund_amount) refund_amount
from ${APP}.dwd_fact_order_refund_info
where dt='$do_date'
group by sku_id
),
tmp_cart as
(
select
sku_id,
count(*) cart_count,
sum(sku_num) cart_num
from ${APP}.dwd_fact_cart_info
where dt='$do_date'
and date_format(create_time,'yyyy-MM-dd')='$do_date'
group by sku_id
),
tmp_favor as
(
select
sku_id,
count(*) favor_count
from ${APP}.dwd_fact_favor_info
where dt='$do_date'
and date_format(create_time,'yyyy-MM-dd')='$do_date'
group by sku_id
),
tmp_appraise as
(
select
sku_id,
sum(if(appraise='1201',1,0)) appraise_good_count,
sum(if(appraise='1202',1,0)) appraise_mid_count,
sum(if(appraise='1203',1,0)) appraise_bad_count,
sum(if(appraise='1204',1,0)) appraise_default_count
from ${APP}.dwd_fact_comment_info
where dt='$do_date'
group by sku_id
)
insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')
select
sku_id,
sum(order_count),
sum(order_num),
sum(order_amount),
sum(payment_count),
sum(payment_num),
sum(payment_amount),
sum(refund_count),
sum(refund_num),
sum(refund_amount),
sum(cart_count),
sum(cart_num),
sum(favor_count),
sum(appraise_good_count),
sum(appraise_mid_count),
sum(appraise_bad_count),
sum(appraise_default_count)
from
(
select
sku_id,
order_count,
order_num,
order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
from tmp_order
union all
select
sku_id,
0 order_count,
0 order_num,
0 order_amount,
payment_count,
payment_num,
payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
from tmp_payment
union all
select
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
refund_count,
refund_num,
refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
from tmp_refund
union all
select
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
cart_count,
cart_num,
0 favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
from tmp_cart
union all
select
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
favor_count,
0 appraise_good_count,
0 appraise_mid_count,
0 appraise_bad_count,
0 appraise_default_count
from tmp_favor
union all
select
sku_id,
0 order_count,
0 order_num,
0 order_amount,
0 payment_count,
0 payment_num,
0 payment_amount,
0 refund_count,
0 refund_num,
0 refund_amount,
0 cart_count,
0 cart_num,
0 favor_count,
appraise_good_count,
appraise_mid_count,
appraise_bad_count,
appraise_default_count
from tmp_appraise
)tmp
group by sku_id;
insert overwrite table ${APP}.dws_sale_detail_daycount partition(dt='$do_date')
select
op.user_id,
op.sku_id,
ui.gender,
months_between('$do_date', ui.birthday)/12 age,
ui.user_level,
si.price,
si.sku_name,
si.tm_id,
si.category3_id,
si.category2_id,
si.category1_id,
si.category3_name,
si.category2_name,
si.category1_name,
si.spu_id,
op.sku_num,
op.order_count,
op.order_amount
from
(
select
user_id,
sku_id,
sum(sku_num) sku_num,
count(*) order_count,
sum(total_amount) order_amount
from ${APP}.dwd_fact_order_detail
where dt='$do_date'
group by user_id, sku_id
)op
join
(
select
*
from ${APP}.dwd_dim_user_info_his
where end_date='9999-99-99'
)ui on op.user_id = ui.id
join
(
select
*
from ${APP}.dwd_dim_sku_info
where dt='$do_date'
)si on op.sku_id = si.id;
"
$hive -e "$sql"
2)增加腳本執(zhí)行權(quán)限
3)執(zhí)行腳本導(dǎo)入數(shù)據(jù)
4)查看導(dǎo)入數(shù)據(jù)
select * from dws_uv_detail_daycount where dt='2020-03-11';
select * from dws_user_action_daycount where dt='2020-03-11';
select * from dws_sku_action_daycount where dt='2020-03-11';
select * from dws_sale_detail_daycount where dt='2020-03-11';
總結(jié)
以上是生活随笔為你收集整理的hive udf 分组取top1_项目实战从0到1之hive(27)数仓项目(九)数仓搭建 DWS 层的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 点击事件为什么会失效_耐高温润滑油脂为什
- 下一篇: 单列多行转单行单列合并oracle_Or