日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

hive udf 分组取top1_项目实战从0到1之hive(27)数仓项目(九)数仓搭建 DWS 层

發(fā)布時間:2025/3/15 编程问答 37 豆豆
生活随笔 收集整理的這篇文章主要介紹了 hive udf 分组取top1_项目实战从0到1之hive(27)数仓项目(九)数仓搭建 DWS 层 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

? ? ? ? ? ? ? ? ? ? ? ? ?點(diǎn)擊上方藍(lán)字關(guān)注我們? ? ? ? ? ? ? ? ??

一、數(shù)倉搭建 - DWS 層

1.1 業(yè)務(wù)術(shù)語

1)用戶

用戶以設(shè)備為判斷標(biāo)準(zhǔn),在移動統(tǒng)計(jì)中,每個獨(dú)立設(shè)備認(rèn)為是一個獨(dú)立用戶。Android

系統(tǒng)根據(jù) IMEI 號,IOS 系統(tǒng)根據(jù) OpenUDID 來標(biāo)識一個獨(dú)立用戶,每部手機(jī)一個用戶

2)新增用戶

首次聯(lián)網(wǎng)使用應(yīng)用的用戶。如果一個用戶首次打開某 APP,那這個用戶定義為新增用

戶;卸載再安裝的設(shè)備,不會被算作一次新增。新增用戶包括日新增用戶、周新增用戶、月

新增用戶

3)活躍用戶

打開應(yīng)用的用戶即為活躍用戶,不考慮用戶的使用情況。每天一臺設(shè)備打開多次會被計(jì)

為一個活躍用戶

4)周(月)活躍用戶

某個自然周(月)內(nèi)啟動過應(yīng)用的用戶,該周(月)內(nèi)的多次啟動只記一個活躍用戶

5)月活躍率

月活躍用戶與截止到該月累計(jì)的用戶總和之間的比例

6)沉默用戶

用戶僅在安裝當(dāng)天(次日)啟動一次,后續(xù)時間無再啟動行為。該指標(biāo)可以反映新增用

戶質(zhì)量和用戶與 APP 的匹配程度

7)版本分布

不同版本的周內(nèi)各天新增用戶數(shù),活躍用戶數(shù)和啟動次數(shù)。利于判斷 APP 各個版本之

間的優(yōu)劣和用戶行為習(xí)慣

8)本周回流用戶

上周未啟動過應(yīng)用,本周啟動了應(yīng)用的用戶

9)連續(xù) n 周活躍用戶

連續(xù) n 周,每周至少啟動一次

10)忠誠用戶

連續(xù)活躍 5 周以上的用戶

11)連續(xù)活躍用戶

連續(xù) 2 周及以上活躍的用戶

12)近期流失用戶

連續(xù) n(2<= n <= 4)周沒有啟動應(yīng)用的用戶。(第 n+1 周沒有啟動過)

13)留存用戶

某段時間內(nèi)的新增用戶,經(jīng)過一段時間后,仍然使用應(yīng)用的被認(rèn)作是留存用戶;這部分

用戶占當(dāng)時新增用戶的比例即是留存率

例如,5 月份新增用戶 200,這 200 人在 6 月份啟動過應(yīng)用的有 100 人,7 月份啟動過應(yīng)用的有 80 人,8 月份啟動過應(yīng)用的有 50 人;則 5 月份新增用戶一個月后的留存率是 50%,二個月后的留存率是 40%,三個月后的留存率是 25%

14)用戶新鮮度

每天啟動應(yīng)用的新老用戶比例,即新增用戶數(shù)占活躍用戶數(shù)的比例

15)單次使用時長

每次啟動使用的時間長度

16)日使用時長

累計(jì)一天內(nèi)的使用時間長度

17)啟動次數(shù)計(jì)算標(biāo)準(zhǔn)

IOS 平臺應(yīng)用退到后臺就算一次獨(dú)立的啟動;Android 平臺我們規(guī)定,兩次啟動之間的間隔小于 30 秒,被計(jì)算一次啟動。用戶在使用過程中,若因收發(fā)短信或接電話等退出應(yīng)用30 秒又再次返回應(yīng)用中,那這兩次行為應(yīng)該是延續(xù)而非獨(dú)立的,所以可以被算作一次使用行為,即一次啟動。業(yè)內(nèi)大多使用 30 秒這個標(biāo)準(zhǔn),但用戶還是可以自定義此時間間隔

1.2 系統(tǒng)函數(shù)

1.2.1 collect_set 函數(shù)

1)創(chuàng)建原數(shù)據(jù)表

drop table if exists stud;

create table stud (name string, area string, course string, score int);

2)向原數(shù)據(jù)表中插入數(shù)據(jù)

insert into table stud values('zhang3','bj','math',88);

insert into table stud values('li4','bj','math',99);

insert into table stud values('wang5','sh','chinese',92);

insert into table stud values('zhao6','sh','chinese',54);

insert into table stud values('tian7','bj','chinese',91);

3)查詢表中數(shù)據(jù)

select * from stud;

stud.name stud.area stud.course stud.score

zhang3 bj math 88

li4 bj math 99

wang5 sh chinese 92

zhao6 sh chinese 54

tian7 bj chinese 91

4)把同一分組的不同行的數(shù)據(jù)聚合成一個集合

select course, collect_set(area), avg(score) from stud group by course;

chinese ["sh","bj"] 79.0

math ["bj"] 93.5

5) 用下標(biāo)可以取某一個

select course, collect_set(area)[0], avg(score) from

stud group by course;

chinese sh 79.0

math bj 93.5

1.2.2 nvl 函數(shù)

1)基本語法

NVL(表達(dá)式 1,表達(dá)式 2)

如果表達(dá)式 1 為空值,NVL 返回值為表達(dá)式 2 的值,否則返回表達(dá)式 1 的值。 該函數(shù)的目的是把一個空值(null)轉(zhuǎn)換成一個實(shí)際的值。其表達(dá)式的值可以是數(shù)字型、字符型和日期型。但是表達(dá)式 1 和表達(dá)式 2 的數(shù)據(jù)類型必須為同一個類型

1.2.3 日期處理函數(shù)

1)date_format 函數(shù)(根據(jù)格式整理日期)

hive (gmall)> select date_format('2020-03-10','yyyy-MM');

2020-03

2)date_add 函數(shù)(加減日期)

hive (gmall)> select date_add('2020-03-10',-1);

2020-03-09

hive (gmall)> select date_add('2020-03-10',1);

2020-03-11

3)next_day 函數(shù)

(1)取當(dāng)前天的下一個周一

hive (gmall)> select next_day('2020-03-12','MO');

2020-03-16

說明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)

(2)取當(dāng)前周的周一

hive (gmall)> select date_add(next_day('2020-03-12','MO'),-7);

2020-03-11

4)last_day 函數(shù)(求當(dāng)月最后一天日期)

hive (gmall)> select last_day('2020-03-10');

2020-03-31

1.3 DWS 層(用戶行為)

1.3.1 每日設(shè)備行為

每日設(shè)備行為,主要按照 設(shè)備 id 統(tǒng)計(jì)

1)建表語句

drop table if exists dws_uv_detail_daycount;

create external table dws_uv_detail_daycount

(

`mid_id` string COMMENT '設(shè)備唯一標(biāo)識',

`user_id` string COMMENT '用戶標(biāo)識',

`version_code` string COMMENT '程序版本號',

`version_name` string COMMENT '程序版本名',

`lang` string COMMENT '系統(tǒng)語言',

`source` string COMMENT '渠道號',

`os` string COMMENT '安卓系統(tǒng)版本',

`area` string COMMENT '區(qū)域',

`model` string COMMENT '手機(jī)型號',

`brand` string COMMENT '手機(jī)品牌',

`sdk_version` string COMMENT 'sdkVersion',

`gmail` string COMMENT 'gmail',

`height_width` string COMMENT '屏幕寬高',

`app_time` string COMMENT '客戶端日志產(chǎn)生時的時間',

`network` string COMMENT '網(wǎng)絡(luò)模式',

`lng` string COMMENT '經(jīng)度',

`lat` string COMMENT '緯度',

`login_count` bigint COMMENT '活躍次數(shù)'

)

partitioned by(dt string)

stored as parquet

location '/warehouse/gmall/dws/dws_uv_detail_daycount';

2)數(shù)據(jù)裝載

insert overwrite table dws_uv_detail_daycount partition(dt='2020-03-10')

select

mid_id,

concat_ws('|', collect_set(user_id)) user_id,

concat_ws('|', collect_set(version_code)) version_code,

concat_ws('|', collect_set(version_name)) version_name,

concat_ws('|', collect_set(lang))lang,

concat_ws('|', collect_set(source)) source,

concat_ws('|', collect_set(os)) os,

concat_ws('|', collect_set(area)) area,

concat_ws('|', collect_set(model)) model,

concat_ws('|', collect_set(brand)) brand,

concat_ws('|', collect_set(sdk_version)) sdk_version,

concat_ws('|', collect_set(gmail)) gmail,

concat_ws('|', collect_set(height_width)) height_width,

concat_ws('|', collect_set(app_time)) app_time,

concat_ws('|', collect_set(network)) network,

concat_ws('|', collect_set(lng)) lng,

concat_ws('|', collect_set(lat)) lat,

count(*) login_count

from dwd_start_log

where dt='2020-03-10'

group by mid_id;

3)查詢加載結(jié)果

select * from dws_uv_detail_daycount where dt='2020-03-10';

1.4 DWS 層(業(yè)務(wù))

DWS 層的寬表字段,是站在不同維度的視角去看事實(shí)表,重點(diǎn)關(guān)注事實(shí)表的度量值

1.4.1 每日會員行為

1)建表語句

drop table if exists dws_user_action_daycount;

create external table dws_user_action_daycount

(

user_id string comment '用戶 id',

login_count bigint comment '登錄次數(shù)',

cart_count bigint comment '加入購物車次數(shù)',

cart_amount double comment '加入購物車金額',

order_count bigint comment '下單次數(shù)',

order_amount decimal(16,2) comment '下單金額',

payment_count bigint comment '支付次數(shù)',

payment_amount decimal(16,2) comment '支付金額'

) COMMENT '每日用戶行為'

PARTITIONED BY (`dt` string)

stored as parquet

location '/warehouse/gmall/dws/dws_user_action_daycount/'

tblproperties ("parquet.compression"="lzo");

2)數(shù)據(jù)裝載

with

tmp_login as

(

select

user_id,

count(*) login_count

from dwd_start_log

where dt='2020-03-10'

and user_id is not null

group by user_id

),

tmp_cart as

(

select

user_id,

count(*) cart_count,

sum(cart_price*sku_num) cart_amount

from dwd_fact_cart_info

where dt='2020-03-10'

and user_id is not null

and date_format(create_time,'yyyy-MM-dd')='2020-03-10'

group by user_id

),

tmp_order as

(

select

user_id,

count(*) order_count,

sum(final_total_amount) order_amount

from dwd_fact_order_info

where dt='2020-03-10'

group by user_id

) ,

tmp_payment as

(

select

user_id,

count(*) payment_count,

sum(payment_amount) payment_amount

from dwd_fact_payment_info

where dt='2020-03-10'

group by user_id

)

insert overwrite table dws_user_action_daycount partition(dt='2020-03-10')

select

user_actions.user_id,

sum(user_actions.login_count),

sum(user_actions.cart_count),

sum(user_actions.cart_amount),

sum(user_actions.order_count),

sum(user_actions.order_amount),

sum(user_actions.payment_count),

sum(user_actions.payment_amount)

from

(

select

user_id,

login_count,

0 cart_count,

0 cart_amount,

0 order_count,

0 order_amount,

0 payment_count,

0 payment_amount

from

tmp_login

union all

select

user_id,

0 login_count,

cart_count,

cart_amount,

0 order_count,

0 order_amount,

0 payment_count,

0 payment_amount

from

tmp_cart

union all

select

user_id,

0 login_count,

0 cart_count,

0 cart_amount,

order_count,

order_amount,

0 payment_count,

0 payment_amount

from tmp_order

union all

select

user_id,

0 login_count,

0 cart_count,

0 cart_amount,

0 order_count,

0 order_amount,

payment_count,

payment_amount

from tmp_payment

) user_actions

group by user_id;

3)查詢加載結(jié)果

select * from dws_user_action_daycount where dt=‘2020-03-10’;

drop table if exists dws_sku_action_daycount;

create external table dws_sku_action_daycount

(

sku_id string comment 'sku_id',

order_count bigint comment '被下單次數(shù)',

order_num bigint comment '被下單件數(shù)',

order_amount decimal(16,2) comment '被下單金額',

payment_count bigint comment '被支付次數(shù)',

payment_num bigint comment '被支付件數(shù)',

payment_amount decimal(16,2) comment '被支付金額',

refund_count bigint comment '被退款次數(shù)',

refund_num bigint comment '被退款件數(shù)',

refund_amount decimal(16,2) comment '被退款金額',

cart_count bigint comment '被加入購物車次數(shù)',

cart_num bigint comment '被加入購物車件數(shù)',

favor_count bigint comment '被收藏次數(shù)',

appraise_good_count bigint comment '好評數(shù)',

appraise_mid_count bigint comment '中評數(shù)',

appraise_bad_count bigint comment '差評數(shù)',

appraise_default_count bigint comment '默認(rèn)評價數(shù)'

) COMMENT '每日商品行為'

PARTITIONED BY (`dt` string)

stored as parquet

location '/warehouse/gmall/dws/dws_sku_action_daycount/'

tblproperties ("parquet.compression"="lzo");

2)數(shù)據(jù)裝載

注意:如果是 23 點(diǎn) 59 下單,支付日期跨天。需要從訂單詳情里面取出支付時間是今天,訂單時間是昨天或者今天的訂單

with

tmp_order as

(

select

sku_id,

count(*) order_count,

sum(sku_num) order_num,

sum(total_amount) order_amount

from dwd_fact_order_detail

where dt='2020-03-10'

group by sku_id

),

tmp_payment as

(

select

sku_id,

count(*) payment_count,

sum(sku_num) payment_num,

sum(total_amount) payment_amount

from dwd_fact_order_detail

where dt='2020-03-10'

and order_id in

(

select

id

from dwd_fact_order_info

where (dt='2020-03-10' or dt=date_add('2020-03-10',-1))

and date_format(payment_time,'yyyy-MM-dd')='2020-03-10'

)

group by sku_id

),

tmp_refund as

(

select

sku_id,

count(*) refund_count,

sum(refund_num) refund_num,

sum(refund_amount) refund_amount

from dwd_fact_order_refund_info

where dt='2020-03-10'

group by sku_id

),

tmp_cart as

(

select

sku_id,

count(*) cart_count,

sum(sku_num) cart_num

from dwd_fact_cart_info

where dt='2020-03-10'

and date_format(create_time,'yyyy-MM-dd')='2020-03-10'

group by sku_id

),

tmp_favor as

(

select

sku_id,

count(*) favor_count

from dwd_fact_favor_info

where dt='2020-03-10'

and date_format(create_time,'yyyy-MM-dd')='2020-03-10'

group by sku_id

),

tmp_appraise as

(

select

sku_id,

sum(if(appraise='1201',1,0)) appraise_good_count,

sum(if(appraise='1202',1,0)) appraise_mid_count,

sum(if(appraise='1203',1,0)) appraise_bad_count,

sum(if(appraise='1204',1,0)) appraise_default_count

from dwd_fact_comment_info

where dt='2020-03-10'

group by sku_id

)

insert overwrite table dws_sku_action_daycount partition(dt='2020-03-10')

select

sku_id,

sum(order_count),

sum(order_num),

sum(order_amount),

sum(payment_count),

sum(payment_num),

sum(payment_amount),

sum(refund_count),

sum(refund_num),

sum(refund_amount),

sum(cart_count),

sum(cart_num),

sum(favor_count),

sum(appraise_good_count),

sum(appraise_mid_count),

sum(appraise_bad_count),

sum(appraise_default_count)

from

(

select

sku_id,

order_count,

order_num,

order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_order

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

payment_count,

payment_num,

payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_payment

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

refund_count,

refund_num,

refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_refund

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

cart_count,

cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_cart

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_favor

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

appraise_good_count,

appraise_mid_count,

appraise_bad_count,

appraise_default_count

from tmp_appraise

)tmp

group by sku_id;

3)查詢加載結(jié)果

select * from dws_sku_action_daycount where dt='2020-03-10';

1.4.5 每日購買行為

1)建表語句

drop table if exists dws_sale_detail_daycount;

create external table dws_sale_detail_daycount

(

user_id string comment '用戶 id',

sku_id string comment '商品 id',

user_gender string comment '用戶性別',

user_age string comment '用戶年齡',

user_level string comment '用戶等級',

order_price decimal(10,2) comment '商品價格',

sku_name string comment '商品名稱',

sku_tm_id string comment '品牌 id',

sku_category3_id string comment '商品三級品類 id',

sku_category2_id string comment '商品二級品類 id',

sku_category1_id string comment '商品一級品類 id',

sku_category3_name string comment '商品三級品類名稱',

sku_category2_name string comment '商品二級品類名稱',

sku_category1_name string comment '商品一級品類名稱',

spu_id string comment '商品 spu',

sku_num int comment '購買個數(shù)',

order_count bigint comment '當(dāng)日下單單數(shù)',

order_amount decimal(16,2) comment '當(dāng)日下單金額'

) COMMENT '每日購買行為'

PARTITIONED BY (`dt` string)

stored as parquet

location '/warehouse/gmall/dws/dws_sale_detail_daycount/'

tblproperties ("parquet.compression"="lzo");

2)數(shù)據(jù)裝載

insert overwrite table dws_sale_detail_daycount partition(dt='2020-03-10')

select

op.user_id,

op.sku_id,

ui.gender,

months_between('2020-03-10', ui.birthday)/12 age,

ui.user_level,

si.price,

si.sku_name,

si.tm_id,

si.category3_id,

si.category2_id,

si.category1_id,

si.category3_name,

si.category2_name,

si.category1_name,

si.spu_id,

op.sku_num,

op.order_count,

op.order_amount

from

(

select

user_id,

sku_id,

sum(sku_num) sku_num,

count(*) order_count,

sum(total_amount) order_amount

from dwd_fact_order_detail

where dt='2020-03-10'

group by user_id, sku_id

)op

join

(

select

*

from dwd_dim_user_info_his

where end_date='9999-99-99'

)ui on op.user_id = ui.id

join

(

select

*

from dwd_dim_sku_info

where dt='2020-03-10'

)si on op.sku_id = si.id;

3)查詢加載結(jié)果

select * from dws_sale_detail_daycount where dt='2020-03-10';

1.5 DWS 層數(shù)據(jù)導(dǎo)入腳本

1)vim dwd_to_dws.sh

在腳本中填寫如下內(nèi)容

#!/bin/bash

APP=gmall

hive=/opt/modules/hive/bin/hive

# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當(dāng)前時間的前一天

if [ -n "$1" ] ;then

do_date=$1

else

do_date=`date -d "-1 day" +%F`

fi

sql="

insert overwrite table ${APP}.dws_uv_detail_daycount partition(dt='$do_date')

select

mid_id,

concat_ws('|', collect_set(user_id)) user_id,

concat_ws('|', collect_set(version_code)) version_code,

concat_ws('|', collect_set(version_name)) version_name,

concat_ws('|', collect_set(lang))lang,

concat_ws('|', collect_set(source)) source,

concat_ws('|', collect_set(os)) os,

concat_ws('|', collect_set(area)) area,

concat_ws('|', collect_set(model)) model,

concat_ws('|', collect_set(brand)) brand,

concat_ws('|', collect_set(sdk_version)) sdk_version,

concat_ws('|', collect_set(gmail)) gmail,

concat_ws('|', collect_set(height_width)) height_width,

concat_ws('|', collect_set(app_time)) app_time,

concat_ws('|', collect_set(network)) network,

concat_ws('|', collect_set(lng)) lng,

concat_ws('|', collect_set(lat)) lat,

count(*) login_count

from ${APP}.dwd_start_log

where dt='$do_date'

group by mid_id;

with

tmp_login as

(

select

user_id,

count(*) login_count

from ${APP}.dwd_start_log

where dt='$do_date'

and user_id is not null

group by user_id

),

tmp_cart as

(

select

user_id,

count(*) cart_count,

sum(cart_price*sku_num) cart_amount

from ${APP}.dwd_fact_cart_info

where dt='$do_date'

and user_id is not null

and date_format(create_time,'yyyy-MM-dd')='$do_date'

group by user_id

),

tmp_order as

(

select

user_id,

count(*) order_count,

sum(final_total_amount) order_amount

from ${APP}.dwd_fact_order_info

where dt='$do_date'

group by user_id

) ,

tmp_payment as

(

select

user_id,

count(*) payment_count,

sum(payment_amount) payment_amount

from ${APP}.dwd_fact_payment_info

where dt='$do_date'

group by user_id

)

insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')

select

user_actions.user_id,

sum(user_actions.login_count),

sum(user_actions.cart_count),

sum(user_actions.cart_amount),

sum(user_actions.order_count),

sum(user_actions.order_amount),

sum(user_actions.payment_count),

sum(user_actions.payment_amount)

from

(

select

user_id,

login_count,

0 cart_count,

0 cart_amount,

0 order_count,

0 order_amount,

0 payment_count,

0 payment_amount

from

tmp_login

union all

select

user_id,

0 login_count,

cart_count,

cart_amount,

0 order_count,

0 order_amount,

0 payment_count,

0 payment_amount

from

tmp_cart

union all

select

user_id,

0 login_count,

0 cart_count,

0 cart_amount,

order_count,

order_amount,

0 payment_count,

0 payment_amount

from tmp_order

union all

select

user_id,

0 login_count,

0 cart_count,

0 cart_amount,

0 order_count,

0 order_amount,

payment_count,

payment_amount

from tmp_payment

) user_actions

group by user_id;

with

tmp_order as

(

select

sku_id,

count(*) order_count,

sum(sku_num) order_num,

sum(total_amount) order_amount

from ${APP}.dwd_fact_order_detail

where dt='$do_date'

group by sku_id

),

tmp_payment as

(

select

sku_id,

count(*) payment_count,

sum(sku_num) payment_num,

sum(total_amount) payment_amount

from ${APP}.dwd_fact_order_detail

where dt='$do_date'

and order_id in

(

select

id

from ${APP}.dwd_fact_order_info

where (dt='$do_date' or dt=date_add('$do_date',-1))

and date_format(payment_time,'yyyy-MM-dd')='$do_date'

)

group by sku_id

),

tmp_refund as

(

select

sku_id,

count(*) refund_count,

sum(refund_num) refund_num,

sum(refund_amount) refund_amount

from ${APP}.dwd_fact_order_refund_info

where dt='$do_date'

group by sku_id

),

tmp_cart as

(

select

sku_id,

count(*) cart_count,

sum(sku_num) cart_num

from ${APP}.dwd_fact_cart_info

where dt='$do_date'

and date_format(create_time,'yyyy-MM-dd')='$do_date'

group by sku_id

),

tmp_favor as

(

select

sku_id,

count(*) favor_count

from ${APP}.dwd_fact_favor_info

where dt='$do_date'

and date_format(create_time,'yyyy-MM-dd')='$do_date'

group by sku_id

),

tmp_appraise as

(

select

sku_id,

sum(if(appraise='1201',1,0)) appraise_good_count,

sum(if(appraise='1202',1,0)) appraise_mid_count,

sum(if(appraise='1203',1,0)) appraise_bad_count,

sum(if(appraise='1204',1,0)) appraise_default_count

from ${APP}.dwd_fact_comment_info

where dt='$do_date'

group by sku_id

)

insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')

select

sku_id,

sum(order_count),

sum(order_num),

sum(order_amount),

sum(payment_count),

sum(payment_num),

sum(payment_amount),

sum(refund_count),

sum(refund_num),

sum(refund_amount),

sum(cart_count),

sum(cart_num),

sum(favor_count),

sum(appraise_good_count),

sum(appraise_mid_count),

sum(appraise_bad_count),

sum(appraise_default_count)

from

(

select

sku_id,

order_count,

order_num,

order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_order

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

payment_count,

payment_num,

payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_payment

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

refund_count,

refund_num,

refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_refund

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

cart_count,

cart_num,

0 favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_cart

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

favor_count,

0 appraise_good_count,

0 appraise_mid_count,

0 appraise_bad_count,

0 appraise_default_count

from tmp_favor

union all

select

sku_id,

0 order_count,

0 order_num,

0 order_amount,

0 payment_count,

0 payment_num,

0 payment_amount,

0 refund_count,

0 refund_num,

0 refund_amount,

0 cart_count,

0 cart_num,

0 favor_count,

appraise_good_count,

appraise_mid_count,

appraise_bad_count,

appraise_default_count

from tmp_appraise

)tmp

group by sku_id;

insert overwrite table ${APP}.dws_sale_detail_daycount partition(dt='$do_date')

select

op.user_id,

op.sku_id,

ui.gender,

months_between('$do_date', ui.birthday)/12 age,

ui.user_level,

si.price,

si.sku_name,

si.tm_id,

si.category3_id,

si.category2_id,

si.category1_id,

si.category3_name,

si.category2_name,

si.category1_name,

si.spu_id,

op.sku_num,

op.order_count,

op.order_amount

from

(

select

user_id,

sku_id,

sum(sku_num) sku_num,

count(*) order_count,

sum(total_amount) order_amount

from ${APP}.dwd_fact_order_detail

where dt='$do_date'

group by user_id, sku_id

)op

join

(

select

*

from ${APP}.dwd_dim_user_info_his

where end_date='9999-99-99'

)ui on op.user_id = ui.id

join

(

select

*

from ${APP}.dwd_dim_sku_info

where dt='$do_date'

)si on op.sku_id = si.id;

"

$hive -e "$sql"

2)增加腳本執(zhí)行權(quán)限

3)執(zhí)行腳本導(dǎo)入數(shù)據(jù)

4)查看導(dǎo)入數(shù)據(jù)

select * from dws_uv_detail_daycount where dt='2020-03-11';

select * from dws_user_action_daycount where dt='2020-03-11';

select * from dws_sku_action_daycount where dt='2020-03-11';

select * from dws_sale_detail_daycount where dt='2020-03-11';

總結(jié)

以上是生活随笔為你收集整理的hive udf 分组取top1_项目实战从0到1之hive(27)数仓项目(九)数仓搭建 DWS 层的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。