2021-7-4
業(yè)務(wù)數(shù)倉(cāng)
- 安裝sqoop
- sqoop定時(shí)導(dǎo)入腳本(sqoop_import.sh)
- 導(dǎo)入數(shù)據(jù)到hdfs層
- ods層
- 創(chuàng)建訂單表(ods_order_info)
- 創(chuàng)建訂單詳情表(ods_order_detail)
- 創(chuàng)建商品表(ods_sku_info)
- 創(chuàng)建用戶表(ods_user_info)
- 創(chuàng)建商品一級(jí)分類表(ds_base_category1)
- 創(chuàng)建商品二級(jí)分類表(ds_base_category2)
- 創(chuàng)建商品三級(jí)分類表(ds_base_category3)
- 創(chuàng)建支付流水表(ods_payment_info)
- ods層數(shù)據(jù)導(dǎo)入腳本(ods_db.sh)
- dwd層
- 創(chuàng)建訂單表(dwd_order_info)
- 創(chuàng)建訂單詳情表(dwd_order_detail)
- 創(chuàng)建用戶表(dwd_user_info)
- 創(chuàng)建支付流水表(dwd_payment_info)
- 創(chuàng)建商品表(增加分類)(dwd_sku_info)
- dwd層數(shù)據(jù)導(dǎo)入腳本(dwd_db.sh)
- dws層
- 創(chuàng)建用戶行為寬表(dws_user_action)
- 向用戶行為寬表導(dǎo)入數(shù)據(jù)
- 用戶行為數(shù)據(jù)寬表導(dǎo)入腳本(dws_db_wide.sh)
- 用戶購(gòu)買(mǎi)商品明細(xì)表(寬表)(dws_sale_detail_daycount)
- 1、建表
- 導(dǎo)入數(shù)據(jù)
- 腳本(dws_sale.sh)
- 新付費(fèi)用戶數(shù)(dws_pay_user_detail)
- 建表
- 導(dǎo)入數(shù)據(jù)
- 腳本(dws_pay_user_detail.sh)
- ads層
- 一天成交總額(ads_gmv_sum_day)
- 新增用戶占日活躍用戶比率(ads_user_convert_day)
- 用戶行為漏斗分析(ads_user_action_convert_day)
- 品牌復(fù)購(gòu)率
- 建表
- 導(dǎo)入數(shù)據(jù)
- 腳本(ads_sale.sh)
- 各用戶等級(jí)對(duì)應(yīng)的復(fù)購(gòu)率前十的商品(ads_ul_rep_ratio)
- 建表(ads_ul_rep_ratio)
- 導(dǎo)入數(shù)據(jù)
- 腳本(ads_ul_rep_ratio.sh)
- 新付費(fèi)用戶數(shù)(ads_pay_user_count)
- 建表
- 導(dǎo)入數(shù)據(jù)
- 腳本(ads_pay_user_count.sh)
- 新付費(fèi)用戶率()
- 建表
- 導(dǎo)入數(shù)據(jù)
- 腳本(ads_pay_user_ratio.sh)
- 每個(gè)用戶最近一次購(gòu)買(mǎi)時(shí)間(ads_user_last_pay)
- 建表
- 導(dǎo)入數(shù)據(jù)
- 商品每日銷量排行Top10(ads_goods_order_count_day)
- 建表
- 導(dǎo)入數(shù)據(jù)
- 腳本(ads_goods_order_count_day.sh)
- 統(tǒng)計(jì)每個(gè)月訂單付款率(ads_order2pay_mn)
- 建表()
- 導(dǎo)入數(shù)據(jù)
- 腳本(ads_order2pay_mn.sh)
- azkaban
- 安裝azkaban
- 生成密鑰庫(kù)
- 配置文件
- Web服務(wù)器配置
- web服務(wù)器用戶配置
- 執(zhí)行服務(wù)器配置
- 啟動(dòng)executor服務(wù)器
- 啟動(dòng)web服務(wù)器
- 安裝Presto
- 安裝druid
- 安裝hbase
- 配置文件
- 安裝kylin
- kylin的使用
- 安裝zeeplin
安裝sqoop
一、Sqoop的安裝
1.Sqoop要和hive安裝在同一臺(tái)機(jī)器!
import: 將mysql中的數(shù)據(jù),導(dǎo)入到hdfs,再由hdfs導(dǎo)入到hive
不需要講sqoop安裝在hive所在的集群,只需要安裝在hadoop所在機(jī)器即可!
將hive/lib/mysql-connector-java-5.1.27-bin.jar復(fù)制到sqoop/lib目錄下
2.驗(yàn)證:
sqoop list-databases --connect jdbc:mysql://192.168.1.103:3306/ --username root --password 123456
二、造數(shù)據(jù)
1.在導(dǎo)入第三個(gè)創(chuàng)建函數(shù)的腳本之前,需要調(diào)高函數(shù)創(chuàng)建的安全級(jí)別
SHOW VARIABLES LIKE ‘%log_bin_trust_function_creators%’;
SET GLOBAL log_bin_trust_function_creators=1
2.導(dǎo)入數(shù)據(jù)
CALL init_data(‘2021-06-09’,1000,200,300,TRUE);
三、在導(dǎo)入和導(dǎo)出數(shù)據(jù)時(shí),如果源數(shù)據(jù)庫(kù)有字段為NULL值,那么導(dǎo)入和導(dǎo)出后,如何也使用NULL
目前: mysql–operate_time–NULL 導(dǎo)入到 hdfs時(shí), hdfs----operate_time----‘null’
希望: mysql–operate_time–NULL 導(dǎo)入到 hdfs時(shí), hdfs----operate_time----NULL
需要在sqoop導(dǎo)入和導(dǎo)出的命令中添加額外的參數(shù)!
①默認(rèn)sqoop到import數(shù)據(jù)時(shí),將Mysql的Null類型,轉(zhuǎn)為’null’
②hive中使用\N代表NULL類型
③如果希望在import時(shí),講將Mysql的Null類型,轉(zhuǎn)為自己期望的類型,
需要使用–null-string and --null-non-string
–null-string: 當(dāng)mysql的string類型列為null時(shí),導(dǎo)入到hive時(shí),使用什么來(lái)代替!
–null-string a: 如果mysql中,當(dāng)前列是字符串類型(varchar,char),假如這列值為NULL,
導(dǎo)入到hive時(shí),使用a來(lái)代替!
–null-non-string: 當(dāng)mysql的非string類型列為null時(shí),導(dǎo)入到hive時(shí),使用什么來(lái)代替!
–null-non-string b: 如果mysql中,當(dāng)前列不是字符串類型(varchar,char),假如這列值為NULL,
導(dǎo)入到hive時(shí),使用b來(lái)代替!
④如果到導(dǎo)出時(shí),希望將指定的參數(shù),導(dǎo)出為mysql的NULL類型,需要使用
–input-null-string and --input-null-non-string
–input-null-string a: 在hive導(dǎo)出到mysql時(shí),如果hive中string類型的列的值為a,導(dǎo)出到mysql中,使用NULL代替!
–input-null-non-string b: 在hive導(dǎo)出到mysql時(shí),如果hive中非string類型的列的值為b,導(dǎo)出到mysql中,使用NULL代替!
sqoop定時(shí)導(dǎo)入腳本(sqoop_import.sh)
#!/bin/bashdb_date=$2 echo $db_date db_name=gmallimport_data() { /opt/module/sqoop/bin/sqoop import \ --connect jdbc:mysql://192.168.1.103:3306/$db_name \ --username root \ --password 123456 \ --target-dir /origin_data/$db_name/db/$1/$db_date \ --delete-target-dir \ --num-mappers 1 \ --fields-terminated-by "\t" \ --query "$2"' and $CONDITIONS;' }import_sku_info(){import_data "sku_info" "select id, spu_id, price, sku_name, sku_desc, weight, tm_id, category3_id, create_timefrom sku_info where 1=1" }import_user_info(){import_data "user_info" "select id, name, birthday, gender, email, user_level, create_time from user_info where 1=1" }import_base_category1(){import_data "base_category1" "select id, name from base_category1 where 1=1" }import_base_category2(){import_data "base_category2" "select id, name, category1_id from base_category2 where 1=1" }import_base_category3(){import_data "base_category3" "select id, name, category2_id from base_category3 where 1=1" }import_order_detail(){import_data "order_detail" "select od.id, order_id, user_id, sku_id, sku_name, order_price, sku_num, o.create_time from order_info o, order_detail odwhere o.id=od.order_idand DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date'" }import_payment_info(){import_data "payment_info" "select id, out_trade_no, order_id, user_id, alipay_trade_no, total_amount, subject, payment_type, payment_time from payment_info where DATE_FORMAT(payment_time,'%Y-%m-%d')='$db_date'" }import_order_info(){import_data "order_info" "select id, total_amount, order_status, user_id, payment_way, out_trade_no, create_time, operate_time from order_info where (DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date' or DATE_FORMAT(operate_time,'%Y-%m-%d')='$db_date')" }case $1 in"base_category1")import_base_category1 ;;"base_category2")import_base_category2 ;;"base_category3")import_base_category3 ;;"order_info")import_order_info ;;"order_detail")import_order_detail ;;"sku_info")import_sku_info ;;"user_info")import_user_info ;;"payment_info")import_payment_info ;;"all")import_base_category1import_base_category2import_base_category3import_order_infoimport_order_detailimport_sku_infoimport_user_infoimport_payment_info ;; esac導(dǎo)入數(shù)據(jù)到hdfs層
執(zhí)行 sqoop_import.sh all 2021-06-10
ods層
創(chuàng)建訂單表(ods_order_info)
hive (gmall)> drop table if exists ods_order_info; create external table ods_order_info (`id` string COMMENT '訂單編號(hào)',`total_amount` decimal(10,2) COMMENT '訂單金額',`order_status` string COMMENT '訂單狀態(tài)',`user_id` string COMMENT '用戶id',`payment_way` string COMMENT '支付方式',`out_trade_no` string COMMENT '支付流水號(hào)',`create_time` string COMMENT '創(chuàng)建時(shí)間',`operate_time` string COMMENT '操作時(shí)間' ) COMMENT '訂單表' PARTITIONED BY (`dt` string) row format delimited fields terminated by '\t' location '/warehouse/gmall/ods/ods_order_info/';創(chuàng)建訂單詳情表(ods_order_detail)
hive (gmall)> drop table if exists ods_order_detail; create external table ods_order_detail( `id` string COMMENT '訂單詳情編號(hào)',`order_id` string COMMENT '訂單號(hào)', `user_id` string COMMENT '用戶id',`sku_id` string COMMENT '商品id',`sku_name` string COMMENT '商品名稱',`order_price` string COMMENT '商品單價(jià)',`sku_num` string COMMENT '商品數(shù)量',`create_time` string COMMENT '創(chuàng)建時(shí)間' ) COMMENT '訂單明細(xì)表' PARTITIONED BY (`dt` string) row format delimited fields terminated by '\t' location '/warehouse/gmall/ods/ods_order_detail/';創(chuàng)建商品表(ods_sku_info)
hive (gmall)> drop table if exists ods_sku_info; create external table ods_sku_info( `id` string COMMENT 'skuId',`spu_id` string COMMENT 'spuid', `price` decimal(10,2) COMMENT '價(jià)格',`sku_name` string COMMENT '商品名稱',`sku_desc` string COMMENT '商品描述',`weight` string COMMENT '重量',`tm_id` string COMMENT '品牌id',`category3_id` string COMMENT '品類id',`create_time` string COMMENT '創(chuàng)建時(shí)間' ) COMMENT '商品表' PARTITIONED BY (`dt` string) row format delimited fields terminated by '\t' location '/warehouse/gmall/ods/ods_sku_info/';創(chuàng)建用戶表(ods_user_info)
hive (gmall)> drop table if exists ods_user_info; create external table ods_user_info( `id` string COMMENT '用戶id',`name` string COMMENT '姓名',`birthday` string COMMENT '生日',`gender` string COMMENT '性別',`email` string COMMENT '郵箱',`user_level` string COMMENT '用戶等級(jí)',`create_time` string COMMENT '創(chuàng)建時(shí)間' ) COMMENT '用戶信息' PARTITIONED BY (`dt` string) row format delimited fields terminated by '\t' location '/warehouse/gmall/ods/ods_user_info/';創(chuàng)建商品一級(jí)分類表(ds_base_category1)
hive (gmall)> drop table if exists ods_base_category1; create external table ods_base_category1( `id` string COMMENT 'id',`name` string COMMENT '名稱' ) COMMENT '商品一級(jí)分類' PARTITIONED BY (`dt` string) row format delimited fields terminated by '\t' location '/warehouse/gmall/ods/ods_base_category1/';創(chuàng)建商品二級(jí)分類表(ds_base_category2)
hive (gmall)> drop table if exists ods_base_category2; create external table ods_base_category2( `id` string COMMENT ' id',`name` string COMMENT '名稱',category1_id string COMMENT '一級(jí)品類id' ) COMMENT '商品二級(jí)分類' PARTITIONED BY (`dt` string) row format delimited fields terminated by '\t' location '/warehouse/gmall/ods/ods_base_category2/';創(chuàng)建商品三級(jí)分類表(ds_base_category3)
hive (gmall)> drop table if exists ods_base_category3; create external table ods_base_category3(`id` string COMMENT ' id',`name` string COMMENT '名稱',category2_id string COMMENT '二級(jí)品類id' ) COMMENT '商品三級(jí)分類' PARTITIONED BY (`dt` string) row format delimited fields terminated by '\t' location '/warehouse/gmall/ods/ods_base_category3/';創(chuàng)建支付流水表(ods_payment_info)
hive (gmall)> drop table if exists ods_payment_info; create external table ods_payment_info(`id` bigint COMMENT '編號(hào)',`out_trade_no` string COMMENT '對(duì)外業(yè)務(wù)編號(hào)',`order_id` string COMMENT '訂單編號(hào)',`user_id` string COMMENT '用戶編號(hào)',`alipay_trade_no` string COMMENT '支付寶交易流水編號(hào)',`total_amount` decimal(16,2) COMMENT '支付金額',`subject` string COMMENT '交易內(nèi)容',`payment_type` string COMMENT '支付類型',`payment_time` string COMMENT '支付時(shí)間') COMMENT '支付流水表' PARTITIONED BY (`dt` string) row format delimited fields terminated by '\t' location '/warehouse/gmall/ods/ods_payment_info/';ods層數(shù)據(jù)導(dǎo)入腳本(ods_db.sh)
#!/bin/bashAPP=gmallhive=/opt/module/hive/bin/hive# 如果是輸入的日期按照取輸入日期;如果沒(méi)輸入日期取當(dāng)前時(shí)間的前一天 if [ -n "$1" ] ;thendo_date=$1 else do_date=`date -d "-1 day" +%F` fisql=" load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table "$APP".ods_order_info partition(dt='$do_date');load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table "$APP".ods_order_detail partition(dt='$do_date');load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table "$APP".ods_sku_info partition(dt='$do_date');load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table "$APP".ods_user_info partition(dt='$do_date');load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table "$APP".ods_payment_info partition(dt='$do_date');load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table "$APP".ods_base_category1 partition(dt='$do_date');load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table "$APP".ods_base_category2 partition(dt='$do_date');load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table "$APP".ods_base_category3 partition(dt='$do_date'); " $hive -e "$sql"dwd層
創(chuàng)建訂單表(dwd_order_info)
hive (gmall)> drop table if exists dwd_order_info; create external table dwd_order_info (`id` string COMMENT '',`total_amount` decimal(10,2) COMMENT '',`order_status` string COMMENT ' 1 2 3 4 5',`user_id` string COMMENT 'id',`payment_way` string COMMENT '',`out_trade_no` string COMMENT '',`create_time` string COMMENT '',`operate_time` string COMMENT '' ) PARTITIONED BY (`dt` string) stored as parquet location '/warehouse/gmall/dwd/dwd_order_info/' tblproperties ("parquet.compression"="snappy");創(chuàng)建訂單詳情表(dwd_order_detail)
hive (gmall)> drop table if exists dwd_order_detail; create external table dwd_order_detail( `id` string COMMENT '',`order_id` decimal(10,2) COMMENT '', `user_id` string COMMENT 'id',`sku_id` string COMMENT 'id',`sku_name` string COMMENT '',`order_price` string COMMENT '',`sku_num` string COMMENT '',`create_time` string COMMENT '' ) PARTITIONED BY (`dt` string) stored as parquet location '/warehouse/gmall/dwd/dwd_order_detail/' tblproperties ("parquet.compression"="snappy");創(chuàng)建用戶表(dwd_user_info)
hive (gmall)> drop table if exists dwd_user_info; create external table dwd_user_info( `id` string COMMENT 'id',`name` string COMMENT '', `birthday` string COMMENT '',`gender` string COMMENT '',`email` string COMMENT '',`user_level` string COMMENT '',`create_time` string COMMENT '' ) PARTITIONED BY (`dt` string) stored as parquet location '/warehouse/gmall/dwd/dwd_user_info/' tblproperties ("parquet.compression"="snappy");創(chuàng)建支付流水表(dwd_payment_info)
hive (gmall)> drop table if exists dwd_payment_info; create external table dwd_payment_info(`id` bigint COMMENT '',`out_trade_no` string COMMENT '',`order_id` string COMMENT '',`user_id` string COMMENT '',`alipay_trade_no` string COMMENT '',`total_amount` decimal(16,2) COMMENT '',`subject` string COMMENT '',`payment_tpe` string COMMENT '',`payment_time` string COMMENT '') PARTITIONED BY (`dt` string) stored as parquet location '/warehouse/gmall/dwd/dwd_payment_info/' tblproperties ("parquet.compression"="snappy");創(chuàng)建商品表(增加分類)(dwd_sku_info)
hive (gmall)> drop table if exists dwd_sku_info; create external table dwd_sku_info(`id` string COMMENT 'skuId',`spu_id` string COMMENT 'spuid',`price` decimal(10,2) COMMENT '',`sku_name` string COMMENT '',`sku_desc` string COMMENT '',`weight` string COMMENT '',`tm_id` string COMMENT 'id',`category3_id` string COMMENT '1id',`category2_id` string COMMENT '2id',`category1_id` string COMMENT '3id',`category3_name` string COMMENT '3',`category2_name` string COMMENT '2',`category1_name` string COMMENT '1',`create_time` string COMMENT '' ) PARTITIONED BY (`dt` string) stored as parquet location '/warehouse/gmall/dwd/dwd_sku_info/' tblproperties ("parquet.compression"="snappy");dwd層數(shù)據(jù)導(dǎo)入腳本(dwd_db.sh)
#!/bin/bash# 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive# 如果是輸入的日期按照取輸入日期;如果沒(méi)輸入日期取當(dāng)前時(shí)間的前一天 if [ -n "$1" ] ;thendo_date=$1 else do_date=`date -d "-1 day" +%F` fi sql="set hive.exec.dynamic.partition.mode=nonstrict;insert overwrite table "$APP".dwd_order_info partition(dt) select * from "$APP".ods_order_info where dt='$do_date' and id is not null;insert overwrite table "$APP".dwd_order_detail partition(dt) select * from "$APP".ods_order_detail where dt='$do_date' and id is not null;insert overwrite table "$APP".dwd_user_info partition(dt) select * from "$APP".ods_user_info where dt='$do_date' and id is not null;insert overwrite table "$APP".dwd_payment_info partition(dt) select * from "$APP".ods_payment_info where dt='$do_date' and id is not null;insert overwrite table "$APP".dwd_sku_info partition(dt) select sku.id,sku.spu_id,sku.price,sku.sku_name,sku.sku_desc,sku.weight,sku.tm_id,sku.category3_id,c2.id category2_id,c1.id category1_id,c3.name category3_name,c2.name category2_name,c1.name category1_name,sku.create_time,sku.dt from"$APP".ods_sku_info sku join "$APP".ods_base_category3 c3 on sku.category3_id=c3.id join "$APP".ods_base_category2 c2 on c3.category2_id=c2.id join "$APP".ods_base_category1 c1 on c2.category1_id=c1.id where sku.dt='$do_date' and c2.dt='$do_date' and c3.dt='$do_date' and c1.dt='$do_date' and sku.id is not null; "$hive -e "$sql"dws層
創(chuàng)建用戶行為寬表(dws_user_action)
hive (gmall)> drop table if exists dws_user_action; create external table dws_user_action ( user_id string comment '用戶 id',order_count bigint comment '下單次數(shù) ',order_amount decimal(16,2) comment '下單金額 ',payment_count bigint comment '支付次數(shù)',payment_amount decimal(16,2) comment '支付金額 ',comment_count bigint comment '評(píng)論次數(shù)' ) COMMENT '每日用戶行為寬表' PARTITIONED BY (`dt` string) stored as parquet location '/warehouse/gmall/dws/dws_user_action/';向用戶行為寬表導(dǎo)入數(shù)據(jù)
hive (gmall)> with tmp_order as (select user_id, count(*) order_count,sum(oi.total_amount) order_amountfrom dwd_order_info oiwhere date_format(oi.create_time,'yyyy-MM-dd')='2019-02-10'group by user_id ) , tmp_payment as (selectuser_id, sum(pi.total_amount) payment_amount, count(*) payment_count from dwd_payment_info pi where date_format(pi.payment_time,'yyyy-MM-dd')='2019-02-10'group by user_id ), tmp_comment as (selectuser_id,count(*) comment_countfrom dwd_comment_log cwhere date_format(c.dt,'yyyy-MM-dd')='2019-02-10'group by user_id )insert overwrite table dws_user_action partition(dt='2019-02-10') selectuser_actions.user_id,sum(user_actions.order_count),sum(user_actions.order_amount),sum(user_actions.payment_count),sum(user_actions.payment_amount),sum(user_actions.comment_count) from (selectuser_id,order_count,order_amount,0 payment_count,0 payment_amount,0 comment_countfrom tmp_orderunion allselectuser_id,0,0,payment_count,payment_amount,0from tmp_paymentunion allselectuser_id,0,0,0,0,comment_countfrom tmp_comment) user_actions group by user_id;用戶行為數(shù)據(jù)寬表導(dǎo)入腳本(dws_db_wide.sh)
#!/bin/bash# 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive# 如果是輸入的日期按照取輸入日期;如果沒(méi)輸入日期取當(dāng)前時(shí)間的前一天 if [ -n "$1" ] ;thendo_date=$1 else do_date=`date -d "-1 day" +%F` fi sql="with tmp_order as (select user_id, sum(oi.total_amount) order_amount, count(*) order_countfrom "$APP".dwd_order_info oiwhere date_format(oi.create_time,'yyyy-MM-dd')='$do_date'group by user_id ) , tmp_payment as (select user_id, sum(pi.total_amount) payment_amount, count(*) payment_count from "$APP".dwd_payment_info pi where date_format(pi.payment_time,'yyyy-MM-dd')='$do_date'group by user_id ), tmp_comment as ( select user_id, count(*) comment_countfrom "$APP".dwd_comment_log cwhere date_format(c.dt,'yyyy-MM-dd')='$do_date'group by user_id )Insert overwrite table "$APP".dws_user_action partition(dt='$do_date') select user_actions.user_id, sum(user_actions.order_count), sum(user_actions.order_amount),sum(user_actions.payment_count), sum(user_actions.payment_amount),sum(user_actions.comment_count) from (selectuser_id,order_count,order_amount,0 payment_count,0 payment_amount,0 comment_countfrom tmp_orderunion allselectuser_id,0,0,payment_count,payment_amount,0from tmp_paymentunion allselectuser_id,0,0,0,0,comment_count from tmp_comment) user_actions group by user_id; "$hive -e "$sql"用戶購(gòu)買(mǎi)商品明細(xì)表(寬表)(dws_sale_detail_daycount)
1、建表
hive (gmall)> drop table if exists dws_sale_detail_daycount; create external table dws_sale_detail_daycount ( user_id string comment '用戶 id',sku_id string comment '商品 Id',user_gender string comment '用戶性別',user_age string comment '用戶年齡',user_level string comment '用戶等級(jí)',order_price decimal(10,2) comment '商品價(jià)格',sku_name string comment '商品名稱',sku_tm_id string comment '品牌id',sku_category3_id string comment '商品三級(jí)品類id',sku_category2_id string comment '商品二級(jí)品類id',sku_category1_id string comment '商品一級(jí)品類id',sku_category3_name string comment '商品三級(jí)品類名稱',sku_category2_name string comment '商品二級(jí)品類名稱',sku_category1_name string comment '商品一級(jí)品類名稱',spu_id string comment '商品 spu',sku_num int comment '購(gòu)買(mǎi)個(gè)數(shù)',order_count string comment '當(dāng)日下單單數(shù)',order_amount string comment '當(dāng)日下單金額' ) COMMENT '用戶購(gòu)買(mǎi)商品明細(xì)表' PARTITIONED BY (`dt` string) stored as parquet location '/warehouse/gmall/dws/dws_user_sale_detail_daycount/' tblproperties ("parquet.compression"="snappy");導(dǎo)入數(shù)據(jù)
hive (gmall)> with tmp_detail as (selectuser_id,sku_id, sum(sku_num) sku_num, count(*) order_count, sum(od.order_price*sku_num) order_amountfrom dwd_order_detail odwhere od.dt='2019-02-10'group by user_id, sku_id ) insert overwrite table dws_sale_detail_daycount partition(dt='2019-02-10') select tmp_detail.user_id,tmp_detail.sku_id,u.gender,months_between('2019-02-10', u.birthday)/12 age, u.user_level,price,sku_name,tm_id,category3_id,category2_id,category1_id,category3_name,category2_name,category1_name,spu_id,tmp_detail.sku_num,tmp_detail.order_count,tmp_detail.order_amount from tmp_detail left join dwd_user_info u on tmp_detail.user_id =u.id and u.dt='2019-02-10' left join dwd_sku_info s on tmp_detail.sku_id =s.id and s.dt='2019-02-10';腳本(dws_sale.sh)
#!/bin/bash# 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive# 如果是輸入的日期按照取輸入日期;如果沒(méi)輸入日期取當(dāng)前時(shí)間的前一天 if [ -n "$1" ] ;thendo_date=$1 else do_date=`date -d "-1 day" +%F` fi sql="set hive.exec.dynamic.partition.mode=nonstrict;with tmp_detail as (select user_id,sku_id, sum(sku_num) sku_num, count(*) order_count, sum(od.order_price*sku_num) order_amountfrom "$APP".dwd_order_detail odwhere od.dt='$do_date'group by user_id, sku_id ) insert overwrite table "$APP".dws_sale_detail_daycount partition(dt='$do_date') select tmp_detail.user_id,tmp_detail.sku_id,u.gender,months_between('$do_date', u.birthday)/12 age, u.user_level,price,sku_name,tm_id,category3_id,category2_id,category1_id,category3_name,category2_name,category1_name,spu_id,tmp_detail.sku_num,tmp_detail.order_count,tmp_detail.order_amount from tmp_detail left join "$APP".dwd_user_info u on tmp_detail.user_id=u.id and u.dt='$do_date' left join "$APP".dwd_sku_info s on tmp_detail.sku_id =s.id and s.dt='$do_date';" $hive -e "$sql"新付費(fèi)用戶數(shù)(dws_pay_user_detail)
建表
drop table if exists dws_pay_user_detail; create external table dws_pay_user_detail( `user_id` string comment '付費(fèi)用戶id',`name` string comment '付費(fèi)用戶姓名',`birthday` string COMMENT '',`gender` string COMMENT '',`email` string COMMENT '',`user_level` string COMMENT '' ) COMMENT '付費(fèi)用戶表' PARTITIONED BY (`dt` string) stored as parquet location '/warehouse/gmall/dws/dws_pay_user_detail/';導(dǎo)入數(shù)據(jù)
insert overwrite table dws_pay_user_detail partition(dt='2019-10-03') selectua.user_id,ui.name,ui.birthday,ui.gender,ui.email,ui.user_level from (select user_id from dws_user_action where dt='2019-10-03' ) ua join( select * from dwd_user_info where dt='2019-10-03' –- 用戶表是每日全量導(dǎo)入 ) ui on ua.user_id=ui.id left join dws_pay_user_detail ud on ua.user_id=ud.user_i腳本(dws_pay_user_detail.sh)
#!/bin/bash db=gmall hive=/opt/module/hive/bin/hive hadoop=/opt/module/hadoop-2.7.2/bin/hadoopif [[ -n $1 ]]; thendo_date=$1 elsedo_date=`date -d '-1 day' +%F` fisql=" use gmall; insert overwrite table dws_pay_user_detail partition(dt='$do_date') selectua.user_id,ui.name,ui.birthday,ui.gender,ui.email,ui.user_level from (select user_id from dws_user_action where dt='$do_date' ) ua join(select * from dwd_user_info where dt='$do_date' ) ui on ua.user_id=ui.id left join dws_pay_user_detail ud on ua.user_id=ud.user_id where ud.user_id is null; "$hive -e "$sql"ads層
一天成交總額(ads_gmv_sum_day)
hive (gmall)> drop table if exists ads_gmv_sum_day; create external table ads_gmv_sum_day(`dt` string COMMENT '統(tǒng)計(jì)日期',`gmv_count` bigint COMMENT '當(dāng)日gmv訂單個(gè)數(shù)',`gmv_amount` decimal(16,2) COMMENT '當(dāng)日gmv訂單總金額',`gmv_payment` decimal(16,2) COMMENT '當(dāng)日支付金額' ) COMMENT 'GMV' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_gmv_sum_day/';數(shù)據(jù)導(dǎo)入
hive (gmall)> insert into table ads_gmv_sum_day select '2019-02-10' dt,sum(order_count) gmv_count,sum(order_amount) gmv_amount,sum(payment_amount) payment_amount from dws_user_action where dt ='2019-02-10' group by dt;數(shù)據(jù)導(dǎo)入腳本(ads_db_gmv.sh)
#!/bin/bash# 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive# 如果是輸入的日期按照取輸入日期;如果沒(méi)輸入日期取當(dāng)前時(shí)間的前一天 if [ -n "$1" ] ;thendo_date=$1 else do_date=`date -d "-1 day" +%F` fi sql=" insert into table "$APP".ads_gmv_sum_day select '$do_date' dt,sum(order_count) gmv_count,sum(order_amount) gmv_amount,sum(payment_amount) payment_amount from "$APP".dws_user_action where dt ='$do_date' group by dt; "$hive -e "$sql"新增用戶占日活躍用戶比率(ads_user_convert_day)
建表
hive (gmall)> drop table if exists ads_user_convert_day; create external table ads_user_convert_day( `dt` string COMMENT '統(tǒng)計(jì)日期',`uv_m_count` bigint COMMENT '當(dāng)日活躍設(shè)備',`new_m_count` bigint COMMENT '當(dāng)日新增設(shè)備',`new_m_ratio` decimal(10,2) COMMENT '當(dāng)日新增占日活的比率' ) COMMENT '轉(zhuǎn)化率' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_user_convert_day/';數(shù)據(jù)導(dǎo)入
hive (gmall)> insert into table ads_user_convert_day select'2021-06-09',sum(uc.dc) sum_dc,sum(uc.nmc) sum_nmc,sum( uc.nmc)/sum( uc.dc)*100 new_m_ratio from (selectday_count dc,0 nmcfrom ads_uv_count where dt='2019-02-10'union allselect0 dc,new_mid_count nmcfrom ads_new_mid_countwhere create_date='2019-02-10' )uc;用戶行為漏斗分析(ads_user_action_convert_day)
建表
hive (gmall)> drop table if exists ads_user_action_convert_day; create external table ads_user_action_convert_day(`dt` string COMMENT '統(tǒng)計(jì)日期',`total_visitor_m_count` bigint COMMENT '總訪問(wèn)人數(shù)',`order_u_count` bigint COMMENT '下單人數(shù)',`visitor2order_convert_ratio` decimal(10,2) COMMENT '訪問(wèn)到下單轉(zhuǎn)化率',`payment_u_count` bigint COMMENT '支付人數(shù)',`order2payment_convert_ratio` decimal(10,2) COMMENT '下單到支付的轉(zhuǎn)化率') COMMENT '用戶行為漏斗分析' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_user_action_convert_day/';數(shù)據(jù)導(dǎo)入
hive (gmall)> insert into table ads_user_action_convert_day select '2019-06-09',uv.day_count,ua.order_count,cast(ua.order_count/uv.day_count as decimal(10,2)) visitor2order_convert_ratio,ua.payment_count,cast(ua.payment_count/ua.order_count as decimal(10,2)) order2payment_convert_ratio from ( select dt,sum(if(order_count>0,1,0)) order_count,sum(if(payment_count>0,1,0)) payment_countfrom dws_user_action where dt='2021-06-09' group by dt )ua join ads_uv_count uv on uv.dt=ua.dt;品牌復(fù)購(gòu)率
建表
hive (gmall)> drop table ads_sale_tm_category1_stat_mn; create external table ads_sale_tm_category1_stat_mn ( tm_id string comment '品牌id',category1_id string comment '1級(jí)品類id ',category1_name string comment '1級(jí)品類名稱 ',buycount bigint comment '購(gòu)買(mǎi)人數(shù)',buy_twice_last bigint comment '兩次以上購(gòu)買(mǎi)人數(shù)',buy_twice_last_ratio decimal(10,2) comment '單次復(fù)購(gòu)率',buy_3times_last bigint comment '三次以上購(gòu)買(mǎi)人數(shù)',buy_3times_last_ratio decimal(10,2) comment '多次復(fù)購(gòu)率',stat_mn string comment '統(tǒng)計(jì)月份',stat_date string comment '統(tǒng)計(jì)日期' ) COMMENT '復(fù)購(gòu)率統(tǒng)計(jì)' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_sale_tm_category1_stat_mn/';導(dǎo)入數(shù)據(jù)
hive (gmall)> insert into table ads_sale_tm_category1_stat_mn select mn.sku_tm_id,mn.sku_category1_id,mn.sku_category1_name,sum(if(mn.order_count>=1,1,0)) buycount,sum(if(mn.order_count>=2,1,0)) buyTwiceLast,sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0)) buyTwiceLastRatio,sum(if(mn.order_count>=3,1,0)) buy3timeLast ,sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0)) buy3timeLastRatio ,date_format('2019-02-10' ,'yyyy-MM') stat_mn,'2019-02-10' stat_date from ( select user_id, sd.sku_tm_id,sd.sku_category1_id,sd.sku_category1_name,sum(order_count) order_countfrom dws_sale_detail_daycount sd where date_format(dt,'yyyy-MM')=date_format('2019-02-10' ,'yyyy-MM')group by user_id, sd.sku_tm_id, sd.sku_category1_id, sd.sku_category1_name ) mn group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name;腳本(ads_sale.sh)
#!/bin/bash# 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive# 如果是輸入的日期按照取輸入日期;如果沒(méi)輸入日期取當(dāng)前時(shí)間的前一天 if [ -n "$1" ] ;thendo_date=$1 else do_date=`date -d "-1 day" +%F` fi sql="set hive.exec.dynamic.partition.mode=nonstrict;insert into table "$APP".ads_sale_tm_category1_stat_mn select mn.sku_tm_id,mn.sku_category1_id,mn.sku_category1_name,sum(if(mn.order_count>=1,1,0)) buycount,sum(if(mn.order_count>=2,1,0)) buyTwiceLast,sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0)) buyTwiceLastRatio,sum(if(mn.order_count>=3,1,0)) buy3timeLast,sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0)) buy3timeLastRatio ,date_format('$do_date' ,'yyyy-MM') stat_mn,'$do_date' stat_date from ( select user_id, od.sku_tm_id, od.sku_category1_id,od.sku_category1_name, sum(order_count) order_countfrom "$APP".dws_sale_detail_daycount od where date_format(dt,'yyyy-MM')=date_format('$do_date' ,'yyyy-MM')group by user_id, od.sku_tm_id, od.sku_category1_id, od.sku_category1_name ) mn group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name;" $hive -e "$sql"各用戶等級(jí)對(duì)應(yīng)的復(fù)購(gòu)率前十的商品(ads_ul_rep_ratio)
注:dws層使用用戶購(gòu)買(mǎi)明細(xì)表寬表(dws_sale_detail_daycount)作為dws數(shù)據(jù)
建表(ads_ul_rep_ratio)
drop table ads_ul_rep_ratio; create table ads_ul_rep_ratio( user_level string comment '用戶等級(jí)' ,sku_id string comment '商品id', buy_count bigint comment '購(gòu)買(mǎi)總?cè)藬?shù)', buy_twice_count bigint comment '兩次購(gòu)買(mǎi)總數(shù)',buy_twice_rate decimal(10,2) comment '二次復(fù)購(gòu)率', rank string comment '排名' ,state_date string comment '統(tǒng)計(jì)日期' ) COMMENT '復(fù)購(gòu)率統(tǒng)計(jì)' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_ul_rep_ratio/';導(dǎo)入數(shù)據(jù)
with tmp_count as(select -- 每個(gè)等級(jí)內(nèi)每個(gè)用戶對(duì)每個(gè)產(chǎn)品的下單次數(shù) user_level, user_id,sku_id,sum(order_count) order_countfrom dws_sale_detail_daycountwhere dt<='2019-10-05'group by user_level, user_id, sku_id ) insert overwrite table ads_ul_rep_ratio select* from(selectuser_level,sku_id,sum(if(order_count >=1, 1, 0)) buy_count,sum(if(order_count >=2, 1, 0)) buy_twice_count,sum(if(order_count >=2, 1, 0)) / sum(if(order_count >=1, 1, 0)) * 100 buy_twice_rate,row_number() over(partition by user_level order by sum(if(order_count >=2, 1, 0)) / sum(if(order_count >=1, 1, 0)) desc) rn,'2019-10-05'from tmp_countgroup by user_level, sku_id ) t1 where rn<=10腳本(ads_ul_rep_ratio.sh)
#!/bin/bashdb=gmall hive=/opt/module/hive-1.2.1/bin/hive hadoop=/opt/module/hadoop-2.7.2/bin/hadoopif [[ -n $1 ]]; thendo_date=$1 elsedo_date=`date -d '-1 day' +%F` fisql=" use gmall; with tmp_count as(select -- 每等級(jí)用戶每產(chǎn)品的下單次數(shù)user_level,sku_id,sum(order_count) order_countfrom dws_sale_detail_daycountwhere dt<='$do_date'group by user_level, sku_id ) insert overwrite table ads_ul_rep_ratio select* from(selectuser_level,sku_id,sum(if(order_count >=1, 1, 0)) buy_count,sum(if(order_count >=2, 1, 0)) buy_twice_count,sum(if(order_count >=2, 1, 0)) / sum(if(order_count >=1, 1, 0)) * 100 buy_twice_rate,row_number() over(partition by user_level order by sum(if(order_count >=2, 1, 0)) / sum(if(order_count >=1, 1, 0)) desc) rn,'$do_date'from tmp_countgroup by user_level, sku_id ) t1 where rn<=10 "$hive -e "$sql"新付費(fèi)用戶數(shù)(ads_pay_user_count)
建表
drop table if exists ads_pay_user_count; create external table ads_pay_user_count( dt string COMMENT '統(tǒng)計(jì)日期',pay_count bigint COMMENT '付費(fèi)用戶數(shù)' ) COMMENT '付費(fèi)用戶表' stored as parquet location '/warehouse/gmall/dws/ads_pay_user_count/';導(dǎo)入數(shù)據(jù)
insert into table ads_pay_user_count select'2019-02-10',count(*) pay_count fromdws_pay_user_detail wheredt='2019-02-10';腳本(ads_pay_user_count.sh)
#!/bin/bash# 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive hadoop=/opt/module/hadoop-2.7.2/bin/hadoop# 如果是輸入的日期按照取輸入日期;如果沒(méi)輸入日期取當(dāng)前時(shí)間的前一天 if [ -n "$1" ] ;thendo_date=$1 else do_date=`date -d "-1 day" +%F` fiecho "===日志日期為 $do_date===" sql=" insert into table "$APP".ads_pay_user_count select'$do_date',count(*) pay_count from"$APP".dws_pay_user_detail wheredt='$do_date'; "$hive -e "$sql"新付費(fèi)用戶率()
注:dws層使用付費(fèi)用戶數(shù)以及新增用戶代表作為數(shù)據(jù)源
建表
drop table if exists ads_pay_user_ratio; create external table ads_pay_user_ratio ( dt string comment '統(tǒng)計(jì)日期',pay_count bigint comment '總付費(fèi)用戶數(shù)',user_count bigint comment '總用戶數(shù)',pay_count_ratio decimal(10,2) COMMENT '付費(fèi)用戶比率' ) COMMENT '付費(fèi)用戶率表' stored as parquet location '/warehouse/gmall/dws/ads_pay_user_ratio';導(dǎo)入數(shù)據(jù)
insert into table ads_pay_user_ratio select'2019-02-10' dt,pay_count,new_mid_count,pay_count/new_mid_count*100 pay_count_ratio from(select'2019-02-10' dt,pay_countfromads_pay_user_count)pay_user join(select'2019-02-10' dt, sum(new_mid_count) new_mid_countfromads_new_mid_countwherecreate_date<='2019-02-10')user_total_count onpay_user.dt=user_total_count.dt;腳本(ads_pay_user_ratio.sh)
#!/bin/bash# 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive hadoop=/opt/module/hadoop-2.7.2/bin/hadoop# 如果是輸入的日期按照取輸入日期;如果沒(méi)輸入日期取當(dāng)前時(shí)間的前一天 if [ -n "$1" ] ;thendo_date=$1 else do_date=`date -d "-1 day" +%F` fiecho "===日志日期為 $do_date===" sql=" insert into table "$APP".ads_pay_user_ratio select'$do_date' dt,pay_count,new_mid_count,pay_count/new_mid_count*100 pay_count_ratio from(select'$do_date' dt,pay_countfrom"$APP".ads_pay_user_count)pay_user join(select'$do_date' dt, sum(new_mid_count) new_mid_countfrom"$APP".ads_new_mid_countwherecreate_date<='$do_date')user_total_count onpay_user.dt=user_total_count.dt; "$hive -e "$sql"每個(gè)用戶最近一次購(gòu)買(mǎi)時(shí)間(ads_user_last_pay)
注:dws層使用用戶行為寬表
建表
drop table if exists ads_user_last_pay; create external table ads_user_last_pay(user_id string comment '用戶id',pay_date string comment '最近一次購(gòu)買(mǎi)時(shí)間' ) COMMENT '用戶最近一次購(gòu)買(mǎi)時(shí)間表' stored as parquet location '/warehouse/gmall/dws/ads_user_last_pay/';導(dǎo)入數(shù)據(jù)
初始化數(shù)據(jù) insert into table ads_user_last_pay selectuser_id,'2019-02-10' fromdws_user_action wheredt='2019-02-10'andpayment_amount>0;導(dǎo)入其他日期數(shù)據(jù) insert overwrite table ads_user_last_pay selectif(du.user_id is null, au.user_id, du.user_id), if(du.user_id is null, au.pay_date,'2019-02-11') fromads_user_last_pay au full join(selectuser_idfromdws_user_actionwheredt='2019-02-11'andpayment_amount>0) du onau.user_id=du.user_id;腳本(ads_user_last_pay.sh)
#!/bin/bash# 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive hadoop=/opt/module/hadoop-2.7.2/bin/hadoop# 如果是輸入的日期按照取輸入日期;如果沒(méi)輸入日期取當(dāng)前時(shí)間的前一天 if [ -n "$1" ] ;thendo_date=$1 else do_date=`date -d "-1 day" +%F` fiecho "===日志日期為 $do_date===" sql=" insert overwrite table "$APP".ads_user_last_pay selectif(du.user_id is null, au.user_id, du.user_id), if(du.user_id is null, au.pay_date,'$do_date') from"$APP".ads_user_last_pay au full join(selectuser_idfrom"$APP".dws_user_actionwheredt='$do_date'andpayment_amount>0) du onau.user_id=du.user_id; "$hive -e "$sql"商品每日銷量排行Top10(ads_goods_order_count_day)
注:使用用戶購(gòu)買(mǎi)商品明細(xì)寬表作為dws數(shù)據(jù)
建表
drop table if exists ads_goods_order_count_day; create external table ads_goods_order_count_day( dt string comment '統(tǒng)計(jì)日期',sku_id string comment '商品id',order_count bigint comment '下單次數(shù)' ) COMMENT '商品下單top10' stored as parquet location '/warehouse/gmall/dws/ads_goods_order_count_day/';導(dǎo)入數(shù)據(jù)
insert into table ads_goods_order_count_day select'2019-02-10',sku_id,sum(order_count) order_totla_count fromdws_sale_detail_daycount wheredt='2019-02-10' group by sku_id order byorder_totla_count desc limit 10;腳本(ads_goods_order_count_day.sh)
#!/bin/bash# 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive hadoop=/opt/module/hadoop-2.7.2/bin/hadoop# 如果是輸入的日期按照取輸入日期;如果沒(méi)輸入日期取當(dāng)前時(shí)間的前一天 if [ -n "$1" ] ;thendo_date=$1 else do_date=`date -d "-1 day" +%F` fiecho "===日志日期為 $do_date===" sql=" insert into table "$APP".ads_goods_order_count_day select'$do_date',sku_id,sum(order_count) order_totla_count from"$APP".dws_sale_detail_daycount wheredt='$do_date' group by sku_id order byorder_totla_count desc limit 10; "$hive -e "$sql"統(tǒng)計(jì)每個(gè)月訂單付款率(ads_order2pay_mn)
注:采用用戶行為寬表作為dws層
建表()
drop table if exists ads_order2pay_mn; create external table ads_order2pay_mn (`dt` string COMMENT '統(tǒng)計(jì)日期',`order_u_count` bigint COMMENT '下單人數(shù)',`payment_u_count` bigint COMMENT '支付人數(shù)',`order2payment_convert_ratio` decimal(10,2) COMMENT '下單到支付的轉(zhuǎn)化率') COMMENT '' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ ads_order2pay_mn /';導(dǎo)入數(shù)據(jù)
insert into table ads_order2pay_mn select '2019-02-10',ua.order_count,ua.payment_count,cast(ua.payment_count/ua.order_count as decimal(10,2)) order2payment_convert_ratio from (select dt,sum(if(order_count>0,1,0)) order_count,sum(if(payment_count>0,1,0)) payment_countfrom dws_user_actionwhere date_format(dt,'yyyy-MM')= date_format('2019-02-10','yyyy-MM')group by dt )ua;腳本(ads_order2pay_mn.sh)
#!/bin/bash# 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive hadoop=/opt/module/hadoop-2.7.2/bin/hadoop# 如果是輸入的日期按照取輸入日期;如果沒(méi)輸入日期取當(dāng)前時(shí)間的前一天 if [ -n "$1" ] ;thendo_date=$1 else do_date=`date -d "-1 day" +%F` fiecho "===日志日期為 $do_date===" sql=" insert into table "$APP".ads_order2pay_mn select '$do_date',ua.order_count,ua.payment_count,cast(ua.payment_count/ua.order_count as decimal(10,2)) order2payment_convert_ratio from ( select dt,sum(if(order_count>0,1,0)) order_count,sum(if(payment_count>0,1,0)) payment_count from "$APP".dws_user_action where date_format(dt,'yyyy-MM')=date_format('$do_date','yyyy-MM') group by dt )ua; "$hive -e "$sql"azkaban
a) azkaban-web-server-2.5.0.tar.gz
b) azkaban-executor-server-2.5.0.tar.gz
c) azkaban-sql-script-2.5.0.tar.gz
d) mysql-libs.zip
安裝azkaban
mkdir azkaban
tar -zxvf azkaban-web-server-2.5.0.tar.gz -C /opt/module/azkaban/
tar -zxvf azkaban-executor-server-2.5.0.tar.gz -C /opt/module/azkaban/
tar -zxvf azkaban-sql-script-2.5.0.tar.gz -C /opt/module/azkaban/
mv azkaban-web-2.5.0/ server
mv azkaban-executor-2.5.0/ executor
? 進(jìn)入mysql,創(chuàng)建azkaban數(shù)據(jù)庫(kù),并將解壓的腳本導(dǎo)入到azkaban數(shù)據(jù)庫(kù)。
mysql> create database azkaban;
mysql> use azkaban;
mysql> source /opt/module/azkaban/azkaban-2.5.0/create-all-sql-2.5.0.sql
注:source后跟.sql文件,用于批量處理.sql文件中的sql語(yǔ)句。
生成密鑰庫(kù)
Keytool:是java數(shù)據(jù)證書(shū)的管理工具,使用戶能夠管理自己的公/私鑰對(duì)及相關(guān)證書(shū)。
-keystore:指定密鑰庫(kù)的名稱及位置(產(chǎn)生的各類信息將不在.keystore文件中)
-genkey:在用戶主目錄中創(chuàng)建一個(gè)默認(rèn)文件".keystore"
-alias:對(duì)我們生成的.keystore進(jìn)行指認(rèn)別名;如果沒(méi)有默認(rèn)是mykey
-keyalg:指定密鑰的算法 RSA/DSA 默認(rèn)是DSA
1)生成 keystore的密碼及相應(yīng)信息的密鑰庫(kù)
[ azkaban]$ keytool -keystore keystore -alias jetty -genkey -keyalg RSA
輸入密鑰庫(kù)口令: 再次輸入新口令: 您的名字與姓氏是什么?[Unknown]: 您的組織單位名稱是什么?[Unknown]: 您的組織名稱是什么?[Unknown]: 您所在的城市或區(qū)域名稱是什么?[Unknown]: 您所在的省/市/自治區(qū)名稱是什么?[Unknown]: 該單位的雙字母國(guó)家/地區(qū)代碼是什么?[Unknown]: CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown是否正確?[否]: y輸入 <jetty> 的密鑰口令? (如果和密鑰庫(kù)口令相同, 按回車): 再次輸入新口令:注意:
密鑰庫(kù)的密碼至少必須6個(gè)字符,可以是純數(shù)字或者字母或者數(shù)字和字母的組合等等
密鑰庫(kù)的密碼最好和 的密鑰相同,方便記憶
2)將keystore 拷貝到 azkaban web服務(wù)器根目錄中
mv keystore /opt/module/azkaban/server/
配置文件
Web服務(wù)器配置
1)進(jìn)入azkaban web服務(wù)器安裝目錄 conf目錄,打開(kāi)azkaban.properties文件
pwd
/opt/module/azkaban/server/conf
vim azkaban.properties
2)按照如下配置修改azkaban.properties文件。
#Azkaban Personalization Settings#服務(wù)器UI名稱,用于服務(wù)器上方顯示的名字azkaban.name=Test#描述azkaban.label=My Local Azkaban#UI顏色azkaban.color=#FF3601azkaban.default.servlet.path=/index#默認(rèn)web server存放web文件的目錄web.resource.dir=/opt/module/azkaban/server/web/#默認(rèn)時(shí)區(qū),已改為亞洲/上海 默認(rèn)為美國(guó)default.timezone.id=Asia/Shanghai#Azkaban UserManager classuser.manager.class=azkaban.user.XmlUserManager#用戶權(quán)限管理默認(rèn)類(絕對(duì)路徑)user.manager.xml.file=/opt/module/azkaban/server/conf/azkaban-users.xml#Loader for projects#global配置文件所在位置(絕對(duì)路徑)executor.global.properties=/opt/module/azkaban/executor/conf/global.propertiesazkaban.project.dir=projects#數(shù)據(jù)庫(kù)類型database.type=mysql#端口號(hào)mysql.port=3306#數(shù)據(jù)庫(kù)連接IPmysql.host=hadoop102#數(shù)據(jù)庫(kù)實(shí)例名mysql.database=azkaban#數(shù)據(jù)庫(kù)用戶名mysql.user=root#數(shù)據(jù)庫(kù)密碼mysql.password=123456#最大連接數(shù)mysql.numconnections=100# Velocity dev modevelocity.dev.mode=false# Azkaban Jetty server properties. # Jetty服務(wù)器屬性.#最大線程數(shù)jetty.maxThreads=25#Jetty SSL端口jetty.ssl.port=8443#Jetty端口jetty.port=8081#SSL文件名(絕對(duì)路徑)jetty.keystore=/opt/module/azkaban/server/keystore#SSL文件密碼jetty.password=000000#Jetty主密碼與keystore文件相同jetty.keypassword=000000#SSL文件名(絕對(duì)路徑)jetty.truststore=/opt/module/azkaban/server/keystore#SSL文件密碼jetty.trustpassword=000000# Azkaban Executor settingsexecutor.port=12321# mail settingsmail.sender=mail.host=job.failure.email=job.success.email=lockdown.create.projects=falsecache.directory=cacheweb服務(wù)器用戶配置
在azkaban web服務(wù)器安裝目錄 conf目錄,按照如下配置修改azkaban-users.xml 文件,增加管理員用戶。
conf$ vim azkaban-users.xml
<azkaban-users><user username="azkaban" password="azkaban" roles="admin" groups="azkaban" /><user username="metrics" password="metrics" roles="metrics"/><user username="admin" password="admin" roles="admin" /><role name="admin" permissions="ADMIN" /><role name="metrics" permissions="METRICS"/></azkaban-users>執(zhí)行服務(wù)器配置
1)進(jìn)入執(zhí)行服務(wù)器安裝目錄conf,打開(kāi)azkaban.properties
pwd
/opt/module/azkaban/executor/conf
vim azkaban.properties
1) 按照如下配置修改azkaban.properties文件。
#Azkaban#時(shí)區(qū)default.timezone.id=Asia/Shanghai# Azkaban JobTypes Plugins#jobtype 插件所在位置azkaban.jobtype.plugin.dir=plugins/jobtypes#Loader for projectsexecutor.global.properties=/opt/module/azkaban/executor/conf/global.propertiesazkaban.project.dir=projectsdatabase.type=mysqlmysql.port=3306mysql.host=hadoop102mysql.database=azkabanmysql.user=rootmysql.password=000000mysql.numconnections=100# Azkaban Executor settings#最大線程數(shù)executor.maxThreads=50#端口號(hào)(如修改,請(qǐng)與web服務(wù)中一致)executor.port=12321#線程數(shù)executor.flow.threads=30啟動(dòng)executor服務(wù)器
在executor服務(wù)器目錄下執(zhí)行啟動(dòng)命令
pwd
/opt/module/azkaban/executor
bin/azkaban-executor-start.sh
啟動(dòng)web服務(wù)器
在azkaban web服務(wù)器目錄下執(zhí)行啟動(dòng)命令
[atguigu@hadoop102 server]$ pwd
/opt/module/azkaban/server
bin/azkaban-web-start.sh
注意:
先執(zhí)行executor,再執(zhí)行web,避免Web Server會(huì)因?yàn)檎也坏綀?zhí)行器啟動(dòng)失敗。
jps查看進(jìn)程
[atguigu@hadoop102 server]$ jps
3601 AzkabanExecutorServer
5880 Jps
3661 AzkabanWebServer
啟動(dòng)完成后,在瀏覽器(建議使用谷歌瀏覽器)中輸入https://服務(wù)器IP地址:8443,即可訪問(wèn)azkaban服務(wù)了。
在登錄中輸入剛才在azkaban-users.xml文件中新添加的戶用名及密碼,點(diǎn)擊 login。
安裝Presto
[root@hadoop103 soft]# tar -zxvf presto-server-0.196.tar.gz -C …/module
[root@hadoop103 soft]# mv presto-cli-0.196-executable.jar presto-cli
[root@hadoop103 soft]# mv presto-cli /opt/module/
[root@hadoop103 module]# mv presto-server-0.196 presto
進(jìn)入到/opt/module/presto目錄,并創(chuàng)建存儲(chǔ)數(shù)據(jù)文件夾
[root@hadoop103 presto]# mkdir data
[root@hadoop103 presto]# mkdir etc
配置在/opt/module/presto/etc目錄下添加jvm.config配置文件
-server -Xmx16G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryErrorPresto可以支持多個(gè)數(shù)據(jù)源,在Presto里面叫catalog,這里我們配置支持Hive的數(shù)據(jù)源,配置一個(gè)Hive的catalog
[root@hadoop103 etc]# mkdir catalog
[root@hadoop103 catalog]# vim hive.properties
xsync presto
分發(fā)之后,分別進(jìn)入hadoop102、hadoop103、hadoop104三臺(tái)主機(jī)的/opt/module/presto/etc的路徑。配置node屬性,node id每個(gè)節(jié)點(diǎn)都不一樣。
[root@hadoop102 etc]#vim node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-fffffffffffe
node.data-dir=/opt/module/presto/data
[root@hadoop103 etc]#vim node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/opt/module/presto/data
[root@hadoop104 etc]#vim node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-fffffffffffd
node.data-dir=/opt/module/presto/data
Presto是由一個(gè)coordinator節(jié)點(diǎn)和多個(gè)worker節(jié)點(diǎn)組成。在hadoop102上配置成coordinator,在hadoop103、hadoop104上配置為worker。
(1)hadoop102上配置coordinator節(jié)點(diǎn)
[root@hadoop103 etc]$ vim config.properties
添加內(nèi)容如下
coordinator=truenode-scheduler.include-coordinator=falsehttp-server.http.port=8881query.max-memory=50GBdiscovery-server.enabled=truediscovery.uri=http://hadoop103:8881(2)hadoop103、hadoop104上配置worker節(jié)點(diǎn)
[root@hadoop102 etc]$ vim config.properties
添加內(nèi)容如下
coordinator=falsehttp-server.http.port=8881query.max-memory=50GBdiscovery.uri=http://hadoop103:8881[root@hadoop104 etc]$ vim config.properties
添加內(nèi)容如下
coordinator=falsehttp-server.http.port=8881query.max-memory=50GBdiscovery.uri=http://hadoop103:8881在hadoop103的/opt/module/hive目錄下,啟動(dòng)Hive Metastore,用root角色
[root@hadoop103 hive]$nohup bin/hive --service metastore >/dev/null 2>&1 &
前臺(tái)啟動(dòng)Presto,控制臺(tái)顯示日志
[root@hadoop102 presto]$ bin/launcher run
[root@hadoop103 presto]$ bin/launcher run
[root@hadoop104 presto]$ bin/launcher run
后臺(tái)啟動(dòng)Presto
[root@hadoop102 presto]$ bin/launcher start
[root@hadoop103 presto]$ bin/launcher start
[root@hadoop104 presto]$ bin/launcher start
日志查看路徑/opt/module/presto/data/var/log
進(jìn)入/opt/module/presto/plugin/hive-hadoop2
[root@hadoop103 hive-hadoop2]# cp /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar ./
xsync hadoop-lzo-0.4.20.jar
連接presto
java -jar presto-cli --server hadoop103:8881 --catalog hive --schema default
安裝druid
解壓
進(jìn)入/opt/module/imply-2.7.10/conf/druid/_common
vim common.runtime.properties
修改druid.zk.service.host=hadoop102:2181,hadoop103:2181,hadoop104:2181
vim /opt/module/imply/conf/supervise/quickstart.conf
修改
:verify bin/verify-java
#:verify bin/verify-default-ports
#:verify bin/verify-version-check
:kill-timeout 10
#!p10 zk bin/run-zk conf-quickstart
(1)啟動(dòng)Zookeeper
zk start
(2)啟動(dòng)kafka
kf start
(2)啟動(dòng)imply
bin/supervise -c conf/supervise/quickstart.conf
web界面
hadoop103:9095
安裝hbase
tar -zxvf hbase-1.3.1-bin.tar.gz -C /opt/module/
改名
mv hbase-1.3.1 hbase
配置環(huán)境變量
vim /etc/profile
HBASE_HOME=/opt/module/hbase
配置文件
進(jìn)入/opt/module/hbase/conf
vim hbase-env.sh
將下面代碼注釋掉
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m" export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"export HBASE_MANAGES_ZK=false將注釋去掉,并將true改為false
vim hbase-site.xml
添加如下代碼
<configuration><property> <name>hbase.rootdir</name> <value>hdfs://hadoop102:9000/hbase</value> </property><property> <name>hbase.cluster.distributed</name><value>true</value></property><!-- 0.98后的新變動(dòng),之前版本沒(méi)有.port,默認(rèn)端口為60000 --><property><name>hbase.master.port</name><value>16000</value></property><property> <name>hbase.zookeeper.quorum</name><value>hadoop102,hadoop103,hadoop104</value></property><property> <name>hbase.zookeeper.property.dataDir</name><value>/opt/module/zookeeper-3.4.10/datas</value></property> </configuration>vim regionservers
添加
hadoop102 hadoop103 hadoop104啟動(dòng)
[hbase]$ bin/start-hbase.sh
對(duì)應(yīng)的停止服務(wù):
[hbase]$ bin/stop-hbase.sh
啟動(dòng)成功后,可以通過(guò)“host:port”的方式來(lái)訪問(wèn)HBase管理頁(yè)面,例如:
http://hadoop103:16010
安裝kylin
tar -zxvf apache-kylin-2.5.1-bin-hbase1x.tar.gz -C /opt/module/
mv apache-kylin-2.5.1 kylin
啟動(dòng)
bin/kylin.sh start
kylin界面
http://hadoop103:7070/kylin
kylin的使用
1、基于之前的雪花模型或星座模型
2、基于之前的建模構(gòu)建諾干cube
3、寫(xiě)sql查詢即可
具體步驟
1、創(chuàng)建一個(gè)project
2、選中當(dāng)前的project,向proje中加載表(hive中)
? 加載一個(gè)事實(shí)表:dwd_payment_info
? 加載兩個(gè)維度表:dwd_order_info,dwd_user_info
3、建模(星型模型或雪花模型)
4、創(chuàng)建cube
? 注意:要求事實(shí)表和維度表在關(guān)聯(lián)時(shí)不能出現(xiàn)主鍵重復(fù)的現(xiàn)象
? 在關(guān)聯(lián)時(shí),對(duì)于每日全量同步的維度表,最好只選擇最新的日期的數(shù)據(jù)進(jìn)行關(guān)聯(lián)
? ①將最新的分區(qū)數(shù)據(jù),查詢,寫(xiě)入一個(gè)臨時(shí)表,關(guān)聯(lián)時(shí),指定臨時(shí)表作為關(guān)聯(lián)的維度
? 表
? ②創(chuàng)建view,將最新的分區(qū)的數(shù)據(jù),放入視圖(虛表中)(推薦)
CREATE VIEW dwd_user_info_view as SELECT * from dwd_user_info WHERE dt='2021-06-10'; CREATE VIEW dwd_order_info_view as SELECT * from dwd_user_info WHERE dt='2021-06-10;5、查詢
? 只能寫(xiě)select支持使用聚集函數(shù)和group by
? 在join時(shí),事實(shí)表必須放在左邊
6、選擇拉鏈表作為維度表時(shí)的注意事項(xiàng)
? dwd_order_info: 每日全量-----維度表id不能重復(fù)—只能選擇今天最新的分區(qū)
? 每日增量同步,今天最新的分區(qū)只有今天最新的數(shù)據(jù),沒(méi)有之前的數(shù)據(jù)!
? dwd_order_info_his: 同步所有數(shù)據(jù)的狀態(tài)變化
? 含有全部信息
安裝zeeplin
不用配置
訪問(wèn)http://hadoop103:8080/#/
總結(jié)
- 上一篇: go-colly官方文档翻译(持续翻译中
- 下一篇: 超级计算机日记300字,真实日记300字