日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

UserBehavior 阿里巴巴淘宝用户行为数据字段分析

發布時間:2023/12/20 编程问答 21 豆豆
生活随笔 收集整理的這篇文章主要介紹了 UserBehavior 阿里巴巴淘宝用户行为数据字段分析 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

[root@gree139 exam]# hdfs dfs -mkdir -p /data/userbehavior

[root@gree139 exam]# hdfs dfs -put ./UserBehavior.csv /data/userbehavior

[root@gree139 exam]# hdfs dfs -ls /data/userbehavior

1 請在 HDFS 中創建目錄/data/userbehavior,并將 UserBehavior.csv 文件傳到該目

錄。(5 分)

[root@gree139 exam]# hdfs dfs -mkdir -p /data/userbehavior

[root@gree139 exam]# hdfs dfs -put ./UserBehavior.csv /data/userbehavior

2 通過 HDFS 命令查詢出文檔有多少行數據。(5 分)

[root@gree139 exam]# hdfs dfs -cat /data/userbehavior/UserBehavior.csv | wc -l

561294

Client連接hive要啟動hiveserver2

[root@gree139 hive110]# nohup ./bin/hive --service hiveserver2 &

[root@gree139 hive110]# nohup ./bin/hive --service metastore &

  • 請在 Hive 中創建數據庫 exam5 分)
  • hive> create database exam;

  • 請在 exam 數據庫中創建外部表 userbehavior,并將 HDFS 數據映射到表中(5 分)
  • use exam;create external table if not exists? userbehavior(user_id int,item_id int,category_id int,behavior_type string,time bigint) row format delimited fields terminated by ','stored as textfile location '/data/userbehavior'

    3) 請在 HBase 中創建命名空間 exam,并在命名空間 exam 創建 userbehavior 表,包

    含一個列簇 info5 分)

    hbase(main):004:0> disable 'exam:userbehavior'

    0 row(s) in 2.3720 seconds

    hbase(main):005:0> drop 'exam:userbehavior'

    0 row(s) in 1.2500 seconds

    hbase(main):007:0> create_namespace 'exam'

    hbase(main):007:0> create 'exam:userbehavior','info'

    hbase(main):009:0> count 'exam:userbehavior'

    4) 請在 Hive 中創建外部表 userbehavior_hbase,并映射到 HBase 中(5 分),并將數

    據加載到 HBase 中(5 分)

    create external table if not exists? userbehavior_hbase(user_id int,item_id int,category_id int,behavior_type string,time bigint) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'with serdeproperties("hbase.columns.mapping"=":key,info:item_id,info:category_id,info:behavior_type,info:time")tblproperties ("hbase.table.name"="exam:userbehavior");

    insert into userbehavior_hbase select * from exam.userbehavior;

    hbase(main):015:0> scan 'exam:userbehavior'

    hbase(main):013:0> get 'exam:userbehavior','108982','info'

    hbase(main):013:0> get 'exam:userbehavior','108982','info'

    ?

    5) 請在 exam 數據庫中創建內部分區表 userbehavior_partitioned(按照日期進行分區),

    并通過查詢 userbehavior 表將時間戳格式化為--日 時::格式,將數據插

    入至 userbehavior_partitioned 表中,例如下圖:(15 分)

    drop table userbehavior_partitioned;create table userbehavior_partitioned(user_id int,item_id int,category_id int,behavior_type string,time string) partitioned by (dt string) stored as orc;set hive.exec.dynamic.partition=true;set hive.exec.dynamic.partition.mode=nonstrict;insert into userbehavior_partitioned partition (dt)select user_id,item_id,category_id,behavior_type,from_unixtime(time,'YYYY-MM-dd HH:mm:ss') as time,from_unixtime(time,'YYYY-MM-dd') as dtfrom userbehavior;show partitions userbehavior_partitioned;select * from userbehavior_partitioned;

    3.用戶行為分析(20 分)

    請使用 Spark,加載 HDFS 文件系統 UserBehavior.csv 文件,并分別使用 RDD 完成以下

    分析。

    scala> val fileRdd = sc.textFile("/data/userbehavior/")

    scala> val userbehaviorRdd =? fileRdd.map(x=>x.split(",")).filter(x=>x.length==5)

    ?

    1 統計 uv 值(一共有多少用戶訪問淘寶)(10 分)

    scala> userbehaviorRdd.map(x=>x(0)).distinct().count

    res3: Long = 5458

    scala> userbehaviorRdd.groupBy(x=>x(0)).count

    res5: Long = 5458

    2 分別統計瀏覽行為為點擊,收藏,加入購物車,購買的總數量(10 分)

    scala> userbehaviorRdd.map(x=>(x(3),1)).reduceByKey(_+_).collect.foreach(println)

    (cart,30888)

    (buy,11508)

    (pv,503881)

    (fav,15017)

    scala> userbehaviorRdd.map(x=>(x(3),1)).groupByKey().map(x=>(x._1,x._2.toList.size)).collect.foreach(println)

    (cart,30888)

    (buy,11508)

    (pv,503881)

    (fav,15017)

    4.找出有價值的用戶(30 分)

    1 使用 SparkSQL 統計用戶最近購買時間。以 2017-12-03 為當前日期,計算時間范圍

    為一個月,計算用戶最近購買時間,時間的區間為 0-30 天,將其分為 5 檔,0-6 ,7-12

    ,13-18 ,19-24 ,25-30 天分別對應評分 4 015 分)

    Hive-->> select t.user_id ,(case when t.diff between 0 and 6 then 4when t.diff between 7 and 12 then 3when t.diff between 13 and 18 then 2when t.diff between 19 and 24 then 1when t.diff between 25 and 30 then 0else null end) levelfrom(select user_id, datediff('2017-12-03', max(dt)) diff , max(dt) maxnumfrom exam.userbehavior_partitioned group by user_id) t;

    Sparksql-->>>

    scala> spark.sql("select t.user_id,(case when t.diff between 0 and 6 then 4 when t.diff between 7 and 12 then 3 when t.diff between 13 and 18 then 2? when t.diff between 19 and 24 then 1 when t.diff between 25 and 30 then 0 else null end ) level from(select user_id, datediff('2017-12-03', max(dt)) diff , max(dt) maxnum from exam.userbehavior_partitioned group by user_id) t").show()

    scala> spark.sql("""

    ???? | select t.user_id ,

    ?? ??|??????? (

    ???? |???????? case when t.diff between 0 and 6 then 4

    ???? |??????????? when t.diff between 7 and 12 then 3

    ???? |??????????? when t.diff between 13 and 18 then 2

    ???? |??????????? when t.diff between 19 and 24 then 1

    ???? |??????????? when t.diff between 25 and 30 then 0

    ???? |??????????? else null end

    ???? |???????? ) level

    ???? | from

    ???? | (select user_id, datediff('2017-12-03', max(dt)) diff , max(dt) maxnum

    ???? | from exam.userbehavior_partitioned group by user_id) t

    ???? | """).show()

    ?

    2 使用 SparkSQL 統計用戶的消費頻率。以 2017-12-03 為當前日期,計算時間范圍為

    一個月,計算用戶的消費次數,用戶中消費次數從低到高為 1-161 次,將其分為 5

    檔,1-3233-6465-9697-128129-161 分別對應評分 0 415 分)

    Hive-->>>

    select t.user_id ,
    ?????? (
    ??????? case when t.num between 129 and 161 then 4
    ?????????? when t.num between 97 and 128 then 3
    ?????????? when t.num between 65 and 96 then 2
    ?????????? when t.num between 33 and 64 then 1
    ?????????? when t.num between 1 and 32 then 0
    ?????????? else null end
    ??????? ) level
    from
    ???? (
    select user_id, count(user_id) num
    from exam.userbehavior_partitioned where behavior_type="buy"
    and dt between '2017-11-03' and '2017-12-03'
    group by user_id) t

    Sparksql--->>

    scala> spark.sql("""

    ???? | select t.user_id ,

    ???? |??????? (

    ???? |???????? case when t.num between 129 and 161 then 4

    ???? |??????????? when t.num between 97 and 128 then 3

    ???? |??????????? when t.num between 65 and 96 then 2

    ???? |??????????? when t.num between 33 and 64 then 1

    ???? |??????????? when t.num between 1 and 32 then 0

    ???? |??????????? else null end

    ???? |???????? ) level

    ???? | from

    ???? |????? (

    ???? | select user_id, count(user_id) num

    ???? | from exam.userbehavior_partitioned where behavior_type="buy"

    ???? | and dt between '2017-11-03' and '2017-12-03'

    ???? | group by user_id) t

    ???? | """).show()

    查看 購買次數等級

    select t2.user_id,t2.level from
    (
    select t.user_id ,
    ?????? (
    ??????? case when t.num between 129 and 161 then 4
    ?????????? when t.num between 97 and 128 then 3
    ?????????? when t.num between 65 and 96 then 2
    ?????????? when t.num between 33 and 64 then 1
    ?????????? when t.num between 1 and 32 then 0
    ?????????? else null end
    ??????? ) level
    from
    ???? (
    select user_id, count(user_id) num
    from exam.userbehavior_partitioned where behavior_type="buy"
    and dt between '2017-11-03' and '2017-12-03'
    group by user_id) t) t2 where t2.level in (1,2,3,4);
    ?

    use exam;create external table if not exists userbehavior( user_id int, item_id int, category_id int, behavior_type string, time bigint )row format delimited fields terminated by ',' stored as textfile location '/data/userbehavior';create external table if not exists userbehavior_hbase( user_id int, item_id int, category_id int, behavior_type string, time bigint )stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ("hbase.columns.mapping"=":key,info:item_id,info:category_id,info:behavior_type,info:time") tblproperties ("hbase.table.name"="exam:userbehavior");select * from exam.userbehavior;insert into userbehavior_hbase select * from userbehavior;select count(*) from userbehavior_hbase;create table userbehavior_partitioned( user_id int, item_id int, category_id int, behavior_type string, time string )partitioned by (dt string) stored as orc;set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict;insert into userbehavior_partitioned partition (dt) select user_id,item_id,category_id,behavior_type,from_unixtime(time,'YYYY-MM-dd HH:mm:ss') as time,from_unixtime(time,'YYYY-MM-dd') as dt from userbehavior;show partitions userbehavior_partitioned; select * from userbehavior_partitioned;select t.user_id,(case when t.diff between 0 and 6 then 4when t.diff between 7 and 12 then 3when t.diff between 13 and 18 then 2when t.diff between 19 and 24 then 1when t.diff between 25 and 30 then 0else null end)level from (select user_id,datediff('2017-12-03',max(dt)) diff from exam.userbehavior_partitioned group by user_id) t;select datediff('2017-12-03','2017-12-10');select t2.user_id,t2.level from (select t.user_id,(case when t.num between 129 and 161 then 4when t.num between 97 and 128 then 3when t.num between 65 and 96 then 2when t.num between 33 and 64 then 1when t.num between 1 and 32 then 0else null end)level from (select user_id,count(user_id) num from exam.userbehavior_partitioned where behavior_type="buy" and dt between '2017-11-03' and '2017-12-03' group by user_id) t) t2 where t2.level in (1,2,3,4);

    總結

    以上是生活随笔為你收集整理的UserBehavior 阿里巴巴淘宝用户行为数据字段分析的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

    主站蜘蛛池模板: 看污片网站 | 天天色天 | 国产露脸国语对白在线 | 国产精品成人在线 | 亚洲成人一级片 | 成人久久久精品乱码一区二区三区 | 性欧美8khd高清极品 | 免费在线播放毛片 | 日本做爰全过程免费看 | 黄色性生活一级片 | 伊人天堂av | 精品人妻一区二区三区视频 | av看片资源 | 高清av网址 | 亚洲国语 | 日韩视频成人 | 成人国产精品久久久 | 老妇裸体性激交老太视频 | 乌克兰av在线 | 人妻体内射精一区二区三区 | 麻豆av在线播放 | 欧美日韩一区在线播放 | 青娱乐97| 亚洲资源在线观看 | 午夜激情免费 | 国产精品久久久久久久久久久久久久 | 少妇黄色一级片 | 国产香蕉一区 | 亚洲精品综合在线观看 | 国产日韩中文 | 米奇影视第四色 | 成人91av | 哪里可以看免费毛片 | 色视频在线播放 | 中国一级大黄大黄大色毛片 | 欧美一区影院 | 哪里可以免费看毛片 | 亚洲综合一区中 | 久久久久人妻一区二区三区 | 亚洲色鬼 | 91精品久久久久久久久中文字幕 | 久久久999久久久 | 欧美色图亚洲天堂 | 久久叉| 国产日皮视频 | 日日干夜夜草 | 日韩在线免费播放 | 久久精品国产99精品国产亚洲性色 | 特级西西444www高清大视频 | 国内成人av | 国产免费99 | 天堂av免费看 | 国产丝袜网站 | 欧洲mv日韩mv国产 | 初高中福利视频网站 | 精品国产乱子伦一区二区 | 国产极品福利 | 国产精品中文无码 | 91av免费观看 | 秋霞毛片少妇激情免费 | 伊人影院综合 | 狠狠躁夜夜躁人人爽天天高潮 | 天天干天天操天天插 | 狠狠狠狠狠狠狠 | av影片在线看 | 国产成人午夜精品无码区久久 | 五级黄高潮片90分钟视频 | 美腿丝袜一区二区三区 | 久久精品一二三区 | 国产69久久精品成人看 | 另类尿喷潮videofree | 免费黄色小视频在线观看 | 久久嫩草 | 免费黄视频在线观看 | 高清无码一区二区在线观看吞精 | 欧美色xxxxx 日本精品一区二区三区四区的功能 | 午夜两性视频 | 国产成人无码精品久久久久 | 无人码人妻一区二区三区免费 | 日韩女优中文字幕 | 呦呦av| 国产精品一区二区无码对白 | 欧美顶级少妇做爰hd | 国产夫妻在线观看 | 今天高清视频在线观看视频 | 潘金莲一级淫片免费放动漫 | 国产麻豆天美果冻无码视频 | 成人a视频在线观看 | 超碰碰碰碰| 国产又色又爽又黄的 | 公侵犯人妻中文字慕一区二区 | 不良视频在线观看 | 日韩高清在线播放 | 欧美日韩亚洲国产 | 国产强伦人妻毛片 | 中日韩免费视频 | 亚洲美女激情视频 | 久久久久久久影院 | 国产黄色免费网站 |