日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Spark MaprLab-Auction Data分析

發(fā)布時(shí)間:2023/12/10 编程问答 34 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Spark MaprLab-Auction Data分析 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

2019獨(dú)角獸企業(yè)重金招聘Python工程師標(biāo)準(zhǔn)>>>

一、環(huán)境安裝

1.安裝hadoop

http://my.oschina.net/u/204498/blog/519789

2.安裝spark


3.啟動(dòng)hadoop

4.啟動(dòng)spark

二、

1.數(shù)據(jù)準(zhǔn)備

從MAPR官網(wǎng)上下載數(shù)據(jù)DEV360DATA.zip并上傳到server上。

[hadoop@hftclclw0001?spark-1.5.1-bin-hadoop2.6]$?pwd /home/hadoop/spark-1.5.1-bin-hadoop2.6[hadoop@hftclclw0001?spark-1.5.1-bin-hadoop2.6]$?cd?test-data/[hadoop@hftclclw0001?test-data]$?pwd /home/hadoop/spark-1.5.1-bin-hadoop2.6/test-data/DEV360Data[hadoop@hftclclw0001?DEV360Data]$?ll total?337940 -rwxr-xr-x?1?hadoop?root????575014?Jun?24?16:18?auctiondata.csv????????=>c測(cè)試用到的數(shù)據(jù) -rw-r--r--?1?hadoop?root??57772855?Aug?18?20:11?sfpd.csv -rwxrwxrwx?1?hadoop?root?287692676?Jul?26?20:39?sfpd.json[hadoop@hftclclw0001?DEV360Data]$?more?auctiondata.csv? 8213034705,95,2.927373,jake7870,0,95,117.5,xbox,3 8213034705,115,2.943484,davidbresler2,1,95,117.5,xbox,3 8213034705,100,2.951285,gladimacowgirl,58,95,117.5,xbox,3 8213034705,117.5,2.998947,daysrus,10,95,117.5,xbox,3 8213060420,2,0.065266,donnie4814,5,1,120,xbox,3 8213060420,15.25,0.123218,myreeceyboy,52,1,120,xbox,3 ... ...#數(shù)據(jù)結(jié)構(gòu)如下 auctionid,bid,bidtime,bidder,bidrate,openbid,price,itemtype,daystolve#把數(shù)據(jù)上傳到HDFS中 [hadoop@hftclclw0001?DEV360Data]$?hdfs?dfs?-mkdir?-p?/spark/exer/mapr [hadoop@hftclclw0001?DEV360Data]$?hdfs?dfs?-put?auctiondata.csv?/spark/exer/mapr [hadoop@hftclclw0001?DEV360Data]$?hdfs?dfs?-ls?/spark/exer/mapr Found?1?items -rw-r--r--???2?hadoop?supergroup?????575014?2015-10-29?06:17?/spark/exer/mapr/auctiondata.csv

2.運(yùn)行spark-shell 我用的scala.并針對(duì)以下task,進(jìn)行分析

tasks:

a.How many items were sold?

b.How many bids per item type?

c.How many different kinds of item type?

d.What was the minimum number of bids?

e.What was the maximum number of bids?

f.What was the average number of bids?

[hadoop@hftclclw0001?spark-1.5.1-bin-hadoop2.6]$?pwd /home/hadoop/spark-1.5.1-bin-hadoop2.6[hadoop@hftclclw0001?spark-1.5.1-bin-hadoop2.6]$?./bin/spark-shell? ... ... scala?>#首先從HDFS加載數(shù)據(jù)生成RDD scala?>?val?originalRDD?=?sc.textFile("/spark/exer/mapr/auctiondata.csv") ... ... scala?>?originalRDD??????==>我們來分析下originalRDD的類型?RDD[String]?可以看做是一條條String的數(shù)組,Array[String] res26:?org.apache.spark.rdd.RDD[String]?=?MapPartitionsRDD[1]?at?textFile?at?<console>:21##根據(jù)“,”把每一行分隔使用map scala?>?val?auctionRDD?=?originalRDD.map(_.split(",")) scala>?auctionRDD????????==>我們來分析下auctionRDD的類型?RDD[Array[String]]?可以看做是String的數(shù)組,但元素依然是數(shù)組即,可以認(rèn)為Array[Array[string]] res17:?org.apache.spark.rdd.RDD[Array[String]]?=?MapPartitionsRDD[5]?at?map?at?<console>:23

a.How many items were sold?

?==> val count = auctionRDD.map(bid => bid(0)).distinct().count()

根據(jù)auctionid去重即可:每條記錄根據(jù)“,”分隔,再去重,再計(jì)數(shù)

#獲取第一列,即獲取auctionid,依然用map #可以這么理解下面一行,由于auctionRDD是Array[Array[String]]那么進(jìn)行map的每個(gè)參數(shù)類型是Array[String],由于actionid是數(shù)組的第一位,即獲取第一個(gè)元素Array(0),注意是()不是[] scala>?val?auctionidRDD?=?auctionRDD.map(_(0)) ... ...scala>?auctionidRDD????????==>我們來分析下auctionidRDD的類型?RDD[String]?,理解為Array[String],即所有的auctionid的數(shù)組 res27:?org.apache.spark.rdd.RDD[String]?=?MapPartitionsRDD[17]?at?map?at?<console>:26#對(duì)auctionidRDD去重 scala?>?val?auctionidDistinctRDD=auctionidRDD.distinct()#計(jì)數(shù) scala?>?auctionidDistinctRDD.count() ... ...

b.How many bids per item type?

===> auctionRDD.map(bid => (bid(7),1)).reduceByKey((x,y) => x + y).collect()

#map每一行,獲取出第7列,即itemtype那一列,輸出(itemtype,1) #可以看做輸出的類型是(String,Int)的數(shù)組 scala?>?auctionRDD.map(bid=>(bid(7),1)) res30:?org.apache.spark.rdd.RDD[(String,?Int)]?=?MapPartitionsRDD[26]?at?map?at?<console>:26 ...#reduceByKey即按照key進(jìn)行reduce #解析下reduceByKey對(duì)于相同的key,? #(xbox,1)(xbox,1)(xbox,1)(xbox,1)...(xbox,1)?==>?reduceByKey?==>?(xbox,(..(((1?+?1)?+?1)?+?...?+?1)) scala?>?auctionRDD.map(bid=>(bid(7),1)).reduceByKey((x,y)?=>?x?+?y) #類型依然是(String,Int)的數(shù)組?String=>itemtype?Int已經(jīng)是該itemtype的計(jì)數(shù)總和了 res31:?org.apache.spark.rdd.RDD[(String,?Int)]?=?ShuffledRDD[28]?at?reduceByKey?at?<console>:26#通過collect()?轉(zhuǎn)換成?Array類型數(shù)組 scala?>?auctionRDD.map(bid=>(bid(7),1)).reduceByKey((x,y)?=>?x?+?y).collect()res32:?Array[(String,?Int)]?=?Array((palm,5917),?(cartier,1953),?(xbox,2784))


轉(zhuǎn)載于:https://my.oschina.net/u/204498/blog/523576

總結(jié)

以上是生活随笔為你收集整理的Spark MaprLab-Auction Data分析的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。