日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spark:如何替换sc.parallelize(List(item1,item2)).collect().foreach(row={})为并行?

發布時間:2025/3/15 编程问答 41 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Spark:如何替换sc.parallelize(List(item1,item2)).collect().foreach(row={})为并行? 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

代碼場景:

1)設定的幾種數據場景,遍歷所有場景:依次統計滿足每種場景條件下的數據,并把統計結果存入hive;

2)已有代碼如下:

case class IndoorOTTCalibrateBuildingVecotrLegend(oid: Int, minHeight: Int, maxHeight: Int, minGridIDCount: Int, maxGridIDCount: Int, heightType: Int) extends Serializable // 實例化建筑物區間段:按照柵格的個數(面積)、樓的高度(商場等場景)來劃分場景val buildingHeightLegends = List(IndoorOTTCalibrateBuildingVecotrLegend(1, 1, 30, 1, 21, BuildingCalibrateHeightType.HeightType1.toString.toInt),IndoorOTTCalibrateBuildingVecotrLegend(2, 1, 30, 21, 45, BuildingCalibrateHeightType.HeightType2.toString.toInt),IndoorOTTCalibrateBuildingVecotrLegend(3, 1, 30, 45, 100, BuildingCalibrateHeightType.HeightType3.toString.toInt),IndoorOTTCalibrateBuildingVecotrLegend(4, 30, 50, 1, 21, BuildingCalibrateHeightType.HeightType4.toString.toInt),IndoorOTTCalibrateBuildingVecotrLegend(5, 30, 50, 21, 45, BuildingCalibrateHeightType.HeightType5.toString.toInt),IndoorOTTCalibrateBuildingVecotrLegend(6, 30, 50, 45, 100, BuildingCalibrateHeightType.HeightType6.toString.toInt),IndoorOTTCalibrateBuildingVecotrLegend(7, 50, 5000, 1, 100, BuildingCalibrateHeightType.HeightType7.toString.toInt))spark.sparkContext.parallelize(buildingHeightLegends).collect().foreach(buildingHeightLegend => {generateSampleBySenceType(spark, p_city, p_hour_start, p_hour_end, p_fpb_day, p_day_sample, linkLossCalibrateParameter, buildingHeightLegend)})

備注:

在generateSampleBySenceType()函數內部包含有:

spark.sql(s"""
|xxx |where t10.heihgt>=${buildingHieghtLegend.MinHeight} and t10.height<${buildingHieghtLegend.MaxHeight} |and t10.gridcount<=${buildingHieghtLegend.MinGridIDCount} and t10.gridcount>${buildingHieghtLegend.MaxGridIDCount}
|""".stripMargin)

如果把代碼修改:

val buildingHeightLegends_df = spark.sqlContext.createDataFrame(buildingHeightLegends)buildingHeightLegends_df.createOrReplaceTempView("temp_buildingheightlegends")sql(s"""|select * from temp_buildingheightlegends""".stripMargin).repartition(buildingHeightLegends.length).foreachPartition(rows => {for (row <- rows) {val buildingHeightLegend = new IndoorOTTCalibrateBuildingVecotrLegend(row.getAs[Int]("oid"),row.getAs[Int]("minheight"),row.getAs[Int]("maxheight"),row.getAs[Int]("mingrididcount"),row.getAs[Int]("maxgrididcount"),row.getAs[Int]("heighttype"))generateSampleBySenceType(spark, p_city, p_hour_start, p_hour_end, p_fpb_day, p_day_sample, linkLossCalibrateParameter, buildingHeightLegend)}})

則會提示:generateSampleBySenceType()內部sql代碼位置拋出SparkSession為NULL的異常。

修改方案:

把buildingHeightLegends注冊為臨時表temp_buildingHeightLegends,去掉外層的foreach,之后在generateSampleBySenceType()內部把temp_buildingHeightLegends與其他結果集合進行cross join:

測試代碼如下:

-- 場景表 CREATE TABLE [dbo].[test_senceitems]([sencetype] [int] NULL,[minheight] [int] NULL,[maxheight] [int] NULL,[mingridcount] [int] NULL,[maxgridcount] [int] NULL ) INSERT [dbo].[test_senceitems] ([sencetype], [minheight], [maxheight], [mingridcount], [maxgridcount]) VALUES (1, 1, 30, 1, 21) INSERT [dbo].[test_senceitems] ([sencetype], [minheight], [maxheight], [mingridcount], [maxgridcount]) VALUES (2, 1, 30, 21, 45) INSERT [dbo].[test_senceitems] ([sencetype], [minheight], [maxheight], [mingridcount], [maxgridcount]) VALUES (3, 1, 30, 45, 100) INSERT [dbo].[test_senceitems] ([sencetype], [minheight], [maxheight], [mingridcount], [maxgridcount]) VALUES (4, 30, 50, 1, 21) INSERT [dbo].[test_senceitems] ([sencetype], [minheight], [maxheight], [mingridcount], [maxgridcount]) VALUES (5, 30, 50, 21, 45) INSERT [dbo].[test_senceitems] ([sencetype], [minheight], [maxheight], [mingridcount], [maxgridcount]) VALUES (6, 30, 50, 45, 100) INSERT [dbo].[test_senceitems] ([sencetype], [minheight], [maxheight], [mingridcount], [maxgridcount]) VALUES (7, 50, 5000, 1, 100)-- 業務過濾統計表 CREATE TABLE [dbo].[test_grid]([gridid] [nvarchar](50) NULL,[height] [int] NULL,[gridcount] [int] NULL ) INSERT [dbo].[test_grid] ([gridid], [height], [gridcount]) VALUES (N'g1', 8, 23) INSERT [dbo].[test_grid] ([gridid], [height], [gridcount]) VALUES (N'g2', 3, 87) INSERT [dbo].[test_grid] ([gridid], [height], [gridcount]) VALUES (N'g3', 4, 34) INSERT [dbo].[test_grid] ([gridid], [height], [gridcount]) VALUES (N'g4', 30, 54) INSERT [dbo].[test_grid] ([gridid], [height], [gridcount]) VALUES (N'g5', 32, 32) INSERT [dbo].[test_grid] ([gridid], [height], [gridcount]) VALUES (N'g6', 32, 20) INSERT [dbo].[test_grid] ([gridid], [height], [gridcount]) VALUES (N'g7', 120, 34) INSERT [dbo].[test_grid] ([gridid], [height], [gridcount]) VALUES (N'g8', 89, 54) INSERT [dbo].[test_grid] ([gridid], [height], [gridcount]) VALUES (N'g9', 9, 16)

替換generateSampleBySenceType()內部sql(s"""|""".stripMargin)代碼類似如下:

select t10.*,t11.* from test_grid t10 cross join test_senceitems t11 where t10.height>=t11.minheight and t10.height<t11.maxheight and t10.gridcount>=t11.mingridcount and t10.gridcount<t11.maxgridcount

?

轉載于:https://www.cnblogs.com/yy3b2007com/p/8505152.html

總結

以上是生活随笔為你收集整理的Spark:如何替换sc.parallelize(List(item1,item2)).collect().foreach(row={})为并行?的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。