日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spark on k8s: 通过hostPath设置SPARK_LOCAL_DIRS加速Shuffle

發布時間:2025/1/21 编程问答 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Spark on k8s: 通过hostPath设置SPARK_LOCAL_DIRS加速Shuffle 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

前言

spark.local.dir/SPARK_LOCAL_DIRS 用于Spark 在 Shuffle階段臨時文件及RDD持久化存儲等,可以使用逗號分隔配置多個路徑對應到不同的磁盤,Spark on Yarn時這個路徑默認會被Yarn集群的配置LOCAL_DIRS所取代。在Spark on k8s下,這個路徑默認是emptyDir這個Volume所對應的位置,可以簡單的理解為java.io.tmpDir對應的/tmp目錄。

配置

spark.kubernetes.driver.volumes.[VolumeType].spark-local-dir-[VolumeName].mount.path=<mount path> spark.kubernetes.driver.volumes.[VolumeType].spark-local-dir-[VolumeName].options.path=<host path>

[VolumeType] - hostPath, emptyDir,…
[VolumeName] - 命名
- pod中對應的目錄
- 宿主機文件目錄

Driver

spark.kubernetes.driver.volumes.hostPath.spark-local-dir-0.mount.path=/opt/dfs/0 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.mount.path=/opt/dfs/1 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-2.mount.path=/opt/dfs/2 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-3.mount.path=/opt/dfs/3 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-4.mount.path=/opt/dfs/4 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-5.mount.path=/opt/dfs/5 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-6.mount.path=/opt/dfs/6 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-7.mount.path=/opt/dfs/7 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-8.mount.path=/opt/dfs/8 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-9.mount.path=/opt/dfs/9 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-10.mount.path=/opt/dfs/10 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-11.mount.path=/opt/dfs/11spark.kubernetes.driver.volumes.hostPath.spark-local-dir-0.options.path=/mnt/dfs/0 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.options.path=/mnt/dfs/1 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-2.options.path=/mnt/dfs/2 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-3.options.path=/mnt/dfs/3 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-4.options.path=/mnt/dfs/4 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-5.options.path=/mnt/dfs/5 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-6.options.path=/mnt/dfs/6 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-7.options.path=/mnt/dfs/7 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-8.options.path=/mnt/dfs/8 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-9.options.path=/mnt/dfs/9 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-10.options.path=/mnt/dfs/10 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-11.options.path=/mnt/dfs/11

Executor

spark.kubernetes.executor.volumes.hostPath.spark-local-dir-0.mount.path=/opt/dfs/0 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.mount.path=/opt/dfs/1 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.mount.path=/opt/dfs/2 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-3.mount.path=/opt/dfs/3 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-4.mount.path=/opt/dfs/4 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-5.mount.path=/opt/dfs/5 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-6.mount.path=/opt/dfs/6 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-7.mount.path=/opt/dfs/7 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-8.mount.path=/opt/dfs/8 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-9.mount.path=/opt/dfs/9 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-10.mount.path=/opt/dfs/10 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-11.mount.path=/opt/dfs/11spark.kubernetes.executor.volumes.hostPath.spark-local-dir-0.options.path=/mnt/dfs/0 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.options.path=/mnt/dfs/1 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.options.path=/mnt/dfs/2 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-3.options.path=/mnt/dfs/3 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-4.options.path=/mnt/dfs/4 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-5.options.path=/mnt/dfs/5 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-6.options.path=/mnt/dfs/6 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-7.options.path=/mnt/dfs/7 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-8.options.path=/mnt/dfs/8 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-9.options.path=/mnt/dfs/9 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-10.options.path=/mnt/dfs/10 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-11.options.path=/mnt/dfs/11

性能測試

當默認使用emptyDir, Spark on k8s性能比同配置的Spark on yarn有極大的性能差距,當數據規模增大是,性能問題暴露的越發明顯。

通過觀察宿主機上的metrics信息,發現sda這塊盤被打滿,而其他11塊盤都空閑。這也導致該宿主機的cpu利用率無法上去。

以上面的配置,重新測試Terasort,我們可以看到Spark on k8s的性能得到了極大的提升,尤其是在大數據量的情況下

通過觀察宿主機上的metrics,12塊磁盤都得到了利用,spark shuffle的能力得到提升,CPU的使用率相比之前得到質的飛躍。

參考連接:
https://www.jianshu.com/p/c6903a7a2c67

總結

以上是生活随笔為你收集整理的Spark on k8s: 通过hostPath设置SPARK_LOCAL_DIRS加速Shuffle的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。