Spark on k8s: 通过hostPath设置SPARK_LOCAL_DIRS加速Shuffle
前言
spark.local.dir/SPARK_LOCAL_DIRS 用于Spark 在 Shuffle階段臨時文件及RDD持久化存儲等,可以使用逗號分隔配置多個路徑對應到不同的磁盤,Spark on Yarn時這個路徑默認會被Yarn集群的配置LOCAL_DIRS所取代。在Spark on k8s下,這個路徑默認是emptyDir這個Volume所對應的位置,可以簡單的理解為java.io.tmpDir對應的/tmp目錄。
配置
spark.kubernetes.driver.volumes.[VolumeType].spark-local-dir-[VolumeName].mount.path=<mount path> spark.kubernetes.driver.volumes.[VolumeType].spark-local-dir-[VolumeName].options.path=<host path>[VolumeType] - hostPath, emptyDir,…
[VolumeName] - 命名
- pod中對應的目錄
- 宿主機文件目錄
Driver
spark.kubernetes.driver.volumes.hostPath.spark-local-dir-0.mount.path=/opt/dfs/0 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.mount.path=/opt/dfs/1 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-2.mount.path=/opt/dfs/2 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-3.mount.path=/opt/dfs/3 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-4.mount.path=/opt/dfs/4 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-5.mount.path=/opt/dfs/5 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-6.mount.path=/opt/dfs/6 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-7.mount.path=/opt/dfs/7 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-8.mount.path=/opt/dfs/8 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-9.mount.path=/opt/dfs/9 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-10.mount.path=/opt/dfs/10 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-11.mount.path=/opt/dfs/11spark.kubernetes.driver.volumes.hostPath.spark-local-dir-0.options.path=/mnt/dfs/0 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.options.path=/mnt/dfs/1 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-2.options.path=/mnt/dfs/2 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-3.options.path=/mnt/dfs/3 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-4.options.path=/mnt/dfs/4 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-5.options.path=/mnt/dfs/5 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-6.options.path=/mnt/dfs/6 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-7.options.path=/mnt/dfs/7 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-8.options.path=/mnt/dfs/8 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-9.options.path=/mnt/dfs/9 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-10.options.path=/mnt/dfs/10 spark.kubernetes.driver.volumes.hostPath.spark-local-dir-11.options.path=/mnt/dfs/11Executor
spark.kubernetes.executor.volumes.hostPath.spark-local-dir-0.mount.path=/opt/dfs/0 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.mount.path=/opt/dfs/1 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.mount.path=/opt/dfs/2 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-3.mount.path=/opt/dfs/3 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-4.mount.path=/opt/dfs/4 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-5.mount.path=/opt/dfs/5 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-6.mount.path=/opt/dfs/6 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-7.mount.path=/opt/dfs/7 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-8.mount.path=/opt/dfs/8 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-9.mount.path=/opt/dfs/9 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-10.mount.path=/opt/dfs/10 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-11.mount.path=/opt/dfs/11spark.kubernetes.executor.volumes.hostPath.spark-local-dir-0.options.path=/mnt/dfs/0 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.options.path=/mnt/dfs/1 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.options.path=/mnt/dfs/2 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-3.options.path=/mnt/dfs/3 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-4.options.path=/mnt/dfs/4 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-5.options.path=/mnt/dfs/5 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-6.options.path=/mnt/dfs/6 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-7.options.path=/mnt/dfs/7 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-8.options.path=/mnt/dfs/8 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-9.options.path=/mnt/dfs/9 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-10.options.path=/mnt/dfs/10 spark.kubernetes.executor.volumes.hostPath.spark-local-dir-11.options.path=/mnt/dfs/11性能測試
當默認使用emptyDir, Spark on k8s性能比同配置的Spark on yarn有極大的性能差距,當數據規模增大是,性能問題暴露的越發明顯。
通過觀察宿主機上的metrics信息,發現sda這塊盤被打滿,而其他11塊盤都空閑。這也導致該宿主機的cpu利用率無法上去。
以上面的配置,重新測試Terasort,我們可以看到Spark on k8s的性能得到了極大的提升,尤其是在大數據量的情況下
通過觀察宿主機上的metrics,12塊磁盤都得到了利用,spark shuffle的能力得到提升,CPU的使用率相比之前得到質的飛躍。
參考連接:
https://www.jianshu.com/p/c6903a7a2c67
總結
以上是生活随笔為你收集整理的Spark on k8s: 通过hostPath设置SPARK_LOCAL_DIRS加速Shuffle的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Scala 类和对象详解
- 下一篇: Spark On K8S 在有赞的实践与