设置分区的三种方法coalesce、repartition、partitionBy
生活随笔
收集整理的這篇文章主要介紹了
设置分区的三种方法coalesce、repartition、partitionBy
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
coalesce[?k????les]:改變 RDD 的分區(qū)數(shù)
/* * false:不產(chǎn)生 shuffle * true:產(chǎn)生 shuffle * 如果重分區(qū)的數(shù)量大于原來的分區(qū)數(shù)量,必須設(shè)置為 true,否則分區(qū)數(shù)不變 * 增加分區(qū)會(huì)把原來的分區(qū)中的數(shù)據(jù)隨機(jī)分配給設(shè)置的分區(qū)個(gè)數(shù) */ val coalesceRdd = result.coalesce(6,true) val results = coalesceRdd.mapPartitionsWithIndex((index,x) => { val list = ListBuffer[String]() while (x.hasNext) { list += "partition:"+ index + " content:[" + x.next + "]" } list.iterator }) println("分區(qū)數(shù)量:" + results.partitions.size) val resultArr = results.collect() for(x <- resultArr){ println(x) } 結(jié)果: 分區(qū)數(shù)量:6 partition:0 content:[partition:1 content:Tom07] partition:0 content:[partition:2 content:Tom10] partition:1 content:[partition:0 content:Tom01] partition:1 content:[partition:1 content:Tom08] partition:1 content:[partition:2 content:Tom11] partition:2 content:[partition:0 content:Tom02] partition:2 content:[partition:2 content:Tom12] partition:3 content:[partition:0 content:Tom03] partition:4 content:[partition:0 content:Tom04] partition:4 content:[partition:1 content:Tom05] partition:5 content:[partition:1 content:Tom06] partition:5 content:[partition:2 content:Tom09] val coalesceRdd = result.coalesce(6,fasle)的結(jié)果是: 分區(qū)數(shù)量:3 partition:0 content:[partition:0 content:Tom01] partition:0 content:[partition:0 content:Tom02] partition:0 content:[partition:0 content:Tom03] partition:0 content:[partition:0 content:Tom04] partition:1 content:[partition:1 content:Tom05] partition:1 content:[partition:1 content:Tom06] partition:1 content:[partition:1 content:Tom07] partition:1 content:[partition:1 content:Tom08] partition:2 content:[partition:2 content:Tom09] partition:2 content:[partition:2 content:Tom10] partition:2 content:[partition:2 content:Tom11] partition:2 content:[partition:2 content:Tom12] val coalesceRdd = result.coalesce(2,fasle)的結(jié)果是: 分區(qū)數(shù)量:2 partition:0 content:[partition:0 content:Tom01] partition:0 content:[partition:0 content:Tom02] partition:0 content:[partition:0 content:Tom03] partition:0 content:[partition:0 content:Tom04] partition:1 content:[partition:1 content:Tom05] partition:1 content:[partition:1 content:Tom06] partition:1 content:[partition:1 content:Tom07] partition:1 content:[partition:1 content:Tom08] partition:1 content:[partition:2 content:Tom09] partition:1 content:[partition:2 content:Tom10] partition:1 content:[partition:2 content:Tom11] partition:1 content:[partition:2 content:Tom12] val coalesceRdd = result.coalesce(2,true)的結(jié)果是: 分區(qū)數(shù)量:2 partition:0 content:[partition:0 content:Tom01] partition:0 content:[partition:0 content:Tom03] partition:0 content:[partition:1 content:Tom05] partition:0 content:[partition:1 content:Tom07] partition:0 content:[partition:2 content:Tom09] partition:0 content:[partition:2 content:Tom11] partition:1 content:[partition:0 content:Tom02] partition:1 content:[partition:0 content:Tom04] partition:1 content:[partition:1 content:Tom06] partition:1 content:[partition:1 content:Tom08] partition:1 content:[partition:2 content:Tom10] partition:1 content:[partition:2 content:Tom12]詳細(xì)圖示:
repartition:改變 RDD 分區(qū)數(shù)
repartition(int n) = coalesce(int n, true)
partitionBy:通過自定義分區(qū)器改變 RDD 分區(qū)數(shù)
JavaPairRDD<Integer, String> partitionByRDD = nameRDD.partitionBy(new Partitioner() { private static final long serialVersionUID = 1L; //分區(qū)數(shù) 2 @Override public int numPartitions() { return 2; } //分區(qū)邏輯 @Override public int getPartition(Object obj) { int i = (int)obj; if(i % 2 == 0){ return 0; }else{ return 1; } } });?
總結(jié)
以上是生活随笔為你收集整理的设置分区的三种方法coalesce、repartition、partitionBy的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: map、mapPartitions、ma
- 下一篇: 小练习——过滤掉出现次数最多的数据