日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 人文社科 > 生活经验 >内容正文

生活经验

Spark-Spark setMaster WordCount Demo

發布時間:2023/11/27 生活经验 41 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Spark-Spark setMaster WordCount Demo 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

?Spark setMaster源碼

/*** The master URL to connect to, such as "local" to run locally with one thread, "local[4]" to* run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.*/def setMaster(master: String): SparkConf = {set("spark.master", master)}

要連接到的主URL,例如“local”用一個線程在本地運行,“local [ 4 ]”用4個內核在本地運行,或者“Spark : / / master : 7077”用Spark獨立集群運行。

?

package cn.rzlee.spark.scalaimport org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}// object相當于靜態的
object ScalaWordCount {def main(args: Array[String]): Unit = {//創建spark配置,設置應用程序名字val conf = new SparkConf().setAppName("wordCountApp")// 創建spark執行入口val sc = new SparkContext()// 指定以后從哪里讀取數據創建RDD(彈性分布式數據集)val lines: RDD[String] = sc.textFile("")// 切分壓平val words: RDD[String] = lines.flatMap(_.split(" "))// 將單詞和一組合val wordAndOne: RDD[(String, Int)] = words.map((_, 1))// 按key進行聚合  相同key不變,將value相加val reduced: RDD[(String, Int)] = wordAndOne.reduceByKey(_+_)// 排序val sorted = reduced.sortBy(_._2,false)// 將結果保存到HDFS中sorted.saveAsTextFile("")//釋放資源
    sc.stop()}
}

?

?

基于排序機制的wordCount

java 版本:

package cn.rzlee.spark.core;import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;
import scala.Tuple2;
import scala.actors.threadpool.Arrays;/*** @Author ^_^* @Create 2018/11/3*/
public class SortWordCount {public static void main(String[] args) {SparkConf conf = new SparkConf().setAppName("SortWordCount").setMaster("local");JavaSparkContext sc = new JavaSparkContext(conf);// 創建line RDDJavaRDD<String> lines = sc.textFile("C:\\Users\\txdyl\\Desktop\\log\\in\\data.txt", 1);// 執行單詞計數JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {@Overridepublic Iterable<String> call(String s) throws Exception {return Arrays.asList(s.split("\t"));}});JavaPairRDD<String, Integer> pair = words.mapToPair(new PairFunction<String, String, Integer>() {@Overridepublic Tuple2<String, Integer> call(String s) throws Exception {return new Tuple2<>(s, 1);}});JavaPairRDD<String, Integer> wordCounts = pair.reduceByKey(new Function2<Integer, Integer, Integer>() {@Overridepublic Integer call(Integer v1, Integer v2) throws Exception {return v1 + v2;}});// 進行key-value的反轉映射JavaPairRDD<Integer, String> countWords = wordCounts.mapToPair(new PairFunction<Tuple2<String, Integer>, Integer, String>() {@Overridepublic Tuple2<Integer, String> call(Tuple2<String, Integer> t) throws Exception {return new Tuple2<>(t._2, t._1);}});// 按照key進行排序JavaPairRDD<Integer, String> sortedCountWords = countWords.sortByKey(false);// 再次進行key-value的反轉映射JavaPairRDD<String, Integer> sortedWordCounts = sortedCountWords.mapToPair(new PairFunction<Tuple2<Integer, String>, String, Integer>() {@Overridepublic Tuple2<String, Integer> call(Tuple2<Integer, String> t) throws Exception {return new Tuple2<>(t._2, t._1);}});// 打印結果sortedWordCounts.foreach(new VoidFunction<Tuple2<String, Integer>>() {@Overridepublic void call(Tuple2<String, Integer> t) throws Exception {System.out.println(t._1 + " appears " + t._2+ " times.");}});// 關閉JavaSparkContext
        sc.close();}
}

scala版本:

package cn.rzlee.spark.scalaimport org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}object SortWordCount {def main(args: Array[String]): Unit = {val conf = new SparkConf().setAppName(this.getClass.getSimpleName).setMaster("local")val sc = new SparkContext(conf)val lines = sc.textFile("C:\\Users\\txdyl\\Desktop\\log\\in\\data.txt",1)val words: RDD[String] = lines.flatMap(line=>line.split("\t"))val pairs: RDD[(String, Int)] = words.map(word=>(word,1))val wordCounts: RDD[(String, Int)] = pairs.reduceByKey(_+_)val countWords: RDD[(Int, String)] = wordCounts.map(wordCount=>(wordCount._2, wordCount._1))val sortedCountWords = countWords.sortByKey(false)val sortedWordCounts: RDD[(String, Int)] = sortedCountWords.map(sortedCountWord=>(sortedCountWord._2, sortedCountWord._1))sortedWordCounts.foreach(sortedWordCount=>{println(sortedWordCount._1+" appear "+ sortedWordCount._2 + " times.")})sc.stop()}}

?

轉載于:https://www.cnblogs.com/RzCong/p/9563509.html

總結

以上是生活随笔為你收集整理的Spark-Spark setMaster WordCount Demo的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。