日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

hadoop--MapReduce_WordCount词频统计案例

發(fā)布時(shí)間:2025/3/17 编程问答 35 豆豆
生活随笔 收集整理的這篇文章主要介紹了 hadoop--MapReduce_WordCount词频统计案例 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

目錄

  • WordCount案例
    • 需求
    • 環(huán)境準(zhǔn)備
    • 本地測試
  • 提交到集群測試
    • 集群測試
  • 源碼程序
    • 1.WordCountMapper類
    • 2.WordCountReducer類
    • 3.WordCountDriver類

WordCount案例

需求

: 統(tǒng)計(jì)一堆文件中單詞出現(xiàn)的個(gè)數(shù)。

1.輸入數(shù)據(jù)
hello hello
hi hi
haha
map
reduce

2.期望輸出數(shù)據(jù)
hello 2
hi 2
haha 1
map 1
reduce 1

需求分析:按照MapReduce編程規(guī)范,分別編寫Mapper、Reducer、Driver。

3.Mapper
1). 將MapTask傳給我們的文本內(nèi)容轉(zhuǎn)換成String:
hello hello

2). 根據(jù)空格將這一行切分成單詞:
hello
hello

3). 將單詞輸出為<單詞,1>
hello, 1
hello, 1

4.Reducer
1). 匯總各個(gè)key的個(gè)數(shù)
hello, 1
hello, 1
2). 輸出該key的總次數(shù)
hello, 2

5.Driver
1). 獲取配置信息,獲取job對(duì)象實(shí)例;
2). 制定本程序的jar包所在的本地路徑;
3). 關(guān)聯(lián)Mapper/Reducer業(yè)務(wù)類;
4). 指定Mapper輸出數(shù)據(jù)的KV類型;
5). 指定最終輸出的數(shù)據(jù)的KV類型;
6). 指定job的輸入原始文件所在目錄;
7). 指定job的輸出結(jié)果所在目錄;
8).提交作業(yè)。

環(huán)境準(zhǔn)備

1.創(chuàng)建maven工程,MapReduceDemo;
2.在pom.xml文件中添加如下依賴:

<dependencies><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>3.2.2</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.12</version></dependency><!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-log4j12 --><dependency><groupId>org.slf4j</groupId><artifactId>slf4j-log4j12</artifactId><version>1.7.30</version></dependency></dependencies>

3.在項(xiàng)目的src/main/resources目錄下,新建一個(gè)文件,命名為“l(fā)og4j.properties“,在文件中填入:

log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n log4j.appender.logfile=org.apache.log4j.FileAppender log4j.appender.logfile.File=target/spring.log log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

4.創(chuàng)建包名:com.xiaobai.mapreduce.wordcount;
分別編寫Mapper、Reducer、Driver類。

本地測試

源碼Driver部分:

//6.設(shè)置輸入路徑和輸出路徑 FileInputFormat.setInputPaths(job,new Path("/Users/jane/Desktop/test/")); FileOutputFormat.setOutputPath(job,new Path("/Users/jane/Desktop/hadoop/output"));

在“/Users/jane/Desktop/test/”目錄下新建一份hello.xml,內(nèi)容如下:

輸出結(jié)果:

提交到集群測試

集群測試

1.用maven打jar包,在pom.xml文件中添加如下依賴:

<build><plugins><plugin><artifactId>maven-compiler-plugin</artifactId><version>3.6.1</version><configuration><source>1.8</source><target>1.8</target></configuration></plugin><plugin><artifactId>maven-assembly-plugin</artifactId><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build>

2.打包maven jar包。

(ps.我太難了,這張圖是拼接的,截了好幾次圖,一直缺東缺西的,不太完美 = = )

3.使用命令啟動(dòng)集群:

[xiaobai@hadoop102 ~]$ myhadoop.sh start

4.使用命令查看進(jìn)程,確保集群已經(jīng)正常啟動(dòng):

[xiaobai@hadoop102 ~]$ jpsall

5.將jar包復(fù)制一份到桌面并命名為wc.jar,上傳打包好的jar包到/opt/module/hadoop3.2.2:

6.右擊WordCountDriver–>copy/paste Special–>copy reference拷貝全類名。
com.xiaobai.mapreduce.wordcount.WordCountDriver

7.在/opt/module/hadoop3.2.2目錄下創(chuàng)建WordSum.txt并輸入以下內(nèi)容:

[xiaobai@hadoop102 hadoop-3.2.2]$ vim WordSum.txt

8.如圖,切換hdfs用戶并創(chuàng)建一個(gè)input文件夾:

[xiaobai@hadoop102 hadoop-3.2.2]$ hdfs dfs -mkdir /input

9.如圖,將本地文件WordSum.txt上傳到HDFS:

[xiaobai@hadoop102 hadoop-3.2.2]$ hdfs dfs -put /opt/module/hadoop-3.2.2/WordSum.txt /input

10.如圖,運(yùn)行wc.jar:

[xiaobai@hadoop102 hadoop-3.2.2]$ hadoop jar wc.jar com.xiaobai.mapreduce.wordcount.WordCountDriver /input /output


tips: 空格計(jì)數(shù)1是因?yàn)槲叶啻蛄艘恍?#xff0c;并沒有寫入內(nèi)容。

源碼程序

1.WordCountMapper類

package com.xiaobai.mapreduce.wordcount;import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;/* KEYIN, map階段輸入的key的類型:LongWritable VALUEIN, map階段輸入value類型:Text KEYOUT, map階段輸出的key類型:Text VALUEOUT,map階段輸出的value類型:IntWritable*/ public class WordCountMapper extends Mapper<LongWritable, Text,Text, IntWritable> {private Text outK = new Text();private IntWritable outV = new IntWritable(1);@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {//1.獲取一行//hello helloString line = value.toString();//2.切割//hello//helloString[] words = line.split(" ");//3.循環(huán)寫出for (String s : words) {//封裝outKoutK.set(s);//寫出context.write(outK,outV);}} }

2.WordCountReducer類

package com.xiaobai.mapreduce.wordcount;import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer;import java.io.IOException;/* KEYIN, reduce階段輸入的key的類型:Text VALUEIN, reduce階段輸入value類型:IntWritable KEYOUT, reduce階段輸出的key類型:Text VALUEOUT,reduce階段輸出的value類型:IntWritable*/ public class WordCountReducer extends Reducer<Text, IntWritable,Text,IntWritable> {private IntWritable outV = new IntWritable();@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {int sum = 0;//hello,(1,1)//累加for (IntWritable value: values) {sum += value.get();}outV.set(sum);//寫出context.write(key,outV);} }

3.WordCountDriver類

package com.xiaobai.mapreduce.wordcount;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;/* KEYIN, map階段輸入的key的類型:LongWritable VALUEIN, map階段輸入value類型:Text KEYOUT, map階段輸出的key類型:Text VALUEOUT,map階段輸出的value類型:IntWritable*/ public class WordCountDriver {public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {//1.獲取jobConfiguration conf = new Configuration();Job job = Job.getInstance(conf);//2.設(shè)置jar包路徑job.setJarByClass(WordCountDriver.class);//3.關(guān)聯(lián)mapper和reducerjob.setMapperClass(WordCountMapper.class);job.setReducerClass(WordCountReducer.class);//4.設(shè)置map輸出的KV類型job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);//5.設(shè)置最終輸出的KV類型job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);//6.設(shè)置輸入路徑和輸出路徑FileInputFormat.setInputPaths(job,new Path(args[0]));FileOutputFormat.setOutputPath(job,new Path(args[1]));//7.提交jobboolean result = job.waitForCompletion(true);System.exit(result?0:1);}}

總結(jié)

以上是生活随笔為你收集整理的hadoop--MapReduce_WordCount词频统计案例的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。