日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

代码实现——MapReduce统计单词出现次数

發(fā)布時間:2025/3/8 编程问答 24 豆豆
生活随笔 收集整理的這篇文章主要介紹了 代码实现——MapReduce统计单词出现次数 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

需求

對以下txt文檔進(jìn)行單詞出現(xiàn)次數(shù)統(tǒng)計(txt文檔在/Users/lizhengi/test/input/目錄下)

hadoop take spring spark hadoop hdfs mapreduce take Tomcat tomcat kafka kafka flume flume hive

實現(xiàn)

1、新建Maven工程,pom.xml依賴如下

<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.lizhengi</groupId><artifactId>Hadoop-API</artifactId><version>1.0-SNAPSHOT</version><dependencies><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>RELEASE</version></dependency><dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-core</artifactId><version>2.8.2</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>3.2.1</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>3.2.1</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>3.2.1</version></dependency></dependencies></project>

2、src/main/resources目錄下,新建一個文件,命名為“l(fā)og4j.properties”,添加內(nèi)容如下

log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n log4j.appender.logfile=org.apache.log4j.FileAppender log4j.appender.logfile.File=target/spring.log log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

3、編寫Mapper類-WcMapper

package com.lizhengi.wordcount;import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper;/*** @author lizhengi* @create 2020-07-20*/ public class WcMapper extends Mapper<LongWritable, Text, Text, IntWritable> {Text k = new Text();IntWritable v = new IntWritable(1);@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {// 1 拿到傳入進(jìn)來的一行內(nèi)容,把數(shù)據(jù)類型轉(zhuǎn)化為StringString line = value.toString();// 2 將這一行內(nèi)容按照分隔符進(jìn)行一行內(nèi)容的切割 切割成一個單詞數(shù)組String[] words = line.split(" ");// 3 遍歷數(shù)組,每出現(xiàn)一個單詞 就標(biāo)記一個數(shù)字1 <單詞,1>for (String word : words) {//使用mr程序的上下文context 把mapper階段處理的數(shù)據(jù)發(fā)送出去//作為reduce節(jié)點的輸入數(shù)據(jù)k.set(word);context.write(k, v);}} }

4、編寫Reducer類-WcReducer

package com.lizhengi.wordcount;import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer;/*** @author lizhengi* @create 2020-07-20*/ public class WcReducer extends Reducer<Text, IntWritable, Text, IntWritable>{int sum;IntWritable v = new IntWritable();@Overrideprotected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {// 1 定義一個計數(shù)器sum = 0;// 2 遍歷一組迭代器,把每一個數(shù)量1累加起來就構(gòu)成了單詞的總次數(shù)for (IntWritable count : values) {sum += count.get();}// 3 輸出最終的結(jié)果v.set(sum);context.write(key,v);} }

5、編寫Driver驅(qū)動類-WcDriver

package com.lizhengi.wordcount;import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/*** @author lizhengi* @create 2020-07-20*/ public class WcDriver {public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {// 1 獲取配置信息以及封裝任務(wù)Configuration configuration = new Configuration();Job job = Job.getInstance(configuration);// 2 設(shè)置jar加載路徑job.setJarByClass(WcDriver.class);// 3 設(shè)置map和reduce類job.setMapperClass(WcMapper.class);job.setReducerClass(WcReducer.class);// 4 設(shè)置map輸出job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);// 5 設(shè)置最終輸出kv類型job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);// 6 設(shè)置輸入和輸出路徑FileInputFormat.setInputPaths(job, "/Users/lizhengi/test/input");FileOutputFormat.setOutputPath(job, new Path("/Users/lizhengi/test/output"));// 7 提交boolean result = job.waitForCompletion(true);System.exit(result ? 0 : 1);} }

結(jié)果

[root@carlota1]ls /Users/lizhengi/test/output/ #多了兩個文件 _SUCCESS part-r-00000 [root@carlota1 output]cat part-r-00000 flume 2 hadoop 2 hdfs 1 hive 1 kafka 2 mapreduce 1 spark 1 spring 1 take 2 tomcat 2

總結(jié)

以上是生活随笔為你收集整理的代码实现——MapReduce统计单词出现次数的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。