日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop入门(二十三)Mapreduce的求数量最大程序

發(fā)布時(shí)間:2023/12/3 编程问答 31 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Hadoop入门(二十三)Mapreduce的求数量最大程序 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

一、簡(jiǎn)介

在文件中統(tǒng)計(jì)出現(xiàn)最多個(gè)數(shù)的單詞,將其輸出到hdfs文件上。

?

二、例子

(1)實(shí)例描述
給出三個(gè)文件,每個(gè)文件中都若干個(gè)單詞以空白符分隔,需要統(tǒng)計(jì)出現(xiàn)最多的單詞? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??

樣例輸入: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
1)file1: ?

MapReduce is simple

2)file2: ?

MapReduce is powerful is simple?

3)file3: ?

Hello MapReduce bye MapReduce

期望輸出:

MapReduce??????4

(2)問題分析
實(shí)現(xiàn)"統(tǒng)計(jì)出現(xiàn)最多個(gè)數(shù)的單詞"只要關(guān)注的信息為:單詞、詞頻。

?

(3)實(shí)現(xiàn)步驟

1)Map過程?

首先使用默認(rèn)的TextInputFormat類對(duì)輸入文件進(jìn)行處理,得到文本中每行的偏移量及其內(nèi)容。顯然,Map過程首先必須分析輸入的<key,value>對(duì),得到倒排索引中需要的三個(gè)信息:單詞、詞頻

2)Combine過程?
????經(jīng)過map方法處理后,Combine過程將key值相同的value值累加,得到一個(gè)單詞在文檔在文檔中的詞頻,輸出作為Reduce過程的輸入。

3)Reduce過程?
經(jīng)過上述兩個(gè)過程后,Reduce過程只需將相同key值的value值累加,保留最大詞頻的單詞輸出。

?

(4)代碼實(shí)現(xiàn)

package com.mk.mapreduce;import org.apache.commons.lang.StringUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.FileSplit; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException; import java.net.URI; import java.util.*;public class MaxWord {public static class MaxWordMapper extends Mapper<LongWritable, Text, Text, IntWritable> {private final Text newKey = new Text();private final IntWritable newValue = new IntWritable(1);@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {if (StringUtils.isBlank(value.toString())) {System.out.println("空白行");return;}StringTokenizer tokenizer = new StringTokenizer(value.toString());while (tokenizer.hasMoreTokens()) {String word = tokenizer.nextToken();newKey.set(word);context.write(newKey, newValue);}}}public static class MaxWordCombiner extends Reducer<Text, IntWritable, Text, IntWritable> {private final IntWritable newValue = new IntWritable();@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {int count = 0;for (IntWritable v : values) {count += v.get();}newValue.set(count);context.write(key, newValue);}}public static class MaxWordReducer extends Reducer<Text, IntWritable, Text, IntWritable> {private String word = null;private int count = 0;@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {int c = 0;for (IntWritable v : values) {c += v.get();}if (word == null || count < c) {word = key.toString();count = c;}}@Overrideprotected void cleanup(Context context) throws IOException, InterruptedException {if (word != null) {context.write(new Text(word), new IntWritable(count));}}}public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {String uri = "hdfs://192.168.150.128:9000";String input = "/maxWord/input";String output = "/maxWord/output";Configuration conf = new Configuration();if (System.getProperty("os.name").toLowerCase().contains("win"))conf.set("mapreduce.app-submission.cross-platform", "true");FileSystem fileSystem = FileSystem.get(URI.create(uri), conf);Path path = new Path(output);fileSystem.delete(path, true);Job job = new Job(conf, "MaxWord");job.setJar("./out/artifacts/hadoop_test_jar/hadoop-test.jar");job.setJarByClass(MaxWord.class);job.setMapperClass(MaxWordMapper.class);job.setCombinerClass(MaxWordCombiner.class);job.setReducerClass(MaxWordReducer.class);job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPaths(job, uri + input);FileOutputFormat.setOutputPath(job, new Path(uri + output));boolean ret = job.waitForCompletion(true);System.out.println(job.getJobName() + "-----" + ret);} }

?

創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎(jiǎng)勵(lì)來咯,堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎(jiǎng)

總結(jié)

以上是生活随笔為你收集整理的Hadoop入门(二十三)Mapreduce的求数量最大程序的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。