Hadoop入门(二十二)Mapreduce的求平均值程序
一、簡介
求平均值是統(tǒng)計(jì)中最常使用到的,現(xiàn)在使用Mapreduce在海量數(shù)據(jù)中統(tǒng)計(jì)數(shù)據(jù)的求平均值。
?
二、例子
(1)實(shí)例描述
給出三個(gè)文件,每個(gè)文件中都存儲(chǔ)了若干個(gè)數(shù)值,求所有數(shù)值中的求平均值。
樣例輸入: ???????????????????????????????????????????
1)file1: ?
2)file2: ?
3)file3: ?
?期望輸出:
14.952380952380953
?
(2)問題分析
實(shí)現(xiàn)統(tǒng)計(jì)海量數(shù)據(jù)的求平均值,不能將所有的數(shù)據(jù)加載到內(nèi)存,計(jì)算只能使用類似外部排序的方式,加載一部分?jǐn)?shù)據(jù)統(tǒng)計(jì)求和和統(tǒng)計(jì)個(gè)數(shù),接著加載另一部分進(jìn)行統(tǒng)計(jì),最后相除取平均值。
(3)實(shí)現(xiàn)步驟
1)Map過程?
????首先使用默認(rèn)的TextInputFormat類對(duì)輸入文件進(jìn)行處理,得到文本中每行的偏移量及其內(nèi)容。顯然,Map過程首先必須分析輸入的<key,value>對(duì),得到數(shù)值,然后在mapper中統(tǒng)計(jì)單個(gè)分塊的求和和統(tǒng)計(jì)個(gè)數(shù)。
2)Reduce過程?
????經(jīng)過map方法處理后,Reduce過程將獲取每個(gè)mapper的求和進(jìn)行統(tǒng)計(jì),分行統(tǒng)計(jì)出總的求和和統(tǒng)計(jì)個(gè)數(shù),最后相除算平均值。
?
(3)關(guān)鍵代碼
package com.mk.mapreduce;import org.apache.commons.lang.StringUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException; import java.net.URI;public class AvgValue {public static class AvgValueMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {private int sumValue = 0;private int count = 0;@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {if (StringUtils.isBlank(value.toString())) {System.out.println("空白行");return;}int v = Integer.parseInt(value.toString().trim());sumValue = sumValue + v;count++;}@Overrideprotected void cleanup(Context context) throws IOException, InterruptedException {context.write(new IntWritable(sumValue), new IntWritable(count));}}public static class AvgValueReducer extends Reducer<IntWritable, IntWritable, DoubleWritable, NullWritable> {private int sumValue = 0;private int count = 0;@Overrideprotected void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {int s = key.get();int c = 0;for (IntWritable v : values)c += v.get();sumValue = sumValue + s;count = count + c;}@Overrideprotected void cleanup(Context context) throws IOException, InterruptedException {double avg = sumValue;if(count!=0){avg = sumValue * 1.0 / count;}context.write(new DoubleWritable(avg), NullWritable.get());}}public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {String uri = "hdfs://192.168.150.128:9000";String input = "/avgValue/input";String output = "/avgValue/output";Configuration conf = new Configuration();if (System.getProperty("os.name").toLowerCase().contains("win"))conf.set("mapreduce.app-submission.cross-platform", "true");FileSystem fileSystem = FileSystem.get(URI.create(uri), conf);Path path = new Path(output);fileSystem.delete(path, true);Job job = new Job(conf, "AvgValue");job.setJar("./out/artifacts/hadoop_test_jar/hadoop-test.jar");job.setJarByClass(AvgValue.class);job.setMapperClass(AvgValueMapper.class);job.setReducerClass(AvgValueReducer.class);job.setMapOutputKeyClass(IntWritable.class);job.setMapOutputValueClass(IntWritable.class);job.setOutputKeyClass(DoubleWritable.class);job.setOutputValueClass(NullWritable.class);FileInputFormat.addInputPaths(job, uri + input);FileOutputFormat.setOutputPath(job, new Path(uri + output));boolean ret = job.waitForCompletion(true);System.out.println(job.getJobName() + "-----" + ret);} }總結(jié)
以上是生活随笔為你收集整理的Hadoop入门(二十二)Mapreduce的求平均值程序的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 电脑查看配置软件有哪些(电脑查看配置软件
- 下一篇: Hadoop入门(二十三)Mapredu