當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

学习笔记Hadoop（十四）—— MapReduce开发入门（2）—— MapReduce API介绍、MapReduce实例

發(fā)布時(shí)間：2025/3/21 编程问答 19 豆豆

生活随笔收集整理的這篇文章主要介紹了学习笔记Hadoop（十四）—— MapReduce开发入门（2）—— MapReduce API介绍、MapReduce实例小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

四、MapReduce API介紹

一般MapReduce都是由Mapper， Reducer 及main 函數(shù)組成。
Mapper程序一般完成鍵值對(duì)映射操作;
Reducer 程序一般完成鍵值對(duì)聚合操作;
Main函數(shù)則負(fù)責(zé)組裝Mapper，Reducer及必要的配置;
高階編程還涉及到設(shè)置輸入輸出文件格式、設(shè)置Combiner、Partitioner優(yōu)化程序等;

4.1、MapReduce程序模塊 : Main 函數(shù)

4.2、MapReduce程序模塊： Mapper

org.apache.hadoop.mapreduce.Mapper

4.3、MapReduce程序模塊： Reducer

org.apache.hadoop.mapreduce.Reducer

五、MapReduce實(shí)例

5.1、流程（Mapper、Reducer、Main、打包運(yùn)行）

參考WordCount程序，修改Mapper;

直接復(fù)制 Reducer程序；

直接復(fù)制Main函數(shù)，并做相應(yīng)修改;

編譯打包 ;

上傳Jar包;

上傳數(shù)據(jù);

運(yùn)行程序;

查看運(yùn)行結(jié)果;

5.2、實(shí)例1：按日期訪問統(tǒng)計(jì)次數(shù):

1、參考WordCount程序，修改Mapper;
（這里新建一個(gè)java程序，然后把下面(1、2、3步代碼)復(fù)制到類里）

public static class SpiltMapperextends Mapper<Object, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();//value: email_address | datepublic void map(Object key, Text value, Context context) throws IOException, InterruptedException {String[] data = value.toString().split("\\|",-1); //word.set(data[1]); //context.write(word, one);}}

2、直接復(fù)制 Reducer程序；

public static class IntSumReducerextends Reducer<Text,IntWritable,Text,IntWritable> {private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();}result.set(sum);context.write(key, result);}}

3、直接復(fù)制Main函數(shù)，并做相應(yīng)修改;

public static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length < 2) {System.err.println("Usage: wordcount <in> [<in>...] <out>");System.exit(2);}Job job = Job.getInstance(conf, "word count");job.setJarByClass(CountByDate.class); //我們的主類是CountByDatejob.setMapperClass(SpiltMapper.class); //mapper：我們修改為SpiltMapperjob.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);for (int i = 0; i < otherArgs.length - 1; ++i) {FileInputFormat.addInputPath(job, new Path(otherArgs[i]));}FileOutputFormat.setOutputPath(job,new Path(otherArgs[otherArgs.length - 1]));System.exit(job.waitForCompletion(true) ? 0 : 1);}

4、編譯打包 (jar打包)

build出現(xiàn)錯(cuò)誤及解決辦法：

完成

5/6、上傳jar包&數(shù)據(jù)
email_log_with_date.txt數(shù)據(jù)包鏈接：https://pan.baidu.com/s/1HfwHCfmvVdQpuL-MPtpAng
提取碼：cgnb

上傳數(shù)據(jù)包(注意開啟hdfs)：

上傳OK（瀏覽器：master:50070查看）

7、運(yùn)行程序
(注意開啟yarn)

上傳完成后：

(master:8088)

8、查看結(jié)果
(master:50070)

5.3、實(shí)例2：按用戶訪問次數(shù)排序

Mapper、Reducer、Main程序
SortByCountFirst.Mapper

package demo;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser;import java.io.IOException;public class SortByCountFirst {//1、修改Mapperpublic static class SpiltMapperextends Mapper<Object, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();//value: email_address | datepublic void map(Object key, Text value, Context context) throws IOException, InterruptedException {String[] data = value.toString().split("\\|",-1);word.set(data[0]);context.write(word, one);}}//2、直接復(fù)制 Reducer程序，不用修改public static class IntSumReducerextends Reducer<Text,IntWritable,Text,IntWritable> {private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();}result.set(sum);context.write(key, result);}}//3、直接復(fù)制Main函數(shù)，并做相應(yīng)修改;public static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length < 2) {System.err.println("Usage: demo.SortByCountFirst <in> [<in>...] <out>");System.exit(2);}Job job = Job.getInstance(conf, "sort by count first ");job.setJarByClass(SortByCountFirst.class); //我們的主類是CountByDatejob.setMapperClass(SpiltMapper.class); //mapper：我們修改為SpiltMapperjob.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);for (int i = 0; i < otherArgs.length - 1; ++i) {FileInputFormat.addInputPath(job, new Path(otherArgs[i]));}FileOutputFormat.setOutputPath(job,new Path(otherArgs[otherArgs.length - 1]));System.exit(job.waitForCompletion(true) ? 0 : 1);} }

SortByCountSecond.Mapper

package demo;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser;import java.io.IOException;public class SortByCountSecond {//1、修改Mapperpublic static class SpiltMapperextends Mapper<Object, Text, IntWritable, Text> {private IntWritable count = new IntWritable(1);private Text word = new Text();//value: email_address \t countpublic void map(Object key, Text value, Context context) throws IOException, InterruptedException {String[] data = value.toString().split("\t",-1);word.set(data[0]);count.set(Integer.parseInt(data[1]));context.write(count,word);}}//2、直接復(fù)制 Reducer程序，不用修改public static class ReverseReducerextends Reducer<IntWritable,Text,Text,IntWritable> {public void reduce(IntWritable key, Iterable<Text> values,Context context) throws IOException, InterruptedException {for (Text val : values) {context.write(val,key);}}}//3、直接復(fù)制Main函數(shù)，并做相應(yīng)修改;public static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length < 2) {System.err.println("Usage: demo.SortByCountFirst <in> [<in>...] <out>");System.exit(2);}Job job = Job.getInstance(conf, "sort by count first ");job.setJarByClass(SortByCountSecond.class); //我們的主類是CountByDatejob.setMapperClass(SpiltMapper.class); //mapper：我們修改為SpiltMapper // job.setCombinerClass(IntSumReducer.class);job.setReducerClass(ReverseReducer.class);job.setMapOutputKeyClass(IntWritable.class);job.setMapOutputValueClass(Text.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);for (int i = 0; i < otherArgs.length - 1; ++i) {FileInputFormat.addInputPath(job, new Path(otherArgs[i]));}FileOutputFormat.setOutputPath(job,new Path(otherArgs[otherArgs.length - 1]));System.exit(job.waitForCompletion(true) ? 0 : 1);} }

然后打包上傳

yarn jar sortbycount.jar demo.SortByCountSecond -Dmapreduce.job.queuename=prod email_log_with_date.txt sortbycountfirst_output00 yarn jar sortbycount.jar demo.SortByCountSecond -Dmapreduce.job.queuename=prod email_log_with_date.txt sortbycountfirst_output00 sortbycountsecond_output00

總結(jié)

以上是生活随笔為你收集整理的学习笔记Hadoop（十四）—— MapReduce开发入门（2）—— MapReduce API介绍、MapReduce实例的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：学习笔记Hadoop（十三）—— Map
下一篇：学习笔记Hadoop（十五）—— Map