日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 >

MR案例:CombineFileInputFormat

發布時間:2025/4/14 44 豆豆
生活随笔 收集整理的這篇文章主要介紹了 MR案例:CombineFileInputFormat 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

CombineFileInputFormat是一個抽象類。Hadoop提供了兩個實現類CombineTextInputFormatCombineSequenceFileInputFormat

此案例讓我明白了三點:詳見 解讀:MR多路徑輸入解讀:CombineFileInputFormat類

  • 對于單一輸入路徑情況:
//指定輸入格式CombineFileInputFormat job.setInputFormatClass(CombineTextInputFormat.class); //指定SplitSize CombineTextInputFormat.setMaxInputSplitSize(job, 60*1024*1024L);//指定輸入路徑 CombineTextInputFormat.addInputPath(job, new Path(args[0]));
  • 對于多路徑輸入情況①:
//指定輸入格式CombineFileInputFormat job.setInputFormatClass(CombineTextInputFormat.class); //指定SplitSize CombineTextInputFormat.setMaxInputSplitSize(job, 60*1024*1024L);//指定輸入路徑(兩個) CombineTextInputFormat.addInputPath(job, new Path(args[0])); CombineTextInputFormat.addInputPath(job, new Path(args[1]));
  • 多路徑輸入情況②:
//指定SplitSize CombineTextInputFormat.setMaxInputSplitSize(job, 60*1024*1024L);//指定輸入路徑,以及指定輸入格式 MultipleInputs.addInputPath(job, new Path(args[0]), CombineTextInputFormat.class); MultipleInputs.addInputPath(job, new Path(args[1]), CombineTextInputFormat.class);

細心觀察,還會發現兩種多路徑輸入① ②的區別:(已驗證)

  • 第一種方案:先把所有的輸入集中起來求出總的輸入大小,再除以SplitSize算出總的map個數
  • 第二種方案:先分別算出每個MultipleInputs路徑對應的map個數,再對兩個MultipleInputs的map個數求和
  • 完整的代碼:

    package test0820;import java.io.IOException;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.VLongWritable; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.CombineTextInputFormat; import org.apache.hadoop.mapreduce.lib.input.MultipleInputs; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount0826 {public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = Job.getInstance(conf);job.setJarByClass(WordCount0826.class); job.setMapperClass(IIMapper.class);job.setReducerClass(IIReducer.class);job.setNumReduceTasks(5);job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(VLongWritable.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(VLongWritable.class);//CombineFileInputFormat類//job.setInputFormatClass(CombineTextInputFormat.class); CombineTextInputFormat.setMaxInputSplitSize(job, 60*1024*1024L);
    //CombineTextInputFormat.addInputPath(job, new Path(args[0]));//CombineTextInputFormat.addInputPath(job, new Path(args[1])); MultipleInputs.addInputPath(job, new Path(args[0]), CombineTextInputFormat.class);MultipleInputs.addInputPath(job, new Path(args[1]), CombineTextInputFormat.class);
    FileOutputFormat.setOutputPath(job,
    new Path(args[2]));System.exit(job.waitForCompletion(true)? 0:1);}//mappublic static class IIMapper extends Mapper<LongWritable, Text, Text, VLongWritable>{@Overrideprotected void map(LongWritable key, Text value,Context context)throws IOException, InterruptedException {String[] splited = value.toString().split(" "); for(String word : splited){context.write(new Text(word),new VLongWritable(1L));}}}//reducepublic static class IIReducer extends Reducer<Text, VLongWritable, Text, VLongWritable>{@Overrideprotected void reduce(Text key, Iterable<VLongWritable> v2s, Context context)throws IOException, InterruptedException {long sum=0;for(VLongWritable vl : v2s){sum += vl.get(); }context.write(key, new VLongWritable(sum));}} }

    ?

    轉載于:https://www.cnblogs.com/skyl/p/4761662.html

    總結

    以上是生活随笔為你收集整理的MR案例:CombineFileInputFormat的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。