當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

第一个MapReduce程序

發布時間：2023/11/27 生活经验 36 豆豆

生活随笔收集整理的這篇文章主要介紹了第一个MapReduce程序小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

計算文件中每個單詞的頻數

?????? wordcount 程序調用 wordmap 和 wordreduce 程序。

 1 import org.apache.hadoop.conf.Configuration;
 2 import org.apache.hadoop.fs.Path;
 3 import org.apache.hadoop.io.IntWritable;
 4 import org.apache.hadoop.io.Text;
 5 import org.apache.hadoop.mapreduce.Job;
 6 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
 7 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 8 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 9 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
10 
11 public class wordcount {
12 
13     /**
14      * @param args
15      */
16     public static void main(String[] args) throws Exception {
17         // TODO Auto-generated method stub
18         
19         Configuration conf = new Configuration();
20         Job job = new Job(conf,"wordcount");
21         job.setJarByClass(wordcount.class);
22         
23         job.setMapperClass(wordmap.class);
24         job.setReducerClass(wordreduce.class);
25         
26         job.setInputFormatClass(TextInputFormat.class);
27         job.setOutputFormatClass(TextOutputFormat.class);
28         
29         FileInputFormat.addInputPath(job,new Path(args[0]));
30         FileOutputFormat.setOutputPath(job, new Path(args[1]));
31         
32         job.setOutputKeyClass(Text.class);
33         job.setOutputValueClass(IntWritable.class);
34         
35         job.waitForCompletion(true);
36         
37 
38     }
39 
40 }

????? wordmap 程序的輸入為<key,value>（key是當前輸入的行數，value對應的是行的內容），然后對此行的內容進行切詞，每切下一個詞就將其組織成<word,1>的形式，word表示文本內容，1代表出現了一次。

 1 import org.apache.hadoop.io.IntWritable;
 2 import org.apache.hadoop.io.LongWritable;
 3 import org.apache.hadoop.io.Text;
 4 import org.apache.hadoop.mapreduce.Mapper;
 5 
 6 public class wordmap extends Mapper<LongWritable, Text, Text, IntWritable> {
 7   
 8     private static final IntWritable one = new IntWritable(1);
 9     protected void map(
10             LongWritable key,
11             Text value,
12             org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, IntWritable>.Context context)
13             throws java.io.IOException, InterruptedException {
14         
15         String line = value.toString();
16         String[] words = line.split(" ");
17         for(String word : words){
18             context.write(new Text(word), one);
19             
20         }
21         
22     };
23 
24 }

????? wordreduce 程序會接受到<word,{1,1,1,1……}>形式的數據，也就是特定單詞及其出現的次數，其中 "1" 表示 word 出現的頻數，所以每接收一個<word,{1,1,1,1……}>，就會在 word 的頻數加 1 ，最后組織成<word,sum>的形式直接輸出。

 1 import org.apache.hadoop.io.IntWritable;
 2 import org.apache.hadoop.io.Text;
 3 import org.apache.hadoop.mapreduce.Reducer;
 4 
 5 public class wordreduce extends Reducer<Text, IntWritable, Text, IntWritable> {
 6 
 7     protected void reduce(
 8             Text key,
 9             java.lang.Iterable<IntWritable> values,
10             org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>.Context context)
11             throws java.io.IOException, InterruptedException {
12         
13         int sum = 0;
14         for(IntWritable count : values){
15             sum+= count.get();
16             
17             
18         }
19         context.write(key, new IntWritable(sum));
20     };
21 
22 }

轉載于:https://www.cnblogs.com/k-yang/p/5595334.html

總結

以上是生活随笔為你收集整理的第一个MapReduce程序的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：我觉得赚了吧...
下一篇：伙伴们好帮我看下昨天刚买的手镯，帮我看