當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

hadoop实例分析之WordCount单词统计分析

發布時間：2023/12/15 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 hadoop实例分析之WordCount单词统计分析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

WordCount單詞統計分析

?最近在網上看了hadoop相關資料以及單詞計數的一個實例，結合網上的資料和自己的看法簡要分析一下執行過程。

MyMapper.java

package com.mpred;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

public class MyMapper extends Mapper<LongWritable, Text, Text,IntWritable> {

??? @Override

??? protected voidmap(LongWritable key, Text value, Context context)

??????????? throws IOException, InterruptedException {

??????? String val=value.toString();

??????? String str[]=val.split(" ");

??????? for(String s:str){

??????????? context.write(new Text(s),new IntWritable(1));

??????? }

??? }

???

}

MyReducer.java

package com.mpred;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

importorg.apache.hadoop.mapreduce.Reducer;

public class MyReducer extends Reducer<Text, IntWritable, Text,IntWritable> {

??? /* (non-Javadoc)

??? ?* @seeorg.apache.hadoop.mapreduce.Reducer#reduce(java.lang.Object,java.lang.Iterable, org.apache.hadoop.mapreduce.Reducer.Context)

??? ?*/

??? @Override

??? protected voidreduce(Text key, Iterable<IntWritable> values,

??????????? Context context)

??????????? throws IOException, InterruptedException {

???????

??????? int sum=0;

??????? for(IntWritable val:values){

??????????? sum+=val.get();

??????? }

???????

??????? context.write(key,new IntWritable(sum));

??? }

}

WordCount.java

package com.mpred;

import java.io.IOException;

importorg.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

??? public staticvoidmain(String[] args) throws IOException, InterruptedException, ClassNotFoundException {

??????? Configuration conf=new Configuration();//加載配置文件

??????? Job job=new Job(conf);//創建一個job,供JobTracker使用

???????

??????? job.setJarByClass(WordCount.class);

??????? job.setMapperClass(MyMapper.class);

??????? job.setReducerClass(MyReducer.class);

???????

??????? FileInputFormat.addInputPath(job,newPath("hdfs://192.168.0.9:9000/hello.txt"));

??????? FileOutputFormat.setOutputPath(job,new Path("hdfs://192.168.0.9:9000/wordcount"));

??????? job.setMapOutputKeyClass(Text.class);

??????? job.setMapOutputValueClass(IntWritable.class);

???????

??????? System.exit(job.waitForCompletion(true)?0:1);

??? }

}

輸入文件

hello you

hello me

follow me

followyou

執行流程簡要分析：

1.????????map任務處理

a)????????讀取文件內容，解析成key、value對。對輸入文件的每一行，解析成key、value對。每一個鍵值對調用一個map函數。

b)????????在map函數中可以編寫自己的邏輯，對輸入的key、value處理，轉換成新的key、value輸出。

c)????????對輸出的key、value進行分區。

d)????????對不同分區的數據，按照key進行排序、分組。相同key的value放到一個集合中。

2.????????reduce任務處理

a)????????對多個map任務的輸出，按照不同的分區，通過網絡copy到不同的reduce節點。

b)????????對多個map任務的輸出進行合并、排序。寫reduce函數自己的邏輯，對輸入的key、reduce處理，轉換成新的key、value輸出。

c)????????把reduce的輸出保存到文件中。

分析以上示例代碼：

輸入文件的分析：

???????? helloyou?????????? //key是0，value是hello you

hello me??????????? //key是10，value是hello me

注：map函數的key表示字節偏移量，value表示一行文本內容。

map函數分析：

???????? protected void map(LongWritable key, Text value, Contextcontext)

??????????? throws IOException, InterruptedException {

??????? String val=value.toString();

??????? String str[]=val.split(" ");

??????? for(String s:str){

??????????? context.write(new Text(s),new IntWritable(1));

??????? }

??? }

key為偏移量，value為每一行的數據；通過split方法（按空格）分離出每一個單詞；然后通過循環輸出每一組單詞（<hello,1>,<you,1>,<hello,1>,<me,1>）

排序后的結果:<hello,1>,<hello,1>,<me,1>,<you,1>

分組后的結果:<hello,{1,1}>,<me,{1}>,<you,{1}>

reduce函數分析

protected void reduce(Text key, Iterable<IntWritable>values,

??????????? Context context)

??????????? throws IOException, InterruptedException {

???????

??????? int sum=0;

??????? for(IntWritable val:values){

??????????? sum+=val.get();

??????? }

???????

??????? context.write(key,new IntWritable(sum));

??? }

key為單詞，value為單詞數的集合；可以看出reduce函數將會被調用3次，每一次調用都會計算values集合的和，然后輸出每一組數據

reduce輸出后的數據為<hello,2>,<you,1>,<me,1>

至此map和reduce函數執行完畢，將數據寫入文件。

總結

以上是生活随笔為你收集整理的hadoop实例分析之WordCount单词统计分析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：公司面试时被要求佩戴面具应聘者：对社恐
下一篇： org.apache.hadoop.hd