日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

MapReduce词频统计

發布時間:2024/7/19 编程问答 36 豆豆
生活随笔 收集整理的這篇文章主要介紹了 MapReduce词频统计 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.1 文件準備
創建本地目錄和創建兩個文本文件,在兩個文件中輸入單詞,用于統計詞頻。

cd /usr/local/hadoop mkdir WordFile cd WordFile touch wordfile1.txt touch wordfile2.txt

1.2 創建一個HDFS目錄,在本地上不可見,并將本地文本文件上傳到HDFS目錄。通過如下命令創建。

cd /usr/local/hadoop ./bin/hdfs dfs -mkdir wordfileinput ./bin/hdfs dfs -put ./WordFile/wordfile1.txt wordfileinput ./bin/hdfs dfs -put ./WordFile/wordfile2.txt wordfileinput

1.3 保證HDFS目錄不存在output,我們執行如下命令,每次運行詞頻統計都要刪除output輸出文件,/user/hadoop/是HDFS的用戶目錄,不是本地目錄。

./bin/hdfs dfs -rm -r /user/hadoop/output

1.4 Eclips編寫代碼
創建Java project ,項目名稱為MapReduceWordCount,右鍵項目名,導入相關Jar包。


1.5 點擊Add External Jars,進入目錄/usr/local/hadoop/share/hadoop,導入如下包。

  • “/usr/local/hadoop/share/hadoop/common”目錄下的hadoop-common-3.1.3.jar和haoop-nfs-3.1.3.jar;
  • “/usr/local/hadoop/share/hadoop/common/lib”目錄下的所有JAR包;
  • “/usr/local/hadoop/share/hadoop/mapreduce”目錄下的所有JAR包,但是,不包括jdiff、lib、lib-examples和sources目錄;
  • “/usr/local/hadoop/share/hadoop/mapreduce/lib”目錄下的所有JAR包。

1.6 創建類WordCount.java

import java.io.IOException; import java.util.Iterator; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount {public WordCount() {}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();if(otherArgs.length < 2) {System.err.println("Usage: wordcount <in> [<in>...] <out>");System.exit(2);}Job job = Job.getInstance(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(WordCount.TokenizerMapper.class);job.setCombinerClass(WordCount.IntSumReducer.class);job.setReducerClass(WordCount.IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class); for(int i = 0; i < otherArgs.length - 1; ++i) {FileInputFormat.addInputPath(job, new Path(otherArgs[i]));}FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));System.exit(job.waitForCompletion(true)?0:1);}public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {private static final IntWritable one = new IntWritable(1);private Text word = new Text();public TokenizerMapper() {}public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString()); while(itr.hasMoreTokens()) {this.word.set(itr.nextToken());context.write(this.word, one);}}} public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {private IntWritable result = new IntWritable();public IntSumReducer() {}public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {int sum = 0;IntWritable val;for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()) {val = (IntWritable)i$.next();}this.result.set(sum);context.write(key, this.result);}} }

1.7 編譯打包程序
將程序打包到 /usr/local/hadoop/myapp目錄下,

cd /usr/local/hadoop mkdir myapp
  • Run As 運行程序;
  • 右鍵工程名->Export->Java->Runnable JAR file

  • “Launch configuration”用于設置生成的JAR包被部署啟動時運行的主類,需要在下拉列表中選擇剛才配置的類“WordCount-MapReduceWordCount”。在“Export destination”中需要設置JAR包要輸出保存到哪個目錄即其名稱。點擊finish,中間會出現一些信息,一直點擊Ok即可。

1.8 運行程序
啟動hadoop

cd /usr/local/hadoop ./sbin/start-dfs.sh ./bin/hadoop jar ./myapp/WordCount.jar wordfileinput output

1.9 查看結果

cd /usr/local/hadoop ./bin/hdfs dfs -cat output/*

1.20 查看HDFS 文件系統
進入/usr/local/hadoop/bin 目錄,執行相關命令。

./hadoop fs -ls

1.21 源文檔
http://dblab.xmu.edu.cn/blog/2481-2/#more-2481

總結

以上是生活随笔為你收集整理的MapReduce词频统计的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。