當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

MapReduce计数器实验

發(fā)布時(shí)間：2023/12/29 编程问答 39 豆豆

生活随笔收集整理的這篇文章主要介紹了 MapReduce计数器实验小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

一、實(shí)驗(yàn)?zāi)康?/h3>

通過(guò)實(shí)驗(yàn)掌握基本的MapReduce編程方法；

掌握用MapReduce解決一些常見的數(shù)據(jù)處理問(wèn)題，編寫計(jì)數(shù)器程序。

二、實(shí)驗(yàn)平臺(tái)

操作系統(tǒng)：Linux（建議CentOS6.5）；
Hadoop版本：2.9.2；
JDK版本：1.8或以上版本；
Java IDE：Eclipse；

三、實(shí)驗(yàn)要求

能夠理解MapReduce編程思想，然后會(huì)編寫MapReduce版本計(jì)數(shù)器程序，并能執(zhí)行該程序和分析執(zhí)行過(guò)程。

四、實(shí)驗(yàn)背景

1、MapReduce計(jì)數(shù)器是什么？

計(jì)數(shù)器是用來(lái)記錄job的執(zhí)行進(jìn)度和狀態(tài)的。它的作用可以理解為日志。我們可以在程序的某個(gè)位置插入計(jì)數(shù)器，記錄數(shù)據(jù)或者進(jìn)度的變化情況。

2、MapReduce計(jì)數(shù)器能做什么？

MapReduce 計(jì)數(shù)器（Counter）為我們提供一個(gè)窗口，用于觀察 MapReduce Job 運(yùn)行期的各種細(xì)節(jié)數(shù)據(jù)。對(duì)MapReduce性能調(diào)優(yōu)很有幫助，MapReduce性能優(yōu)化的評(píng)估大部分都是基于這些 Counter 的數(shù)值表現(xiàn)出來(lái)的。

3、內(nèi)置計(jì)數(shù)器

MapReduce 自帶了許多默認(rèn)Counter，現(xiàn)在我們來(lái)分析這些默認(rèn) Counter 的含義，方便大家觀察 Job 結(jié)果，如輸入的字節(jié)數(shù)、輸出的字節(jié)數(shù)、Map端輸入/輸出的字節(jié)數(shù)和條數(shù)、Reduce端的輸入/輸出的字節(jié)數(shù)和條數(shù)等。下面我們只需了解這些內(nèi)置計(jì)數(shù)器，知道計(jì)數(shù)器組名稱（groupName）和計(jì)數(shù)器名稱（counterName），以后使用計(jì)數(shù)器會(huì)查找groupName和counterName即可。

4、計(jì)數(shù)器使用

（1）定義計(jì)數(shù)器

枚舉聲明計(jì)數(shù)器：

// 自定義枚舉變量Enum

Counter counter = context.getCounter(Enum enum)

自定義計(jì)數(shù)器：

// 自己命名groupName和counterName

Counter counter = context.getCounter(String groupName,String counterName)

（2）為計(jì)數(shù)器賦值

初始化計(jì)數(shù)器：

counter.setValue(long value);//設(shè)置初始值

計(jì)數(shù)器自增：

counter.increment(long incr);// 增加計(jì)數(shù)

（3）獲取計(jì)數(shù)器的值

獲取枚舉計(jì)數(shù)器的值：

Configuration conf = new Configuration();Job job = new Job(conf, "MyCounter");job.waitForCompletion(true);Counters counters=job.getCounters();Counter counter=counters.findCounter(LOG_PROCESSOR_COUNTER.BAD_RECORDS_LONG);//獲取自定義計(jì)數(shù)器的值：long value=counter.getValue();Configuration conf = new Configuration();Job job = new Job(conf, "MyCounter");job.waitForCompletion(true);Counters counters = job.getCounters();Counter counter=counters.findCounter("ErrorCounter","toolong");// 假如groupName為ErrorCounter，counterName為toolong//獲取內(nèi)置計(jì)數(shù)器的值：long value = counter.getValue();Configuration conf = new Configuration();Job job = new Job(conf, "MyCounter");job.waitForCompletion(true);Counters counters=job.getCounters();Counter counter=counters.findCounter("org.apache.hadoop.mapreduce.JobCounter","TOTAL_LAUNCHED_REDUCES") ;long value=counter.getValue();Configuration conf = new Configuration();Job job = new Job(conf, "MyCounter");Counters counters = job.getCounters();for (CounterGroup group : counters) {for (Counter counter : group) {System.out.println(counter.getDisplayName() + ": " + counter.getName() + ": "+ counter.getValue());}}

（5）自定義計(jì)數(shù)器

MapReduce允許用戶編寫程序來(lái)定義計(jì)數(shù)器，計(jì)數(shù)器的值可在mapper或reducer 中增加。多個(gè)計(jì)數(shù)器由一個(gè)Java枚舉(enum)類型來(lái)定義，以便對(duì)計(jì)數(shù)器分組。一個(gè)作業(yè)可以定義的枚舉類型數(shù)量不限，各個(gè)枚舉類型所包含的字段數(shù)量也不限。枚舉類型的名稱即為組的名稱，枚舉類型的字段就是計(jì)數(shù)器名稱。計(jì)數(shù)器是全局的。換言之，MapReduce框架將跨所有map和reduce聚集這些計(jì)數(shù)器，并在作業(yè)結(jié)束時(shí)產(chǎn)生一個(gè)最終結(jié)果。

五、實(shí)驗(yàn)步驟

1、實(shí)驗(yàn)分析設(shè)計(jì)

該實(shí)驗(yàn)要求學(xué)生自己實(shí)現(xiàn)一個(gè)計(jì)數(shù)器，統(tǒng)計(jì)輸入的無(wú)效數(shù)據(jù)。說(shuō)明如下：假如一個(gè)文件，規(guī)范的格式是3個(gè)字段，“\t”作為分隔符，其中有2條異常數(shù)據(jù)，一條數(shù)據(jù)是只有2個(gè)字段，一條數(shù)據(jù)是有4個(gè)字段。其內(nèi)容如下所示：

jim 1 28kate 0 26tom 1lily 0 29 22

編寫代碼統(tǒng)計(jì)文檔中字段不為3個(gè)的異常數(shù)據(jù)個(gè)數(shù)。如果字段超過(guò)3個(gè)視為過(guò)長(zhǎng)字段，字段少于3個(gè)視為過(guò)短字段。

2、編寫程序代碼：

package mr ; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Counter; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser;public class Counters {public static class MyCounterMap extends Mapper<LongWritable, Text, Text, Text> {public static Counter ct = null;protected void map(LongWritable key, Text value,org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, Text>.Context context)throws java.io.IOException, InterruptedException {String arr_value[] = value.toString().split("\t"); if (arr_value.length < 3) {ct = context.getCounter("ErrorCounter", "toolong"); // ErrorCounter為組名，toolong為組員名 ct.increment(1); // 計(jì)數(shù)器加一 } else if (arr_value.length>=3) {ct = context.getCounter("ErrorCounter", "tooshort");ct.increment(1);}} } public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {Configuration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length != 2) {System.err.println("Usage: Counters <in> <out>");System.exit(2); } Job job = new Job(conf, "Counter");job.setJarByClass(Counters.class); job.setMapperClass(MyCounterMap.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0]));FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));System.exit(job.waitForCompletion(true) ? 0 : 1);} }

3、打包并提交

使用Eclipse開發(fā)工具將該代碼打包，選擇主類為mr.Counters。假定打包后的文件名為Counters.jar，主類Counters位于包mr下，則可使用如下命令向Hadoop集群提交本應(yīng)用。

[root@master hadoop]# bin/hadoop jar Counters.jar mr.Counters /usr/counters/in/counters.txt /usr/counters/out

其中“hadoop”為命令，“jar”為命令參數(shù)，后面緊跟打包。 “/usr/counts/in/counts.txt”為輸入文件在HDFS中的位置(如果沒有，自行上傳)，“/usr/counts/out”為輸出文件在HDFS中的位置。

4、實(shí)驗(yàn)操作：

步驟1 首先在Hadoop下使用 sbin/start-all.sh 命令啟動(dòng)集群。

步驟2 上傳數(shù)據(jù)文件至HDFS

因?yàn)闆]有HDFS中沒有 /usr/counters/in/ 目錄，所以要先創(chuàng)建數(shù)據(jù)輸入的路徑。之后將放在/root/data/7/counters.txt的數(shù)據(jù)文件上傳到剛剛創(chuàng)建好的路徑

步驟3編寫計(jì)數(shù)器程序。

步驟4打包程序。

步驟5 運(yùn)行程序。

總結(jié)

以上是生活随笔為你收集整理的MapReduce计数器实验的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。