日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

关于MultipleOutputFormat若干小记

發(fā)布時間:2024/6/14 编程问答 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 关于MultipleOutputFormat若干小记 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

使用版本是0.19.2,據(jù)說0.20以后,MultipleOutputFormat不好使,不知道真假

api可以參考

http://hadoop.apache.org/common/docs/r0.19.2/api/

但是說老實話,光看api有的時候有點混亂,每個函數(shù)到底影響些啥呢?

protected ?KgenerateActualKey(K?key, V?value)
??????????Generate the actual key from the given key/value.
protected ?VgenerateActualValue(K?key, V?value)
??????????Generate the actual value from the given key and value.
protected ?StringgenerateFileNameForKeyValue(K?key, V?value, String?name)
??????????Generate the file output file name based on the given key and the leaf file name.
protected ?StringgenerateLeafFileName(String?name)
??????????Generate the leaf name for the output file name.
protected abstract ?RecordWriter<K,V>getBaseRecordWriter(FileSystem?fs, JobConf?job, String?name, Progressable?arg3)
???????????
protected ?StringgetInputFileBasedOutputFileName(JobConf?job, String?name)
??????????Generate the outfile name based on a given anme and the input file name.
?RecordWriter<K,V>getRecordWriter(FileSystem?fs, JobConf?job, String?name, Progressable?arg3)
??????????Create a composite record writer that can write key/value data to different output files

?

現(xiàn)在簡單介紹了下調(diào)用的過程

ReduceTask.java文件中

?1?public?void?run(JobConf?job,?final?TaskUmbilicalProtocol?umbilical)?throws?IOException
?2?{
?3?..........
?4?
?5?String?finalName?=?getOutputName(getPartition());//return?"part-"?+?NUMBER_FORMAT.format(partition);依據(jù)taskid產(chǎn)生諸如part-00000這樣的文件名
?6?
?7?FileSystem?fs?=?FileSystem.get(job);
?8?
?9?final?RecordWriter?out?=?job.getOutputFormat().getRecordWriter(fs,?job,?finalName,?reporter);//finalName=part-00000
10?
11?.............
12?}

?

?在MultipleOutputFormat.java里面,請注意這些個函數(shù)的調(diào)用順序

?

public?RecordWriter<K,?V>?getRecordWriter(FileSystem?fs,?JobConf?job,?String?name,?Progressable?arg3) throws?IOException
????{
????????final?FileSystem?myFS?=?fs;
????????final?String?myName?=?generateLeafFileName(name);//在這里可以硬性的指定文件名名稱
????????final?JobConf?myJob?=?job;
????????final?Progressable?myProgressable?=?arg3;

????????return?new?RecordWriter<K,?V>()?{
????????????//?a?cache?storing?the?record?writers?for?different?output?files.
????????????TreeMap<String,?RecordWriter<K,?V>>?recordWriters?=?new?TreeMap<String,?RecordWriter<K,?V>>();

????????????public?void?write(K?key,?V?value)?throws?IOException
????????????{
????????????????//?get?the?file?name?based?on?the?key
????????????????String?keyBasedPath?=?generateFileNameForKeyValue(key,?value,?myName);//一般依據(jù)key來決定文件名的時候 就在這個函數(shù)

????????????????//?get?the?file?name?based?on?the?input?file?name
????????????????String?finalPath?=?getInputFileBasedOutputFileName(myJob,?keyBasedPath);//如果想依據(jù)jobconf配置來確定名稱的話 就在這個函數(shù)里實現(xiàn)? finalPath?就是最終的文件名

????????????????//?get?the?actual?key
????????????????K?actualKey?=?generateActualKey(key,?value);
????????????????V?actualValue?=?generateActualValue(key,?value);

????????????????RecordWriter<K,?V>?rw?=?this.recordWriters.get(finalPath);
????????????????if?(rw?==?null)
????????????????{
????????????????????//?if?we?don't?have?the?record?writer?yet?for?the?final?path,?create one and?add?it?to?the?cache
????????????????????rw?=?getBaseRecordWriter(myFS,?myJob,?finalPath,?myProgressable);//必須自己實現(xiàn)的
????????????????????this.recordWriters.put(finalPath,?rw);
????????????????}
????????????????rw.write(actualKey,?actualValue);//
????????????};
....... };
????}

?

?上述函數(shù),除了getInputFileBasedOutputFileName,其他的紅色函數(shù)基本上都只是簡單的返回輸入值.

轉(zhuǎn)載于:https://www.cnblogs.com/xuxm2007/archive/2012/02/23/2365332.html

總結(jié)

以上是生活随笔為你收集整理的关于MultipleOutputFormat若干小记的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。