當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

人民大学云计算编程的网上评估平台--解题报告 1004-1007

發布時間：2025/6/15 编程问答 18 豆豆

生活随笔收集整理的這篇文章主要介紹了人民大学云计算编程的网上评估平台--解题报告 1004-1007 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

因為一次寫7道題，文章太長了，為了方便大家閱讀，我分成了兩篇。

接著上一篇文章，我們繼續mapreduce編程之旅~~

1004：題目

Single?Table?Join

描述

輸入文件是一個包含有子女-父母表的文件。請編寫一個程序，輸入為此輸入文件，輸出是包含在子女-父母表中的孫子女-祖父母關系表。

輸入

輸入是包含有子女-父母表的一個文件

輸出

輸出是包含有孫子女-祖父母關系的一個文件，孫子女-祖父母關系是從子女-父母表中得出的。

樣例輸入

child?parent
Tom?Lucy
Tom?Jack
Jone?Lucy
Jone?Jack
Lucy?Mary
Lucy?Ben
Jack?Alice
Jack?Jesse
Terry?Alice
Terry?Jesse
Philip?Terry
Philip?Alma
Mark?Terry
Mark?Alma

樣例輸出

grandchild??grandparent?
Jone????????Alice?
Jone????????Jesse?
Tom?????????Alice?
Tom?????????Jesse?
Jone????????Mary?
Jone????????Ben?
Tom?????????Mary?
Tom?????????Ben?
Mark????????Jesse?
Mark????????Alice?
Philip??????Jesse?
Philip??????Alice

1004：解題思路

單表的連接，這個比較有味道~~當然有可能是我水平有問題，所以寫的比較復雜。

首先，我定義了一個自定義數據類型TextPair?關于自定義數據類型我這里也不多說了，大家可以百度一下，或者看看hadoop權威指南上面都會講解。

接著：我們從輸入可以看出，孩子和雙親都寫在同一個文件，而我們要求的是祖孫關系，所以雙親類也會出現在孩子列。為了正確區分，所以我們借助自定義數據類型來完成。

我先上代碼，在代碼中我會詳細注釋：

[java]?view plaincopy

public?class?MyMapre?{??

public?static??class?wordcountMapper?extends??

Mapper{??

public?void?map(LongWritable?key,?Text?value,?Context?context)throws?IOException,?InterruptedException{??

String?key1?=?"";??

String?value1?=?"";??

StringTokenizer?itr?=?new?StringTokenizer(value.toString());??

//從讀入得行中?取出?孩子、雙親??

if?(itr.hasMoreElements()){??

key1?=?itr.nextToken();??

}??

if?(itr.hasMoreElements()){??

value1?=?itr.nextToken();??

}??

//使用自定義的數據類型，作為key-value??

//0-孩子，?1-代表雙親??

//我這里將孩子和雙親進了交換輸出，方便reduce進行?孩子-祖父的配對??

context.write(new?TextPair(key1,?0),?new?TextPair(value1,?1));??

context.write(new?TextPair(value1,?1),?new?TextPair(key1,?0));??

}??

public?static??class?wordcountReduce?extends??

Reducer{??

public?void?reduce(TextPair?key,?Iterablevalues,?Context?context)throws?IOException,?InterruptedException{??

//上面定義了兩個list，保存孩子和雙親??

List?child?=?new?ArrayList();??

List?parent?=?new?ArrayList();??

for?(TextPair?str?:?values){??

//通過比對?0?或者?1?就可以直接是孩子還是雙親??

//具有同一個key值，表示這是雙親，而與雙親有關系的就是孩子和雙親的雙親，所以通過判斷就是可以孩子和祖父??

if?(str.second.get()?==?0){??

child.add(str.first.toString());??

}??

else{??

parent.add(str.first.toString());??

}??

if?(child.size()?!=?0?&&?parent.size()?!=?0){??

//一個孩子可能對應多個祖父、所以采用了雙重循環，孩子作為外層循環??

for?(int?i?=?0;?i?<?child.size();?i++){??

for?(int?j?=?0;?j?<?parent.size();?j++){??

context.write(new?Text(child.get(i)),?new?Text(parent.get(j)));??

}??

//自定義數據類型，這個我就不多說了。??

public?static?class?TextPair?implements?WritableComparable?{??

private?Text?first;??

private?IntWritable?second;??

public?TextPair()?{??

set(new?Text(),?new?IntWritable());??

}??

public?TextPair(String?first,?int?second)?{??

set(new?Text(first),?new?IntWritable(second));??

}??

public?TextPair(Text?first,?IntWritable?second)?{??

set(first,?second);??

}??

public?void?set(Text?first,?IntWritable?second)?{??

this.first?=?first;??

this.second?=?second;??

}??

public?Text?getFirst()?{??

return?first;??

}??

public?String?toString()?{??

return?(first.toString());??

}??

public?IntWritable?getSecond()?{??

return?second;??

}??

public?void?write(DataOutput?out)?throws?IOException?{??

first.write(out);??

second.write(out);??

}??

public?void?readFields(DataInput?in)?throws?IOException?{??

first.readFields(in);??

second.readFields(in);??

}??

public?int?compareTo(TextPair?tp)?{??

//注意這里排序時，只對first排序，不對進行判斷的0、1進行排序??

int?cmp?=?first.compareTo(tp.first);??

return?cmp;??

}??

public?static??void?main(String?args[])throws?Exception{??

Configuration?conf?=?new?Configuration();??

Job?job?=?new?Job(conf,?"SingleJoin");??

job.setJarByClass(MyMapre.class);??

job.setMapOutputKeyClass(TextPair.class);??

job.setMapOutputValueClass(TextPair.class);??

job.setOutputKeyClass(Text.class);??

job.setOutputValueClass(Text.class);??

job.setMapperClass(wordcountMapper.class);??

job.setReducerClass(wordcountReduce.class);??

FileInputFormat.setInputPaths(job,?new?Path(args[0]));??

FileOutputFormat.setOutputPath(job,?new?Path(args[1]));??

job.waitForCompletion(true);??

}??

1005：題目

Multi-table?Join

描述

輸入有兩個文件，一個名為factory的輸入文件包含描述工廠名和其對應地址ID的表，另一個名為address的輸入文件包含描述地址名和其ID的表格。請編寫一個程序輸出工廠名和其對應地址的名字。

輸入

輸入有兩個文件，第一個描述了工廠名和對應地址的ID，第二個輸入文件描述了地址名和其ID。

輸出

輸出是一個包含工廠名和其對應地名的文件。

輸入樣例

input:?
factory:
factoryname?addressID
Beijing?Red?Star?1
Shenzhen?Thunder?3
Guangzhou?Honda?2
Beijing?Rising?1
Guangzhou?Development?Bank?2
Tencent?3
Bank?of?Beijing?1
address:
addressID?addressname
1?Beijing
2?Guangzhou
3?Shenzhen
4?Xian

輸出樣例

output:
factoryname??addressname
Bank?of?Beijing?Beijing
Beijing?Red?Star?Beijing?
Beijing?Rising?Beijing?
Guangzhou?Development?Bank?Guangzhou?
Guangzhou?Honda?Guangzhou
Shenzhen?Thunder?Shenzhen?
Tencent?Shenzhen
1005解題思路：

這題跟1004的思路都差不多，能做出1004，那么1005也就不在話下了。

我們已經使用1004的自定義數據類型TextPair?，因為我們從一個文件中讀入得數據分為兩類，所以使用TextPair?對其進行區分。

還是上代碼吧，我在代碼里詳細注釋：

[java]?view plaincopy

public?class?MyMapre?{??

public?static??class?wordcountMapper?extends??

Mapper{??

public?void?map(LongWritable?key,?Text?value,?Context?context)throws?IOException,?InterruptedException{??

//這里比較特殊，因為一個工廠名中包含了空格，所以我們要正確分割就要注意了。??

String?str?=?"";??

String?id?=?"";??

String?value1?=?"";??

//分割??

StringTokenizer?itr?=?new?StringTokenizer(value.toString());??

while?(itr.hasMoreElements()){??

str?=?itr.nextToken();??

//如果第一個域不包含了0-9就證明是factory文件的內容??

if?(!str.matches("[0-9]")){??

value1?+=?str;??//包含多個str??

value1?+=?"?";??

}else{?//否則是address文件的內容??

id?=?str;??//第一個域就是Id??

//如果value1不為空則是factor，已經分解完全?factor-1??

if?(!value1.isEmpty())?{???

context.write(new?Text(id),?new?TextPair(value1,?1));??

return;??

}???

}??

//如果前面都沒return?那么就是address文件的內容?adress-0??

context.write(new?Text(id),?new?TextPair(value1,?0));?}??

}??

public?static??class?wordcountReduce?extends??

Reducer{??

public?void?reduce(Text?key,?Iterablevalues,?Context?context)throws?IOException,?InterruptedException{??

//依舊定義兩個list來保存。??

List?factor?=?new?ArrayList();??

List?address?=?new?ArrayList();??

for?(TextPair?str?:?values){??

//1-factor??

if?(str.second.get()?==?1){??

factor.add(str.first.toString());??

}??

else{??

//0-adress??

address.add(str.first.toString());??

}??

//因為一個地方可能對應多個工廠，所以將adress作為外層循環??

if?(factor.size()?!=?0?&&?address.size()?!=?0){??

for?(int?i?=?0;?i?<?address.size();?i++){??

for?(int?j?=?0;?j?<?factor.size();?j++){??

context.write(new?Text(factor.get(j)),?new?Text(address.get(i)));??

}??

//自定義數據類型，不多說了。??

public?static?class?TextPair?implements?WritableComparable?{??

private?Text?first;??

private?IntWritable?second;??

public?TextPair()?{??

set(new?Text(),?new?IntWritable());??

}??

public?TextPair(String?first,?int?second)?{??

set(new?Text(first),?new?IntWritable(second));??

}??

public?TextPair(Text?first,?IntWritable?second)?{??

set(first,?second);??

}??

public?void?set(Text?first,?IntWritable?second)?{??

this.first?=?first;??

this.second?=?second;??

}??

public?Text?getFirst()?{??

return?first;??

}??

public?String?toString()?{??

return?(first.toString());??

}??

public?IntWritable?getSecond()?{??

return?second;??

}??

public?void?write(DataOutput?out)?throws?IOException?{??

first.write(out);??

second.write(out);??

}??

public?void?readFields(DataInput?in)?throws?IOException?{??

first.readFields(in);??

second.readFields(in);??

}??

public?int?compareTo(TextPair?tp)?{??

int?cmp?=?first.compareTo(tp.first);??

return?cmp;??

}??

public?static??void?main(String?args[])throws?Exception{??

Configuration?conf?=?new?Configuration();??

Job?job?=?new?Job(conf,?"MultiTableJoin");??

job.setJarByClass(MyMapre.class);??

job.setMapOutputKeyClass(Text.class);??

job.setMapOutputValueClass(TextPair.class);??

job.setOutputKeyClass(Text.class);??

job.setOutputValueClass(Text.class);??

job.setMapperClass(wordcountMapper.class);??

job.setReducerClass(wordcountReduce.class);??

FileInputFormat.setInputPaths(job,?new?Path(args[0]));??

FileOutputFormat.setOutputPath(job,?new?Path(args[1]));??

job.waitForCompletion(true);??

}??

1006：題目

Sum

描述

輸入文件是一組文本文件，每個輸入文件中都包含很多行，每行都是一個數字字符串，代表了一個特別大的數字。需要注意的是這個數字的低位在字符串的開頭，高位在字符串的結尾。請編寫一個程序求包含在輸入文件中的所有數字的和并輸出。

輸入

輸入有很多文件組成，每個文件都有很多行，每行都由一個數字字符串代表一個數字。

輸出

輸出時一個文件，這個文件中第一行的第一個數字是行標，第二個數字式輸入文件中所有數字的和。

輸入樣例

input:?
file1:
1235546665312
112344569882
326434546462
21346546846
file2:
3654354655
3215456463
21235465463
321265465
65465463
32
file3:
31654
654564564
3541231564
351646846
3164646
3163

輸出樣例

output:
1?8685932816082

注意:
1?只有一個輸出文件;
2?輸出文件的第一行由行標"1"和所有數字的和組成;
3?每個數字都是正整數或者零。每個數字都超過50位，所以常用數據類型是無法存儲的;
4?數字的低位在數字字符串的左側，高位在數字字符串的右側。比如樣例輸入第一個輸入文件的第一行代表的數字是2135666455321。

1006解題思路：1006主要解決兩個問題，一：大數加法。二：將所有數據歸一

第一個問題是常規解法，我不多說。第二，因為我們最后需要求出一個總結果，所以就需要將所有的key歸成一個group。當然我們可以自定義group的劃分,這個可以參考hadoop權威指南，以后如果有需要，我會寫出來的。我這里用了一個簡單解決辦法。（能用簡單的辦法，當然用簡單的辦法）

我結合代碼給大家講解吧：

[java]?view plaincopy

public?class?MyMapre?{??

public?static??class?wordcountMapper?extends??

Mapper{??

public?void?map(LongWritable?key,?Text?value,?Context?context)throws?IOException,?InterruptedException{??

//注意這里的key,這就是我所謂的簡單辦法，用同一個key,那么在reduce階段就可以加所有數據歸到一個group??

context.write(new?LongWritable(1),?value);??

}??

public?static??class?wordcountReduce?extends??

Reducer{??

String?tem?=?"0";?//因為是大數，所以要string來存儲??

public?void?reduce(LongWritable?key,?Iterablevalues,?Context?context)throws?IOException,?InterruptedException{??

for?(Text?str?:?values){??

//獲取大數,調用Sum（）大數加法函數??

tem?=?Sum(tem,?str.toString());??

}??

context.write(key,?new?Text(tem));??

}??

//這是我實現的大數加法函數，其實我作了很久心理斗爭，因為這個函數寫的實在不怎么樣，大家可以自己實現，不要看我這個壞例子。呵呵~~?這個函數我就不寫注釋了。??

public??static?String??Sum(String?a,?String?b){??

String?c?=?"";??

int?a_len?=?a.length();??

int?b_len?=?b.length();??

int?jin?=?0;??

int?a_first;??

int?b_first;??

int?temp;??

while?(a_len??>?0?&&?b_len??>?0){??

a_first?=?Integer.parseInt(a.substring(0,?1));??

b_first?=?Integer.parseInt(b.substring(0,?1));??

a?=?a.substring(1);??

b?=?b.substring(1);??

temp=?a_first?+?b_first?+jin;??

jin?=?temp/?10;??

temp=?temp-?10?*?jin;??

c?+=?temp;??

a_len--;??

b_len--;??

}??

if?(a_len?==?0?&&?b_len?==?0?&&?jin?!=?0)??

c?+=?jin;??

while?(a_len?>?0){??

int?k?=?Integer.parseInt(a.substring(0,?1))?+?jin;??

a?=?a.substring(1);??

c?+=?k;??

a_len--;??

jin?=?0;??

}??

while?(b_len?>?0){??

int?k?=?Integer.parseInt(b.substring(0,?1))?+?jin;??

b?=?b.substring(1);??

c?+=?k;??

b_len?--;??

jin?=?0;??

}??

return?c;??

}???

public?static??void?main(String?args[])throws?Exception{??

Configuration?conf?=?new?Configuration();??

Job?job?=?new?Job(conf,?"Sum");??

job.setJarByClass(MyMapre.class);??

job.setMapOutputKeyClass(LongWritable.class);??

job.setMapOutputValueClass(Text.class);??

job.setOutputKeyClass(LongWritable.class);??

job.setOutputValueClass(Text.class);??

job.setMapperClass(wordcountMapper.class);??

job.setReducerClass(wordcountReduce.class);??

FileInputFormat.setInputPaths(job,?new?Path(args[0]));??

FileOutputFormat.setOutputPath(job,?new?Path(args[1]));??

job.waitForCompletion(true);??

}??

1007：題目

WordCount?Plus

描述

WordCount例子輸入文本文件并計算單詞出現的次數。現在有一個WordCount2.0版本，在這個版本中你必須處理含有"/.',"{}[]:;"等等字符的輸入文件。在你切詞的時候，你應該把"declare,"?切成?"declare"，同樣?"Hello!"應該切成"Hello"，"can't"應該切成"can't"。

輸入

輸入是包含很多單詞的文本文件。

出入

輸出是一個文本文件，這個文件的每一行包含一個單詞和這個單詞在所有輸入文件中出現的次數。在輸出文件中單詞是按照字典順序排序的。

輸入樣例

input1:
hello?world,?bye?world.
input2:
hello?hadoop,?bye?hadoop!
輸出樣例

bye?2
hadoop?2
hello?2
world?2
1007解題思路：1007主要是對字符的過濾，這里我可以使用正則表達式來過濾。沒什么難點~~

我們還是邊看代碼邊說吧：

[java]?view plaincopy

public?class?MyMapre?{??

public?static??class?wordcountMapper?extends??

Mapper{??

private?final?static?IntWritable??one?=?new?IntWritable(1);??

private?String?pattern?=?"[^//w/']";??//定義正則表達式，過濾除數字、字母、“'”?外的字符??

public?void?map(LongWritable?key,?Text?value,?Context?context)throws?IOException,?InterruptedException{??

String?line?=?value.toString().toLowerCase();??

//用空格代替要過濾的字符??

line?=?line.replaceAll(pattern,?"?");??

//劃分??

StringTokenizer?itr?=?new?StringTokenizer(line);??

while(itr.hasMoreElements()){??

context.write(new?Text(itr.nextToken()),?one);??

}??

public?static??class?wordcountReduce?extends??

Reducer{??

public?void?reduce(Text?key,?Iterablevalues,?Context?context)throws?IOException,?InterruptedException{??

//這里就比較簡單了，跟wordcount一樣，我就不多說了。??

int?sum?=?0;??

for?(IntWritable?str?:?values){??

sum?+=?str.get();??

}??

context.write(key,?new?IntWritable(sum));??

}??

public?static??void?main(String?args[])throws?Exception{??

Configuration?conf?=?new?Configuration();??

Job?job?=?new?Job(conf,?"Plus");??

job.setJarByClass(MyMapre.class);??

job.setMapOutputKeyClass(Text.class);??

job.setMapOutputValueClass(IntWritable.class);??

job.setOutputKeyClass(Text.class);??

job.setOutputValueClass(IntWritable.class);??

job.setMapperClass(wordcountMapper.class);??

job.setReducerClass(wordcountReduce.class);??

FileInputFormat.setInputPaths(job,?new?Path(args[0]));??

FileOutputFormat.setOutputPath(job,?new?Path(args[1]));??

job.waitForCompletion(true);??

}??

終于寫完了，當然這里寫的是我的解題思路，如果各位大大有更好的想法，不妨分享出來，大家一起happy。上面的程序都能正確提交。

當然我不排除我程序中有考慮不周的地方或錯誤的地方（測試數據的不全面造成）的，如果各位大大能指出，我將不勝感激~~

我最后再說明下，因為程序是我從網站上的提交庫直接取回來的，格式不太好看。對不住各位了~~

總結

以上是生活随笔為你收集整理的人民大学云计算编程的网上评估平台--解题报告 1004-1007的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Hadoop IO
下一篇：人民大学云计算编程的网上评估平台--解题