日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

hadoop容灾能力测试

發(fā)布時(shí)間:2024/1/23 编程问答 29 豆豆
生活随笔 收集整理的這篇文章主要介紹了 hadoop容灾能力测试 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

實(shí)驗(yàn)簡單來講就是

1. put 一個(gè)600M文件,分散3個(gè)replica x 9個(gè)block 共18個(gè)blocks到4個(gè)datanode

2. 我關(guān)掉了兩個(gè)datanode,使得大部分的block只在一個(gè)datanode上存在,但因?yàn)?個(gè)很分散,所以文件能正確取回(靠的是checksum來計(jì)算文件值)

3. hadoop namenode很迅速的復(fù)制了僅有一個(gè)replica的block使之成為 3 replica(2) but only found 2

4. 我再關(guān)掉一個(gè)datanode,結(jié)果發(fā)現(xiàn)每個(gè)datanode被很均衡的分配了block,這樣即使只有一個(gè)datanode,也因?yàn)橹坝写_保2個(gè)replicas的比率,所以依然healthy

5. 我從這個(gè)僅存的datanode中刪除一個(gè)blk,namenode report這個(gè)文件corrupt,(我其實(shí)一直很希望能進(jìn)safemode,結(jié)果-safemode get一直是OFF)

6. 然后我啟動(dòng)另外一個(gè)datanode,30秒不到,這個(gè)missing的block被從這個(gè)新啟動(dòng)的datanode中迅速“擴(kuò)展”為2個(gè)replicas

容災(zāi)性非常可靠,如果使用至少三個(gè)rack的話,數(shù)據(jù)會非常堅(jiān)挺,對HADOOP信任值 level up!


首先來了解一下HDFS的一些基本特性

HDFS設(shè)計(jì)基礎(chǔ)與目標(biāo)

硬件錯(cuò)誤是常態(tài)。因此需要冗余
流式數(shù)據(jù)訪問。即數(shù)據(jù)批量讀取而非隨機(jī)讀寫,Hadoop擅長做的是數(shù)據(jù)分析而不是事務(wù)處理
大規(guī)模數(shù)據(jù)集
簡單一致性模型。為了降低系統(tǒng)復(fù)雜度,對文件采用一次性寫多次讀的邏輯設(shè)計(jì),即是文件一經(jīng)寫入,關(guān)閉,就再也不能修改
程序采用“數(shù)據(jù)就近”原則分配節(jié)點(diǎn)執(zhí)行
HDFS體系結(jié)構(gòu)

NameNode
DataNode
事務(wù)日志
映像文件
SecondaryNameNode
Namenode

管理文件系統(tǒng)的命名空間
記錄每個(gè)文件數(shù)據(jù)塊在各個(gè)Datanode上的位置和副本信息
協(xié)調(diào)客戶端對文件的訪問
記錄命名空間內(nèi)的改動(dòng)或空間本身屬性的改動(dòng)
Namenode使用事務(wù)日志記錄HDFS元數(shù)據(jù)的變化。使用映像文件存儲文件系統(tǒng)的命名空間,包括文件映射,文件屬性等
Datanode

負(fù)責(zé)所在物理節(jié)點(diǎn)的存儲管理
一次寫入,多次讀取(不修改)
文件由數(shù)據(jù)塊組成,典型的塊大小是64MB
數(shù)據(jù)塊盡量散布道各個(gè)節(jié)點(diǎn)
讀取數(shù)據(jù)流程

客戶端要訪問HDFS中的一個(gè)文件
首先從namenode獲得組成這個(gè)文件的數(shù)據(jù)塊位置列表
根據(jù)列表知道存儲數(shù)據(jù)塊的datanode
訪問datanode獲取數(shù)據(jù)
Namenode并不參與數(shù)據(jù)實(shí)際傳輸
HDFS的可靠性

冗余副本策略
機(jī)架策略
心跳機(jī)制
安全模式
使用文件塊的校驗(yàn)和 Checksum來檢查文件的完整性
回收站
元數(shù)據(jù)保護(hù)
快照機(jī)制
我分別試驗(yàn)了冗余副本策略/心跳機(jī)制/安全模式/回收站。下面實(shí)驗(yàn)是關(guān)于冗余副本策略的。

環(huán)境:

Namenode/Master/jobtracker: h1/192.168.221.130
SecondaryNameNode: h1s/192.168.221.131
四個(gè)Datanode: h2~h4 (IP段:142~144)
為以防文件太小只有一個(gè)文件塊(block/blk),我們準(zhǔn)備一個(gè)稍微大一點(diǎn)的(600M)的文件,使之能分散分布到幾個(gè)datanode,再停掉其中一個(gè)看有沒有問題。
先來put一個(gè)文件(為了方便起見,建議將hadoop/bin追加到$Path變量后
:hadoop fs –put ~/Documents/IMMAUSWX201304
結(jié)束后,我們想查看一下文件塊的情況,可以去網(wǎng)頁上看,也可以在namenode上使用fsck命令來檢查一下,關(guān)于fsck命令
:bin/hadoop fsck /user/hadoop_admin/in/bigfile? -files -blocks -locations < ~/hadoopfiles/log1.txt
下面打印結(jié)果說明 個(gè)600M文件被劃分為9個(gè)64M的blocks,并且被分散到我當(dāng)前所有datanode上(共4個(gè)),看起來比較平均,

/user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):? OK
0. blk_-4541681964616523124_1011 len=67108864 repl=3 [192.168.221.131:50010, 192.168.221.142:50010, 192.168.221.144:50010]
1. blk_4347039731705448097_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.131:50010, 192.168.221.144:50010]
2. blk_-4962604929782655181_1011 len=67108864 repl=3 [192.168.221.142:50010, 192.168.221.143:50010, 192.168.221.144:50010]
3. blk_2055128947154747381_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.142:50010, 192.168.221.144:50010]
4. blk_-2280734543774885595_1011 len=67108864 repl=3 [192.168.221.131:50010, 192.168.221.142:50010, 192.168.221.144:50010]
5. blk_6802612391555920071_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.142:50010, 192.168.221.144:50010]
6. blk_1890624110923458654_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.142:50010, 192.168.221.144:50010]
7. blk_226084029380457017_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.131:50010, 192.168.221.144:50010]
8. blk_-1230960090596945446_1011 len=60768970 repl=3 [192.168.221.142:50010, 192.168.221.143:50010, 192.168.221.144:50010]

Status: HEALTHY
Total size:??? 597639882 B
Total dirs:??? 0
Total files:?? 1
Total blocks (validated):????? 9 (avg. block size 66404431 B)
Minimally replicated blocks:?? 9 (100.0 %)
Over-replicated blocks:??????? 0 (0.0 %)
Under-replicated blocks:?????? 0 (0.0 %)
Mis-replicated blocks:???????? 0 (0.0 %)
Default replication factor:??? 3
Average block replication:???? 3.0
Corrupt blocks:??????????????? 0
Missing replicas:????????????? 0 (0.0 %)
Number of data-nodes:????????? 4
Number of racks:?????????????? 1

h1s,h2,h3,h4四個(gè)DD全部參與,跑去h2 (142),h3(143) stop datanode, 從h4上面get,發(fā)現(xiàn)居然能夠get回,而且初步來看,size正確,看一下上圖中黃底和綠底都DEAD了,每個(gè)blk都有源可以取回,所以GET后數(shù)據(jù)仍然是完整的,從這點(diǎn)看hadoop確實(shí)是強(qiáng)大啊,load balancing也做得很不錯(cuò),數(shù)據(jù)看上去很堅(jiān)強(qiáng),容錯(cuò)性做得不錯(cuò)

1

再檢查一下,我本來想測試safemode的,結(jié)果隔一會一刷,本來有幾個(gè)blk只有1個(gè)livenode的,現(xiàn)在又被全部復(fù)制為確保每個(gè)有2個(gè)了!?? ?

hadoop_admin@h1:~/hadoop-0.20.2$ hadoop fsck /user/hadoop_admin/in/bigfile? -files -blocks -locations
/user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s): ?
Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_4347039731705448097_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 2 replica(s).
0. blk_-4541681964616523124_1011 len=67108864 repl=2 [192.168.221.131:50010, 192.168.221.144:50010]
1. blk_4347039731705448097_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
2. blk_-4962604929782655181_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
3. blk_2055128947154747381_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
4. blk_-2280734543774885595_1011 len=67108864 repl=2 [192.168.221.131:50010, 192.168.221.144:50010]
5. blk_6802612391555920071_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
6. blk_1890624110923458654_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
7. blk_226084029380457017_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
8. blk_-1230960090596945446_1011 len=60768970 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]

我決定再關(guān)一個(gè)datanode,結(jié)果等了好半天也沒見namenode發(fā)現(xiàn)它死了,這是因?yàn)樾奶鴻C(jī)制,datanode每隔3秒會向namenode發(fā)送heartbeat指令表明它的存活,但如果namenode很長時(shí)間(5~10分鐘看設(shè)置)沒有收到heartbeat即認(rèn)為這個(gè)NODE死掉了,就會做出BLOCK的復(fù)制操作,以保證有足夠的replica來保證數(shù)據(jù)有足夠的容災(zāi)/錯(cuò)性,現(xiàn)在再打印看看,發(fā)現(xiàn)因?yàn)橹挥幸粋€(gè)live datanode,所以現(xiàn)在每個(gè)blk都有且只有一份

hadoop_admin@h1:~$ hadoop fsck /user/hadoop_admin/in/bigfile -files -blocks -locations
/user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):? Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_4347039731705448097_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 1 replica(s).

我現(xiàn)在把其中一個(gè)BLK從這個(gè)僅存的Datanode中移走使之corrupt,我想實(shí)驗(yàn),重啟一個(gè)DATANODE后,會不會復(fù)員
hadoop_admin@h4:/hadoop_run/data/current$ mv blk_4347039731705448097_1011* ~/Documents/
然后為了不必要等8分鐘DN發(fā)block report,我手動(dòng)修改了h4的dfs.blockreport.intervalMsec值為30000,stop datanode,再start (另外,你應(yīng)該把hadoop/bin也加入到Path變量后面,這樣你可以不帶全路徑執(zhí)行hadoop命令,結(jié)果,檢測它已被損壞
hadoop_admin@h1:~$ hadoop fsck /user/hadoop_admin/in/bigfile -files -blocks -locations

/user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):? Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 1 replica(s).

/user/hadoop_admin/in/bigfile/USWX201304: CORRUPT block blk_4347039731705448097
Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 1 replica(s).
MISSING 1 blocks of total size 67108864 B
0. blk_-4541681964616523124_1011 len=67108864 repl=1 [192.168.221.144:50010]
1. blk_4347039731705448097_1011 len=67108864 MISSING!
2. blk_-4962604929782655181_1011 len=67108864 repl=1 [192.168.221.144:50010]
3. blk_2055128947154747381_1011 len=67108864 repl=1 [192.168.221.144:50010]
4. blk_-2280734543774885595_1011 len=67108864 repl=1 [192.168.221.144:50010]
5. blk_6802612391555920071_1011 len=67108864 repl=1 [192.168.221.144:50010]
6. blk_1890624110923458654_1011 len=67108864 repl=1 [192.168.221.144:50010]
7. blk_226084029380457017_1011 len=67108864 repl=1 [192.168.221.144:50010]
8. blk_-1230960090596945446_1011 len=60768970 repl=1 [192.168.221.144:50010]

Status: CORRUPT
Total size:??? 597639882 B
Total dirs:??? 0
Total files:?? 1
Total blocks (validated):????? 9 (avg. block size 66404431 B)
?? ********************************
?? CORRUPT FILES:??????? 1
?? MISSING BLOCKS:?????? 1
?? MISSING SIZE:???????? 67108864 B
?? CORRUPT BLOCKS:?????? 1
?? ********************************
Minimally replicated blocks:?? 8 (88.888885 %)
Over-replicated blocks:??????? 0 (0.0 %)
Under-replicated blocks:?????? 8 (88.888885 %)
Mis-replicated blocks:???????? 0 (0.0 %)
Default replication factor:??? 3
Average block replication:???? 0.8888889
Corrupt blocks:??????????????? 1
Missing replicas:????????????? 16 (200.0 %)
Number of data-nodes:????????? 1
Number of racks:?????????????? 1


The filesystem under path '/user/hadoop_admin/in/bigfile' is CORRUPT

我現(xiàn)在啟動(dòng)一個(gè)DATANODE h1s(131),結(jié)果很快的在30秒之內(nèi),它就被hadoop原地滿HP復(fù)活了,現(xiàn)在每個(gè)blk都有了兩份replica
hadoop_admin@h1:~$ hadoop fsck /user/hadoop_admin/in/bigfile -files -blocks -locations
/user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):? Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_4347039731705448097_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 2 replica(s).
0. blk_-4541681964616523124_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
1. blk_4347039731705448097_1011 len=67108864 repl=2 [192.168.221.131:50010, 192.168.221.144:50010]
2. blk_-4962604929782655181_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
3. blk_2055128947154747381_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
4. blk_-2280734543774885595_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
5. blk_6802612391555920071_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
6. blk_1890624110923458654_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
7. blk_226084029380457017_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
8. blk_-1230960090596945446_1011 len=60768970 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]

發(fā)現(xiàn)這個(gè)文件被從131成功復(fù)制回了144 (h4)。

結(jié)論:HADOOP容災(zāi)太堅(jiān)挺了,我現(xiàn)在堅(jiān)信不疑了!

另外有一個(gè)沒有粘出來的提示就是,h4 datanode上有不少重新format遺留下來的badLinkBlock,在重新put同一個(gè)文件的時(shí)候,hadoop將那些老舊殘留的block文件全部都刪除了。這說明它是具有刪除無效bad block的功能的。


總結(jié)

以上是生活随笔為你收集整理的hadoop容灾能力测试的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。