日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

构建一致性哈希ring Part2

發(fā)布時(shí)間:2023/12/29 编程问答 38 豆豆
生活随笔 收集整理的這篇文章主要介紹了 构建一致性哈希ring Part2 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

******************轉(zhuǎn)載請(qǐng)注明出處!**********

最后更新:2011年8月22日22:47:38


Part1中,已經(jīng)構(gòu)建好了一致性哈希ring的原型。

但存在一個(gè)問題。100個(gè)結(jié)點(diǎn)對(duì)應(yīng)著1000個(gè)虛結(jié)點(diǎn)。結(jié)點(diǎn)變動(dòng)時(shí),虛結(jié)點(diǎn)和結(jié)點(diǎn)的對(duì)應(yīng)關(guān)系會(huì)發(fā)生變化。當(dāng)100個(gè)結(jié)點(diǎn)擴(kuò)張到1001個(gè)時(shí),會(huì)發(fā)生什么情況?

新增的結(jié)點(diǎn)數(shù)目會(huì)擠兌掉原先的結(jié)點(diǎn)數(shù)目!原因就在于1000個(gè)虛結(jié)點(diǎn)是固定的,不變化的。如果再擴(kuò)容1000個(gè)虛結(jié)點(diǎn)->更改虛結(jié)點(diǎn)和實(shí)結(jié)點(diǎn)之間的對(duì)應(yīng)關(guān)系->調(diào)整數(shù)據(jù),這似乎又回到ring2_0.py的老套路。

這里做一些改動(dòng)以更接近真實(shí)情況。

首先是vnode,以后改稱為partition。因?yàn)閜artition數(shù)量很少會(huì)變動(dòng),所以需要充分估計(jì)到系統(tǒng)預(yù)期的規(guī)模。假如不會(huì)超過6000個(gè)結(jié)點(diǎn),那么虛結(jié)點(diǎn)可以設(shè)置為實(shí)結(jié)點(diǎn)的100倍。這樣,當(dāng)虛結(jié)點(diǎn)需要調(diào)整的時(shí)候,最多只會(huì)影響到1%的數(shù)據(jù)。

在計(jì)算機(jī)中,數(shù)字取2的冪階會(huì)有一些好處。比如除法只需要移相應(yīng)的位就可以了。所以ring3_0.py中,結(jié)點(diǎn)數(shù)為65536(2^16),partition數(shù)為8388608(2^23)個(gè)。


ring3_0.py

from array import array from hashlib import md5 from struct import unpack_from from time import timePARTITION_POWER = 23 PARTITION_SHIFT = 32 - PARTITION_POWER NODE_COUNT = 65536 DATA_ID_COUNT = 100000000begin = time() part2node = array('H') for part in xrange(2 ** PARTITION_POWER):part2node.append(part % NODE_COUNT) node_counts = [0] * NODE_COUNT for data_id in xrange(DATA_ID_COUNT):data_id = str(data_id)part = unpack_from('>I',md5(str(data_id)).digest())[0] >> PARTITION_SHIFTnode_id = part2node[part]node_counts[node_id] += 1 desired_count = DATA_ID_COUNT / NODE_COUNTprint '%d: Desier data ids per node' % desired_count max_count = max(node_counts) over = 100.0 * (max_count - desired_count) / desired_count print '%d Most data ids on one node, %.02f%% over' % (max_count, over) min_count = min(node_counts) under = 100.0 * (desired_count - min_count) / desired_count print '%d Least data ids on one node, %.02f%% under' % (min_count, under) print '%d seconds pass ...' % (time() - begin) 結(jié)果

1525: Desier data ids per node 1683 Most data ids on one node, 10.36% over 1360 Least data ids on one node, 10.82% under 234 seconds pass ...

說明:

代碼比較簡(jiǎn)單,part2node是node和partition的對(duì)應(yīng)關(guān)系,node_counts記錄著每個(gè)node中有多少個(gè)數(shù)據(jù)映射進(jìn)來。

gholt特意提到系統(tǒng)開銷:2Byte存儲(chǔ)16位對(duì)應(yīng)結(jié)點(diǎn)id, 600多萬個(gè)partition只對(duì)應(yīng)占用12MB的內(nèi)存。gholt還提到ring3_0.py結(jié)果出現(xiàn)10%波動(dòng)的問題。主要是因?yàn)閿?shù)據(jù)空間(10^8相對(duì)于partition數(shù)(約8*10^6)太小了。他嘗試過更大的數(shù)據(jù)空間空間,比如10^9個(gè)數(shù)據(jù),對(duì)應(yīng)于8*10^6的partition,耗時(shí)6個(gè)小時(shí)。-_-!


ring3_0.py已經(jīng)有點(diǎn)接近現(xiàn)實(shí)中一致性哈希ring了,ring4_0.py中將加入replica的特性。


ring4_0.py

from array import array from struct import unpack_from from hashlib import md5 from time import timeREPLICAS = 3 PARTITION_POWER = 16 PARTITION_SHIFT = 32 - PARTITION_POWER PARTITION_MAX = 2 ** PARTITION_POWER - 1 NODE_COUNT = 256 DATA_ID_COUNT = 10000000begin = time() part2node = array('H') for part in xrange(2 ** PARTITION_POWER):part2node.append(part % NODE_COUNT) #(3) node_counts = [0] * NODE_COUNT for data_id in xrange(DATA_ID_COUNT):data_id = str(data_id)part = unpack_from('>I',md5(str(data_id)).digest())[0] >> PARTITION_SHIFTnode_ids = [part2node[part]] #(1)node_counts[node_ids[0]] += 1for replica in xrange(1, REPLICAS):while part2node[part] in node_ids: #(2)part += 1if part > PARTITION_MAX:part = 0node_ids.append(part2node[part])node_counts[node_ids[-1]] += 1 desired_count = DATA_ID_COUNT / NODE_COUNT * REPLICAS print'%d: Desired data ids per node' % desired_count max_count = max(node_counts) over = 100.0 * (max_count - desired_count) / desired_count print'%d: Most data ids on one node, %.02f%% over' % (max_count, over) min_count = min(node_counts) under = 100.0 * (desired_count - min_count) / desired_count print'%d: Least data ids on one node,%.02f%% under' % (min_count, under) print'%d seconds pass ...' % (time() - begin) 結(jié)果:

117186: Desired data ids per node 118133: Most data ids on one node, 0.81% over 116093: Least data ids on one node,0.93% under 52 seconds pass ...

說明:

#(1) node_ids記錄的是3個(gè)replica存放的node id。part2node[part]是根據(jù)對(duì)應(yīng)的partition id 找到對(duì)應(yīng)的node id。

#(2)處循環(huán)3次,依次為數(shù)據(jù)的replica安排連續(xù)的partition。之后改變相應(yīng)的記錄,node_ids.和node_counts。

ring4_0.py看起來還不錯(cuò),1%的波動(dòng)。可是仍然會(huì)有兩個(gè)問題:

1. #(2)處是分配連續(xù)的partition。#(3)處初始化時(shí)是有規(guī)律的。這會(huì)在一些情況下使部分?jǐn)?shù)據(jù)表現(xiàn)很糟糕。比如這部分?jǐn)?shù)據(jù)被映射到node x的partition N上,node x頻繁宕機(jī),而總是partition N上的數(shù)據(jù)需要進(jìn)行復(fù)制。解決也相對(duì)容易: part2node初始化時(shí)進(jìn)行隨機(jī)打亂,partition N和node x不在存在某種聯(lián)系。

2 數(shù)據(jù)安全的問題。假如replica所在的partiton都分布在同一個(gè)機(jī)架上。機(jī)架掉電,這會(huì)導(dǎo)致所有replica都不可用。因此需要一種機(jī)制對(duì)故障進(jìn)行隔離。因此也就引入了zone的概念。


ring4_1.py

from array import array from struct import unpack_from from hashlib import md5 from time import time from random import shuffleREPLICAS = 3 PARTITION_POWER = 16 PARTITION_SHIFT = 32 - 16 PARTITION_MAX = 2 ** PARTITION_POWER - 1 NODE_COUNT = 256 ZONE_COUNT = 16 DATA_ID_COUNT = 10000000begin = time() node2zone = [] while len(node2zone) < NODE_COUNT: #(1)zone = 0while zone < ZONE_COUNT and len(node2zone) < NODE_COUNT:node2zone.append(zone)zone += 1 part2node = array('H') for part in xrange(2 ** PARTITION_POWER):part2node.append(part % NODE_COUNT) shuffle(part2node) #(2) node_counts = [0] * NODE_COUNT zone_counts = [0] * ZONE_COUNT for data_id in xrange(DATA_ID_COUNT):data_id = str(data_id)part = unpack_from('>I',md5(str(data_id)).digest())[0] >> PARTITION_SHIFTnode_ids = [part2node[part]]zones = [node2zone[node_ids[0]]]node_counts[node_ids[0]] += 1zone_counts[zones[0]] += 1for replica in xrange(1, REPLICAS): #(3)while part2node[part] in node_ids and \node2zone[part2node[part]] in zones:part += 1if part > PARTITION_MAX:part = 0node_ids.append(part2node[part])zones.append(node2zone[node_ids[-1]])node_counts[node_ids[-1]] += 1zone_counts[zones[-1]] += 1desired_count = DATA_ID_COUNT / NODE_COUNT * REPLICAS print '%d Desired data ids per node' % desired_count max_count = max(node_counts) over = 100.0 * (max_count - desired_count) / desired_count print '%d: Most data ids on one node, %.02f%% over' % (max_count, over) min_count = min(node_counts) under = 100.0 * (desired_count - min_count) / desired_count print '%d: Least data ids on one node,%.02f%% under' % (min_count, under)desired_count = DATA_ID_COUNT / ZONE_COUNT * REPLICAS print'%d: Desired data ids per zone' % desired_count max_count = max(zone_counts) over = 100.0 * (max_count - desired_count) / desired_count print'%d: Most data ids in one zone, %.02f %% over' % (max_count, over) min_count = min(zone_counts) under = 100.0 * (desired_count - min_count) / desired_count print'%d: Least data ids in one zone, %.02f %%under' % (min_count, under) print '%d seconds pass ...' % (time() - begin) 結(jié)果:

117186 Desired data ids per node 118619: Most data ids on one node, 1.22% over 115810: Least data ids on one node,1.17% under 1875000: Desired data ids per zone 1879143: Most data ids in one zone, 0.22 % over 1871851: Least data ids in one zone, 0.17 %under 69 seconds pass ...

說明:

#(1)zonelist初始化。引入zone,以解決ring4_0.py問題2。

#(2) part2node洗牌,打亂順序。解決ring4_0.py問題1。花的時(shí)間比較多。

#(3) 逐次探查partition位置是否合適,不能在同一個(gè)node上,也不能在同一個(gè)zone上。

?


到此,ring的基本功能都已經(jīng)有了:一致性哈希ring,replica,zone。ring4_2.py給出類似圖2.1的實(shí)現(xiàn)。當(dāng)然,只是一種實(shí)現(xiàn)模型。swift曾經(jīng)采用過這種模型,但現(xiàn)已廢棄。這里權(quán)當(dāng)擴(kuò)展了解,不感興趣的請(qǐng)?zhí)较乱籔art。Part3將把上述的特性封裝成類,加入weight,并簡(jiǎn)單測(cè)試。

?

ring4_2.py體現(xiàn)的是一種anchor思想。即:維護(hù)一個(gè)node的anchor環(huán)。每次data都會(huì)查找anchor環(huán)找到和匹配自己的存儲(chǔ)node位置。

anchor環(huán)就是ring4_2.py中的hash2index和index2node。每個(gè)數(shù)據(jù)都在#(2)處查找index,之后到index2node找對(duì)應(yīng)的node。#(1)中的二分查找、hash計(jì)算都是比較耗時(shí)的。系統(tǒng)開銷暫不提,這種實(shí)現(xiàn)并不能夠均勻分布數(shù)據(jù)。為使數(shù)據(jù)分布的更為均勻,每個(gè)node都要維護(hù)anchor,不斷遍歷anchor環(huán)以查找合適位置。而且,這種情況下replica的管理會(huì)非常麻煩。

空間和時(shí)間,計(jì)算機(jī)中的博弈。ring4_1.py采用的空間換時(shí)間,簡(jiǎn)化了對(duì)應(yīng)關(guān)系;而ring4_2.py采用了時(shí)間換空間,思路簡(jiǎn)單,但效果不盡人意。


ring4_2.py

from bisect import bisect_left from hashlib import md5 from struct import unpack_from from time import timeREPLICAS = 3 NODE_COUNT = 256 ZONE_COUNT = 16 DATA_ID_COUNT = 10000000 VNODE_COUNT = 100begin = time() node2zone = [] while len(node2zone) < NODE_COUNT:zone = 0while zone < ZONE_COUNT and len(node2zone) < NODE_COUNT:node2zone.append(zone)zone += 1 hash2index = [] index2node = [] for node in xrange(NODE_COUNT):for vnode in xrange(VNODE_COUNT):hsh = unpack_from('>I', md5(str(node)).digest())[0]index = bisect_left(hash2index, hsh)if index > len(hash2index):index = 0hash2index.insert(index, hsh)index2node.insert(index, node) node_counts = [0] * NODE_COUNT zone_counts = [0] * ZONE_COUNT for data_id in xrange(DATA_ID_COUNT): #(1)data_id = str(data_id)hsh = unpack_from('>I', md5(str(data_id)).digest())[0]index = bisect_left(hash2index, hsh) #(2)if index >= len(hash2index):index = 0node_ids = [index2node[index]]zones = [node2zone[node_ids[0]]]node_counts[node_ids[0]] += 1zone_counts[zones[0]] += 1for replica in xrange(1, REPLICAS):while index2node[index] in node_ids and node2zone[index2node[index]] in zones:index += 1if index >= len(hash2index):index = 0node_ids.append(index2node[index])zones.append(node2zone[node_ids[-1]])node_counts[node_ids[-1]] += 1zone_counts[zones[-1]] += 1 desired_count = DATA_ID_COUNT / NODE_COUNT * REPLICAS print '%d: Desired data ids per node' % desired_count max_count = max(node_counts) over = 100.0 * (max_count - desired_count) / desired_count print '%d: Most data ids on one node, %.02f%% over' % (max_count, over) min_count = min(node_counts) under = 100.0 * (desired_count - min_count) / desired_count print '%d: Least data ids on one node, %.02f%% under' % (min_count, under)desired_count = DATA_ID_COUNT / ZONE_COUNT * REPLICAS print '%d: Desired data ids per zone' % desired_count max_count = max(zone_counts) over = 100.0 * (max_count - desired_count) / desired_count print '%d: Most data ids on one zone, %.02f%% over' % (max_count, over) min_count = min(zone_counts) under = 100.0 * (desired_count - min_count) / desired_count print '%d: Least data ids on one zone, %.02f%% under' % (min_count, under)print '%d seconds pass ...' % (time() - begin) 結(jié)果:

117186: Desired data ids per node 351282: Most data ids on one node, 199.76% over 15965: Least data ids on one node, 86.38% under 1875000: Desired data ids per zone 2248496: Most data ids on one zone, 19.92% over 1378013: Least data ids on one zone, 26.51% under 990 seconds pass ...


總結(jié)

以上是生活随笔為你收集整理的构建一致性哈希ring Part2的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。