Citus数据分片分布研究(二 副本与故障)
(本文中凡是未顯式指出的SQL,均在協(xié)調(diào)節(jié)點(diǎn)上執(zhí)行)
工作節(jié)點(diǎn)
mydb1=# SELECT * FROM master_get_active_worker_nodes();node_name | node_port ---------------+-----------192.168.7.131 | 5432192.168.7.135 | 5432192.168.7.136 | 5432192.168.7.137 | 5432192.168.7.133 | 5432192.168.7.132 | 5432192.168.7.134 | 5432192.168.7.130 | 5432 (8 rows)創(chuàng)建表test_table
create table test_table(id int, name varchar(16));配置分片原則
SELECT master_create_distributed_table('test_table', 'id', 'hash');根據(jù)分片數(shù)和副本數(shù)進(jìn)行分片
SELECT master_create_worker_shards('test_table', 8, 2);查看分片
mydb1=# SELECT * from pg_dist_shard order by shardid; logicalrelid | shardid | shardstorage | shardminvalue | shardmaxvalue --------------+---------+--------------+---------------+---------------test_table | 102032 | t | -2147483648 | -1610612737test_table | 102033 | t | -1610612736 | -1073741825test_table | 102034 | t | -1073741824 | -536870913test_table | 102035 | t | -536870912 | -1test_table | 102036 | t | 0 | 536870911test_table | 102037 | t | 536870912 | 1073741823test_table | 102038 | t | 1073741824 | 1610612735test_table | 102039 | t | 1610612736 | 2147483647 (8 rows)可見一共有8個(gè)分片。
查看分片分布
mydb1=# SELECT * from pg_dist_shard_placement order by shardid, placementid;shardid | shardstate | shardlength | nodename | nodeport | placementid ---------+------------+-------------+---------------+----------+-------------102032 | 1 | 0 | 192.168.7.130 | 5432 | 33102032 | 1 | 0 | 192.168.7.131 | 5432 | 34102033 | 1 | 0 | 192.168.7.131 | 5432 | 35102033 | 1 | 0 | 192.168.7.132 | 5432 | 36102034 | 1 | 0 | 192.168.7.132 | 5432 | 37102034 | 1 | 0 | 192.168.7.133 | 5432 | 38102035 | 1 | 0 | 192.168.7.133 | 5432 | 39102035 | 1 | 0 | 192.168.7.134 | 5432 | 40102036 | 1 | 0 | 192.168.7.134 | 5432 | 41102036 | 1 | 0 | 192.168.7.135 | 5432 | 42102037 | 1 | 0 | 192.168.7.135 | 5432 | 43102037 | 1 | 0 | 192.168.7.136 | 5432 | 44102038 | 1 | 0 | 192.168.7.136 | 5432 | 45102038 | 1 | 0 | 192.168.7.137 | 5432 | 46102039 | 1 | 0 | 192.168.7.137 | 5432 | 47102039 | 1 | 0 | 192.168.7.130 | 5432 | 48 (16 rows)可見每個(gè)分片有2個(gè)副本,分布在相鄰的不同工作節(jié)點(diǎn)上。
插入8條記錄
mydb1=# select * from test_table order by id;id | name ----+------1 | a2 | b3 | c4 | d5 | e6 | f7 | g8 | h (8 rows)在工作節(jié)點(diǎn)上查詢分片內(nèi)的數(shù)據(jù)
在節(jié)點(diǎn)192.168.7.130和節(jié)點(diǎn)192.168.7.131上查詢分片102032(及其副本),查詢結(jié)果相同。
mydb1=# select * from test_table_102032; id | name ----+------ 1 | a 8 | h (2 rows)直接向工作節(jié)點(diǎn)寫數(shù)據(jù)(故意)造成數(shù)據(jù)不同步
在節(jié)點(diǎn)192.168.7.130上執(zhí)行:
mydb1=# INSERT INTO test_table_102032 VALUES(111,'111'); INSERT 0 1 mydb1=# select * from test_table_102032;id | name -----+------1 | a8 | h111 | 111 (3 rows)在節(jié)點(diǎn)192.168.7.131上執(zhí)行:
mydb1=# INSERT INTO test_table_102032 VALUES(222,'222'); INSERT 0 1 mydb1=# select * from test_table_102032;id | name -----+------1 | a8 | h222 | 222 (3 rows)在協(xié)調(diào)節(jié)點(diǎn)上查看結(jié)果
mydb1=# select * from test_table order by id;id | name -----+------1 | a2 | b3 | c4 | d5 | e6 | f7 | g8 | h111 | 111 (9 rows)可以判斷:協(xié)調(diào)節(jié)點(diǎn)通常只從主工作節(jié)點(diǎn)取數(shù)據(jù)。
人為拔出“主工作節(jié)點(diǎn)”網(wǎng)線
mydb1=# select * from test_table order by id; WARNING: could not establish asynchronous connection after 5000 msid | name -----+------1 | a2 | b3 | c4 | d5 | e6 | f7 | g8 | h222 | 222 (9 rows)可以判斷:當(dāng)無法從主工作節(jié)點(diǎn)(192.168.7.130)獲取數(shù)據(jù)時(shí),協(xié)調(diào)節(jié)點(diǎn)會(huì)從副本工作節(jié)點(diǎn)(192.168.7.131)取數(shù)據(jù)。
將主工作節(jié)點(diǎn)網(wǎng)絡(luò)恢復(fù)后,再次查詢
mydb1=# select * from test_table order by id;id | name -----+------1 | a2 | b3 | c4 | d5 | e6 | f7 | g8 | h111 | 111 (9 rows)可以判斷:協(xié)調(diào)節(jié)點(diǎn)自動(dòng)切回了主工作節(jié)點(diǎn)
在工作節(jié)點(diǎn)掉線的過程中,如果不發(fā)生涉及掉線節(jié)點(diǎn)的寫操作,分片信息和分片分布信息未發(fā)生變化。(只涉及其他節(jié)點(diǎn)的寫操作,沒有影響)
mydb1=# INSERT INTO test_table VALUES(99,'99'); INSERT 0 1 mydb1=# SELECT * from pg_dist_shard order by shardid;logicalrelid | shardid | shardstorage | shardminvalue | shardmaxvalue --------------+---------+--------------+---------------+---------------test_table | 102032 | t | -2147483648 | -1610612737test_table | 102033 | t | -1610612736 | -1073741825test_table | 102034 | t | -1073741824 | -536870913test_table | 102035 | t | -536870912 | -1test_table | 102036 | t | 0 | 536870911test_table | 102037 | t | 536870912 | 1073741823test_table | 102038 | t | 1073741824 | 1610612735test_table | 102039 | t | 1610612736 | 2147483647 (8 rows)mydb1=# SELECT * from pg_dist_shard_placement order by shardid, placementid;shardid | shardstate | shardlength | nodename | nodeport | placementid ---------+------------+-------------+---------------+----------+-------------102032 | 1 | 0 | 192.168.7.130 | 5432 | 33102032 | 1 | 0 | 192.168.7.131 | 5432 | 34102033 | 1 | 0 | 192.168.7.131 | 5432 | 35102033 | 1 | 0 | 192.168.7.132 | 5432 | 36102034 | 1 | 0 | 192.168.7.132 | 5432 | 37102034 | 1 | 0 | 192.168.7.133 | 5432 | 38102035 | 1 | 0 | 192.168.7.133 | 5432 | 39102035 | 1 | 0 | 192.168.7.134 | 5432 | 40102036 | 1 | 0 | 192.168.7.134 | 5432 | 41102036 | 1 | 0 | 192.168.7.135 | 5432 | 42102037 | 1 | 0 | 192.168.7.135 | 5432 | 43102037 | 1 | 0 | 192.168.7.136 | 5432 | 44102038 | 1 | 0 | 192.168.7.136 | 5432 | 45102038 | 1 | 0 | 192.168.7.137 | 5432 | 46102039 | 1 | 0 | 192.168.7.137 | 5432 | 47102039 | 1 | 0 | 192.168.7.130 | 5432 | 48 (16 rows)在工作節(jié)點(diǎn)掉線的過程中,如果發(fā)生了涉及掉線節(jié)點(diǎn)的寫操作,分片分布信息中“分片狀態(tài)”發(fā)生了變化。(從1變成3)
mydb1=# INSERT INTO test_table VALUES(1,'1111111'); WARNING: connection error: 192.168.7.130:5432 DETAIL: could not send data to server: No route to host could not send SSL negotiation packet: No route to host INSERT 0 1 mydb1=# SELECT * from pg_dist_shard order by shardid;logicalrelid | shardid | shardstorage | shardminvalue | shardmaxvalue --------------+---------+--------------+---------------+---------------test_table | 102032 | t | -2147483648 | -1610612737test_table | 102033 | t | -1610612736 | -1073741825test_table | 102034 | t | -1073741824 | -536870913test_table | 102035 | t | -536870912 | -1test_table | 102036 | t | 0 | 536870911test_table | 102037 | t | 536870912 | 1073741823test_table | 102038 | t | 1073741824 | 1610612735test_table | 102039 | t | 1610612736 | 2147483647 (8 rows)mydb1=# SELECT * from pg_dist_shard_placement order by shardid, placementid;shardid | shardstate | shardlength | nodename | nodeport | placementid ---------+------------+-------------+---------------+----------+-------------102032 | 3 | 0 | 192.168.7.130 | 5432 | 33102032 | 1 | 0 | 192.168.7.131 | 5432 | 34102033 | 1 | 0 | 192.168.7.131 | 5432 | 35102033 | 1 | 0 | 192.168.7.132 | 5432 | 36102034 | 1 | 0 | 192.168.7.132 | 5432 | 37102034 | 1 | 0 | 192.168.7.133 | 5432 | 38102035 | 1 | 0 | 192.168.7.133 | 5432 | 39102035 | 1 | 0 | 192.168.7.134 | 5432 | 40102036 | 1 | 0 | 192.168.7.134 | 5432 | 41102036 | 1 | 0 | 192.168.7.135 | 5432 | 42102037 | 1 | 0 | 192.168.7.135 | 5432 | 43102037 | 1 | 0 | 192.168.7.136 | 5432 | 44102038 | 1 | 0 | 192.168.7.136 | 5432 | 45102038 | 1 | 0 | 192.168.7.137 | 5432 | 46102039 | 1 | 0 | 192.168.7.137 | 5432 | 47102039 | 1 | 0 | 192.168.7.130 | 5432 | 48 (16 rows)此時(shí)再恢復(fù)原“主工作節(jié)點(diǎn)”,發(fā)現(xiàn)標(biāo)記并未恢復(fù);且協(xié)調(diào)節(jié)點(diǎn)仍會(huì)從原先的“副本工作節(jié)點(diǎn)”取得數(shù)據(jù)。
在節(jié)點(diǎn)192.168.7.130上執(zhí)行:
mydb1=# select * from test_table_102032 order by id; id | name -----+------ 1 | a 8 | h 111 | 111 (3 rows)缺少記錄 (1, ‘1111111’)
在節(jié)點(diǎn)192.168.7.131上執(zhí)行:
mydb1=# select * from test_table_102032 order by id; id | name -----+--------- 1 | a 1 | 1111111 8 | h 222 | 222 (4 rows)在協(xié)調(diào)節(jié)點(diǎn)上執(zhí)行:
mydb1=# select * from test_table order by id;id | name -----+---------1 | a1 | 11111112 | b3 | c4 | d5 | e6 | f7 | g8 | h99 | 99222 | 222 <-----可見是從131上取的數(shù)據(jù) (11 rows)查看分片分布狀態(tài):
mydb1=# SELECT * from pg_dist_shard_placement order by shardid, placementid;shardid | shardstate | shardlength | nodename | nodeport | placementid ---------+------------+-------------+---------------+----------+-------------102032 | 3 | 0 | 192.168.7.130 | 5432 | 33102032 | 1 | 0 | 192.168.7.131 | 5432 | 34102033 | 1 | 0 | 192.168.7.131 | 5432 | 35102033 | 1 | 0 | 192.168.7.132 | 5432 | 36102034 | 1 | 0 | 192.168.7.132 | 5432 | 37102034 | 1 | 0 | 192.168.7.133 | 5432 | 38102035 | 1 | 0 | 192.168.7.133 | 5432 | 39102035 | 1 | 0 | 192.168.7.134 | 5432 | 40102036 | 1 | 0 | 192.168.7.134 | 5432 | 41102036 | 1 | 0 | 192.168.7.135 | 5432 | 42102037 | 1 | 0 | 192.168.7.135 | 5432 | 43102037 | 1 | 0 | 192.168.7.136 | 5432 | 44102038 | 1 | 0 | 192.168.7.136 | 5432 | 45102038 | 1 | 0 | 192.168.7.137 | 5432 | 46102039 | 1 | 0 | 192.168.7.137 | 5432 | 47102039 | 1 | 0 | 192.168.7.130 | 5432 | 48 (16 rows)總結(jié)
以上是生活随笔為你收集整理的Citus数据分片分布研究(二 副本与故障)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Citus数据分片分布研究(一 在工作节
- 下一篇: Citus中的分片策略:Append D