解决GREENPLUM某些版本gprecoverseg –r失败后镜像双坏,系统无法启动的问题
?開始之前,我先聲明一點,參考這篇文章去操作數據庫,造成丟失數據,搞癱系統的可能性非常之高,初學者以及思路不清晰者,請關閉頁面,不要再繼續下去,經過一通亂搞以后,真的就沒人能再幫得了你了。
先說明一下故障的來由,由于系統的BUG某些版本的GREENPLUM只要執行了gprecoverseg -r,就一定會失敗,而且一定會鏡像雙壞,無法啟動。這里說明一下,這種版本的GP鏡像反轉時,只能重啟數據庫去自動修復,不可以使用gprecoverseg –r人工修復,切記。
????值得慶幸的是,只要是進入到這種反轉修復失敗的情況,一般來說所有鏡像一定是同步的,因為如果不同步的話會報另一個錯誤:有部分鏡像失敗,無法執行gprecoverseg -r,這種情況就不會出現我們本文討論的問題。此時雖然我們無法啟動整個數據庫,但是事實上MIRROR是可以作為PRIMARY啟動的,只是GREENPLUM把節點的狀態搞亂了。我們只要把出現問題之前的描述節點狀態的系統表gp_segment_configuration恢復到鏡像反轉的狀態,就可以啟動數據庫了,當然只能是手動恢復。
接下來我結合實際的數據介紹一下gp_segment_configuration?的表結構?
postgres=# select * from gp_segment_configuration??where address like 'sdw10-%' or address like 'sdw11-%' order by dbid ;
?dbid | content | role | preferred_role | mode | status | port??|??hostname???| address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+-------------+---------+------------------+------------
???56 |??????54 | p ????| p??????????????| c????| u??????| 40000 | dssln-sdw10 | sdw10-1 |????????????41000 |
???57 |??????55 | p????| p??????????????| c????| u??????| 40001 | dssln-sdw10 | sdw10-1 |????????????41001 |
???58 |??????56 | p????| p??????????????| c????| u??????| 40002 | dssln-sdw10 | sdw10-1 |????????????41002 |
???59 |??????57 | p????| p??????????????| c????| u??????| 40003 | dssln-sdw10 | sdw10-2 |????????????41003 |
???60 |??????58 | p????| p??????????????| c????| u??????| 40004 | dssln-sdw10 | sdw10-2 |????????????41004 |
???61 |??????59 | p????| p??????????????| c????| u??????| 40005 | dssln-sdw10 | sdw10-2 |????????????41005 |
???62 |??????60 | p????| p??????????????| s????| u??????| 40000 | dssln-sdw11 | sdw11-1 |????????????41000 |
???63 |??????61 | p????| p??????????????| s????| u??????| 40001 | dssln-sdw11 | sdw11-1 |????????????41001 |
???64 |??????62 | p????| p??????????????| s????| u??????| 40002 | dssln-sdw11 | sdw11-1 |????????????41002 |
???65 |??????63 | p????| p??????????????| s????| u??????| 40003 | dssln-sdw11 | sdw11-2 |????????????41003 |
???66 |??????64 | p????| p??????????????| s????| u??????| 40004 | dssln-sdw11 | sdw11-2 |????????????41004 |
???67 |??????65 | p????| p??????????????| s????| u??????| 40005 | dssln-sdw11 | sdw11-2 |????????????41005 |
??158 |??????48 | m????| m??????????????| s????| u??????| 50000 | dssln-sdw10 | sdw10-2 |????????????51000 |
??159 |??????49 | m????| m??????????????| s????| u??????| 50001 | dssln-sdw10 | sdw10-2 |????????????51001 |
??160 |??????50 | m????| m??????????????| s????| u??????| 50002 | dssln-sdw10 | sdw10-2 |????????????51002 |
??161 |??????51 | m????| m??????????????| s????| u??????| 50003 | dssln-sdw10 | sdw10-1 |????????????51003 |
??162 |??????52 | m????| m??????????????| s????| u??????| 50004 | dssln-sdw10 | sdw10-1 |????????????51004 |
??163 |??????53 | m????| m??????????????| s????| u??????| 50005 | dssln-sdw10 | sdw10-1 |????????????51005 |
??164 |??????54 | m????| m??????????????| c????| d??????| 50000 | dssln-sdw11 | sdw11-2 |????????????51000 |
??165 |??????55 | m????| m??????????????| c????| d??????| 50001 | dssln-sdw11 | sdw11-2 |????????????51001 |
??166 |??????56 | m????| m??????????????| c????| d??????| 50002 | dssln-sdw11 | sdw11-2 |????????????51002 |
??167 |??????57 | m????| m??????????????| c????| d??????| 50003 | dssln-sdw11 | sdw11-1 |????????????51003 |
??168 |??????58 | m????| m??????????????| c????| d??????| 50004 | dssln-sdw11 | sdw11-1 |????????????51004 |
??169 |??????59 | m????| m??????????????| c????| d??????| 50005 | dssln-sdw11 | sdw11-1 |???????????51005 |
?dbid | content | role | preferred_role | mode | status | port??|??hostname???| address | replication_port | san_mounts
dbid?和content表示數據庫的ID
role?代表當前角色
preferred_role?代表首選角色,也就是原本應該出演的角色
mode='s/c/r'三個取值分別代表synced, change logging, resyncing
status='u/d'兩個取值分別代表up,down。
其他字段不做解釋了。
上面這個樣例數據表示11節點的6個鏡像子庫全部被標記為DOWN了,但是事實上里邊的數據是完整的,10節點的6個主用子庫鏡像失敗,可以啟動,但是啟動不起來,但事實上里邊的數據也是完整的只是狀態標記出現了問題。
根據之前的分析,我們只要把這兩個節點的鏡像數據到反轉狀態就可以了。目標如下:
postgres=# select * from gp_segment_configuration??where (address like 'sdw10-%' and preferred_role = ‘p’) or (address like 'sdw11-%' ?and preferred_role = ‘m’) order by dbid ;
?dbid | content | role | preferred_role | mode | status | port??|??hostname???| address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+-------------+---------+------------------+------------
???56 |??????54 | m ????| p??????????????| s????| u??????| 40000 | dssln-sdw10 | sdw10-1 |????????????41000 |
???57 |??????55 | m????| p??????????????| s ???| u??????| 40001 | dssln-sdw10 | sdw10-1 |????????????41001 |
???58 |??????56 | m????| p??????????????| s????| u??????| 40002 | dssln-sdw10 | sdw10-1 |????????????41002 |
???59 |??????57 | m????| p??????????????| s????| u??????| 40003 | dssln-sdw10 | sdw10-2 |????????????41003 |
???60 |??????58 | m????| p??????????????| s????| u??????| 40004 | dssln-sdw10 | sdw10-2 |????????????41004 |
???61 |??????59 | m????| p??????????????| s????| u??????| 40005 | dssln-sdw10 | sdw10-2 |????????????41005 |
??164 |??????54 | p????| m??????????????| s????| u??????| 50000 | dssln-sdw11 | sdw11-2 |????????????51000 |
??165 |??????55 | p????| m??????????????| s????| u??????| 50001 | dssln-sdw11 | sdw11-2 |????????????51001 |
??166 |??????56 | p????| m??????????????| s????| u??????| 50002 | dssln-sdw11 | sdw11-2 |????????????51002 |
??167 |??????57 | p????| m??????????????| s????| u??????| 50003 | dssln-sdw11 | sdw11-1 |????????????51003 |
??168 |??????58 | p????| m??????????????| s????| u??????| 50004 | dssln-sdw11 | sdw11-1 |????????????51004 |
??169 |??????59 | p????| m??????????????| s????| u??????| 50005 | dssln-sdw11 | sdw11-1 |???????????51005 |
好我們正式開始:
1、???????啟動到維護模式:
gpstart –m
以維護方式連接到數據庫
PGOPTIONS='-c gp_session_role=utility' psql -d postgres
打開系統表的維護開關
set allow_system_table_mods=DML;
2、???????備份系統表:
create table mybak_segment_configuration as select * from gp_segment_configuration;
3、???????修改數據到鏡像反轉狀態:
update gp_segment_configuration set role='p',mode='s',status='u'?where address like 'sdw11%' and preferred_role='m';
update gp_segment_configuration set role='m',mode='s',status='u'?where address like 'sdw10%' and preferred_role='p';
\q
4、???????重啟數據庫
gpstop –m –M fast
gpstart
5、???????檢查鏡像,此時應該已經自動恢復到正常狀態。
gpstate –m
6、???????如果不同步就修一下
gprecoverseg
7、???????如果鏡像仍然反轉就重新啟動一下數據庫。
gpstop –M fast
gpstart
分享:解決GREENPLUM某些版本gprecoverseg?–r失敗后鏡像雙壞,系統無法啟動的問題_醉糊涂蟲_新浪博客
總結
以上是生活随笔為你收集整理的解决GREENPLUM某些版本gprecoverseg –r失败后镜像双坏,系统无法启动的问题的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: C++中单独大括号的意义
- 下一篇: 基于lis3dh的简易倾角仪c源码_轻松