當(dāng)前位置：首頁(yè) > 运维知识 > 数据库 >内容正文

数据库

mysql auto position_MHA-Failover(GTID，Auto_Position=0)

發(fā)布時(shí)間：2024/8/5 数据库 39 豆豆

生活随笔收集整理的這篇文章主要介紹了 mysql auto position_MHA-Failover(GTID，Auto_Position=0) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

最近一位同學(xué)遇到的案例：凌晨數(shù)據(jù)庫(kù)意外宕機(jī)，要求在一主兩從的基礎(chǔ)上，搭建MHA做故障切換。在部署測(cè)試中遇到一些問題找到我，交流的過程挖出一些之前忽略的坑，感謝這位同學(xué)無私分享！

? GTID環(huán)境，KILL主庫(kù)，新主庫(kù)和從庫(kù)丟失數(shù)據(jù)(之前已知)

? 在數(shù)據(jù)庫(kù)進(jìn)程掛掉、數(shù)據(jù)庫(kù)服務(wù)器關(guān)機(jī)或重啟、開啟防火墻、關(guān)閉網(wǎng)絡(luò)服務(wù)等狀況下，測(cè)試MHA是否正常切換(之前沒考慮腦裂問題)

? 線上部分環(huán)境GTID，Auto_Position=0，故障切換會(huì)變成GTID，Auto_Position=1(之前沒考慮)

? 梳理故障切換流程(之前梳理)

一、GTID環(huán)境，KILL主庫(kù)，新主庫(kù)和從庫(kù)丟失數(shù)據(jù)

需在配置文件將Master/Binlog Server配置到[binlogN]，才能補(bǔ)全Dead Master上的差異數(shù)據(jù)，否則只應(yīng)用到Latest Slave

發(fā)散：[binlogN]指定到Binlog Server，kill -9 master_mysqld，MHA是從Binlog Server上獲取還是從Dead Master上獲取差異binlog？

指定到Binlog Server就從Binlog Server上獲取，指定到Dead Master就到Dead Master獲取；如果沒有指定，就不會(huì)補(bǔ)全差異數(shù)據(jù)

二、MHA切換測(cè)試

在數(shù)據(jù)庫(kù)進(jìn)程掛掉，數(shù)據(jù)庫(kù)服務(wù)器關(guān)機(jī)或重啟、開啟防火墻、關(guān)閉網(wǎng)絡(luò)服務(wù)等狀況下，測(cè)試MHA是否正常切換

MySQL5.7.21，基于Row+Gtid搭建的一主兩從復(fù)制結(jié)構(gòu)：Master132->{Slave133、Slave134}；VIP在132上，mha-manager 0.56在134上

測(cè)試場(chǎng)景

XX.132

XX.133

XX.134

說明

132：kill -9 mysqld

不可用

主

從

MHA正常切換，數(shù)據(jù)不丟失

132：關(guān)閉或重啟132服務(wù)器

不可用

主

從

MHA正常切換，數(shù)據(jù)可能丟失

134：iptables -I INPUT -s XX.132 -j DROP

可用

主

從

MHA正常切換，原主庫(kù)正常訪問，133成為新主庫(kù)，132和133同時(shí)存在VIP

132：service network stop/ifconfig eth0 down

不可用

主

從

MHA正常切換，數(shù)據(jù)可能丟失

注：上述表格是配置[binlogN]指定到Binlog Server，沒有指定secondary_check_script的測(cè)試結(jié)果

關(guān)閉數(shù)據(jù)庫(kù)服務(wù)器，數(shù)據(jù)可能丟失的原因：Binlog Server是異步，高并發(fā)下binlog延遲可以理解

開啟防火墻，模擬主庫(kù)與mha-manager不通訊，出現(xiàn)腦裂。配置文件添加"secondary_check_script=masterha_secondary_check -s remote_host1 -s remote_host2"，remote_host1、remote_host2盡量與mha-manager、MySQL Server處于不同網(wǎng)段

三、GTID，Auto_Position=0，故障切換變成GTID，Auto_Position=1

3.1、Auto_Position

線上部分環(huán)境GTID，Auto_Position=0，故障切換會(huì)變成GTID，Auto_Position=1

? 有何風(fēng)險(xiǎn)

如果S1從庫(kù)的GTIDs存在空洞，S2從庫(kù)的GTIDs正常，隨著時(shí)間推移，S2將S1上GTIDs空洞對(duì)應(yīng)的binlog刪除。此時(shí)發(fā)生故障切換，且選擇S2做為新Master，在S1 change master to S2 master_auto_position=1會(huì)報(bào)錯(cuò)

Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

View Code

從庫(kù)存在GTIDs空洞可能會(huì)導(dǎo)致切換異常：VIP正常切換，新主可用，新從到新主之間的復(fù)制報(bào)錯(cuò)，只有修復(fù)主從報(bào)錯(cuò)，才會(huì)做后續(xù)操作(new master cleanup、Failover Report、send mail)

? 為何不直接修改為GTID，Auto_Position=1

Slave GTIDs

Slave GTIDs>Master GTIDs，5.7下主從直接報(bào)錯(cuò)

? 如何解決

修改源碼~~~

shell> vim /usr/share/perl5/vendor_perl/MHA/ServerManager.pm1550 return 1 if ( $_->{use_gtid_auto_pos} );-->修改為1550 #return 1 if ( $_->{use_gtid_auto_pos} );

shell> vim /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm367 if ( !$_server_manager->is_gtid_auto_pos_enabled() ) {368 $log->info("GTID (with auto-pos) is not supported");-->修改為367 if ( $_server_manager->is_gtid_auto_pos_enabled() !=1) {368 $log->info("GTID (with auto-pos) is not supported");

View Code

為啥這樣修改表示看不懂，感謝順子(另一位同學(xué))分享~

注意：傳統(tǒng)復(fù)制(gtid_mode=off)，MHA不會(huì)利用Binlog Server補(bǔ)全差異數(shù)據(jù)(又是一個(gè)坑●-●)

Binlog Server

Starting from MHA version 0.56, MHA supports new section [binlogN]. In binlog section, you can define mysqlbinlog streaming servers. When MHA does GTID based failover, MHA checks binlog servers, and if binlog servers are ahead of other slaves, MHA applies differential binlog events to the new master before recovery. When MHA does non-GTID based (traditional) failover, MHA ignores binlog servers.

3.2、什么情況會(huì)出現(xiàn)GTID空洞

1、從庫(kù)暫停Slave Thread->主庫(kù)寫數(shù)據(jù)->主庫(kù)flush log、purge log->從庫(kù)啟動(dòng)Slave Thread->報(bào)錯(cuò)，缺失binary log

手工執(zhí)行change master_auto_position=0;change binlog file & pos;

2、搭建復(fù)制時(shí)change master_auto_position=0;->復(fù)制過程暫停Slave Thread->change new_file & new_pos->從庫(kù)啟動(dòng)Slave Thread

master_auto_position=1，Slave連接Master時(shí)，會(huì)把Executed_Gtid_Set中的GTIDs發(fā)給Master，Master會(huì)跳過Executed_Gtid_Set，把沒有執(zhí)行過的GTIDs發(fā)送給Slave

情況1：再次change master_auto_position=1;它依舊會(huì)去查找那些被purge的binlog，然后拋出錯(cuò)誤

情況2：再次change master_auto_position=1;只要主上對(duì)應(yīng)binlog沒被purge，它能自動(dòng)將空洞GTID補(bǔ)全

前提：Master沒有對(duì)GTIDs空洞相應(yīng)的記錄進(jìn)行DML操作，不然復(fù)制早就報(bào)錯(cuò)了，可能就錯(cuò)過這個(gè)坑~不過仔細(xì)想想，從庫(kù)本來就有空洞，復(fù)制也沒報(bào)錯(cuò)，側(cè)面反映Master沒有對(duì)GTIDs空洞相應(yīng)的記錄進(jìn)行DML操作

擴(kuò)展閱讀：[MySQL FAQ]系列 — 5.6版本GTID復(fù)制異常處理一例

3.3、relay-log是如何獲取及應(yīng)用

Slave GTIDs>Master GTIDs，relay-log是如何獲取及應(yīng)用

? GTID，auto_position=0Master

Executed_Gtid_Set：90b30799-9215-11e7-8645-000c29c1025c:1-14Slave

set global Gtid_Purged='90b30799-9215-11e7-8645-000c29c1025c:1-6:8-24';-->Master寫入一條數(shù)據(jù)

Master

Executed_Gtid_Set：90b30799-9215-11e7-8645-000c29c1025c:1-15Slave

Retrieved_Gtid_Set: 90b30799-9215-11e7-8645-000c29c1025c:15Executed_Gtid_Set: 90b30799-9215-11e7-8645-000c29c1025c:1-6:8-24

View Code

新寫入的binlog會(huì)寫到從庫(kù)的relay-log，但是不會(huì)應(yīng)用(可以通過查看數(shù)據(jù)、解析日志確認(rèn))！

? GTID，auto_position=1change master to master_auto_position=1;

啟動(dòng)復(fù)制報(bào)錯(cuò)

Last_IO_Error: Got fatal error1236 from master when reading data from binary log: Slave has more GTIDs than the master has, using the master`s SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replica

View Code

relay-log獲取

Auto_Position=0，如果開啟relay-log自動(dòng)修復(fù)機(jī)制，發(fā)生crash時(shí)根據(jù)relay_log_info中記錄的已執(zhí)行的binlog位置從master上重新獲取寫入relay-log

Auto_Position=1，Slave連接Master時(shí)，會(huì)把Executed_Gtid_Set中的GTIDs發(fā)給Master，Master會(huì)跳過Executed_Gtid_Set，把沒有執(zhí)行過的GTIDs發(fā)送給Slave。如果Slave上的GTIDs大于Master上的GTIDs，5.7下直接報(bào)錯(cuò)，5.6下不會(huì)報(bào)錯(cuò)(有環(huán)境的自行驗(yàn)證，順便看看relay-log會(huì)不會(huì)有記錄寫入)

relay-log應(yīng)用

如果relay-log中的GTIDs包含在Executed_Gtid_Set里，則不會(huì)apply-log

四、故障切換流程

MHA在傳統(tǒng)復(fù)制和GTID復(fù)制下，主庫(kù)發(fā)生故障，如何選舉New Master，如何修復(fù)差異數(shù)據(jù)

詳細(xì)流程請(qǐng)參考：MHA-手動(dòng)Failover流程(傳統(tǒng)復(fù)制&GTID復(fù)制)

總結(jié)

以上是生活随笔為你收集整理的mysql auto position_MHA-Failover(GTID，Auto_Position=0)的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： hibernate mysql 映射_H
下一篇： mysql freebuf_浅析mysq