MySQL 高可用架构 之 MHA (Centos 7.5 MySQL 5.7.18 MHA 0.58)
目錄
- 簡介
- 環境準備
- 秘鑰互信
- 安裝基礎依賴包
- 安裝MHA組件
- 安裝 MHA Node組件
- 安裝 MHA Manager 組件
- 建立 MySQL 一主三從
- 初始化 MySQL
- 啟動MySQL 并簡單配置
- 建立 一主三從
- 特別說明
- MHA Manager 配置
- MHA 配置文件
- 配置文件說明
- 腳本配置
- 自動 VIP 管理配置
- 配置郵件和微信報警腳本
- 手動 VIP 管理配置腳本
- 賦權
- 驗證 MHA 相關操作
- 驗證 ssh 信任登錄是否成功
- 驗證 mysql 主從復制是否成功
- 啟動 MHA
- 手動第一次添加vip
- 啟動
- 關閉
- 模擬宕機測試
- 自動切換步驟
- 修復后重新加入集群
- node02 操作
- node01 操作
- manager 操作
- 修改前
- 修改后
- 重新啟動MHA
- 在線進行切換
- 停止MHA 的manager 監控
- 執行切換命令
- 查看狀態
簡介
MHA(Master High Availability)目前在MySQL高可用方面是一個相對成熟的解決方案,它由日本DeNA公司youshimaton(現就職于Facebook公司)開發,是一套優秀的作為MySQL高可用性環境下故障切換和主從提升的高可用軟件。
在MySQL故障切換過程中,MHA能做到在0~30秒之內自動完成數據庫的故障切換操作,并且在進行故障切換的過程中,MHA能在最大程度上保證數據的一致性,以達到真正意義上的高可用。
該軟件由兩部分組成:
- MHA Manager(管理節點)
- MHA Node(數據節點)
MHA Manager可以單獨部署在一臺獨立的機器上管理多個master-slave集群,也可以部署在一臺slave節點上。
MHA Node運行在每臺MySQL服務器上,MHA Manager會定時探測集群中的master節點,當master出現故障時,它可以自動將最新數據的slave提升為新的master,然后將所有其他的slave重新指向新的master。
整個故障轉移過程對應用程序完全透明。
可以將MHA工作原理總結為如下
Manager工具包
| masterha_check_ssh | 檢查MHA的SSH配置狀況 |
| masterha_check_repl | 檢查MySQL復制狀況 |
| masterha_manger | 啟動MHA |
| masterha_check_status | 檢測當前MHA運行狀態 |
| masterha_master_monitor | 檢測master是否宕機 |
| masterha_master_switch | 控制故障轉移(自動或者手動) |
| masterha_conf_host | 添加或刪除配置的server信息 |
Node工具包
這些工具通常由MHA Manager的腳本觸發,無需人為操作
| save_binary_logs | 保存和復制master的二進制日志 |
| apply_diff_relay_logs | 識別差異的中繼日志事件并將其差異的事件應用于其他的slave |
| filter_mysqlbinlog | 去除不必要的ROLLBACK事件(MHA已不再使用這個工具) |
| purge_relay_logs | 清除中繼日志(不會阻塞SQL線程) |
注意:
為了盡可能的減少主庫硬件損壞宕機造成的數據丟失,因此在配置MHA的同時建議配置成MySQL 5.5的半同步復制。關于半同步復制原理各位自己進行查閱。(不是必須)
環境準備
| centos 7.5 | 5.1.3-1.el7 | manager.mha | MySQL 5.7.18 | 10.0.20.200 | Manager |
| centos 7.5 | 5.1.3-1.el7 | node01.mha | MySQL 5.7.18 | 10.0.20.201 | node01 mysql-master |
| centos 7.5 | 5.1.3-1.el7 | node02.mha | MySQL 5.7.18 | 10.0.20.202 | node02 mysql-slave |
| centos 7.5 | 5.1.3-1.el7 | node03.mha | MySQL 5.7.18 | 10.0.20.203 | node03 mysql-slave |
| centos 7.5 | 5.1.3-1.el7 | node04.mha | MySQL 5.7.18 | 10.0.20.204 | node04 mysql-slave |
| v0.58 | GitHub下載地址 | 百度網盤地址 提取碼:lzb0 |
| v0.58 | GitHub下載地址 | 百度網盤地址 提取碼:4e6h |
秘鑰互信
配置所有機器相互之間root用戶秘鑰互信
在所有機器上執行:
ssh-keygen -t dsa -f ~/.ssh/id_rsa -P ""
此時所有的機器之間以完成互信,無需密碼等即可ssh登陸
安裝基礎依賴包
在所有機器上執行:
yum install -y perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-CPAN perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes安裝MHA組件
安裝 MHA Node組件
在所有節點上執行
[root@node01 ~]# cd /opt/soft [root@node01 soft]# ll total 639152 -rw-r--r-- 1 root root 56220 Jun 12 17:59 mha4mysql-node-0.58.tar.gz -rw-r--r-- 1 root root 654430368 Jun 11 11:21 mysql-5.7.18-linux-glibc2.5-x86_64.tar.gz解壓安裝
具體命令執行輸出就不復制出來了
[root@node01 soft]# tar xf mha4mysql-node-0.58.tar.gz [root@node01 soft]# cd mha4mysql-node-0.58 [root@node01 mha4mysql-node-0.58]# perl Makefile.PL [root@node01 mha4mysql-node-0.58]# make && make installNode安裝完成后會得到四個工具
[root@node01 mha4mysql-node-0.58]# ll /usr/local/bin/ total 48 -r-xr-xr-x 1 root root 17639 Jun 13 15:00 apply_diff_relay_logs -r-xr-xr-x 1 root root 4807 Jun 13 15:00 filter_mysqlbinlog -r-xr-xr-x 1 root root 8337 Jun 13 15:00 purge_relay_logs -r-xr-xr-x 1 root root 7525 Jun 13 15:00 save_binary_logs安裝 MHA Manager 組件
在 Manager 節點執行安裝
不用在Node節點上安裝
[root@manager soft]# tar xf mha4mysql-manager-0.58.tar.gz [root@manager soft]# cd mha4mysql-manager-0.58 [root@manager mha4mysql-manager-0.58]# ls AUTHORS bin COPYING debian inc lib Makefile.PL MANIFEST META.yml README rpm samples t tests [root@manager mha4mysql-manager-0.58]# perl Makefile.PL [root@manager mha4mysql-manager-0.58]# make && make install查看 Manager 工具
[root@manager mha4mysql-manager-0.58]# ll /usr/local/bin/ total 88 -r-xr-xr-x 1 root root 17639 Jun 13 15:10 apply_diff_relay_logs -r-xr-xr-x 1 root root 4807 Jun 13 15:10 filter_mysqlbinlog -r-xr-xr-x 1 root root 1995 Jun 13 15:13 masterha_check_repl -r-xr-xr-x 1 root root 1779 Jun 13 15:13 masterha_check_ssh -r-xr-xr-x 1 root root 1865 Jun 13 15:13 masterha_check_status -r-xr-xr-x 1 root root 3201 Jun 13 15:13 masterha_conf_host -r-xr-xr-x 1 root root 2517 Jun 13 15:13 masterha_manager -r-xr-xr-x 1 root root 2165 Jun 13 15:13 masterha_master_monitor -r-xr-xr-x 1 root root 2373 Jun 13 15:13 masterha_master_switch -r-xr-xr-x 1 root root 5172 Jun 13 15:13 masterha_secondary_check -r-xr-xr-x 1 root root 1739 Jun 13 15:13 masterha_stop -r-xr-xr-x 1 root root 8337 Jun 13 15:10 purge_relay_logs -r-xr-xr-x 1 root root 7525 Jun 13 15:10 save_binary_logs建立 MySQL 一主三從
本文章主要實現是MHA集群,MySQL集群直接貼命令和my.cnf配置
在 四臺 Node 節點上,實現,node01 為 master,剩下三個 node 為 slave 。
[root@node01 mysql-5.7]# rpm -qa |grep mariadb | xargs rpm -e --nodeps [root@node01 soft]# useradd -s /sbin/nologin -M mysql [root@node01 soft]# tar xf mysql-5.7.18-linux-glibc2.5-x86_64.tar.gz [root@node01 soft]# mv mysql-5.7.18-linux-glibc2.5-x86_64 mysql-5.7 [root@node01 soft]# mv mysql-5.7 /usr/local/ [root@node01 soft]# ln -s /usr/local/mysql-5.7 /usr/local/mysql [root@node01 soft]# cd /usr/local/mysql-5.7 [root@node01 mysql-5.7]# echo 'export PATH=$PATH:/usr/local/mysql-5.7/bin' >> /etc/profile [root@node01 mysql-5.7]# source /etc/profile [root@node01 mysql-5.7]# mysql -V mysql Ver 14.14 Distrib 5.7.18, for linux-glibc2.5 (x86_64) using EditLine wrapper [root@node01 mysql-5.7]# cp support-files/mysql.server /etc/init.d/mysqld [root@node01 mysql-5.7]# sed -i 's@/etc/my.cnf@/usr/local/mysql-5.7/my.cnf@g' /etc/init.d/mysqld [root@node01 mysql-5.7]# sed -i 's@/usr/local/mysql/data@/opt/mysql_data@g' /etc/init.d/mysqld [root@node01 mysql-5.7]# chkconfig mysqld on [root@node01 mysql-5.7]# mkdir /opt/mysql_data [root@node01 mysql-5.7]# chown -R mysql.mysql /usr/local/mysql-5.7 [root@node01 mysql-5.7]# chown -R mysql.mysql /opt/mysql_data [root@node01 mysql-5.7]#ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog [root@node01 mysql-5.7]#ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysqlmy.cnf 配置文件
注意 需要把my.cnf 中的server-id的的值四臺node不能重復,否則主從會建立失敗。
[root@node04 mysql-5.7]# cat my.cnf [client] socket = /tmp/mysql.sock port=3306[mysql] default-character-set=utf8 socket = /tmp/mysql.sock[mysqld] socket = /tmp/mysql.sock character-set-server=utf8 basedir=/usr/local/mysql-5.7 datadir=/opt/mysql_data port=3306 pid-file=/opt/mysql_data/mysqld.pid# 四臺node不可重復 server-id=204skip-name-resolvedefault-storage-engine=INNODB explicit_defaults_for_timestamp = truegtid_mode = on enforce_gtid_consistency = 1 log_slave_updates = 1plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so" loose_rpl_semi_sync_master_enabled = 1 loose_rpl_semi_sync_slave_enabled = 1 loose_rpl_semi_sync_master_timeout = 5000relay-log = mysql-relay-bin replicate-wild-ignore-table=mysql.% replicate-wild-ignore-table=test.% replicate-wild-ignore-table=information_schema.%max_connections=2000 query_cache_size=0 table_open_cache=2000 tmp_table_size=246M thread_cache_size=300 thread_stack = 192k key_buffer_size=512M read_buffer_size=4M read_rnd_buffer_size=32Minnodb_data_home_dir = /opt/mysql_data innodb_flush_log_at_trx_commit=0 innodb_log_buffer_size=16M# 此選項修改為實際運行mysql機器內存的%60 - %80 innodb_buffer_pool_size=13Ginnodb_log_file_size=128M innodb_thread_concurrency=128 innodb_autoextend_increment=1000 innodb_buffer_pool_instances=8 innodb_concurrency_tickets=5000 innodb_old_blocks_time=1000 innodb_open_files=300 innodb_stats_on_metadata=0 innodb_file_per_table=1 innodb_checksum_algorithm=0back_log = 80 flush_time = 0 join_buffer_size = 128M max_allowed_packet = 1024M max_connect_errors = 2000 open_files_limit = 4161 query_cache_type = 0 sort_buffer_size = 32M table_definition_cache = 1400 binlog_row_event_max_size = 8K sync_master_info = 10000 sync_relay_log = 10000 sync_relay_log_info = 10000 bulk_insert_buffer_size = 64M interactive_timeout = 120 wait_timeout = 120 log-bin-trust-function-creators=1 sql_mode = NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES [mysqld_safe] log-error = /opt/mysql_data/error.log pid-file = /opt/mysql_data/mysqld.pid初始化 MySQL
node01
[root@node01 mysql-5.7]# mysqld --initialize --user=mysql --basedir=/usr/local/mysql-5.7 --datadir=/opt/mysql_data 2019-06-13T07:59:00.947482Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2019-06-13T07:59:01.056859Z 0 [Warning] InnoDB: New log files created, LSN=45790 2019-06-13T07:59:01.076218Z 0 [Warning] InnoDB: Creating foreign key constraint system tables. 2019-06-13T07:59:01.129463Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae29152-8db1-11e9-9d54-005056990727. 2019-06-13T07:59:01.129873Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2019-06-13T07:59:01.130247Z 1 [Note] A temporary password is generated for root@localhost: 1qGoEiI7ga#Unode02
[root@node02 mysql-5.7]# mysqld --initialize --user=mysql --basedir=/usr/local/mysql-5.7 --datadir=/opt/mysql_data 2019-06-13T07:59:00.952176Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2019-06-13T07:59:01.092736Z 0 [Warning] InnoDB: New log files created, LSN=45790 2019-06-13T07:59:01.116696Z 0 [Warning] InnoDB: Creating foreign key constraint system tables. 2019-06-13T07:59:01.171324Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae8f47b-8db1-11e9-b8bb-0050569972c0. 2019-06-13T07:59:01.171711Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2019-06-13T07:59:01.172126Z 1 [Note] A temporary password is generated for root@localhost: qTwtKAOue7:onode03
[root@node03 mysql-5.7]# mysqld --initialize --user=mysql --basedir=/usr/local/mysql-5.7 --datadir=/opt/mysql_data 2019-06-13T07:59:00.949924Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2019-06-13T07:59:01.090890Z 0 [Warning] InnoDB: New log files created, LSN=45790 2019-06-13T07:59:01.116166Z 0 [Warning] InnoDB: Creating foreign key constraint system tables. 2019-06-13T07:59:01.171335Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae8f4ef-8db1-11e9-b6ae-0050569975f7. 2019-06-13T07:59:01.171753Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2019-06-13T07:59:01.172159Z 1 [Note] A temporary password is generated for root@localhost: XIu,h#*HQ5&Mnode04
2019-06-13T07:59:00.955598Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2019-06-13T07:59:01.090420Z 0 [Warning] InnoDB: New log files created, LSN=45790 2019-06-13T07:59:01.113972Z 0 [Warning] InnoDB: Creating foreign key constraint system tables. 2019-06-13T07:59:01.166754Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae84210-8db1-11e9-b6fe-005056992c6b. 2019-06-13T07:59:01.167145Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2019-06-13T07:59:01.167537Z 1 [Note] A temporary password is generated for root@localhost: 26jvaV)XAy>G執行完初始化操作后,最后會給予root的默認密碼,使用此密碼登陸后,要第一時間修改root密碼,否則不允許操作數據庫;
啟動MySQL 并簡單配置
# /etc/init.d/mysqld start Starting MySQL.Logging to '/opt/mysql_data/error.log'. .. SUCCESS!登陸MySQL 并修改密碼
[root@node01 mysql-5.7]# mysql -uroot -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.18Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql> alter user user() identified by "123456"; Query OK, 0 rows affected (0.00 sec)所有mysql增加主從用戶
mysql> grant replication slave on *.* to 'repl'@'10.0.20.%' identified by '123456'; Query OK, 0 rows affected, 1 warning (0.00 sec)mysql> grant all on *.* to 'root'@'%' identified by '123456'; Query OK, 0 rows affected, 1 warning (0.00 sec)mysql> flush privileges; Query OK, 0 rows affected (0.00 sec)建立 一主三從
node01 的MySQL執行
mysql> show master status; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | mysql-bin.000002 | 154 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec)node02、node03、node04 都執行下列語句
change master to master_host='10.0.20.201',master_user='repl',master_password='123456',master_log_file='mysql-bin.000002',master_log_pos=463;show slave status\G; #查看slave IO和slave sql是否都正常
特別說明
下面開始配置Manager機器,本人的所有機器,均做了bond網卡綁定,所有機器的網卡名都為bond0,大家根據自己的網卡名稱自行修改,還有發送郵件的郵箱以及微信公眾號的相關配置,均需要修改為自己的。
本次是用vip 是: 10.0.20.199
大家根據自己的情況,做出對應的修改。
MHA Manager 配置
下面配置,均在manager機器上操作。
# 創建MHA配置文件目錄 mkdir /etc/mha # 創建MHA腳本目錄 mkdir /etc/mha/scripts # 創建MHA日志目錄 mkdir /var/log/mha/ # 創建日志目錄 mkdir /var/log/mha/app1 -p # 創建日志文件 touch /var/log/mha/app1/manager.logMHA 配置文件
[root@manager mha]# cat /etc/masterha_default.cnf [server default] user=root password=SIjiayong.123repl_user=repl repl_password=SIjiayong.123ssh_user=rootping_interval=1 master_binlog_dir=/opt/mysql_datamanager_workdir=/var/log/mha/app1.log manager_log=/var/log/mha/manager.log master_ip_failover_script="/etc/mha/scripts/master_ip_failover" master_ip_online_change_script="/etc/mha/scripts/master_ip_online_change" report_script="/etc/mha/scripts/send_report" remote_workdir=/tmp secondary_check_script= /usr/local/bin/masterha_secondary_check -s 10.0.20.201 -s 10.0.20.202 -s 10.0.20.203 -s 10.0.20.204 shutdown_script="" [root@manager ~]# cat /etc/mha/app1.cnf [server1] hostname=10.0.20.201 port=3306[server2] hostname=10.0.20.202 port=3306 candidate_master=1 check_repl_delay=0[server3] hostname=10.0.20.203 port=3306[server4] hostname=10.0.20.204 port=3306配置文件說明
MHA主要配置文件說明
- manager_workdir=/var/log/masterha/app1.log:設置manager的工作目錄
- manager_log=/var/log/masterha/app1/manager.log:設置manager的日志文件
- master_binlog_dir=/data/mysql:設置master 保存binlog的位置,以便MHA可以找到master的日志
- master_ip_failover_script= /usr/local/bin/master_ip_failover:設置自動failover時候的切換腳本
- master_ip_online_change_script= /usr/local/bin/master_ip_online_change:設置手動切換時候的切換腳本
- user=root:設置監控mysql的用戶
- password=dayi123:設置監控mysql的用戶,需要授權能夠在manager節點遠程登錄
- ping_interval=1:設置監控主庫,發送ping包的時間間隔,默認是3秒,嘗試三次沒有回應的時候自動進行railover
- remote_workdir=/tmp:設置遠端mysql在發生切換時binlog的保存位置
- repl_user=repl :設置mysql中用于復制的用戶密碼
- repl_password=replication:設置mysql中用于復制的用戶
- report_script=/usr/local/send_report:設置發生切換后發送的報警的腳本
- shutdown_script="":設置故障發生后關閉故障主機腳本(該腳本的主要作用是關閉主機放在發生腦裂,這里沒有使用)
- ssh_user=root //設置ssh的登錄用戶名
- candidate_master=1:在節點下設置,設置當前節點為候選的master
- slave check_repl_delay=0 :在節點配置下設置,默認情況下如果一個slave落后master 100M的relay logs的話,MHA將不會選擇該slave作為一個新的master;這個選項對于對于設置了candidate_master=1的主機非常有用
腳本配置
自動 VIP 管理配置
#為了防止腦裂發生,推薦生產環境采用腳本的方式來管理虛擬 ip,而不是使用 keepalived來完成
vim /etc/mha/scripts/master_ip_failover
#!/usr/bin/env perluse strict; use warnings FATAL => 'all';use Getopt::Long;my ($command, $ssh_user, $orig_master_host, $orig_master_ip,$orig_master_port, $new_master_host, $new_master_ip, $new_master_port );my $vip = '10.0.20.199/24'; my $key = '1'; my $ssh_start_vip = "/sbin/ifconfig bond0:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig bond0:$key down";GetOptions('command=s' => \$command,'ssh_user=s' => \$ssh_user,'orig_master_host=s' => \$orig_master_host,'orig_master_ip=s' => \$orig_master_ip,'orig_master_port=i' => \$orig_master_port,'new_master_host=s' => \$new_master_host,'new_master_ip=s' => \$new_master_ip,'new_master_port=i' => \$new_master_port, );exit &main();sub main {print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";if ( $command eq "stop" || $command eq "stopssh" ) {my $exit_code = 1;eval {print "Disabling the VIP on old master: $orig_master_host \n";&stop_vip();$exit_code = 0;};if ($@) {warn "Got Error: $@\n";exit $exit_code;}exit $exit_code;}elsif ( $command eq "start" ) {my $exit_code = 10;eval {print "Enabling the VIP - $vip on the new master - $new_master_host \n";&start_vip();$exit_code = 0;};if ($@) {warn $@;exit $exit_code;}exit $exit_code;}elsif ( $command eq "status" ) {print "Checking the Status of the script.. OK \n";exit 0;}else {&usage();exit 1;} }sub start_vip() {`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } sub stop_vip() {return 0 unless ($ssh_user);`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; }sub usage {print"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }配置郵件和微信報警腳本
# 安裝發送郵件的工具 yum install mailx -ymail郵件發送程序,需要先配置好發送這信息
vim /etc/mail.rc
set from=*****@163.com set smtp=smtp.163.com set smtp-auth-user=***** #拿163郵箱來說這個不是密碼,而是授權碼 set smtp-auth-password=***** set smtp-auth=login這是具體的郵件和微信發送腳本
vim /etc/mha/scripts/send_report
#!/bin/bash source /root/.bash_profile # 解析變量 orig_master_host=`echo "$1" | awk -F = '{print $2}'` new_master_host=`echo "$2" | awk -F = '{print $2}'` new_slave_hosts=`echo "$3" | awk -F = '{print $2}'` subject=`echo "$4" | awk -F = '{print $2}'` body=`echo "$5" | awk -F = '{print $2}'` #定義收件人地址 email="***@***.com"# 下面這倆個需要微信公眾號中自行獲取 CropID='******************' Secret='***************************************'GURL="https://qyapi.weixin.qq.com/cgi-bin/gettoken?corpid=$CropID&corpsecret=$Secret" Gtoken=$(/usr/bin/curl -s -G $GURL | awk -F\" '{print $10}')PURL="https://qyapi.weixin.qq.com/cgi-bin/message/send?access_token=$Gtoken"function body() {#企業號中的應用idlocal int AppID=1000002#部門成員id,local UserID=$1#部門id,定義了范圍,組內成員都可接收到消息local PartyID='2|3'#過濾出zabbix傳遞的第三個參數local Msg=$(echo "$@" | cut -d" " -f3-)printf '{\n'printf '\t"touser": "'"$UserID"\"",\n"printf '\t"toparty": "'"$PartyID"\"",\n"printf '\t"msgtype": "text",\n'printf '\t"agentid": "'" $AppID "\"",\n"printf '\t"text": {\n'printf '\t\t"content": "'"$Msg"\""\n"printf '\t},\n'printf '\t"safe":"0"\n'printf '}\n' }tac /var/log/mha/app1/manager.log | sed -n 2p | grep 'successfully' > /dev/null if [ $? -eq 0 ]thenmessages=`echo -e "MHA $subject 主從切換成功\n master:$orig_master_host --> $new_master_host \n $body \n 當前從庫:$new_slave_hosts"` echo "$messages" | mail -s "Mysql 實例宕掉,MHA $subject 切換成功" $email >>/tmp/mailx.log 2>&1 /usr/bin/curl --data-ascii "$(body 1 1 ${messages})" ${PURL}elsemessages=`echo -e "MHA $subject 主從切換失敗\n master:$orig_master_host --> $new_master_host \n $body" `echo "$messages" | mail -s ""Mysql 實例宕掉,MHA $subject 切換失敗"" $email >>/tmp/mailx.log 2>&1 /usr/bin/curl --data-ascii "$(body 1 1 ${messages})" ${PURL} fi手動 VIP 管理配置腳本
vim /etc/mha/scripts/master_ip_online_change
#!/bin/bash source /root/.bash_profilevip=`echo '10.0.20.199/24'` #設置VIP key=`echo '1'`command=`echo "$1" | awk -F = '{print $2}'` orig_master_host=`echo "$2" | awk -F = '{print $2}'` new_master_host=`echo "$7" | awk -F = '{print $2}'` orig_master_ssh_user=`echo "${12}" | awk -F = '{print $2}'` new_master_ssh_user=`echo "${13}" | awk -F = '{print $2}'`#要求服務的網卡識別名一樣 stop_vip=`echo "ssh root@$orig_master_host /usr/sbin/ifconfig bond0:$key down"` start_vip=`echo "ssh root@$new_master_host /usr/sbin/ifconfig bond0:$key $vip"`if [ $command = 'stop' ]thenecho -e "\n\n\n****************************\n"echo -e "Disabled thi VIP - $vip on old master: $orig_master_host \n"$stop_vipif [ $? -eq 0 ]thenecho "Disabled the VIP successfully"elseecho "Disabled the VIP failed"fiecho -e "***************************\n\n\n"fiif [ $command = 'start' -o $command = 'status' ]thenecho -e "\n\n\n*************************\n"echo -e "Enabling the VIP - $vip on new master: $new_master_host \n"$start_vipif [ $? -eq 0 ]thenecho "Enabled the VIP successfully"elseecho "Enabled the VIP failed"fiecho -e "***************************\n\n\n" fi賦權
最后給剛剛配置的三個腳本增加執行權限
chmod +x /etc/mha/scripts/master_ip_failover chmod +x /etc/mha/scripts/master_ip_online_change chmod +x /etc/mha/scripts/send_report驗證 MHA 相關操作
驗證 ssh 信任登錄是否成功
通過 masterha_check_ssh 命令驗證
[root@manager scripts]# masterha_check_ssh --conf=/etc/mha/app1.cnf # 最后出現以下提示,則表示通過 Thu Jun 13 17:19:34 2019 - [info] All SSH connection tests passed successfully.驗證 mysql 主從復制是否成功
通過 masterha_check_repl 命令驗證
[root@manager mha]# vim /etc/masterha_default.cnf # 最后出現以下提示,則表示通過 MySQL Replication Health is OK.啟動 MHA
手動第一次添加vip
本次在node01 上操作
先在node01 的 MySQL master上綁定vip,只需要在master綁定這一次,以后會自動切換
[root@node01 mysql-5.7]# ip a | grep 20inet 10.0.20.201/24 brd 10.0.20.255 scope global bond0inet 10.0.20.199/24 brd 10.0.20.255 scope global secondary bond0:1啟動
這一步在manager上操作
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &檢查 MHA 狀態
[root@manager mha]# masterha_check_status --conf=/etc/mha/app1.cnf app1 (pid:4745) is running(0:PING_OK), master:10.0.20.201MHA 的日志保存在/var/log/masterha/app1/manager.log 下
[root@manager mha]# tailf /var/log/mha/manager.log #如果最后一行是如下,表明啟動成功 Thu Jun 13 17:31:41 2019 - [info] Starting ping health check on 10.0.20.201(10.0.20.201:3306).. Thu Jun 13 17:31:41 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..關閉
若已處于監控狀態,需要停掉它
masterha_stop --conf=/etc/mha/app1.cnf
模擬宕機測試
手動停止node01 的 MySQL master,然后查看其它節點情況。
[root@node01 ~]# /etc/init.d/mysqld stop Shutting down MySQL............ SUCCESS! [root@node01 ~]# ip a | grep 20inet 10.0.20.201/24 brd 10.0.20.255 scope global bond0在node02 上查看VIP
[root@node02 ~]# ip a | grep 20inet 10.0.20.202/24 brd 10.0.20.255 scope global bond0inet 10.0.20.199/24 brd 10.0.20.255 scope global secondary bond0:1在node03 上查看主從同步狀態和地址
[root@node03 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure.Master_Host: 10.0.20.202Slave_IO_Running: YesSlave_SQL_Running: YesSlave_SQL_Running_State: Slave has read all relay log; waiting for more updates在node04 上查看主從同步狀態和地址
[root@node04 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure.Master_Host: 10.0.20.202Slave_IO_Running: YesSlave_SQL_Running: YesSlave_SQL_Running_State: Slave has read all relay log; waiting for more updates查看Manager日志
[root@manager mha]# tailf manager.log Fri Jun 14 10:01:03 2019 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Fri Jun 14 10:01:03 2019 - [info] Executing SSH check script: exit 0 Fri Jun 14 10:01:03 2019 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 10.0.20.201 -s 10.0.20.202 -s 10.0.20.203 -s 10.0.20.204 --user=root --master_host=10.0.20.201 --master_ip=10.0.20.201 --master_port=3306 --master_user=root --master_password=123456 --ping_type=SELECT Fri Jun 14 10:01:03 2019 - [info] HealthCheck: SSH to 10.0.20.201 is reachable. Monitoring server 10.0.20.201 is reachable, Master is not reachable from 10.0.20.201. OK. Monitoring server 10.0.20.202 is reachable, Master is not reachable from 10.0.20.202. OK. Monitoring server 10.0.20.203 is reachable, Master is not reachable from 10.0.20.203. OK. Fri Jun 14 10:01:04 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.20.201' (111)) Fri Jun 14 10:01:04 2019 - [warning] Connection failed 2 time(s).. Monitoring server 10.0.20.204 is reachable, Master is not reachable from 10.0.20.204. OK. Fri Jun 14 10:01:04 2019 - [info] Master is not reachable from all other monitoring servers. Failover should start. Fri Jun 14 10:01:05 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.20.201' (111)) Fri Jun 14 10:01:05 2019 - [warning] Connection failed 3 time(s).. Fri Jun 14 10:01:06 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.20.201' (111)) Fri Jun 14 10:01:06 2019 - [warning] Connection failed 4 time(s).. Fri Jun 14 10:01:06 2019 - [warning] Master is not reachable from health checker! Fri Jun 14 10:01:06 2019 - [warning] Master 10.0.20.201(10.0.20.201:3306) is not reachable! Fri Jun 14 10:01:06 2019 - [warning] SSH is reachable. Fri Jun 14 10:01:06 2019 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status.. Fri Jun 14 10:01:06 2019 - [info] Reading default configuration from /etc/masterha_default.cnf.. Fri Jun 14 10:01:06 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Fri Jun 14 10:01:06 2019 - [info] Reading server configuration from /etc/mha/app1.cnf.. Fri Jun 14 10:01:07 2019 - [info] GTID failover mode = 1 Fri Jun 14 10:01:07 2019 - [info] Dead Servers: Fri Jun 14 10:01:07 2019 - [info] 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:07 2019 - [info] Alive Servers: Fri Jun 14 10:01:07 2019 - [info] 10.0.20.202(10.0.20.202:3306) Fri Jun 14 10:01:07 2019 - [info] 10.0.20.203(10.0.20.203:3306) Fri Jun 14 10:01:07 2019 - [info] 10.0.20.204(10.0.20.204:3306) Fri Jun 14 10:01:07 2019 - [info] Alive Slaves: Fri Jun 14 10:01:07 2019 - [info] 10.0.20.202(10.0.20.202:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:07 2019 - [info] GTID ON Fri Jun 14 10:01:07 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:07 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 10:01:07 2019 - [info] 10.0.20.203(10.0.20.203:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:07 2019 - [info] GTID ON Fri Jun 14 10:01:07 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:07 2019 - [info] 10.0.20.204(10.0.20.204:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:07 2019 - [info] GTID ON Fri Jun 14 10:01:07 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:07 2019 - [info] Checking slave configurations.. Fri Jun 14 10:01:07 2019 - [info] read_only=1 is not set on slave 10.0.20.202(10.0.20.202:3306). Fri Jun 14 10:01:07 2019 - [info] read_only=1 is not set on slave 10.0.20.203(10.0.20.203:3306). Fri Jun 14 10:01:07 2019 - [info] read_only=1 is not set on slave 10.0.20.204(10.0.20.204:3306). Fri Jun 14 10:01:07 2019 - [info] Checking replication filtering settings.. Fri Jun 14 10:01:07 2019 - [info] Replication filtering check ok. Fri Jun 14 10:01:07 2019 - [info] Master is down! Fri Jun 14 10:01:07 2019 - [info] Terminating monitoring script. Fri Jun 14 10:01:07 2019 - [info] Got exit code 20 (Master dead). Fri Jun 14 10:01:07 2019 - [info] MHA::MasterFailover version 0.58. Fri Jun 14 10:01:07 2019 - [info] Starting master failover. Fri Jun 14 10:01:07 2019 - [info] Fri Jun 14 10:01:07 2019 - [info] * Phase 1: Configuration Check Phase.. Fri Jun 14 10:01:07 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] GTID failover mode = 1 Fri Jun 14 10:01:08 2019 - [info] Dead Servers: Fri Jun 14 10:01:08 2019 - [info] 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Checking master reachability via MySQL(double check)... Fri Jun 14 10:01:08 2019 - [info] ok. Fri Jun 14 10:01:08 2019 - [info] Alive Servers: Fri Jun 14 10:01:08 2019 - [info] 10.0.20.202(10.0.20.202:3306) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.203(10.0.20.203:3306) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.204(10.0.20.204:3306) Fri Jun 14 10:01:08 2019 - [info] Alive Slaves: Fri Jun 14 10:01:08 2019 - [info] 10.0.20.202(10.0.20.202:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.203(10.0.20.203:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.204(10.0.20.204:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Starting GTID based failover. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] ** Phase 1: Configuration Check Phase completed. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 2: Dead Master Shutdown Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Forcing shutdown so that applications never connect to the current master.. Fri Jun 14 10:01:08 2019 - [info] Executing master IP deactivation script: Fri Jun 14 10:01:08 2019 - [info] /etc/mha/scripts/master_ip_failover --orig_master_host=10.0.20.201 --orig_master_ip=10.0.20.201 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig bond0:1 down==/sbin/ifconfig bond0:1 10.0.20.199/24===Disabling the VIP on old master: 10.0.20.201 Fri Jun 14 10:01:08 2019 - [info] done. Fri Jun 14 10:01:08 2019 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Fri Jun 14 10:01:08 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 3: Master Recovery Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] The latest binary log file/position on all slaves is mysql-bin.000004:194 Fri Jun 14 10:01:08 2019 - [info] Retrieved Gtid Set: 6211616e-8db3-11e9-be15-005056990727:3-5 Fri Jun 14 10:01:08 2019 - [info] Latest slaves (Slaves that received relay log files to the latest): Fri Jun 14 10:01:08 2019 - [info] 10.0.20.202(10.0.20.202:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.203(10.0.20.203:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.204(10.0.20.204:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] The oldest binary log file/position on all slaves is mysql-bin.000004:194 Fri Jun 14 10:01:08 2019 - [info] Retrieved Gtid Set: 6211616e-8db3-11e9-be15-005056990727:3-5 Fri Jun 14 10:01:08 2019 - [info] Oldest slaves: Fri Jun 14 10:01:08 2019 - [info] 10.0.20.202(10.0.20.202:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.203(10.0.20.203:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.204(10.0.20.204:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 3.3: Determining New Master Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Searching new master from slaves.. Fri Jun 14 10:01:08 2019 - [info] Candidate masters from the configuration file: Fri Jun 14 10:01:08 2019 - [info] 10.0.20.202(10.0.20.202:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 10:01:08 2019 - [info] Non-candidate masters: Fri Jun 14 10:01:08 2019 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Fri Jun 14 10:01:08 2019 - [info] New master is 10.0.20.202(10.0.20.202:3306) Fri Jun 14 10:01:08 2019 - [info] Starting master failover.. Fri Jun 14 10:01:08 2019 - [info] From: 10.0.20.201(10.0.20.201:3306) (current master)+--10.0.20.202(10.0.20.202:3306)+--10.0.20.203(10.0.20.203:3306)+--10.0.20.204(10.0.20.204:3306)To: 10.0.20.202(10.0.20.202:3306) (new master)+--10.0.20.203(10.0.20.203:3306)+--10.0.20.204(10.0.20.204:3306) Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 3.3: New Master Recovery Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Waiting all logs to be applied.. Fri Jun 14 10:01:08 2019 - [info] done. Fri Jun 14 10:01:08 2019 - [info] Getting new master's binlog name and position.. Fri Jun 14 10:01:08 2019 - [info] mysql-bin.000002:194 Fri Jun 14 10:01:08 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.20.202', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Fri Jun 14 10:01:08 2019 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000002, 194, 6211616e-8db3-11e9-be15-005056990727:4-5 Fri Jun 14 10:01:08 2019 - [info] Executing master IP activate script: Fri Jun 14 10:01:08 2019 - [info] /etc/mha/scripts/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.0.20.201 --orig_master_ip=10.0.20.201 --orig_master_port=3306 --new_master_host=10.0.20.202 --new_master_ip=10.0.20.202 --new_master_port=3306 --new_master_user='root' --new_master_password=xxx Unknown option: new_master_user Unknown option: new_master_passwordIN SCRIPT TEST====/sbin/ifconfig bond0:1 down==/sbin/ifconfig bond0:1 10.0.20.199/24===Enabling the VIP - 10.0.20.199/24 on the new master - 10.0.20.202 Fri Jun 14 10:01:08 2019 - [info] OK. Fri Jun 14 10:01:08 2019 - [info] ** Finished master recovery successfully. Fri Jun 14 10:01:08 2019 - [info] * Phase 3: Master Recovery Phase completed. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 4: Slaves Recovery Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 4.1: Starting Slaves in parallel.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] -- Slave recovery on host 10.0.20.203(10.0.20.203:3306) started, pid: 2838. Check tmp log /var/log/mha/10.0.20.203_3306_20190614100107.log if it takes time.. Fri Jun 14 10:01:08 2019 - [info] -- Slave recovery on host 10.0.20.204(10.0.20.204:3306) started, pid: 2839. Check tmp log /var/log/mha/10.0.20.204_3306_20190614100107.log if it takes time.. Fri Jun 14 10:01:09 2019 - [info] Fri Jun 14 10:01:09 2019 - [info] Log messages from 10.0.20.204 ... Fri Jun 14 10:01:09 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Resetting slave 10.0.20.204(10.0.20.204:3306) and starting replication from the new master 10.0.20.202(10.0.20.202:3306).. Fri Jun 14 10:01:08 2019 - [info] Executed CHANGE MASTER. Fri Jun 14 10:01:08 2019 - [info] Slave started. Fri Jun 14 10:01:08 2019 - [info] gtid_wait(6211616e-8db3-11e9-be15-005056990727:4-5) completed on 10.0.20.204(10.0.20.204:3306). Executed 0 events. Fri Jun 14 10:01:09 2019 - [info] End of log messages from 10.0.20.204. Fri Jun 14 10:01:09 2019 - [info] -- Slave on host 10.0.20.204(10.0.20.204:3306) started. Fri Jun 14 10:01:10 2019 - [info] Fri Jun 14 10:01:10 2019 - [info] Log messages from 10.0.20.203 ... Fri Jun 14 10:01:10 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Resetting slave 10.0.20.203(10.0.20.203:3306) and starting replication from the new master 10.0.20.202(10.0.20.202:3306).. Fri Jun 14 10:01:08 2019 - [info] Executed CHANGE MASTER. Fri Jun 14 10:01:09 2019 - [info] Slave started. Fri Jun 14 10:01:09 2019 - [info] gtid_wait(6211616e-8db3-11e9-be15-005056990727:4-5) completed on 10.0.20.203(10.0.20.203:3306). Executed 0 events. Fri Jun 14 10:01:10 2019 - [info] End of log messages from 10.0.20.203. Fri Jun 14 10:01:10 2019 - [info] -- Slave on host 10.0.20.203(10.0.20.203:3306) started. Fri Jun 14 10:01:10 2019 - [info] All new slave servers recovered successfully. Fri Jun 14 10:01:10 2019 - [info] Fri Jun 14 10:01:10 2019 - [info] * Phase 5: New master cleanup phase.. Fri Jun 14 10:01:10 2019 - [info] Fri Jun 14 10:01:10 2019 - [info] Resetting slave info on the new master.. Fri Jun 14 10:01:10 2019 - [info] 10.0.20.202: Resetting slave info succeeded. Fri Jun 14 10:01:10 2019 - [info] Master failover to 10.0.20.202(10.0.20.202:3306) completed successfully. Fri Jun 14 10:01:10 2019 - [info] Deleted server1 entry from /etc/mha/app1.cnf . Fri Jun 14 10:01:10 2019 - [info] ----- Failover Report -----app1: MySQL Master failover 10.0.20.201(10.0.20.201:3306) to 10.0.20.202(10.0.20.202:3306) succeededMaster 10.0.20.201(10.0.20.201:3306) is down!Check MHA Manager logs at manager.mha:/var/log/mha/manager.log for details.Started automated(non-interactive) failover. Invalidated master IP address on 10.0.20.201(10.0.20.201:3306) Selected 10.0.20.202(10.0.20.202:3306) as a new master. 10.0.20.202(10.0.20.202:3306): OK: Applying all logs succeeded. 10.0.20.202(10.0.20.202:3306): OK: Activated master IP address. 10.0.20.204(10.0.20.204:3306): OK: Slave started, replicating from 10.0.20.202(10.0.20.202:3306) 10.0.20.203(10.0.20.203:3306): OK: Slave started, replicating from 10.0.20.202(10.0.20.202:3306) 10.0.20.202(10.0.20.202:3306): Resetting slave info succeeded. Master failover to 10.0.20.202(10.0.20.202:3306) completed successfully. Fri Jun 14 10:01:10 2019 - [info] Sending mail..% Total % Received % Xferd Average Speed Time Time Time CurrentDload Upload Total Spent Left Speed 100 347 100 45 100 302 133 897 --:--:-- --:--:-- --:--:-- 898由上面的日志以及各節點狀態看出,vip已經自動漂移到node02的服務器上,并且node02自動提升為主庫,node03 和 node04 自動同步node02的庫。
同時也收到了微信和郵件告警。
自動切換步驟
從上面的輸出可以看出整個 MHA 的切換過程,共包括以下的步驟:
修復后重新加入集群
切換完成后,關注如下變化:
模擬宕機的時候,停止了MySQL進程,現在重新啟動MySQL,并加入到Node02 的從庫中
node02 操作
[root@node02 ~]# mysql -uroot -p123456 -e 'show master status\G' mysql: [Warning] Using a password on the command line interface can be insecure. *************************** 1. row ***************************File: mysql-bin.000002Position: 194Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 6211616e-8db3-11e9-be15-005056990727:4-5node01 操作
mysql> change master to master_host='10.0.20.202',master_user='repl',master_password='123456',master_log_file='mysql-bin.000002',master_log_pos=194; Query OK, 0 rows affected, 2 warnings (0.00 sec)mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> exit Bye [root@node01 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure.Master_Host: 10.0.20.202Slave_IO_Running: YesSlave_SQL_Running: YesSlave_SQL_Running_State: Slave has read all relay log; waiting for more updatesmanager 操作
需要注意的是,當發生宕機切換后,manager中的MHA進程會自動停止,在修復后,需要手動再次啟動
當發生宕機切換,MHA會自動把宕機的信息從app1.cnf配置文件中刪除,修復后機器,要把信息重新寫入到app1.cnf中。
修改前
[root@manager mha]# pwd /etc/mha [root@manager mha]# cat app1.cnf [server2] candidate_master=1 check_repl_delay=0 hostname=10.0.20.202 port=3306[server3] hostname=10.0.20.203 port=3306[server4] hostname=10.0.20.204 port=3306修改后
[root@manager mha]# cat app1.cnf [server1] candidate_master=1 check_repl_delay=0 hostname=10.0.20.201[server2] hostname=10.0.20.202 port=3306[server3] hostname=10.0.20.203 port=3306[server4] hostname=10.0.20.204 port=3306重新啟動MHA
修改好配置文件后,再次啟動MHA即可
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &此時修復完成。
在線進行切換
在許多情況下, 需要將現有的主服務器遷移到另外一臺服務器上。 比如主服務器硬件故障,RAID 控制卡需要重建,將主服務器移到性能更好的服務器上等等。維護主服務器引起性能下降, 導致停機時間至少無法寫入數據。 另外, 阻塞或殺掉當前運行的會話會導致主主之間數據不一致的問題發生。 MHA 提供快速切換和優雅的阻塞寫入,這個切換過程只需要 0.5-2s 的時間,這段時間內數據是無法寫入的。在很多情況下,0.5-2s 的阻塞寫入是可以接受的。因此切換主服務器不需要計劃分配維護時間窗口。
MHA在線切換的大概過程:
注意,在線切換的時候應用架構需要考慮以下兩個問題:
為了保證數據完全一致性,在最快的時間內完成切換,MHA的在線切換必須滿足以下條件才會切換成功,否則會切換失敗。
停止MHA 的manager 監控
[root@manager mha]# masterha_stop --conf=/etc/mha/app1.cnf Stopped app1 successfully. [1]+ Exit 1 nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1執行切換命令
進行在線切換操作
模擬在線切換主庫操作,原主庫10.0.20.202變為slave,10.0.20.201提升為新的主庫
上一次進行了模擬宕機測試,最開始的主庫是201,切換到了202位主庫了
[root@manager mha]# masterha_master_switch --conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=10.0.20.201 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0執行后輸出的日志如下:
Fri Jun 14 11:30:26 2019 - [info] MHA::MasterRotate version 0.58. Fri Jun 14 11:30:26 2019 - [info] Starting online master switch.. Fri Jun 14 11:30:26 2019 - [info] Fri Jun 14 11:30:26 2019 - [info] * Phase 1: Configuration Check Phase.. Fri Jun 14 11:30:26 2019 - [info] Fri Jun 14 11:30:26 2019 - [info] Reading default configuration from /etc/masterha_default.cnf.. Fri Jun 14 11:30:26 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Fri Jun 14 11:30:26 2019 - [info] Reading server configuration from /etc/mha/app1.cnf.. Fri Jun 14 11:30:27 2019 - [info] GTID failover mode = 1 Fri Jun 14 11:30:27 2019 - [info] Current Alive Master: 10.0.20.202(10.0.20.202:3306) Fri Jun 14 11:30:27 2019 - [info] Alive Slaves: Fri Jun 14 11:30:27 2019 - [info] 10.0.20.201(10.0.20.201:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 11:30:27 2019 - [info] GTID ON Fri Jun 14 11:30:27 2019 - [info] Replicating from 10.0.20.202(10.0.20.202:3306) Fri Jun 14 11:30:27 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 11:30:27 2019 - [info] 10.0.20.203(10.0.20.203:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 11:30:27 2019 - [info] GTID ON Fri Jun 14 11:30:27 2019 - [info] Replicating from 10.0.20.202(10.0.20.202:3306) Fri Jun 14 11:30:27 2019 - [info] 10.0.20.204(10.0.20.204:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 11:30:27 2019 - [info] GTID ON Fri Jun 14 11:30:27 2019 - [info] Replicating from 10.0.20.202(10.0.20.202:3306) Fri Jun 14 11:30:27 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Checking MHA is not monitoring or doing failover.. Fri Jun 14 11:30:27 2019 - [info] Checking replication health on 10.0.20.201.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Checking replication health on 10.0.20.203.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Checking replication health on 10.0.20.204.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] 10.0.20.201 can be new master. Fri Jun 14 11:30:27 2019 - [info] From: 10.0.20.202(10.0.20.202:3306) (current master)+--10.0.20.201(10.0.20.201:3306)+--10.0.20.203(10.0.20.203:3306)+--10.0.20.204(10.0.20.204:3306)To: 10.0.20.201(10.0.20.201:3306) (new master)+--10.0.20.203(10.0.20.203:3306)+--10.0.20.204(10.0.20.204:3306)+--10.0.20.202(10.0.20.202:3306) Fri Jun 14 11:30:27 2019 - [info] Checking whether 10.0.20.201(10.0.20.201:3306) is ok for the new master.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] 10.0.20.202(10.0.20.202:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Fri Jun 14 11:30:27 2019 - [info] 10.0.20.202(10.0.20.202:3306): Resetting slave pointing to the dummy host. Fri Jun 14 11:30:27 2019 - [info] ** Phase 1: Configuration Check Phase completed. Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] * Phase 2: Rejecting updates Phase.. Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] Executing master ip online change script to disable write on the current master: Fri Jun 14 11:30:27 2019 - [info] /etc/mha/scripts/master_ip_online_change --command=stop --orig_master_host=10.0.20.202 --orig_master_ip=10.0.20.202 --orig_master_port=3306 --orig_master_user='root' --new_master_host=10.0.20.201 --new_master_ip=10.0.20.201 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx****************************Disabled thi VIP - 10.0.20.199/24 on old master: 10.0.20.202 Disabled the VIP successfully ***************************Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Fri Jun 14 11:30:27 2019 - [info] Executing FLUSH TABLES WITH READ LOCK.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Orig master binlog:pos is mysql-bin.000002:194. Fri Jun 14 11:30:27 2019 - [info] Waiting to execute all relay logs on 10.0.20.201(10.0.20.201:3306).. Fri Jun 14 11:30:27 2019 - [info] master_pos_wait(mysql-bin.000002:194) completed on 10.0.20.201(10.0.20.201:3306). Executed 0 events. Fri Jun 14 11:30:27 2019 - [info] done. Fri Jun 14 11:30:27 2019 - [info] Getting new master's binlog name and position.. Fri Jun 14 11:30:27 2019 - [info] mysql-bin.000005:194 Fri Jun 14 11:30:27 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.20.201', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Fri Jun 14 11:30:27 2019 - [info] Executing master ip online change script to allow write on the new master: Fri Jun 14 11:30:27 2019 - [info] /etc/mha/scripts/master_ip_online_change --command=start --orig_master_host=10.0.20.202 --orig_master_ip=10.0.20.202 --orig_master_port=3306 --orig_master_user='root' --new_master_host=10.0.20.201 --new_master_ip=10.0.20.201 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx*************************Enabling the VIP - 10.0.20.199/24 on new master: 10.0.20.201 Enabled the VIP successfully ***************************Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] * Switching slaves in parallel.. Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] -- Slave switch on host 10.0.20.203(10.0.20.203:3306) started, pid: 7081 Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] -- Slave switch on host 10.0.20.204(10.0.20.204:3306) started, pid: 7082 Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:29 2019 - [info] Log messages from 10.0.20.203 ... Fri Jun 14 11:30:29 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] Waiting to execute all relay logs on 10.0.20.203(10.0.20.203:3306).. Fri Jun 14 11:30:27 2019 - [info] master_pos_wait(mysql-bin.000002:194) completed on 10.0.20.203(10.0.20.203:3306). Executed 0 events. Fri Jun 14 11:30:27 2019 - [info] done. Fri Jun 14 11:30:27 2019 - [info] Resetting slave 10.0.20.203(10.0.20.203:3306) and starting replication from the new master 10.0.20.201(10.0.20.201:3306).. Fri Jun 14 11:30:27 2019 - [info] Executed CHANGE MASTER. Fri Jun 14 11:30:28 2019 - [info] Slave started. Fri Jun 14 11:30:29 2019 - [info] End of log messages from 10.0.20.203 ... Fri Jun 14 11:30:29 2019 - [info] Fri Jun 14 11:30:29 2019 - [info] -- Slave switch on host 10.0.20.203(10.0.20.203:3306) succeeded. Fri Jun 14 11:30:29 2019 - [info] Log messages from 10.0.20.204 ... Fri Jun 14 11:30:29 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] Waiting to execute all relay logs on 10.0.20.204(10.0.20.204:3306).. Fri Jun 14 11:30:27 2019 - [info] master_pos_wait(mysql-bin.000002:194) completed on 10.0.20.204(10.0.20.204:3306). Executed 0 events. Fri Jun 14 11:30:27 2019 - [info] done. Fri Jun 14 11:30:27 2019 - [info] Resetting slave 10.0.20.204(10.0.20.204:3306) and starting replication from the new master 10.0.20.201(10.0.20.201:3306).. Fri Jun 14 11:30:27 2019 - [info] Executed CHANGE MASTER. Fri Jun 14 11:30:28 2019 - [info] Slave started. Fri Jun 14 11:30:29 2019 - [info] End of log messages from 10.0.20.204 ... Fri Jun 14 11:30:29 2019 - [info] Fri Jun 14 11:30:29 2019 - [info] -- Slave switch on host 10.0.20.204(10.0.20.204:3306) succeeded. Fri Jun 14 11:30:29 2019 - [info] Unlocking all tables on the orig master: Fri Jun 14 11:30:29 2019 - [info] Executing UNLOCK TABLES.. Fri Jun 14 11:30:29 2019 - [info] ok. Fri Jun 14 11:30:29 2019 - [info] Starting orig master as a new slave.. Fri Jun 14 11:30:29 2019 - [info] Resetting slave 10.0.20.202(10.0.20.202:3306) and starting replication from the new master 10.0.20.201(10.0.20.201:3306).. Fri Jun 14 11:30:29 2019 - [info] Executed CHANGE MASTER. Fri Jun 14 11:30:30 2019 - [info] Slave started. Fri Jun 14 11:30:30 2019 - [info] All new slave servers switched successfully. Fri Jun 14 11:30:30 2019 - [info] Fri Jun 14 11:30:30 2019 - [info] * Phase 5: New master cleanup phase.. Fri Jun 14 11:30:30 2019 - [info] Fri Jun 14 11:30:30 2019 - [info] 10.0.20.201: Resetting slave info succeeded. Fri Jun 14 11:30:30 2019 - [info] Switching master to 10.0.20.201(10.0.20.201:3306) completed successfully.查看狀態
node01
[root@node01 ~]# mysql -uroot -p123456 -e 'show slave status\G' mysql: [Warning] Using a password on the command line interface can be insecure. [root@node01 ~]# ip a | grep 20inet 10.0.20.201/24 brd 10.0.20.255 scope global bond0inet 10.0.20.199/24 brd 10.0.20.255 scope global secondary bond0:1node02
[root@node02 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure.Master_Host: 10.0.20.201Slave_IO_Running: YesSlave_SQL_Running: YesSlave_SQL_Running_State: Slave has read all relay log; waiting for more updates [root@node02 ~]# ip a | grep 20inet 10.0.20.202/24 brd 10.0.20.255 scope global bond0node03
[root@node03 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure.Master_Host: 10.0.20.201Slave_IO_Running: YesSlave_SQL_Running: YesSlave_SQL_Running_State: Slave has read all relay log; waiting for more updatesnode04
[root@node04 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure.Master_Host: 10.0.20.201Slave_IO_Running: YesSlave_SQL_Running: YesSlave_SQL_Running_State: Slave has read all relay log; waiting for more updates從上面各個數據庫的狀態可以看出來,主庫已經變成了node01了,并且vip也漂移到node01的機器上了。
轉載于:https://www.cnblogs.com/winstom/p/11022014.html
總結
以上是生活随笔為你收集整理的MySQL 高可用架构 之 MHA (Centos 7.5 MySQL 5.7.18 MHA 0.58)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: C#中几种单例模式
- 下一篇: PostgreSQL 数据库备份