mysql启用组提交变量_MySQL的COMMIT_ORDER模式下组提交分组实现与BUG案例源码剖析...
背景
自MySQL 5.7以來,組提交大面積應用,已經不斷地得到優(yōu)化。但網上有關組提交的實現機制,卻還不夠詳細。故障多的時候,往往會發(fā)生一些模棱兩可的揣測和猜疑。因此,筆者有了從自己的角度,去分析組提交實現的動機。
源碼分析
以“l(fā)ast_committed”為入口,搜索MySQL 5.7.24源碼,很快可以定位到關鍵類Transaction_dependency_tracker。梳理一下該類的虛函數實現以及調用位置,基本就理解了大部分實現。如下:
以此類為基礎,再梳理調用過程,結合前面文章《MySQL的after_sync與after_commit性能之爭源碼剖析與情景測試》的分析,基本就可以梳理出具體的實現細節(jié)。基本實現
通過類Commit_order_trx_dependency_tracker的兩個成員變量來基本實現:m_max_committed_transaction,保存已提交的最大事務號;m_transaction_counter,保存已prepare的最大事務號。獲取last_committed與sequence_number值
如上圖所示,實現該功能的是get_dependency函數,把last_committed與sequence_number的絕對值轉換成相對值。而這個轉換恰巧是在Binlog的flush階段MYSQL_BIN_LOG::write_gtid函數中,生成GTID事件之前。更新m_max_committed_transaction值
update_max_committed函數的調用發(fā)生在commit隊列處理函數MYSQL_BIN_LOG::process_commit_stage_queue中,遍歷每個線程提交最開始的時候。線程獲取m_max_committed_transaction值
對于非事務型語句(如DDL)來說,在binlog_prepare中獲取last_committed絕對值。而對于事務型語句來說,則在提交的函數MYSQL_BIN_LOG::commit中,執(zhí)行ordered_commit函數之前獲取last_committed絕對值。sequence_number值累加
step()函數的調用也是在Binlog的flush階段,binlog_cache_data::flush函數中,不過是在MYSQL_BIN_LOG::write_gtid函數之前,即把絕對值轉換成相對值之前。具體實現為,sequence_number每次加1。
梳理出來的流程如下所示:
案例分析
初看時,想當然的會以為,兩個DDL進入了同一組導致了死鎖。然后就會著重分析為什么在主庫被分配了同一組,而在從庫卻又沖突了。從上圖的分析可以了解,獲取last_committed值是在執(zhí)行ordered_commit函數之前。這個時候,線程所有持有的MDL鎖并未釋放,所以兩者存在沖突,根本就不可能同時進入提交。但仔細回頭一看,就會發(fā)現不對,兩個語句根本就不是DDL,而是DCL呀!為了了解DCL中關于MDL鎖的釋放位置,就在測試環(huán)境抓取分別抓取它們的debug日志來分析。flush privileges
...
// 第一部分3343 T@5: | | | | | >MDL_context::release_transactional_locks
3344 T@5: | | | | | | >MDL_context::release_locks_stored_before
3345 T@5: | | | | | | <:release_locks_stored_before>
3346 T@5: | | | | | | >MDL_context::release_locks_stored_before
3347 T@5: | | | | | | | info: found lock to release ticket=0x7f3c64012410
3348 T@5: | | | | | | | >MDL_context::release_lock
3349 T@5: | | | | | | | | enter: db=mysql name=proxies_priv
3350 T@5: | | | | | | | <:release_lock>
3351 T@5: | | | | | | | info: found lock to release ticket=0x7f3c64012320
3352 T@5: | | | | | | | >MDL_context::release_lock
3353 T@5: | | | | | | | | enter: db=mysql name=db
3354 T@5: | | | | | | | <:release_lock>
3355 T@5: | | | | | | | info: found lock to release ticket=0x7f3c6400e860
3356 T@5: | | | | | | | >MDL_context::release_lock
3357 T@5: | | | | | | | | enter: db=mysql name=user
3358 T@5: | | | | | | | <:release_lock>
3359 T@5: | | | | | | <:release_locks_stored_before>
3360 T@5: | | | | | <:release_transactional_locks>
3361 T@5: | | | |
...
// 第二部分5073 T@5: | | | | | >MDL_context::release_transactional_locks
5074 T@5: | | | | | | >MDL_context::release_locks_stored_before
5075 T@5: | | | | | | <:release_locks_stored_before>
5076 T@5: | | | | | | >MDL_context::release_locks_stored_before
5077 T@5: | | | | | | | info: found lock to release ticket=0x7f3c640308e0
5078 T@5: | | | | | | | >MDL_context::release_lock
5079 T@5: | | | | | | | | enter: db=mysql name=procs_priv
5080 T@5: | | | | | | | <:release_lock>
5081 T@5: | | | | | | | info: found lock to release ticket=0x7f3c640273e0
5082 T@5: | | | | | | | >MDL_context::release_lock
5083 T@5: | | | | | | | | enter: db=mysql name=columns_priv
5084 T@5: | | | | | | | <:release_lock>
5085 T@5: | | | | | | | info: found lock to release ticket=0x7f3c6400e860
5086 T@5: | | | | | | | >MDL_context::release_lock
5087 T@5: | | | | | | | | enter: db=mysql name=tables_priv
5088 T@5: | | | | | | | <:release_lock>
5089 T@5: | | | | | | <:release_locks_stored_before>
5090 T@5: | | | | | <:release_transactional_locks>
5091 T@5: | | | |
...
// 第三部分5641 T@5: | | | | | >restore_backup_open_tables_state
5642 T@5: | | | | | | >MDL_context::rollback_to_savepoint
5643 T@5: | | | | | | | >MDL_context::release_locks_stored_before
5644 T@5: | | | | | | | <:release_locks_stored_before>
5645 T@5: | | | | | | | >MDL_context::release_locks_stored_before
5646 T@5: | | | | | | | | info: found lock to release ticket=0x7f3c6400e860
5647 T@5: | | | | | | | | >MDL_context::release_lock
5648 T@5: | | | | | | | | | enter: db=mysql name=servers
5649 T@5: | | | | | | | | <:release_lock>
5650 T@5: | | | | | | | <:release_locks_stored_before>
5651 T@5: | | | | | | <:rollback_to_savepoint>
5652 T@5: | | | | |
5653 T@5: | | | | | >my_hash_free
5654 T@5: | | | | | | enter: hash: 0x7f3c64002eb0
5655 T@5: | | | | |
5656 T@5: | | | | | info: unlocking servers_cache
5657 T@5: | | | |
...
// 第四部分5764 T@5: | | | | >trans_commit_stmt
5765 T@5: | | | | | debug: add_unsafe_rollback_flags: 0
5766 T@5: | | | | | >MYSQL_BIN_LOG::commit
5767 T@5: | | | | | | info: query='flush privileges'
...
以上分別截取了三部分:第一部分,對mysql.proxies_priv、mysql.db以及mysql.user釋放MDL鎖;第二部分,對mysql.procs_priv、mysql.columns_priv以及mysql.tables_priv釋放MDL鎖;第三部分,對mysql.servers釋放MDL鎖;第四部分,提交。由此,可以發(fā)現,MDL鎖在進入提交之前就已經釋放了。為了進一步確認,于是對函數MDL_context::acquire_lock設置斷點,調試結果如下:
因此,可以發(fā)現,獲取的都是SR鎖。grant privileges
...
// 第一部分2477 T@4: | | | | | >trans_commit_stmt
2478 T@4: | | | | | | debug: add_unsafe_rollback_flags: 0
2479 T@4: | | | | | | >MYSQL_BIN_LOG::commit
2480 T@4: | | | | | | | info: query='grant process on *.* to 'slave'@'192.168.10.%''
...
2724 T@4: | | | | | | | | | >THD::enter_cond
2725 T@4: | | | | | | | | | | THD::enter_stage: 'Waiting for semi-sync ACK from slave' /data/mysql-5.7.24/plugin/semisync/semisync_master.cc:735
2726 T@4: | | | | | | | | | | >PROFILING::status_change
2727 T@4: | | | | | | | | | | <:status_change>
2728 T@4: | | | | | | | | | <:enter_cond>
...
2816 T@4: | | | | |
2817 T@4: | | | | | >trans_commit_implicit
...
2843 T@4: | | | | |
// 第二部分2950 T@4: | | | | | >MDL_context::release_transactional_locks
2951 T@4: | | | | | | >MDL_context::release_locks_stored_before
2952 T@4: | | | | | | | info: found lock to release ticket=0x7f3c7000e9b0
2953 T@4: | | | | | | | >MDL_context::release_lock
2954 T@4: | | | | | | | | enter: db= name=
2955 T@4: | | | | | | | <:release_lock>
2956 T@4: | | | | | | <:release_locks_stored_before>
2957 T@4: | | | | | | >MDL_context::release_locks_stored_before
2958 T@4: | | | | | | | info: found lock to release ticket=0x7f3c7000e760
2959 T@4: | | | | | | | >MDL_context::release_lock
2960 T@4: | | | | | | | | enter: db=mysql name=db
2961 T@4: | | | | | | | <:release_lock>
2962 T@4: | | | | | | | info: found lock to release ticket=0x7f3c7000ea10
2963 T@4: | | | | | | | >MDL_context::release_lock
2964 T@4: | | | | | | | | enter: db=mysql name=user
2965 T@4: | | | | | | | <:release_lock>
2966 T@4: | | | | | | <:release_locks_stored_before>
2967 T@4: | | | | | <:release_transactional_locks>
...
以上分別截取了兩部分:第一部分執(zhí)行提交;第二部分,對mysql.db和mysql.user釋放鎖。依照上面的方式,同樣進行調試,結果如下:
此時,可以發(fā)現,上述debug日志中的第一個鎖為IX類型的GLOBAL鎖,其它兩個都是SW鎖。死鎖分析
從以上的分析,基本可以認定flush privileges和grant privileges的執(zhí)行過程有很大的不同,如果SW鎖和SR鎖不兼容的話,基本可以認定,前者先執(zhí)行,后者可以并行,反之則不行。為此,進行一下測試(讓語句處理ACK等待狀態(tài)),先flush privileges再grant privileges,
mysql> show processlist;
+-----+------+-----------+------+---------+------+--------------------------------------+------------------------------------------------+| Id | User | Host | db | Command | Time | State | Info |
+-----+------+-----------+------+---------+------+--------------------------------------+------------------------------------------------+| 196 | root | localhost | NULL | Query | 2449 | Waiting for semi-sync ACK from slave | flush privileges |
| 197 | root | localhost | NULL | Query | 2430 | checking permissions | grant process on *.* to 'slave'@'192.168.10.%' |
| 198 | root | localhost | NULL | Query | 0 | starting | show processlist |
+-----+------+-----------+------+---------+------+--------------------------------------+------------------------------------------------+3 rows in set (0.00 sec)
線程197的堆棧信息如下:
反之,先執(zhí)行grant privileges再flush privileges,
mysql> show processlist;
+-----+-------+--------------------+------+------------------+------+---------------------------------------------------------------+------------------------------------------------+| Id | User | Host | db | Command | Time | State | Info |
+-----+-------+--------------------+------+------------------+------+---------------------------------------------------------------+------------------------------------------------+| 196 | root | localhost | NULL | Query | 12 | Waiting for table level lock | flush privileges |
| 197 | root | localhost | NULL | Query | 23 | Waiting for semi-sync ACK from slave | grant process on *.* to 'slave'@'192.168.10.%' |
| 198 | root | localhost | NULL | Query | 0 | starting | show processlist |
| 199 | slave | 192.168.10.4:39721 | NULL | Binlog Dump GTID | 133 | Master has sent all binlog to slave; waiting for more updates | NULL |
+-----+-------+--------------------+------+------------------+------+---------------------------------------------------------------+------------------------------------------------+4 rows in set (0.00 sec)
線程196的堆棧信息如下:
通過以上的堆棧信息,可以驚訝地發(fā)現,線程196已經進入了mysql_lock_tables函數,即已經拿到了MDL鎖,足以說明SR鎖和SW鎖是兼容的。通過上面的函數調用,不難發(fā)現,它們正在等待的,是表鎖【InnoDB實現的表鎖,存儲引擎層只定義了表鎖實現接口】。通過對比debug日志,不難發(fā)現,該鎖的釋放時間與MDL鎖是緊貼在MDL鎖前面的,因而以上的結論是成立的,只是對象由MDL鎖變成了InnoDB表鎖。故障復現
主庫設置:注意:binlog_group_commit_sync_delay最大值為100萬微妙,即1s。
mysql> show variables like '%group_commit%';
+-----------------------------------------+---------+| Variable_name | Value |
+-----------------------------------------+---------+| binlog_group_commit_sync_delay | 1000000 |
| binlog_group_commit_sync_no_delay_count | 1000 |
+-----------------------------------------+---------+2 rows in set (0.01 sec)
執(zhí)行腳本:在主庫先執(zhí)行flush privileges再執(zhí)行grant privileges以保證進入同一組;然后線程1睡眠0.5s以保證flush privileges先獲取sequence_number值,即在從庫先提交;最后,在從庫讓先執(zhí)行的線程后提交。
import threading
import time
import os
def thread_1():
os.system('date +%M:%S')
time.sleep(0.5)
os.system("/usr/local/mysql/bin/mysql -uroot -ps3cret -e\"grant process on *.* to 'slave'@'192.168.10.%';\"")
os.system('date +%M:%S')
def thread_2():
os.system('date +%M:%S')
os.system("/usr/local/mysql/bin/mysql -uroot -ps3cret -e 'flush privileges'")
os.system('date +%M:%S')
t1 = threading.Thread(target=thread_2)
t1.start()
t2 = threading.Thread(target=thread_1)
t2.start()
從庫現象:結論
flush privileges設計的提交前就釋放鎖的機制導致了死鎖,依次類推,針對mysql.user表的DML語句與flush privileges配合,也有可能產生死鎖。
總結
組提交的實現依賴于兩階段提交,因而對于非事務性語句來說,鎖的釋放可能會出現在事務提交前,因而可能會出現本來沖突的兩個語句獲得同一個last_committed,即被分配到了同一組,進而導致了并行復制死鎖。
總結
以上是生活随笔為你收集整理的mysql启用组提交变量_MySQL的COMMIT_ORDER模式下组提交分组实现与BUG案例源码剖析...的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: c mysql安装教程 pdf_MyS
- 下一篇: mysql got signal 11_