當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

SQLServer镜像状态异常排查

發布時間：2025/1/21 数据库 96 豆豆

生活随笔收集整理的這篇文章主要介紹了 SQLServer镜像状态异常排查小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

title: SQLServer · CASE分析 · 鏡像狀態異常排查

author: 天銘

問題

用戶實例的某個DB一直處于Synchronizing無法達到SYNCHRONIZED狀態，用了很多修復方法并且進行了鏡像重搭，但依舊沒有達到正常同步態

排查

Synchronizing態通常和redo queue或者send queue有關

主庫的問題DB當前的TPS在3K左右

begin tran DECLARE @value int DECLARE @value2 int select @value=CONVERT(int,cntr_value) from sys.dm_os_performance_counters where object_name like '****:database%%'and instance_name='businesscard' and counter_name like 'Transactions/ sec%%' waitfor delay '00:00:01' select @value2=CONVERT(int,cntr_value) from sys.dm_os_performance_counters where object_name like '****:database%%'and instance_name='businesscard' and counter_name like 'Transactions/ sec%%' select @value2-@value commit tran

主庫的發送隊列

begin tran DECLARE @value int DECLARE @value2 int select @value=CONVERT(int,cntr_value) from sys.dm_os_performance_counters where counter_name = 'Log Send Queue KB' and instance_name='businesscard'; waitfor delay '00:00:01' select @value2=CONVERT(int,cntr_value) from sys.dm_os_performance_counters where counter_name = 'Log Send Queue KB' and instance_name='businesscard'; select @value*1./1024 as first_second_MB,@value2*1./1024 as second_second_MB,(@value2-@value)*1./1024 as diff_MB commit tran

備庫的應用隊列

begin tran DECLARE @value int DECLARE @value2 int select @value=CONVERT(int,cntr_value) from sys.dm_os_performance_counters where counter_name = 'Redo Queue KB' and instance_name='businesscard' waitfor delay '00:00:01' select @value2=CONVERT(int,cntr_value) from sys.dm_os_performance_counters where counter_name = 'Redo Queue KB' and instance_name='businesscard'; select @value*1./1024 as first_second_MB,@value2*1./1024 as second_second_MB,(@value2-@value)*1./1024 as diff_MB commit tran

備庫的應用速度大概在 14MB

begin tran DECLARE @value bigint DECLARE @value2 bigint select @value=CONVERT(bigint,cntr_value)*1./1024/1024 from sys.dm_os_performance_counters where counter_name = 'Redo Bytes/sec' and instance_name='businesscard'; waitfor delay '00:00:01' select @value2=CONVERT(bigint,cntr_value)*1./1024/1024 from sys.dm_os_performance_counters where counter_name = 'Redo Bytes/sec' and instance_name='businesscard'; select (@value2-@value) as speed_MB commit tran

假設忽略主庫新產生的日志，追上主庫需要的時間 32min
select 27338.278320/14/60 =32.545569428566
如果再算上主庫新產生的大概在40min左右

查看其它counter值

select cntr_value,* from sys.dm_os_performance_counters where counter_name in( 'Log Send Flow Control Time (ms)','Bytes Sent/sec','Log Bytes Sent/sec', 'Log Compressed Bytes Sent/sec','Log Harden Time (ms)','Log Send Flow Control Time (ms)', 'Log Send Queue KB','Mirrored Write Transactions/sec','Pages Sent/sec', 'Send/Receive Ack Time','Sends/sec','Transaction Delay' ) and instance_name='businesscard';

Send/Receive Ack Time：
Milliseconds that messages waited for acknowledgement from the partner, in the last second.
This counter is helpful in troubleshooting a problem that might be caused by a network bottleneck, such as unexplained failovers, a large send queue, or high transaction latency. In such cases, you can analyze the value of this counter to determine whether the network is causing the problem.

開始懷疑問題在網絡上，該主機某個網絡組建已經升級了，但是備機的組件未升級
等待確認的時間穩定在800ms左右，對比主備網絡組件都升級的主機在100ms左右，這個值跟事務大小有關，目前發現有問題的地方可能是這里

begin tran DECLARE @value bigint DECLARE @value2 bigint select @value=CONVERT(bigint,cntr_value) from sys.dm_os_performance_counters where counter_name = 'Send/Receive Ack Time' and instance_name='_Total'; waitfor delay '00:00:01' select @value2=CONVERT(bigint,cntr_value) from sys.dm_os_performance_counters where counter_name = 'Send/Receive Ack Time' and instance_name='_Total'; select (@value2-@value) as 'Send/Receive Ack Time' commit tran

另外查看errorlog，時有告警鏡像的狀態應該是介于suspended和SYNCHRONIZING之間，錯誤類似KB,2008r2 應該已經修復。

半小時后發送隊列又變大了

備庫的應用隊列減小的也非常慢

考慮單點時間過久，只能重搭解決，但重搭后追日志還是出現此問題，懷疑可能有壞頁或者其它未知情況，計劃在維護時間做checkdb。

--- ---
checkdb發現沒有壞頁，說明數據是ok的，那么問題可能在日志了。

select log_reuse_wait,log_reuse_wait_desc,* from sys.databases where name='businesscard'

log_reuse_wait_desc 為 ACTIVE_TRANSACTION

ACTIVE_TRANSACTION：事務處于活動狀態。
一個長時間運行的事務可能存在于日志備份的開頭。在這種情況下，可能需要進行另一個日志備份才能釋放空間。
事務被延遲（僅適用于 SQL Server 2005 Enterprise Edition 及更高版本）?！把舆t的事務”是有效的活動事務，因為某些資源不可用，其回滾受阻。

dbcc opentran('businesscard')

15號的事務今天19號，導致中間的日志全是活動日志無法截斷，跟用戶確認kill掉后，再次查看活動日志

dbcc opentran('businesscard')

發現結果一直在變化，可以理解為活動日志一直在往前走（越來越少），再次備份后活動日志恢復到7G左右，重搭成功。

事實上這是很早以前處理的一個CASE，當時排查了很久才找到root cause但現在看來問題其實很簡單，希望大家下次遇到可以很快處理好

總結

以上是生活随笔為你收集整理的SQLServer镜像状态异常排查的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Linux关闭selinux
下一篇：【mysql】[error]group_