日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【Oracle RAC故障分析与处理】

發布時間:2024/1/17 编程问答 40 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【Oracle RAC故障分析与处理】 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

原文地址:【Oracle?RAC故障分析與處理】作者:蟻巡運維平臺

?RAC環境

RAC架構,2節點信息

節點1

SQL> show parameter instance

NAME?????????????????????????????????TYPE????????VALUE

------------------------------------ ----------- -----------------------------------------------

active_instance_count????????????????????integer

cluster_database_instances????????????????integer?????2

instance_groups?????????????????????????string

instance_name??????????????????????????string??????RACDB1

instance_number????????????????????????Integer?????1

instance_type???????????????????????????string??????RDBMS

open_links_per_instance??????????????????integer?????4

parallel_instance_group???????????????????string

parallel_server_instances??????????????????integer?????2

節點2

SQL> show parameter instance

NAME?????????????????????????????????TYPE????????VALUE

------------------------------------ ----------- ------------------------------------------

active_instance_count????????????????????integer

cluster_database_instances????????????????integer?????2

instance_groups?????????????????????????string

instance_name??????????????????????????string??????RACDB2

instance_number????????????????????????integer?????2

instance_type???????????????????????????string??????RDBMS

open_links_per_instance??????????????????integer?????4

parallel_instance_group???????????????????string

parallel_server_instances??????????????????integer?????2

數據庫版本

SQL> select * from v$version;

BANNER

----------------------------------------------------------------

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod

PL/SQL Release 10.2.0.1.0 - Production

CORE????10.2.0.1.0??????Production

TNS for Linux: Version 10.2.0.1.0 - Production

NLSRTL Version 10.2.0.1.0 - Production

操作系統信息

節點1

[oracle@rac1 ~]$ uname -a

Linux rac1 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007 i686 i686 i386 GNU/Linux

節點2

[oracle@rac2 ~]$ uname -a

Linux rac2 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007 i686 i686 i386 GNU/Linux

RAC所有資源信息

[oracle@rac2 ~]$ crs_stat -t

Name???????????Type????????????Target?????State??????Host????????

----------------------------------------------------------------------------------------------

ora....B1.inst????application????????ONLINE????ONLINE????rac1????????

ora....B2.inst????application????????ONLINE????ONLINE????rac2????????

ora....DB1.srv???application????????ONLINE????ONLINE????rac2????????

ora.....TAF.cs????application????????ONLINE????ONLINE????rac2????????

ora.RACDB.db??application?????????ONLINE????ONLINE????rac2????????

ora....SM1.asm??application????????ONLINE????ONLINE????rac1????????

ora....C1.lsnr????application????????ONLINE????ONLINE????rac1????????

ora.rac1.gsd????application????????ONLINE????ONLINE????rac1????????

ora.rac1.ons????application????????ONLINE????ONLINE????rac1????????

ora.rac1.vip????application????????ONLINE????ONLINE????rac1????????

ora....SM2.asm??application????????ONLINE????ONLINE????rac2????????

ora....C2.lsnr????application???????ONLINE????ONLINE????rac2????????

ora.rac2.gsd????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.ons????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.vip????application?????????ONLINE????ONLINE?????rac2


?模擬兩個節點內聯網不通,觀察RAC會出現什么現象?給出故障定位的整個過程

本小題會模擬RAC的私有網絡不通現象,然后定位故障原因,最后排除故障。

1.首先RAC是一個非常健康的狀態

[oracle@rac2 ~]$ crs_stat -t

Name???????????Type????????????Target?????State??????Host????????

----------------------------------------------------------------------------------------------

ora....B1.inst????application????????ONLINE????ONLINE????rac1????????

ora....B2.inst????application????????ONLINE????ONLINE????rac2????????

ora....DB1.srv???application????????ONLINE????ONLINE????rac2????????

ora.....TAF.cs????application????????ONLINE????ONLINE????rac2????????

ora.RACDB.db??application?????????ONLINE????ONLINE????rac2????????

ora....SM1.asm??application????????ONLINE????ONLINE????rac1????????

ora....C1.lsnr????application????????ONLINE????ONLINE????rac1????????

ora.rac1.gsd????application????????ONLINE????ONLINE????rac1????????

ora.rac1.ons????application????????ONLINE????ONLINE????rac1????????

ora.rac1.vip????application????????ONLINE????ONLINE????rac1????????

ora....SM2.asm??application????????ONLINE????ONLINE????rac2????????

ora....C2.lsnr????application???????ONLINE????ONLINE????rac2????????

ora.rac2.gsd????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.ons????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.vip????application?????????ONLINE????ONLINE?????rac2??

檢查CRS進程狀態(CRS??CSS??EVM

[oracle@rac2 ~]$ crsctl check crs

CSS appears healthy

CRS appears healthy

EVM appears healthy

檢查OCR磁盤狀態,沒有問題

[oracle@rac2 ~]$ ocrcheck

Status of Oracle Cluster Registry is as follows :

?????????Version??????????????????:??????????2

?????????Total space (kbytes)?????:?????104344

?????????Used space (kbytes)??????:???????4344

?????????Available space (kbytes) :?????100000

?????????ID???????????????????????: 1752469369

?????????Device/File Name?????????: /dev/raw/raw1

????????????????????????????????????Device/File integrity check succeeded

????????????????????????????????????Device/File not configured

?????????Cluster registry integrity check succeeded

檢查vote disk狀態

[oracle@rac2 ~]$ crsctl query css votedisk

0.?????0????/dev/raw/raw2??????????????????????顯示2號裸設備為表決磁盤

located 1 votedisk(s).??????????????????????????????只定位1個表決磁盤

2.手工禁用一個私有網卡

[oracle@rac2 ~]$ cat /etc/hosts

127.0.0.1???????localhost.localdomain???localhost

::1?????localhost6.localdomain6 localhost6

##Public Network - (eth0)

##Private Interconnect - (eth1)

##Public Virtual IP (VIP) addresses - (eth0)

192.168.1.101???rac1????????????????????????這是RAC的共有網卡

192.168.1.102???rac2

192.168.2.101???rac1-priv????????????????????這是RAC的私有網卡

192.168.2.102???rac2-priv

192.168.1.201???rac1-vip?????????????????????這是RAC虛擬網卡

192.168.1.202???rac2-vip

看一下IP地址和網卡的對應關系

[oracle@rac2 ~]$ ifconfig

eth0??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:87??

??????????inet addr:192.168.1.102??Bcast:192.168.1.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f187/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:360 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:593 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:46046 (44.9 KiB)??TX bytes:62812 (61.3 KiB)

??????????Interrupt:185 Base address:0x14a4

eth0:1????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:87??

??????????inet addr:192.168.1.202??Bcast:192.168.1.255??Mask:255.255.255.0

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????Interrupt:185 Base address:0x14a4

eth1??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:91??

??????????inet addr:192.168.2.102??Bcast:192.168.2.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f191/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:76588 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:58002 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:65185420 (62.1 MiB)??TX bytes:37988820 (36.2 MiB)

??????????Interrupt:193 Base address:0x1824

eth2??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:9B??

??????????inet addr:192.168.203.129??Bcast:192.168.203.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f19b/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:339 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:83 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:42206 (41.2 KiB)??TX bytes:10199 (9.9 KiB)

??????????Interrupt:169 Base address:0x18a4

lo????????Link encap:Local Loopback??

??????????inet addr:127.0.0.1??Mask:255.0.0.0

??????????inet6 addr: ::1/128 Scope:Host

??????????UP LOOPBACK RUNNING??MTU:16436??Metric:1

??????????RX packets:99403 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:99403 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:0

??????????RX bytes:18134658 (17.2 MiB)??TX bytes:18134658 (17.2 MiB)

eth 0?對應RAC的共有網卡

eth 1?對應RAC的私有網卡

eth0:1對應RAC的虛擬網卡

我們現在禁止eth1私有網卡來完成內聯網網絡不通現象,方法很簡單

ifdown eth1?????????????????????????????禁用網卡

ifup???eth1?????????????????????????????激活網卡

[oracle@rac2 ~]$ su – root?????????????????需要使用root用戶哦,否則提示Users cannot control this device.

Password:

[root@rac2 ~]# ifdown eth1???????????????

我從17:18:51敲入這個命令,4分鐘之后節點2重啟,大家知道發生了什么現象嘛?

Good?這就是傳說中RAC腦裂brain split問題,當節點間的內聯網不通時,無法信息共享,就會出現腦裂現象,RAC必須驅逐其中一部分節點來保護數據的一致性,被驅逐的節點被強制重啟,這不節點2自動重啟了么。又說回來,那為什么節點2重啟,其他節點不重啟呢。

這里有個驅逐原則:(1)子集群中少節點的被驅逐

?????????????????2)節點號大的被驅逐

?????????????????3)負載高的節點被驅逐

我們中的就是第二條,OK,節點2重啟來了,我們登陸系統,輸出用戶名/密碼

3.定位故障原因

1)查看操作系統日志

[oracle@rac2 ~]$ su - root

Password:

[root@rac2 ~]# tail -30f /var/log/messages

我又重新模擬了一遍,由于信息量很大,我從里面找出與網絡有關的告警信息

Jul 17 20:05:25 rac2 avahi-daemon[3659]: Withdrawing address record for 192.168.2.102 on eth1.

收回eth1網卡的ip地址,導致節點1驅逐節點2,節點2自動重啟

Jul 17 20:05:25 rac2 avahi-daemon[3659]: Leaving mDNS multicast group on interface eth1.IPv4 with address 192.168.2.102.

網卡eth1脫離多組播組

Jul 17 20:05:25 rac2 avahi-daemon[3659]: iface.c: interface_mdns_mcast_join() called but no local address available.

Jul 17 20:05:25 rac2 avahi-daemon[3659]: Interface eth1.IPv4 no longer relevant for mDNS.

網卡eth1不在與mDNS有關

Jul 17 20:09:54 rac2 logger: Oracle Cluster Ready Services starting up automatically.

Oracle集群自動啟動

Jul 17 20:09:59 rac2 avahi-daemon[3664]: Registering new address record for fe80::20c:29ff:fe8f:f191 on eth1.

Jul 17 20:09:59 rac2 avahi-daemon[3664]: Registering new address record for 192.168.2.102 on eth1.

注冊新ip地址

Jul 17 20:10:17 rac2 logger: Cluster Ready Services completed waiting on dependencies.

CRS完成等待依賴關系

從上面信息我們大體知道,是因為eth1網卡的問題導致節點2重啟的,為了進一步分析問題我們還需要看一下CRS排錯日志

[root@rac2 crsd]# tail -100f $ORA_CRS_HOME/log/rac2/crsd/crsd.log

Abnormal termination by CSS, ret = 8

異常終止CSS

2013-07-17 20:11:18.115: [ default][1244944]0CRS Daemon Starting

2013-07-17 20:11:18.116: [ CRSMAIN][1244944]0Checking the OCR device

2013-07-17 20:11:18.303: [ CRSMAIN][1244944]0Connecting to the CSS Daemon

重啟CRS??CSS進程

[root@rac2 cssd]# pwd

/u01/crs1020/log/rac2/cssd

[root@rac2 cssd]# more ocssd.log???????查看cssd進程日志

[CSSD]2013-07-17 17:26:18.319 [86104976] >TRACE:???clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))

這里可以看到rac2節點的cssd進程監聽出了問題

[CSSD]2013-07-17 17:26:19.296 [75615120] >TRACE:???clssnmHandleSync: Acknowledging sync: src[1] srcName[rac1] seq[13] sync[12]

請確認兩個節點的同步問題

從以上一系列信息可以分析出這是內聯網通信問題,由于兩個節點的信息無法同步導致信息無法共享從而引起腦裂現象

4.節點2重啟自動恢復正常狀態

[root@rac2 cssd]# ifconfig

eth0??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:87??

??????????inet addr:192.168.1.102??Bcast:192.168.1.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f187/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:567 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:901 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:65402 (63.8 KiB)??TX bytes:96107 (93.8 KiB)

??????????Interrupt:185 Base address:0x14a4

eth0:1????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:87??

??????????inet addr:192.168.1.202??Bcast:192.168.1.255??Mask:255.255.255.0

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????Interrupt:185 Base address:0x14a4

eth1??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:91??

??????????inet addr:192.168.2.102??Bcast:192.168.2.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f191/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:76659 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:51882 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:61625763 (58.7 MiB)??TX bytes:26779167 (25.5 MiB)

??????????Interrupt:193 Base address:0x1824

eth2??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:9B??

??????????inet addr:192.168.203.129??Bcast:192.168.203.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f19b/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:409 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:58 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:45226 (44.1 KiB)??TX bytes:9567 (9.3 KiB)

??????????Interrupt:169 Base address:0x18a4

lo????????Link encap:Local Loopback??

??????????inet addr:127.0.0.1??Mask:255.0.0.0

??????????inet6 addr: ::1/128 Scope:Host

??????????UP LOOPBACK RUNNING??MTU:16436??Metric:1

??????????RX packets:49025 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:49025 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:0

??????????RX bytes:11292111 (10.7 MiB)??TX bytes:11292111 (10.7 MiB)


我們看一下網卡ip地址,被收回的私有eth1網卡ip現在已經恢復了,這是因為剛剛節點2進行了重啟操作。重啟后會初始化所有網卡,被我們禁用的eth1網卡被重新啟用,重新恢復ip

檢查CRS進程狀態,全都是健康的

[root@rac2 cssd]# crsctl check crs

CSS appears healthy

CRS appears healthy

EVM appears healthy

檢查集群,實例,數據庫,監聽,ASM服務狀態,也都是完好無損,全部啟動了

[root@rac2 cssd]# crs_stat -t

Name???????????Type???????????Target????State?????Host????????

------------------------------------------------------------

ora....B1.inst???application????ONLINE????ONLINE????rac1????????

ora....B2.inst???application????ONLINE????ONLINE????rac2????????

ora....DB1.srv???application????ONLINE????ONLINE????rac1????????

ora.....TAF.cs???application????ONLINE????ONLINE????rac1????????

ora.RACDB.db??application????ONLINE????ONLINE????rac1????????

ora....SM1.asm??application????ONLINE????ONLINE????rac1????????

ora....C1.lsnr???application????ONLINE????ONLINE????rac1????????

ora.rac1.gsd???application????ONLINE????ONLINE????rac1????????

ora.rac1.ons???application????ONLINE????ONLINE????rac1????????

ora.rac1.vip????application????ONLINE????ONLINE????rac1????????

ora....SM2.asm??application????ONLINE????ONLINE????rac2????????

ora....C2.lsnr???application????ONLINE????ONLINE????rac2????????

ora.rac2.gsd???application????ONLINE????ONLINE????rac2????????

ora.rac2.ons???application????ONLINE????ONLINE????rac2????????

ora.rac2.vip????application????ONLINE????ONLINE????rac2????????

RAC故障分析并解決的整個過程到此結束


?模擬OCR磁盤不可用時,RAC會出現什么現象?給出故障定位的整個過程

OCR磁盤:OCR磁盤中注冊了RAC所有的資源信息,包含集群、數據庫、實例、監聽、服務、ASM、存儲、網絡等等,只有被OCR磁盤注冊的資源才能被CRS集群管理,CRS進程就是按照OCR磁盤中記錄的資源來管理的,在我們的運維過程中可能會發生OCR磁盤信息丟失的情況,例如 在增減節點時,添加?or?刪除OCR磁盤時可能都會發生。接下來我們模擬一下當OCR磁盤信息丟失時,如果定位故障并解決。


實驗

1.檢查OCR磁盤和CRS進程

1)檢查OCR磁盤,只有OCR磁盤沒有問題,CRS進程才可以順利管理

[root@rac2 cssd]# ocrcheck

Status of Oracle Cluster Registry is as follows :

?????????Version??????????????????:???????????2

?????????Total space (kbytes)????????:??????104344

?????????Used space (kbytes)????????:????????4344

?????????Available space (kbytes)?????:??????100000

?????????ID???????????????????????:??1752469369

?????????Device/File Name??????????: /dev/raw/raw1????????????這個就是OCR磁盤所屬的裸設備

????????????????????????????????????Device/File integrity check succeeded

????????????????????????????????????Device/File not configured

?????????Cluster registry integrity check succeeded?????????????????完整檢查完畢沒有問題

2)檢查CRS狀態

[root@rac2 cssd]# crsctl check crs

CSS appears healthy

CRS appears healthy

EVM appears healthy

集群進程全部健康

3)關閉CRS守護進程

[root@rac2 sysconfig]# crsctl stop crs

Stopping resources.????????????????????????停止資源

Successfully stopped CRS resources????????????停止CRS進程

Stopping CSSD.????????????????????????????停止CSSD進程

Shutting down CSS daemon.

Shutdown request successfully issued.?????????

關閉請求執行成功

[root@rac2 sysconfig]# crsctl check crs

Failure 1 contacting CSS daemon???????????????連接CSS守護進程失敗

Cannot communicate with CRS????????????????無法與CRS通信

Cannot communicate with EVM???????????????無法與EVM通信


2.root用戶導出OCR磁盤內容進行OCR備份

[root@rac2 sysconfig]# ocrconfig -export /home/oracle/ocr.exp

[oracle@rac2 ~]$ pwd

/home/oracle

[oracle@rac2 ~]$ ll

total 108

-rw-r--r-- 1 root???root?????98074 Jul 18 11:20 ocr.exp?????????已經生成OCR導出文件


3.重啟CRS守護進程

[root@rac2 sysconfig]# crsctl start crs

Attempting to start CRS stack?????????????????????嘗試啟動CRS

The CRS stack will be started shortly?????????CRS即將啟動

檢查CRS狀態

[root@rac2 sysconfig]# crsctl check crs???????很好,我們重新啟動后就變正常了

CSS appears healthy

CRS appears healthy

EVM appears healthy


4.使用裸設備命令0字節覆蓋OCR磁盤內容模擬丟失狀態

[root@rac2 sysconfig]# dd if=/dev/zero of=/dev/raw/raw1 bs=1024 count=102400

102400+0 records in???????102400記錄輸入

102400+0 records out??????102400記錄輸出

104857600 bytes (105 MB) copied, 76.7348 seconds, 1.4 MB/s

命令解釋

dd???????????????????????????????指定大小的塊拷貝一個文件,并在拷貝的同時進行指定的轉換

if=/dev/zero?????????????????指定源文件,0設備

of=/dev/raw/raw1?????指定目標文件,OCR磁盤

bs=1024????????????????????????指定塊大小1024個字節,即1k

count=102400?????????????指定拷貝的塊數,102400個塊


5.再次檢查OCR磁盤狀態

[root@rac2 sysconfig]# ocrcheck

PROT-601: Failed to initialize ocrcheck??????????????????初始化OCR磁盤失敗

檢查CRS狀態

[root@rac2 sysconfig]# crsctl check crs

Failure 1 contacting CSS daemon??????????????????????連接CSS守護進程失敗

Cannot communicate with CRS???????????????????????無法與CRS通信

EVM appears healthy

CRS進程失敗很正常,你想想連記錄的資源信息都丟失了,還怎么管理呢


6.使用import恢復OCR磁盤內容

[root@rac2 crs1020]# ocrconfig -import /home/oracle/ocr.exp


7.最后檢查OCR磁盤狀態

謝天謝地順順利利恢復回來了

[root@rac2 crs1020]# ocrcheck

Status of Oracle Cluster Registry is as follows :

?????????Version??????????????????:??????????2

?????????Total space (kbytes)?????:?????104344

?????????Used space (kbytes)??????:???????4348

?????????Available space (kbytes) :??????99996

?????????ID???????????????????????:??425383787

?????????Device/File Name?????????: /dev/raw/raw1

????????????????????????????????????Device/File integrity check succeeded

????????????????????????????????????Device/File not configured

?????????Cluster registry integrity check succeeded


8.關注CRS守護進程

[root@rac2 crs1020]# crsctl check crs

CSS appears healthy

CRS appears healthy

EVM appears healthy

非常好,當OCR磁盤恢復之后自動重啟CRS守護進程

[root@rac2 crs1020]# crs_stat -t

Name???????????Type???????????Target????State?????Host????????

------------------------------------------------------------

ora....B1.inst????application????ONLINE????ONLINE????rac1????????

ora....B2.inst????application????ONLINE????OFFLINE???????????????

ora....DB1.srv???application????ONLINE????ONLINE????rac1????????

ora.....TAF.cs????application????ONLINE????ONLINE????rac1????????

ora.RACDB.db???application????ONLINE????ONLINE????rac1????????

ora....SM1.asm??application????ONLINE????ONLINE????rac1????????

ora....C1.lsnr????application????ONLINE????ONLINE????rac1????????

ora.rac1.gsd????application????ONLINE????ONLINE????rac1????????

ora.rac1.ons????application????ONLINE????ONLINE????rac1????????

ora.rac1.vip?????application????ONLINE????ONLINE????rac1????????

ora....SM2.asm??application????ONLINE????OFFLINE???????????????

ora....C2.lsnr????application????ONLINE????OFFLINE???????????????

ora.rac2.gsd????application????ONLINE????OFFLINE???????????????

ora.rac2.ons????application????ONLINE????OFFLINE???????????????

ora.rac2.vip?????application????ONLINE????ONLINE????rac2

我重啟了一遍CRS集群服務

[root@rac2 init.d]# ./init.crs stop

Shutting down Oracle Cluster Ready Services (CRS):

Stopping resources.

Successfully stopped CRS resources

Stopping CSSD.

Shutting down CSS daemon.

Shutdown request successfully issued.

Shutdown has begun. The daemons should exit soon.

[root@rac2 init.d]# crs_stat -t

CRS-0184: Cannot communicate with the CRS daemon.

[root@rac2 init.d]# ./init.crs start

Startup will be queued to init within 90 seconds.

現在都恢復了

[oracle@rac2 ~]$ crs_stat -t

Name???????????Type????????????Target?????State??????Host????????

----------------------------------------------------------------------------------------------

ora....B1.inst????application????????ONLINE????ONLINE????rac1????????

ora....B2.inst????application????????ONLINE????ONLINE????rac2????????

ora....DB1.srv???application????????ONLINE????ONLINE????rac2????????

ora.....TAF.cs????application????????ONLINE????ONLINE????rac2????????

ora.RACDB.db??application?????????ONLINE????ONLINE????rac2????????

ora....SM1.asm??application????????ONLINE????ONLINE????rac1????????

ora....C1.lsnr????application????????ONLINE????ONLINE????rac1????????

ora.rac1.gsd????application????????ONLINE????ONLINE????rac1????????

ora.rac1.ons????application????????ONLINE????ONLINE????rac1????????

ora.rac1.vip????application????????ONLINE????ONLINE????rac1????????

ora....SM2.asm??application????????ONLINE????ONLINE????rac2????????

ora....C2.lsnr????application???????ONLINE????ONLINE????rac2????????

ora.rac2.gsd????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.ons????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.vip????application?????????ONLINE????ONLINE?????rac2??



四 模擬votedisk不可用時,RAC會出現什么現象?給出故障定位的整個過程

表決磁盤:在發生腦裂問題時,通過表決磁盤來決定驅逐哪個節點。這是發生在集群層上的腦裂。

控制文件:如果是發生在實例層上的腦裂問題,通過控制文件來決定驅逐哪個節點。

Votedisk冗余策略:

(1)votedisk可以選擇外部冗余,通過外部的機制進行保護

(2)votedisk還可以選擇Oracle自己的內部冗余,通過添加votedisk磁盤鏡像來實現內部冗余

實驗

1.檢查vote disk狀態

[oracle@rac1 ~]$ crsctl query css votedisk

0.?????0????/dev/raw/raw2?????????????????顯示2號裸設備為表決磁盤



located 1 votedisk(s).?????????????????????????只定位1個表決磁盤

2.停止CRS集群

[root@rac1 sysconfig]# crsctl stop crs

Stopping resources.????????????????????????停止資源

Successfully stopped CRS resources????????????停止CRS進程

Stopping CSSD.????????????????????????????停止CSSD進程

Shutting down CSS daemon.

Shutdown request successfully issued.?????????

3.添加votedisk表決磁盤,實現內部冗余,

crsctl??add??css??votedisk /dev/raw/raw3 –force???把raw3這塊裸設備添加入表決磁盤組

添加之后Oracle就會把原來表決磁盤內容復制一份到新表決磁盤中

4.再次檢查vote disk狀態

crsctl??query??css??votedisk

5.啟動CRS集群

[root@rac2 sysconfig]# crsctl start crs

Attempting to start CRS stack???????????????嘗試啟動CRS

The CRS stack will be started shortly?????????CRS即將啟動

小結:當表決磁盤/dev/raw/raw2損壞時,可以用其鏡像/dev/raw/raw3來代替,使其RAC可以繼續對外提供服務。

來源:互聯網


轉載于:https://blog.51cto.com/linuxzkq/1583890

總結

以上是生活随笔為你收集整理的【Oracle RAC故障分析与处理】的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。