日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【Oracle RAC故障分析与处理】

發(fā)布時(shí)間:2024/1/17 编程问答 37 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【Oracle RAC故障分析与处理】 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

原文地址:【Oracle?RAC故障分析與處理】作者:蟻巡運(yùn)維平臺

?RAC環(huán)境

RAC架構(gòu),2節(jié)點(diǎn)信息

節(jié)點(diǎn)1

SQL> show parameter instance

NAME?????????????????????????????????TYPE????????VALUE

------------------------------------ ----------- -----------------------------------------------

active_instance_count????????????????????integer

cluster_database_instances????????????????integer?????2

instance_groups?????????????????????????string

instance_name??????????????????????????string??????RACDB1

instance_number????????????????????????Integer?????1

instance_type???????????????????????????string??????RDBMS

open_links_per_instance??????????????????integer?????4

parallel_instance_group???????????????????string

parallel_server_instances??????????????????integer?????2

節(jié)點(diǎn)2

SQL> show parameter instance

NAME?????????????????????????????????TYPE????????VALUE

------------------------------------ ----------- ------------------------------------------

active_instance_count????????????????????integer

cluster_database_instances????????????????integer?????2

instance_groups?????????????????????????string

instance_name??????????????????????????string??????RACDB2

instance_number????????????????????????integer?????2

instance_type???????????????????????????string??????RDBMS

open_links_per_instance??????????????????integer?????4

parallel_instance_group???????????????????string

parallel_server_instances??????????????????integer?????2

數(shù)據(jù)庫版本

SQL> select * from v$version;

BANNER

----------------------------------------------------------------

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod

PL/SQL Release 10.2.0.1.0 - Production

CORE????10.2.0.1.0??????Production

TNS for Linux: Version 10.2.0.1.0 - Production

NLSRTL Version 10.2.0.1.0 - Production

操作系統(tǒng)信息

節(jié)點(diǎn)1

[oracle@rac1 ~]$ uname -a

Linux rac1 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007 i686 i686 i386 GNU/Linux

節(jié)點(diǎn)2

[oracle@rac2 ~]$ uname -a

Linux rac2 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007 i686 i686 i386 GNU/Linux

RAC所有資源信息

[oracle@rac2 ~]$ crs_stat -t

Name???????????Type????????????Target?????State??????Host????????

----------------------------------------------------------------------------------------------

ora....B1.inst????application????????ONLINE????ONLINE????rac1????????

ora....B2.inst????application????????ONLINE????ONLINE????rac2????????

ora....DB1.srv???application????????ONLINE????ONLINE????rac2????????

ora.....TAF.cs????application????????ONLINE????ONLINE????rac2????????

ora.RACDB.db??application?????????ONLINE????ONLINE????rac2????????

ora....SM1.asm??application????????ONLINE????ONLINE????rac1????????

ora....C1.lsnr????application????????ONLINE????ONLINE????rac1????????

ora.rac1.gsd????application????????ONLINE????ONLINE????rac1????????

ora.rac1.ons????application????????ONLINE????ONLINE????rac1????????

ora.rac1.vip????application????????ONLINE????ONLINE????rac1????????

ora....SM2.asm??application????????ONLINE????ONLINE????rac2????????

ora....C2.lsnr????application???????ONLINE????ONLINE????rac2????????

ora.rac2.gsd????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.ons????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.vip????application?????????ONLINE????ONLINE?????rac2


?模擬兩個節(jié)點(diǎn)內(nèi)聯(lián)網(wǎng)不通,觀察RAC會出現(xiàn)什么現(xiàn)象?給出故障定位的整個過程

本小題會模擬RAC的私有網(wǎng)絡(luò)不通現(xiàn)象,然后定位故障原因,最后排除故障。

1.首先RAC是一個非常健康的狀態(tài)

[oracle@rac2 ~]$ crs_stat -t

Name???????????Type????????????Target?????State??????Host????????

----------------------------------------------------------------------------------------------

ora....B1.inst????application????????ONLINE????ONLINE????rac1????????

ora....B2.inst????application????????ONLINE????ONLINE????rac2????????

ora....DB1.srv???application????????ONLINE????ONLINE????rac2????????

ora.....TAF.cs????application????????ONLINE????ONLINE????rac2????????

ora.RACDB.db??application?????????ONLINE????ONLINE????rac2????????

ora....SM1.asm??application????????ONLINE????ONLINE????rac1????????

ora....C1.lsnr????application????????ONLINE????ONLINE????rac1????????

ora.rac1.gsd????application????????ONLINE????ONLINE????rac1????????

ora.rac1.ons????application????????ONLINE????ONLINE????rac1????????

ora.rac1.vip????application????????ONLINE????ONLINE????rac1????????

ora....SM2.asm??application????????ONLINE????ONLINE????rac2????????

ora....C2.lsnr????application???????ONLINE????ONLINE????rac2????????

ora.rac2.gsd????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.ons????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.vip????application?????????ONLINE????ONLINE?????rac2??

檢查CRS進(jìn)程狀態(tài)(CRS??CSS??EVM

[oracle@rac2 ~]$ crsctl check crs

CSS appears healthy

CRS appears healthy

EVM appears healthy

檢查OCR磁盤狀態(tài),沒有問題

[oracle@rac2 ~]$ ocrcheck

Status of Oracle Cluster Registry is as follows :

?????????Version??????????????????:??????????2

?????????Total space (kbytes)?????:?????104344

?????????Used space (kbytes)??????:???????4344

?????????Available space (kbytes) :?????100000

?????????ID???????????????????????: 1752469369

?????????Device/File Name?????????: /dev/raw/raw1

????????????????????????????????????Device/File integrity check succeeded

????????????????????????????????????Device/File not configured

?????????Cluster registry integrity check succeeded

檢查vote disk狀態(tài)

[oracle@rac2 ~]$ crsctl query css votedisk

0.?????0????/dev/raw/raw2??????????????????????顯示2號裸設(shè)備為表決磁盤

located 1 votedisk(s).??????????????????????????????只定位1個表決磁盤

2.手工禁用一個私有網(wǎng)卡

[oracle@rac2 ~]$ cat /etc/hosts

127.0.0.1???????localhost.localdomain???localhost

::1?????localhost6.localdomain6 localhost6

##Public Network - (eth0)

##Private Interconnect - (eth1)

##Public Virtual IP (VIP) addresses - (eth0)

192.168.1.101???rac1????????????????????????這是RAC的共有網(wǎng)卡

192.168.1.102???rac2

192.168.2.101???rac1-priv????????????????????這是RAC的私有網(wǎng)卡

192.168.2.102???rac2-priv

192.168.1.201???rac1-vip?????????????????????這是RAC虛擬網(wǎng)卡

192.168.1.202???rac2-vip

看一下IP地址和網(wǎng)卡的對應(yīng)關(guān)系

[oracle@rac2 ~]$ ifconfig

eth0??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:87??

??????????inet addr:192.168.1.102??Bcast:192.168.1.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f187/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:360 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:593 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:46046 (44.9 KiB)??TX bytes:62812 (61.3 KiB)

??????????Interrupt:185 Base address:0x14a4

eth0:1????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:87??

??????????inet addr:192.168.1.202??Bcast:192.168.1.255??Mask:255.255.255.0

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????Interrupt:185 Base address:0x14a4

eth1??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:91??

??????????inet addr:192.168.2.102??Bcast:192.168.2.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f191/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:76588 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:58002 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:65185420 (62.1 MiB)??TX bytes:37988820 (36.2 MiB)

??????????Interrupt:193 Base address:0x1824

eth2??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:9B??

??????????inet addr:192.168.203.129??Bcast:192.168.203.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f19b/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:339 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:83 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:42206 (41.2 KiB)??TX bytes:10199 (9.9 KiB)

??????????Interrupt:169 Base address:0x18a4

lo????????Link encap:Local Loopback??

??????????inet addr:127.0.0.1??Mask:255.0.0.0

??????????inet6 addr: ::1/128 Scope:Host

??????????UP LOOPBACK RUNNING??MTU:16436??Metric:1

??????????RX packets:99403 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:99403 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:0

??????????RX bytes:18134658 (17.2 MiB)??TX bytes:18134658 (17.2 MiB)

eth 0?對應(yīng)RAC的共有網(wǎng)卡

eth 1?對應(yīng)RAC的私有網(wǎng)卡

eth0:1對應(yīng)RAC的虛擬網(wǎng)卡

我們現(xiàn)在禁止eth1私有網(wǎng)卡來完成內(nèi)聯(lián)網(wǎng)網(wǎng)絡(luò)不通現(xiàn)象,方法很簡單

ifdown eth1?????????????????????????????禁用網(wǎng)卡

ifup???eth1?????????????????????????????激活網(wǎng)卡

[oracle@rac2 ~]$ su – root?????????????????需要使用root用戶哦,否則提示Users cannot control this device.

Password:

[root@rac2 ~]# ifdown eth1???????????????

我從17:18:51敲入這個命令,4分鐘之后節(jié)點(diǎn)2重啟,大家知道發(fā)生了什么現(xiàn)象嘛?

Good?這就是傳說中RAC腦裂brain split問題,當(dāng)節(jié)點(diǎn)間的內(nèi)聯(lián)網(wǎng)不通時(shí),無法信息共享,就會出現(xiàn)腦裂現(xiàn)象,RAC必須驅(qū)逐其中一部分節(jié)點(diǎn)來保護(hù)數(shù)據(jù)的一致性,被驅(qū)逐的節(jié)點(diǎn)被強(qiáng)制重啟,這不節(jié)點(diǎn)2自動重啟了么。又說回來,那為什么節(jié)點(diǎn)2重啟,其他節(jié)點(diǎn)不重啟呢。

這里有個驅(qū)逐原則:(1)子集群中少節(jié)點(diǎn)的被驅(qū)逐

?????????????????2)節(jié)點(diǎn)號大的被驅(qū)逐

?????????????????3)負(fù)載高的節(jié)點(diǎn)被驅(qū)逐

我們中的就是第二條,OK,節(jié)點(diǎn)2重啟來了,我們登陸系統(tǒng),輸出用戶名/密碼

3.定位故障原因

1)查看操作系統(tǒng)日志

[oracle@rac2 ~]$ su - root

Password:

[root@rac2 ~]# tail -30f /var/log/messages

我又重新模擬了一遍,由于信息量很大,我從里面找出與網(wǎng)絡(luò)有關(guān)的告警信息

Jul 17 20:05:25 rac2 avahi-daemon[3659]: Withdrawing address record for 192.168.2.102 on eth1.

收回eth1網(wǎng)卡的ip地址,導(dǎo)致節(jié)點(diǎn)1驅(qū)逐節(jié)點(diǎn)2,節(jié)點(diǎn)2自動重啟

Jul 17 20:05:25 rac2 avahi-daemon[3659]: Leaving mDNS multicast group on interface eth1.IPv4 with address 192.168.2.102.

網(wǎng)卡eth1脫離多組播組

Jul 17 20:05:25 rac2 avahi-daemon[3659]: iface.c: interface_mdns_mcast_join() called but no local address available.

Jul 17 20:05:25 rac2 avahi-daemon[3659]: Interface eth1.IPv4 no longer relevant for mDNS.

網(wǎng)卡eth1不在與mDNS有關(guān)

Jul 17 20:09:54 rac2 logger: Oracle Cluster Ready Services starting up automatically.

Oracle集群自動啟動

Jul 17 20:09:59 rac2 avahi-daemon[3664]: Registering new address record for fe80::20c:29ff:fe8f:f191 on eth1.

Jul 17 20:09:59 rac2 avahi-daemon[3664]: Registering new address record for 192.168.2.102 on eth1.

注冊新ip地址

Jul 17 20:10:17 rac2 logger: Cluster Ready Services completed waiting on dependencies.

CRS完成等待依賴關(guān)系

從上面信息我們大體知道,是因?yàn)?span style="font-family:Calibri;">eth1網(wǎng)卡的問題導(dǎo)致節(jié)點(diǎn)2重啟的,為了進(jìn)一步分析問題我們還需要看一下CRS排錯日志

[root@rac2 crsd]# tail -100f $ORA_CRS_HOME/log/rac2/crsd/crsd.log

Abnormal termination by CSS, ret = 8

異常終止CSS

2013-07-17 20:11:18.115: [ default][1244944]0CRS Daemon Starting

2013-07-17 20:11:18.116: [ CRSMAIN][1244944]0Checking the OCR device

2013-07-17 20:11:18.303: [ CRSMAIN][1244944]0Connecting to the CSS Daemon

重啟CRS??CSS進(jìn)程

[root@rac2 cssd]# pwd

/u01/crs1020/log/rac2/cssd

[root@rac2 cssd]# more ocssd.log???????查看cssd進(jìn)程日志

[CSSD]2013-07-17 17:26:18.319 [86104976] >TRACE:???clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))

這里可以看到rac2節(jié)點(diǎn)的cssd進(jìn)程監(jiān)聽出了問題

[CSSD]2013-07-17 17:26:19.296 [75615120] >TRACE:???clssnmHandleSync: Acknowledging sync: src[1] srcName[rac1] seq[13] sync[12]

請確認(rèn)兩個節(jié)點(diǎn)的同步問題

從以上一系列信息可以分析出這是內(nèi)聯(lián)網(wǎng)通信問題,由于兩個節(jié)點(diǎn)的信息無法同步導(dǎo)致信息無法共享從而引起腦裂現(xiàn)象

4.節(jié)點(diǎn)2重啟自動恢復(fù)正常狀態(tài)

[root@rac2 cssd]# ifconfig

eth0??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:87??

??????????inet addr:192.168.1.102??Bcast:192.168.1.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f187/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:567 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:901 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:65402 (63.8 KiB)??TX bytes:96107 (93.8 KiB)

??????????Interrupt:185 Base address:0x14a4

eth0:1????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:87??

??????????inet addr:192.168.1.202??Bcast:192.168.1.255??Mask:255.255.255.0

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????Interrupt:185 Base address:0x14a4

eth1??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:91??

??????????inet addr:192.168.2.102??Bcast:192.168.2.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f191/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:76659 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:51882 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:61625763 (58.7 MiB)??TX bytes:26779167 (25.5 MiB)

??????????Interrupt:193 Base address:0x1824

eth2??????Link encap:Ethernet??HWaddr 00:0C:29:8F:F1:9B??

??????????inet addr:192.168.203.129??Bcast:192.168.203.255??Mask:255.255.255.0

??????????inet6 addr: fe80::20c:29ff:fe8f:f19b/64 Scope:Link

??????????UP BROADCAST RUNNING MULTICAST??MTU:1500??Metric:1

??????????RX packets:409 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:58 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:1000

??????????RX bytes:45226 (44.1 KiB)??TX bytes:9567 (9.3 KiB)

??????????Interrupt:169 Base address:0x18a4

lo????????Link encap:Local Loopback??

??????????inet addr:127.0.0.1??Mask:255.0.0.0

??????????inet6 addr: ::1/128 Scope:Host

??????????UP LOOPBACK RUNNING??MTU:16436??Metric:1

??????????RX packets:49025 errors:0 dropped:0 overruns:0 frame:0

??????????TX packets:49025 errors:0 dropped:0 overruns:0 carrier:0

??????????collisions:0 txqueuelen:0

??????????RX bytes:11292111 (10.7 MiB)??TX bytes:11292111 (10.7 MiB)


我們看一下網(wǎng)卡ip地址,被收回的私有eth1網(wǎng)卡ip現(xiàn)在已經(jīng)恢復(fù)了,這是因?yàn)閯倓偣?jié)點(diǎn)2進(jìn)行了重啟操作。重啟后會初始化所有網(wǎng)卡,被我們禁用的eth1網(wǎng)卡被重新啟用,重新恢復(fù)ip

檢查CRS進(jìn)程狀態(tài),全都是健康的

[root@rac2 cssd]# crsctl check crs

CSS appears healthy

CRS appears healthy

EVM appears healthy

檢查集群,實(shí)例,數(shù)據(jù)庫,監(jiān)聽,ASM服務(wù)狀態(tài),也都是完好無損,全部啟動了

[root@rac2 cssd]# crs_stat -t

Name???????????Type???????????Target????State?????Host????????

------------------------------------------------------------

ora....B1.inst???application????ONLINE????ONLINE????rac1????????

ora....B2.inst???application????ONLINE????ONLINE????rac2????????

ora....DB1.srv???application????ONLINE????ONLINE????rac1????????

ora.....TAF.cs???application????ONLINE????ONLINE????rac1????????

ora.RACDB.db??application????ONLINE????ONLINE????rac1????????

ora....SM1.asm??application????ONLINE????ONLINE????rac1????????

ora....C1.lsnr???application????ONLINE????ONLINE????rac1????????

ora.rac1.gsd???application????ONLINE????ONLINE????rac1????????

ora.rac1.ons???application????ONLINE????ONLINE????rac1????????

ora.rac1.vip????application????ONLINE????ONLINE????rac1????????

ora....SM2.asm??application????ONLINE????ONLINE????rac2????????

ora....C2.lsnr???application????ONLINE????ONLINE????rac2????????

ora.rac2.gsd???application????ONLINE????ONLINE????rac2????????

ora.rac2.ons???application????ONLINE????ONLINE????rac2????????

ora.rac2.vip????application????ONLINE????ONLINE????rac2????????

RAC故障分析并解決的整個過程到此結(jié)束


?模擬OCR磁盤不可用時(shí),RAC會出現(xiàn)什么現(xiàn)象?給出故障定位的整個過程

OCR磁盤:OCR磁盤中注冊了RAC所有的資源信息,包含集群、數(shù)據(jù)庫、實(shí)例、監(jiān)聽、服務(wù)、ASM、存儲、網(wǎng)絡(luò)等等,只有被OCR磁盤注冊的資源才能被CRS集群管理,CRS進(jìn)程就是按照OCR磁盤中記錄的資源來管理的,在我們的運(yùn)維過程中可能會發(fā)生OCR磁盤信息丟失的情況,例如 在增減節(jié)點(diǎn)時(shí),添加?or?刪除OCR磁盤時(shí)可能都會發(fā)生。接下來我們模擬一下當(dāng)OCR磁盤信息丟失時(shí),如果定位故障并解決。


實(shí)驗(yàn)

1.檢查OCR磁盤和CRS進(jìn)程

1)檢查OCR磁盤,只有OCR磁盤沒有問題,CRS進(jìn)程才可以順利管理

[root@rac2 cssd]# ocrcheck

Status of Oracle Cluster Registry is as follows :

?????????Version??????????????????:???????????2

?????????Total space (kbytes)????????:??????104344

?????????Used space (kbytes)????????:????????4344

?????????Available space (kbytes)?????:??????100000

?????????ID???????????????????????:??1752469369

?????????Device/File Name??????????: /dev/raw/raw1????????????這個就是OCR磁盤所屬的裸設(shè)備

????????????????????????????????????Device/File integrity check succeeded

????????????????????????????????????Device/File not configured

?????????Cluster registry integrity check succeeded?????????????????完整檢查完畢沒有問題

2)檢查CRS狀態(tài)

[root@rac2 cssd]# crsctl check crs

CSS appears healthy

CRS appears healthy

EVM appears healthy

集群進(jìn)程全部健康

3)關(guān)閉CRS守護(hù)進(jìn)程

[root@rac2 sysconfig]# crsctl stop crs

Stopping resources.????????????????????????停止資源

Successfully stopped CRS resources????????????停止CRS進(jìn)程

Stopping CSSD.????????????????????????????停止CSSD進(jìn)程

Shutting down CSS daemon.

Shutdown request successfully issued.?????????

關(guān)閉請求執(zhí)行成功

[root@rac2 sysconfig]# crsctl check crs

Failure 1 contacting CSS daemon???????????????連接CSS守護(hù)進(jìn)程失敗

Cannot communicate with CRS????????????????無法與CRS通信

Cannot communicate with EVM???????????????無法與EVM通信


2.root用戶導(dǎo)出OCR磁盤內(nèi)容進(jìn)行OCR備份

[root@rac2 sysconfig]# ocrconfig -export /home/oracle/ocr.exp

[oracle@rac2 ~]$ pwd

/home/oracle

[oracle@rac2 ~]$ ll

total 108

-rw-r--r-- 1 root???root?????98074 Jul 18 11:20 ocr.exp?????????已經(jīng)生成OCR導(dǎo)出文件


3.重啟CRS守護(hù)進(jìn)程

[root@rac2 sysconfig]# crsctl start crs

Attempting to start CRS stack?????????????????????嘗試啟動CRS

The CRS stack will be started shortly?????????CRS即將啟動

檢查CRS狀態(tài)

[root@rac2 sysconfig]# crsctl check crs???????很好,我們重新啟動后就變正常了

CSS appears healthy

CRS appears healthy

EVM appears healthy


4.使用裸設(shè)備命令0字節(jié)覆蓋OCR磁盤內(nèi)容模擬丟失狀態(tài)

[root@rac2 sysconfig]# dd if=/dev/zero of=/dev/raw/raw1 bs=1024 count=102400

102400+0 records in???????102400記錄輸入

102400+0 records out??????102400記錄輸出

104857600 bytes (105 MB) copied, 76.7348 seconds, 1.4 MB/s

命令解釋

dd???????????????????????????????指定大小的塊拷貝一個文件,并在拷貝的同時(shí)進(jìn)行指定的轉(zhuǎn)換

if=/dev/zero?????????????????指定源文件,0設(shè)備

of=/dev/raw/raw1?????指定目標(biāo)文件,OCR磁盤

bs=1024????????????????????????指定塊大小1024個字節(jié),即1k

count=102400?????????????指定拷貝的塊數(shù),102400個塊


5.再次檢查OCR磁盤狀態(tài)

[root@rac2 sysconfig]# ocrcheck

PROT-601: Failed to initialize ocrcheck??????????????????初始化OCR磁盤失敗

檢查CRS狀態(tài)

[root@rac2 sysconfig]# crsctl check crs

Failure 1 contacting CSS daemon??????????????????????連接CSS守護(hù)進(jìn)程失敗

Cannot communicate with CRS???????????????????????無法與CRS通信

EVM appears healthy

CRS進(jìn)程失敗很正常,你想想連記錄的資源信息都丟失了,還怎么管理呢


6.使用import恢復(fù)OCR磁盤內(nèi)容

[root@rac2 crs1020]# ocrconfig -import /home/oracle/ocr.exp


7.最后檢查OCR磁盤狀態(tài)

謝天謝地順順利利恢復(fù)回來了

[root@rac2 crs1020]# ocrcheck

Status of Oracle Cluster Registry is as follows :

?????????Version??????????????????:??????????2

?????????Total space (kbytes)?????:?????104344

?????????Used space (kbytes)??????:???????4348

?????????Available space (kbytes) :??????99996

?????????ID???????????????????????:??425383787

?????????Device/File Name?????????: /dev/raw/raw1

????????????????????????????????????Device/File integrity check succeeded

????????????????????????????????????Device/File not configured

?????????Cluster registry integrity check succeeded


8.關(guān)注CRS守護(hù)進(jìn)程

[root@rac2 crs1020]# crsctl check crs

CSS appears healthy

CRS appears healthy

EVM appears healthy

非常好,當(dāng)OCR磁盤恢復(fù)之后自動重啟CRS守護(hù)進(jìn)程

[root@rac2 crs1020]# crs_stat -t

Name???????????Type???????????Target????State?????Host????????

------------------------------------------------------------

ora....B1.inst????application????ONLINE????ONLINE????rac1????????

ora....B2.inst????application????ONLINE????OFFLINE???????????????

ora....DB1.srv???application????ONLINE????ONLINE????rac1????????

ora.....TAF.cs????application????ONLINE????ONLINE????rac1????????

ora.RACDB.db???application????ONLINE????ONLINE????rac1????????

ora....SM1.asm??application????ONLINE????ONLINE????rac1????????

ora....C1.lsnr????application????ONLINE????ONLINE????rac1????????

ora.rac1.gsd????application????ONLINE????ONLINE????rac1????????

ora.rac1.ons????application????ONLINE????ONLINE????rac1????????

ora.rac1.vip?????application????ONLINE????ONLINE????rac1????????

ora....SM2.asm??application????ONLINE????OFFLINE???????????????

ora....C2.lsnr????application????ONLINE????OFFLINE???????????????

ora.rac2.gsd????application????ONLINE????OFFLINE???????????????

ora.rac2.ons????application????ONLINE????OFFLINE???????????????

ora.rac2.vip?????application????ONLINE????ONLINE????rac2

我重啟了一遍CRS集群服務(wù)

[root@rac2 init.d]# ./init.crs stop

Shutting down Oracle Cluster Ready Services (CRS):

Stopping resources.

Successfully stopped CRS resources

Stopping CSSD.

Shutting down CSS daemon.

Shutdown request successfully issued.

Shutdown has begun. The daemons should exit soon.

[root@rac2 init.d]# crs_stat -t

CRS-0184: Cannot communicate with the CRS daemon.

[root@rac2 init.d]# ./init.crs start

Startup will be queued to init within 90 seconds.

現(xiàn)在都恢復(fù)了

[oracle@rac2 ~]$ crs_stat -t

Name???????????Type????????????Target?????State??????Host????????

----------------------------------------------------------------------------------------------

ora....B1.inst????application????????ONLINE????ONLINE????rac1????????

ora....B2.inst????application????????ONLINE????ONLINE????rac2????????

ora....DB1.srv???application????????ONLINE????ONLINE????rac2????????

ora.....TAF.cs????application????????ONLINE????ONLINE????rac2????????

ora.RACDB.db??application?????????ONLINE????ONLINE????rac2????????

ora....SM1.asm??application????????ONLINE????ONLINE????rac1????????

ora....C1.lsnr????application????????ONLINE????ONLINE????rac1????????

ora.rac1.gsd????application????????ONLINE????ONLINE????rac1????????

ora.rac1.ons????application????????ONLINE????ONLINE????rac1????????

ora.rac1.vip????application????????ONLINE????ONLINE????rac1????????

ora....SM2.asm??application????????ONLINE????ONLINE????rac2????????

ora....C2.lsnr????application???????ONLINE????ONLINE????rac2????????

ora.rac2.gsd????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.ons????application????????ONLINE????ONLINE?????rac2????????

ora.rac2.vip????application?????????ONLINE????ONLINE?????rac2??



四 模擬votedisk不可用時(shí),RAC會出現(xiàn)什么現(xiàn)象?給出故障定位的整個過程

表決磁盤:在發(fā)生腦裂問題時(shí),通過表決磁盤來決定驅(qū)逐哪個節(jié)點(diǎn)。這是發(fā)生在集群層上的腦裂。

控制文件:如果是發(fā)生在實(shí)例層上的腦裂問題,通過控制文件來決定驅(qū)逐哪個節(jié)點(diǎn)。

Votedisk冗余策略:

(1)votedisk可以選擇外部冗余,通過外部的機(jī)制進(jìn)行保護(hù)

(2)votedisk還可以選擇Oracle自己的內(nèi)部冗余,通過添加votedisk磁盤鏡像來實(shí)現(xiàn)內(nèi)部冗余

實(shí)驗(yàn)

1.檢查vote disk狀態(tài)

[oracle@rac1 ~]$ crsctl query css votedisk

0.?????0????/dev/raw/raw2?????????????????顯示2號裸設(shè)備為表決磁盤



located 1 votedisk(s).?????????????????????????只定位1個表決磁盤

2.停止CRS集群

[root@rac1 sysconfig]# crsctl stop crs

Stopping resources.????????????????????????停止資源

Successfully stopped CRS resources????????????停止CRS進(jìn)程

Stopping CSSD.????????????????????????????停止CSSD進(jìn)程

Shutting down CSS daemon.

Shutdown request successfully issued.?????????

3.添加votedisk表決磁盤,實(shí)現(xiàn)內(nèi)部冗余,

crsctl??add??css??votedisk /dev/raw/raw3 –force???把raw3這塊裸設(shè)備添加入表決磁盤組

添加之后Oracle就會把原來表決磁盤內(nèi)容復(fù)制一份到新表決磁盤中

4.再次檢查vote disk狀態(tài)

crsctl??query??css??votedisk

5.啟動CRS集群

[root@rac2 sysconfig]# crsctl start crs

Attempting to start CRS stack???????????????嘗試啟動CRS

The CRS stack will be started shortly?????????CRS即將啟動

小結(jié):當(dāng)表決磁盤/dev/raw/raw2損壞時(shí),可以用其鏡像/dev/raw/raw3來代替,使其RAC可以繼續(xù)對外提供服務(wù)。

來源:互聯(lián)網(wǎng)


轉(zhuǎn)載于:https://blog.51cto.com/linuxzkq/1583890

總結(jié)

以上是生活随笔為你收集整理的【Oracle RAC故障分析与处理】的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。