當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

ORA-04031错误导致宕机案例分析

發(fā)布時間：2025/1/21 编程问答 59 豆豆

生活随笔收集整理的這篇文章主要介紹了 ORA-04031错误导致宕机案例分析小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

今天遇到一起ORACLE數(shù)據(jù)庫宕機(jī)案例，下面是對這起數(shù)據(jù)庫宕機(jī)案例的原因進(jìn)行分析、解讀。分析過程中順便記錄一下這個案例的前因后果，攢點(diǎn)經(jīng)驗值，培養(yǎng)一下分析、解決問題的能力。

案例環(huán)境：

?? 操作系統(tǒng) ：Oracle Linux Server release 5.7 64 bit

?? 數(shù)據(jù)庫版本：Oracle Database 10g Release 10.2.0.4.0 - 64bit Production

案例分析：

收到告警去檢查數(shù)據(jù)庫時，發(fā)現(xiàn)實例已經(jīng)宕機(jī)。檢查告警日志，發(fā)現(xiàn)下面錯誤信息：

ORA-00604: error occurred at recursive SQL level 1ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")Mon Nov? 2 11:43:00 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:ORA-00604: error occurred at recursive SQL level 1ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")Mon Nov? 2 11:43:00 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:ORA-00604: error occurred at recursive SQL level 1ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")Mon Nov? 2 11:43:05 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:ORA-00604: error occurred at recursive SQL level 1ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")Mon Nov? 2 11:43:05 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:ORA-00604: error occurred at recursive SQL level 1ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")Mon Nov? 2 11:43:08 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_reco_6569.trc:ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select host,userid,password,...","sql area","tmp")Mon Nov? 2 11:43:08 2015RECO: terminating instance due to error 4031Mon Nov? 2 11:43:08 2015Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_pmon_6555.trc:ORA-04031: unable to allocate? bytes of shared memory ("","","","")Instance terminated by RECO, pid = 6569

從告警日志我們可以看到ORA-00604與ORA-04031錯誤導(dǎo)致了這次宕機(jī)事故（RECO: terminating instance due to error 4031）：

$ oerr ora 4031

04031, 00000, "unable to allocate %s bytes of shared memory (\"%s\",\"%s\",\"%s\",\"%s\")"

// *Cause: More shared memory is needed than was allocated in the shared

// pool.

// *Action: If the shared pool is out of memory, either use the

// dbms_shared_pool package to pin large packages,

// reduce your use of shared memory, or increase the amount of

// available shared memory by increasing the value of the

// INIT.ORA parameters "shared_pool_reserved_size" and

// "shared_pool_size".

// If the large pool is out of memory, increase the INIT.ORA

// parameter "large_pool_size".

一般出現(xiàn)ORA-04031錯誤可能由兩個原因引起：

1：內(nèi)存中存在大量碎片，導(dǎo)致在分配內(nèi)存的時候，沒有連續(xù)的內(nèi)存可存放，此問題一般是需要在開發(fā)的角度上入手，比如增加綁定變量，減少硬解析來改善和避免；

2.內(nèi)存容量不足，需要擴(kuò)大內(nèi)存。

這臺機(jī)器分配的物理內(nèi)存為8G，結(jié)果檢查發(fā)現(xiàn)SGA只分配了1168M，不到2G，瞬時碉堡了。此時真是很無語。ASH Report分析宕機(jī)前后的Buffer Cache和Shared Pool大小如下所示。

查看跟蹤文件，可以看到SGA: allocation forcing component growth等待事件，可以確認(rèn)的是由于SGA無法增長導(dǎo)致，也就是SGA被撐爆了，結(jié)合ASH Report我們可以看到當(dāng)時Shared Pool的大小已經(jīng)接近SGA的69.6%大小。

SO: 0xa617d9c0, type: 4, owner: 0xa8a26c68, flag: INIT/-/-/0x00? (session) sid: 932 trans: (nil), creator: 0xa8a26c68, flag: (51) USR/- BSY/-/-/-/-/-??????????? DID: 0001-000A-00000003, short-term DID: 0000-0000-00000000??????????? txn branch: (nil)??????????? oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS? last wait for?'SGA: allocation forcing component growth' blocking sess=0x(nil) seq=51324 wait_time=10714 seconds since wait started=0????????? =0, =0, =0? Dumping Session Wait History?? for?'SGA: allocation forcing component growth' count=1 wait_time=10714????????? =0, =0, =0?? for?'SGA: allocation forcing component growth' count=1 wait_time=10512????????? =0, =0, =0?? for?'latch: shared pool' count=1 wait_time=892????????? address=600e7320, number=d6, tries=0?? for?'latch: shared pool' count=1 wait_time=28????????? address=600e7320, number=d6, tries=0?? for?'latch: shared pool' count=1 wait_time=51????????? address=600e7320, number=d6, tries=0?? for?'latch: shared pool' count=1 wait_time=114????????? address=600e7320, number=d6, tries=0?? for?'latch: shared pool' count=1 wait_time=120????????? address=600e7320, number=d6, tries=0?? for?'latch: library cache' count=1 wait_time=33????????? address=a3fa46e8, number=d7, tries=1

結(jié)合上面的一些分析，可以斷定SGA的不合理設(shè)置導(dǎo)致shared pool的內(nèi)存被全部耗盡，SGA被撐爆了。于是調(diào)整SGA的參數(shù)才是解決問題的正確對策。另外考慮到這個數(shù)據(jù)庫也正常運(yùn)行了較長一段時間，也分析了一下awr、addm報告，發(fā)現(xiàn)系統(tǒng)的硬解析相當(dāng)嚴(yán)重。另外通過下面腳本觀察了一段時間shared pool的變化，發(fā)現(xiàn)其收縮、增長較頻繁。

?SELECT start_time, ?????? component, ?????? oper_type, ?????? oper_mode, ?????? initial_size / 1024 / 1024 "INITIAL", ?????? final_size / 1024 / 1024?? "FINAL", ?????? end_time FROM?? v$sga_resize_ops WHERE? component IN ( 'DEFAULT buffer cache', 'shared pool' ) ?????? AND status = 'COMPLETE'?ORDER? BY start_time, ????????? component;

這個可以通過設(shè)置數(shù)據(jù)庫參數(shù)SHARED_POOL_SIZE，保證SHARED_POOL_SIZE大小不會由于內(nèi)存緊張而低于這個大小，另外可以設(shè)置SGA resize的時間間隔

ALTER SYSTEM SET “_memory_broker_stat_interval”=n SCOPE=SPFILE;

問題雖然解決了，但是真正需要反思的是為什么這個SGA_MAX_SIZE設(shè)置為1168M大小的事情！而且沒有在巡檢當(dāng)中被發(fā)現(xiàn)。

參考資料：

http://blog.csdn.net/wenzhongyan/article/details/29866845

http://blog.chinaunix.net/uid-20802110-id-4188357.html

http://www.oraclefreebase.com/blog/2015/10/%E6%95%B0%E6%8D%AE%E5%BA%93ora-4031%E6%95%B0%E6%8D%AE%E5%BA%93crash/

總結(jié)

以上是生活随笔為你收集整理的ORA-04031错误导致宕机案例分析的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： spring mvc(注解)上传文件的简
下一篇： log4j源码阅读

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

ORA-04031错误导致宕机案例分析

總結(jié)