Oracle ORA-04031 错误 说明
?
?????? 在 老熊 的Blog上看到他們寫的有關ORA-04031的文章,轉到blog。
?
老熊的Blog:
http://www.laoxiong.net/an-ora-04031-case.html
?
?????? ORA-04031這個錯誤,幾乎每一個專業的DBA都遇到過。這是一個相當嚴重的錯誤,Oracle進程在向SGA申請內存時,如果申請失敗,則會報這個錯誤。大部分情況下是在向SGA中的shared pool申請內存時失敗,而少有向large pool等池中申請內存失敗。比如下面的報錯:
?
Wed Apr 27 16:00:25 2011
Errors in file /oracle/app/oracle/admin/zxin/bdump/zxin1_ora_2052294.trc:
ORA-04031: unable to allocate 4128 bytes of shared memory
("shared pool","unknown object","sga heap(3,0)","kgllk hash table")
?
這里很清楚地表示出來,是在向shared pool申請內存時失敗。
?
shared pool內存申請(分配)失敗,通常有如下的幾種可能:
?????? (1)shared pool過小,比如在SGA Manual Management方式下,shared pool設置過小。比如一套數千連接的大系統,shared pool只設置了幾百M。這種情況下,要解決問題很解單,增加shared pool的大小即可。
?????? (2)應用沒有使用綁定變量,硬解析非常多,導致shared pool內存碎片嚴重,分配大塊內存時不能獲得連續的內存空間。硬解析多的一個變種是雖然使用了綁定變量,但是由于某種原因,Cursor不能共享,導致Child Cursor非常多。實際上,如果shared pool較大(比如數GB大小),這種問題還是很少出現的,并且出現也通常出現在申請大塊內存時。這種情況如果使用alter system flush shared_pool可以暫時緩解問題。但是這條命令又通常不適用于shared pool較大而且比較繁忙的系統。使用綁定變量
?????? (3) Cache的cursor很多,同時cursor_space_for_time這一參數設置為TRUE,可能會使shared pool碎片化嚴重,導致不能分配到大塊的連續內存。
?????? (4)Oracle的BUG導致內存泄露,比如在一些版本中查詢v$segment_statistics這樣的視圖導致內存泄露,使shared pool內存耗光。同樣的情形還有類似于“obj stat memory”,”gcs resources”,”ges resources”等。通常這類內存為perm類型(permanet),這類內存通常是在分配時就確定了固定的用途,不能用于其他用途,因此極容易產生碎片。
?????? (5)Oracle從9i開始,根據shared pool的大小將shared pool分為多個子池(subpool),每個子池有獨立的free list,同時在分配時單獨管理(有其獨立 的shared pool latch)。Oracle的BUG或者說是內存分配策略缺陷導致某一類shared pool的內存分配只在一個子池(subpool)中,即多個子池的使用極不均衡,導致向那個使用得最多的子池申請內存時失敗。報錯信息中的”sga heap(3,0)”即指明是在第3個子池申請內存時失敗。本文案例中的ORA-04031錯誤其產生的原因可以歸結為Oracle對shared pool的分配/使用策略問題。
?????? (6)操作系統內存不足,這只會出現在shared pool的使用還沒有達到最大值時才會出現,并且在操作系統都有swap的情況下,只有部分操作系統才可能有這種情況,比如在HP-UX下,reserved 內存過多導致swap滿。
?????? (7)其他原因,多數是因為BUG。請參考下面提及的MOS參考文檔。
?
?????? 本文中的案例,其數據庫是運行在AIX 5.3系統中的10.2.0.4 RAC,RAC節點數為2。數據庫是從9i升級到10g,而目前處于正式升級前的測試階段。數據庫報的ORA-04031錯誤信息如本文前面所示(其中的數據庫名稱已經做了處理)。
?
在繼續講解案例之前,不得不提到MOS上的幾篇關于ORA-04031錯誤的文檔:
?????? (1)Master Note for Diagnosing ORA-4031 [ID 1088239.1]
?????? (2)Diagnosing and Resolving Error ORA-04031 on the Shared Pool or Other Memory Pools [Video] [ID 146599.1]
?????? (3)Interpreting the automatically generated ORA-4031 diagnostic trace. [ID 809560.1]
?????? (4)Troubleshooting and Diagnosing ORA-4031 Error [Video] [ID 396940.1]
?????? (5)ORA-4031 Common Analysis/Diagnostic Scripts [Video] [ID 430473.1]
?
其實分析ORA-04031錯誤,通常有以下幾個要點:
?????? (1)判斷錯誤發生所有的內存區域,是shared pool,large pool還是streams pool等。這個很容易從錯誤信息中判斷出來,本文主要描述shared pool的ORA-04031錯誤,這也是最常見的。
?????? (2)檢查Shared Pool的總大小以及free memory的大小。如果free memory看上去挺多,以subpool為單位檢查是否存在是由于碎片導致沒有足夠的連續內存以供分配,特別是關注報錯信息中提及的子池。
?????? (3)如果Shared Pool相較于系統規模來說足夠大(通常數GB都已經是很大的了),檢查Shared Pool中有沒有占用非常多的內存類型或內存組件,如果有,是什么樣的類型的內存,在各個子池之間是否分布均勻。如果有異常占用較多的內存類型,根據此類型在MOS上搜尋是否是會有相應的BUG引起,或者分析這種類型的內存消耗較多的原因。比如如果是sql area很大,檢查是不是硬解析特別多,或者是不是child cursor特別多引起。
?????? (4)基于以上分析的數據,來判斷shared pool內存分配失敗的原因。
?
?????? 上面的步驟寫得比較粗略,關于分析和解決ORA-04031問題,這里也有一篇不錯的文章:Simplified Approach to Resolve ORA-4031
?
?????? 這里關鍵的是分析Shared Pool的內存數據。ORA-04031錯誤發生后如果有條件可以馬上連接到數據庫中查詢相應的x$表和v$視圖得到相應的數據,否則只能通過ORA-4031錯誤發生時產生的trace文件。_4031_dump_bitvec這個隱含參數用于控制發生ORA-04031錯誤時對SGA的dump行為,而trace文件的分析就不像使用SQL那樣簡單了。
?
下面再來詳細地分析案例:
?????? 從錯誤信息來看,很顯然,是向shared pool的第3個subpool申請內存時出錯。以下的數據是shared pool的數據:
?
SQL> select sum(bytes)/1024/1024 mb from v$sgastat where pool='shared pool';
MB
----------
4096.53062
?
SQL> SELECT KSMCHCLS CLASS, COUNT(KSMCHCLS) NUM, SUM(KSMCHSIZ)
? 2? SIZ,
? 3? To_char( ((SUM(KSMCHSIZ)/COUNT(KSMCHCLS)/1024)),'999,999.00')|
? 4? |'k' "AVG SIZE"
? 5? FROM X$KSMSP GROUP BY KSMCHCLS;
?
CLASS????? NUM???? SIZ AVG?? ?SIZE
-------- ---------- ---------- ------------
R-freea???????? 512????? 24576???????? .05k
freeabl????? 807395 1643969848??????? 1.99k
recr???????? 530728? 662065240??????? 1.22k
R-free????????? 256? 214910976????? 819.82k
free????????? 43063? 100605496??????? 2.28k
perm??????????? 140 1673368632?? 11,672.49k
?
注意:
?????? 在生產庫上查詢X$KSMSP時,要看下系統的繁忙或者說是負載高低,因為可能會導致db hang 住。
?
?
雖然free的數量不是太多,但是freeable的數量還是很多的。
下面是各個子池更詳細的數據:
?
SQL> SELECT KSMCHIDX,KSMCHDUR, KSMCHCLS CLASS, COUNT(KSMCHCLS) NUM, SUM(KSMCHSIZ)
? 2? SIZ,
? 3? To_char( ((SUM(KSMCHSIZ)/COUNT(KSMCHCLS)/1024)),'999,999.00')|
? 4? |'k' "AVG SIZE"
? 5? FROM X$KSMSP GROUP BY KSMCHIDX,KSMCHDUR, KSMCHCLS
? 6? order by 1,2,3;
?
KSMCHIDX?? KSMCHDUR CLASS?????? NUM??????? SIZ AVG SIZE
---------- ---------- -------- ---------- ---------- ------------
???????? 1????????? 1 R-free?????????? 27?? 22666392????? 819.82k
????????????????????? R-freea????????? 54?????? 2592???????? .05k
????????????????????? free???????????? 26????? 14024???????? .53k
????????????????????? perm???????????? 32? 430299448?? 13,131.70k
??????????????????? 2 R-free?????????? 12?? 10073952????? 819.82k
????????????????????? R-freea????????? 24?????? 1152???????? .05k
????????????????????? free????????? 10531 ??24519112??????? 2.27k
????????????????????? freeabl?????? 44922?? 32457736???????? .71k
????????????????????? recr???????? 163177? 134273584???????? .80k
??????????????????? 3 R-free??????????? 9??? 7555464????? 819.82k
????????????????????? R-freea???? ?????18??????? 864???????? .05k
????????????????????? free?????????? 1678??? 4555704??????? 2.65k
????????????????????? freeabl?????? 79815? 102514024??????? 1.25k
????????????????????? recr????????? 32689?? 36368096??????? 1.09k
??????????????????? 4 R-free?????????? 20?? 16789920????? 819.82k
????????????????????? R-freea????????? 40?????? 1920???????? .05k
????????????????????? free?????????? 2182??? 5810056??????? 2.60k
????????????????????? freeabl?????? 66235? 254656184??????? 3.75k
????????????????? ????recr????????? 16245?? 58284480??????? 3.50k
???? ????2????????? 1 R-free?????????? 25?? 20987400????? 819.82k
????????????????????? R-freea????????? 50?????? 2400???????? .05k
????????????????????? free???????????? 23????? 20016???????? .85k
????????? ????????????perm???????????? 35? 398418384?? 11,116.58k
??????????????????? 2 R-free??????????? 4??? 3357984????? 819.82k
????????????????????? R-freea?????????? 8??????? 384???????? .05k
????????????????????? free?????????? 5137??? 6604176??????? 1.26k
? ????????????????????freeabl?????? 40377?? 12140944???????? .29k
????????????????????? recr????????? 54942?? 45005024???????? .80k
??????????????????? 3 R-free??????????? 9??? 7555464????? 819.82k
????????????????????? R-freea????????? 18??????? 864??????? ?.05k
????????????????????? free?????????? 1477??? 5524568??????? 3.65k
????????????????????? freeabl?????? 79548? 101879808??????? 1.25k
????????????????????? recr????????? 32380?? 36033448??????? 1.09k
??????????????????? 4 R-free?????????? 21?? 17629416????? 819.82k
????????????????????? R-freea????????? 42?????? 2016???????? .05k
????????????????????? free?????????? 2540??? 7092424??????? 2.73k
????????????????????? freeabl?????? 70133? 270332800??????? 3.76k
????????????????????? recr????????? 15924?? 57263032??????? 3.51k
?? ??????3????????? 1 R-free?????????? 26?? 21826896????? 819.82k
????????????????????? R-freea????????? 52?????? 2496???????? .05k
????????????????????? free???????????? 26????? 20520???????? .77k
????????????????????? perm????????? ???33? 414355416?? 12,261.94k
??????????????????? 2 R-free??????????? 4??? 3357984????? 819.82k
????????????????????? R-freea?????????? 8??????? 384???????? .05k
????????????????????? free?????????? 4693??? 7053032??????? 1.47k
????????????????????? freeabl?????? 49723?? 14339800???????? .28k
????????????????????? recr????????? 52771?? 42357312???????? .78k
??????????????????? 3 R-free?????????? 11??? 9234456????? 819.82k
????????????????????? R-freea????????? 22?????? 1056???????? .05k
??????????????????? ??free?????????? 3594??? 9280904??????? 2.52k
????????????????????? freeabl?????? 95823? 121934488??????? 1.24k
????????????????????? recr????????? 39643?? 44097504??????? 1.09k
??????????????????? 4 R-free?????????? 25?? 20987400????? 819.82k
??????????? ??????????R-freea????????? 50?????? 2400???????? .05k
????????????????????? free?????????? 2822??? 7291680??????? 2.52k
????????????????????? freeabl?????? 84443? 323149712??????? 3.74k
????????????????????? recr????????? 19148?? 67997008??????? 3.47k
??? ?????4????????? 1 R-free?????????? 27?? 22666392????? 819.82k
????????????????????? R-freea????????? 54?????? 2592???????? .05k
????????????????????? free???????????? 26????? 18088???????? .68k
????????????????????? perm???????????? 40? 430295384?? 10,505.26k
??????????????????? 2 R-free??????????? 6??? 5036976????? 819.82k
????????????????????? R-freea????????? 12??????? 576???????? .05k
????????????????????? free?????????? 4818?? 11473920??????? 2.33k
????????????????????? freeabl?????? 46089?? 39963224? ???????.85k
????????????????????? recr????????? 54061?? 44188072???????? .80k
??????????????????? 3 R-free??????????? 9??? 7555464????? 819.82k
????????????????????? R-freea????????? 18??????? 864???????? .05k
????????????????????? free?????????? 1427??? 4109504??????? 2.81k
????????????????????? freeabl?????? 80026? 102379080??????? 1.25k
????????????????????? recr????????? 33217?? 36949240??????? 1.09k
??????????????????? 4 R-free?????????? 21?? 17629416????? 819.82k
????????????????????? R-freea???????? ?42?????? 2016???????? .05k
????????????????????? free?????????? 1993??? 6228464??????? 3.05k
????????????????????? freeabl?????? 70581? 269087136??????? 3.72k
????????????????????? recr????????? 16614?? 59372656??????? 3.49k
?
?????? KSMCHDUR是什么意思?在9i里面這個列的值通常為1。實際上,Oracle從9i開始,將shared pool劃分為多個sub pool。而在10g以上的版本中(具體開始的版本號已經不記得),每個sub pool又分了4個更小的池,我們暫且稱之為mini heap。每個mini heap有其自己的free list。KSMCHDUR這一列即表示mini heap的編號。”heap(3,0)”中的0是指第1個mini heap。
?
?????? 在上面的數據中,可以看到這個子池的第1個mini heap的free已經很少,只有10來K。另外,我們可以觀察到,perm類型的內存塊只存在于每個sub pool的第1個min heap中。這個是一個重點,在后面會有解釋。
?
?????? 這里本應該有通過查詢v$sgastat得到shared pool的各個組件占用的內存分布,只是寫BLOG時找不到了….但是我們可以在trace文件中找到數據,下面只列出sub pool 3的數據:
?
==============================
Memory Utilization of Subpool 3
================================
???? Allocation Name????????? Size
_________________________? __________
"free memory????????????? "??? 81466568
"miscellaneous??????????? "?????????? 0
"dpslut_kfdsg???????????? "???????? 512
"trace buffer???????????? "????? 737280
"trace_knlasg???????????? "????? ???504
"gcs res hash bucket????? "???? 1048576
"gcs res latch table????? "?????? 12288
"evaluation con?????????? "?????????? 0
"sql area???????????????? "?? 344545752
"UNDO STAT INFO?????????? "?????? 59904
"txncallback????????????? "????? 141744
"transaction????????????? "???? 2103264
"ges resource pools?????? "??????? 3968
"sessions???????????????? "???? 4526488
"dlo fib struct?????????? "????? 128032
"KJCTS process batching st"???????? 240
"row cache??????????????? "??????? 3272
"KCB where statistics arra"?????? 25888
"KCB buffer wait statistic"?????? 32000
"KCB incremental ckpt entr"???????? 512
"invalid low rba queue??? "??????? 1024
"table definiti?????????? "????? 108704
"temporary tabl?????????? "??????? 4136
"KCL instance cache transf"????? 131072
"resumable??????????????? "??????? 2720
"KESTB existence bitvec se"???????? 128
"type object de?????????? "????? 392448
"enqueue_hash???????????? "????? 318960
"KSXR pending consumption "?????? 20192
"KTI SGA freeable small po"?????????? 0
"trigger defini? ?????????"????? 885472
"trigger source?????????? "?????? 99264
"trigger inform?????????? "???????? 960
"KTCN: Obj Invalidation Se"??????? 2336
"kmgsb circular statistics"????? 108800
"kgl lock hash table state"?????? 45360
"kglsim size of pinned mem"????? ??8024
"kelr system metrics table"???????? 280
"kzctxgjsi ksuseclid memor"????? 117360
"kzctxgjci ksuseclid? memo"?????????? 0
"CCursor????????????????? "??? 95912048
"ksr message pool free que"????? 188960
"ksb ci process list (each"???????? 144
"ksunfy: nodes of hierarch"???????? 320
"ksuloi: long op free list"???????? 256
"kwqmncal: allocate buffer"??????? 4048
"ksim group query request "?????????? 0
"ksuxds ksuseclid? memory "?????????? 0
"call???????????????????? "?????? 87304
"dictionary cache??????? ?"?????????? 0
"KSXR pending reply queue "????? 255488
"hng: All sessions data fo"?????????? 0
"ksfv subheap descriptor? "???????? 184
"gcs resources??????????? "?? 169082312
"gcs affinity???????????? "??????? 8320
"gcs opaque in??????????? "?????? 12312
"PCursor????????????????? "??? 50743128
"ges resource???????????? "????? 539376
"fdrec_kffsg????????????? "????????? 24
"work area tab??????????? "?????? 80640
"kglsim main lru count??? "?????? 38400
"plwpil:wa??????????????? "??????? 4264
"grptab_kfgsg??? ?????????"??????? 2464
"AW SGA?????????????????? "????????? 40
"KEWS sesstat seg tbl???? "?????????? 8
"kebm slave descriptors?? "??????? 1456
"kglsim hash table bkts?? "???? 1048576
"KSXR global channels???? "??????? 1288
"ges enqueues???????????? "??? 17333720
"PLS chunk??????????????? "???????? 352
"KSQ event description??? "??????? 1440
"KESTB existence bitvec?? "??????? 4096
"gcs shadows????????????? "?? 101246344
"qmtb_init_data?????????? "???????? 224
"Core dump directory????? "???????? 264
"sort segment handle????? "??????? 7480
"SERVICE NAME ENTRY?????? "????????? 48
"PQ/BizCard?????????????? "??????? 1536
"qtree_kwqbspse?????????? "????????? 40
"latch descriptor table?? "??????? 1576
"recovery domain????????? "?????? 29856
"parameters????????????? ?"?????? 30056
"SHARED SERVERS INFO????? "???????? 240
"qtree_kwqbsgn??????????? "????????? 40
"post agent?????????????? "?????????? 0
"pspool_kfsg????????????? "????????? 80
"plwsppwp:wa????????????? "?????????? 0
"PL/SQL DIANA???????????? "??? 14050624
"segmented arrays???????? "??????? 2072
"Checkpoint queue???????? "???? 4097024
"sim lru segments???????? "??????? 2560
"sim segment hits???????? "??????? 2560
"sim state object???????? "????????? 40
"partitioning d?????????? "????? 199616
"ASH buffers???? ?????????"???? 8388608
"message pool freequeue?? "????? 276336
"PL/SQL MPCODE??????????? "???? 4499360
"PL/SQL PPCODE??????????? "???? 3984944
"procs: ksunfy??????????? "???? 1512000
"primem_kfmdsg??????????? "??????? 1032
"SYSTEM PARAMETERS??????? "????? ?76624
"object queue hash buckets"????? 262656
"object queue hash table d"??????? 7552
"object level stats hash t"???????? 512
"object stat dummy statprv"???????? 144
"sim cache sizes????????? "???????? 320
"logout storm management? "?????? 24000
"pl/sql source??????????? "?????? 21256
"sys event stats????????? "????? 199136
"parameter handle???????? "?????? 67896
"Parameter Handle???????? "??????? 1656
"channel handle?????????? "????? 828672
"API blockers array?????? "????????? 64
"PARAMETER TABLE???????? ?"??????? 2048
"PARAMETER ENTRY????????? "?????????? 8
"LGWR post requested array"????????? 24
"bloom filter???????????? "??????? 3104
"param hash values??????? "??????? 5984
"sql area:PLSQL?????????? "??? 11477776
"PX subheap desc????????? "???????? 256
"repository?????????????? "????? 213544
"sql area:KOKA??????????? "?????? 16192
"archive_lag_target?????? "??????? 9624
"state objects??????????? "???????? 640
"latch nowait fails or sle"????? 116832
"sched job slv??????????? "??????? 5952
"pso tbs: ksunfy ?????????"????? 390000
"dummy??????????????????? "????? 269928
"Sort Segment???????????? "?????? 37440
"Cursor Stats???????????? "???? 6095760
"Banner Storage?????????? "??????? 2048
"quiescing session??????? "??????? 3872
"API data buffer????????? "????? ????16
"buffer handles?????????? "???? 1020000
"prmtzdini tz region????? "????? 408320
"sga node map???????????? "????????? 16
"savepoints?????????????? "?????????? 0
"Managed Standby Proc Arra"?????? 24576
"OS proc request holder?? "??????? 4664
"db_files???????????????? "????? 416576
"PX server msg stats????? "??????? 2288
"KQR M PO???????????????? "????? 283376
"kks stats??????????????? "????????? 40
"parameter table block??? "????? 483168
"KSFV SGA???????????????? "???????? 824
"plugin datafile array?? ?"?????? 36016
"plwda:PLW_STR_NEW_RVAL?? "????????? 24
"plwspv:PLW_STR_NEW_VAL?? "????????? 16
"KGKP sga???????????????? "????????? 32
"BRANCH TABLE SEGMENTED AR"?????? 70176
"mvobj part des?????????? "????? 306544
"parameter value memory?? "???????? 216
"multiblock re??????????? "?????? 98496
"parameter text value???? "??????? 1080
"parallel_max_servers???? "??????? 8192
"KGLS heap??????????????? "??? 13290800
"KGSKI sga??????????????? "????????? 80
"resize request state obje"????? 368000
"MTTR advisory?? ?????????"???? 1462832
"monitoring co??????????? "?????? 12480
"rules engine aggregate st"??????? 1416
"krbmror????????????????? "?????? 36400
"joxs heap??????????????? "???????? 136
"krbmrsr????????????????? "???????? 152
"ksfqpar????????????????? "????? ??4008
"SGA - SWRF DrvMet Runtime"??????? 2656
"SGA - SWRF Metrics ksuTim"????????? 72
"SGA - SWRF RawMet Runtime"??????? 1408
"SGA - SWRF Metrics WCTime"????????? 32
"SQL Memory Manager Base W"?????? 13400
"change notification regis"??????? 4096
"simulator latch/bucket st"?????? 59392
"Prefetch history buffer? "??????? 2832
"change notification obj m"??????? 4096
"KQR ENQ????????????????? "?????? 16512
"kksss??????????????????? "?????? 16464
"API data buffer length?? "?????????? 0
"kokcd?????????????????? ?"?????????? 0
"kohsg??????????????????? "?????????? 8
"Sequence Background Insta"????????? 88
"ksfqpn?????????????????? "???????? 416
"KGLS SP????????????????? "??????? 4704
"knstsg?????????????????? "????????? 48
"latch classes??????????? "???????? 352
"system default language h"???????? 568
"name-service entry?????? "??????? 2592
"API data buffer array??? "?????????? 0
"kzull??????????????????? "??????? 4096
"kzulu??????????????????? "???????? 392
"kfgsga?????????????????? "???????? 104
"library cache?? ?????????"??? 46604712
"kcrrny?????????????????? "?????? 25320
"spfile cleanup structure "?????? 16760
"xssinfo????????????????? "??????? 5952
"buffer_pool_desc_array?? "??????? 3384
"row cache child latch??? "??????? 3360
"rm request queue link??? "????? ??5320
"SCHEDULING POLICY TABLE? "???????? 160
"namhsh_kfdsg???????????? "??????? 4104
"Closed Thread SCN Bitvec "??????? 8448
"Client ID trace settings "??????? 3872
"osp allocation?????????? "?????? 21104
"os statistics??????????? "??????? 9192
"plwppwp:PLW_STR_NEW_LEN_V"????????? 16
"plwgc: plwgc_garbage_clea"?????????? 0
"plwiiw: kglpql warnings? "?????????? 0
"object queue???????????? "????? 808080
"obj stat memo??????????? "????? 599184
"obj htab chun??????????? "????? 122960
"object level??????????? ?"????? 111888
"XCT XGA????????????????? "?????????? 0
"SGA - SWRF Metric Eidbuf "????? 900840
"Processor group descripto"????????? 64
"Prefetch client count per"????????? 32
"X$SKGXPIA??????????????? "??????? 2680
"simulator hash buckets?? "???? 2101248
"State object subpools??? "???????? 896
"API data buffer length ma"?????????? 0
"AWR Table Info (KEW layer"???????? 872
"character set memory???? "??????? 4856
"sim segment num bufs???? "??????? 1280
"character set object???? "????? 129728
"session idle latches???? "??????? 2560
"qesmmaInitialize:??????? "???????? 112
"returns from remote ops? "?????? 49152
"name-service???????????? "??????? 4080
"SGA - SWRF Metric CHBs?? "?????? 10912
"listener addresses?????? "????????? 32
"db_block_hash_buckets??? "??? 67108864
"KSI resource types?????? "??????? 2704
"kglsim object batch????? "???? 4196304
"trigger condition node?? "????????? 72
"ksws service events????? "?????? 18560
"Heap0: KGL?????????????? "??? 11642128
"fixed allocation callback"???????? 392
"kqlpWrntoStr:value?????? "?????????? 0
"KEWS statistic name????? "???????? 424
"KEWS statistic maps????? "??????? 1096
"KCL partition table????? "????? 131072
"kebm slave message?????? "????????? 88
"kcbl state objects?????? "?????? 12800
"free rm request queue lin"?????????? 0
"xsoqsehift?????????????? "??????? 3104
"DBWR event stats array?? "???????? 192
"kgllk hash table???????? "????? 659456
"event descriptor table?? "???????? 192
"kpssnfy: kpsssgct??????? "????????? 32
"kpscad: kpscscon???????? "??????? 1952
"dbwriter coalesce buffer "???? 3158016
"kglsim hash table??????? "??????? 8208
"gcs resource freelist dyn"???????? 256
"gcs shadow locks dyn seg "???????? 256
"kks stats latch????????? "???????? 160
"KTC latch cleanup??????? "???????? 576
"ges enqueue max. usage pe"????????? 64
"ges lmd process descripto"??????? 2760
"KTU latch cleanup??????? "??????? 2496
"kscdnfyinithead????????? "????????? 16
"X$KSVIT table??????????? "???????? 512
"kqlpaac:value-1????????? "????????? 64
"KCL buffer header??????? "????? 192064
"kxfpdp pointers????????? "?????? 28800
"kodosgi kopfdo?????????? "???????? 104
"kglsim latches?????????? "???????? 136
"TXN TABLE SEGMENTED ARRAY"?????? 54784
"KJCT remote i??????????? "??????? 1640
"KKJ SHRD WRQS??????????? "???????? 288
"KJC dest ctx???????????? "??????? 3560
"kwrsnfy: kwrs??????????? "??????? 1624
"kwqmn:tskdata??????????? "?????????? 0
"KKKI consumer??????????? "??????? 4136
"dbwr suspend/resume ptr a"????????? 16
"dbwr actual working sets "????????? 64
"KGSKI schedule????????? ?"?????????? 0
"temp lob duration state o"??????? 3296
"ges regular msg buffers? "???? 3078008
"jsksncb: 9?????????????? "?????? 28672
"Transportable DB Converte"??????? 2552
"KTU lat struct?????????? "???????? 800
"kks stats hds??????????? "???????? 256
"KSFD SGA I/O b?????????? "???? 4190248
"HTTP fixed headers?????? "????????? 72
"UNDO INFO SEGMENTED ARRAY"????? 649856
"ges process hash table?? "????? 132000
"jsksncb-latch??????????? "??????? 1280
"kfkid hrec?????????????? "????????? 24
"KTCCC OBJECT??? ?????????"?????????? 0
"KTPR HIST TB???????????? "??????? 2808
"KTF MAPPINGS???????????? "?????? 12288
"kksss-heap?????????????? "?????? 35136
"kglsim heap????????????? "???? 3431232
"event statistics per sess"???? 7665280
"eventlist to post commits"????? ????16
?
?????? 從上面的數據可以看到,第3個sub pool中,占用較多的內存是gcs resources、gcs shadows以及sql area。但是沒有明顯的異常。
下面是第3個sub pool中第1個mini-heap中free memory的更詳細數據:
SQL> break on ksmchidx on ksmchdur
SQL> select
? 2??? ksmchidx,ksmchdur,
? 3??? case
? 4????????? when ksmchsiz < 1672 then trunc((ksmchsiz-32)/8)
? 5????????? when ksmchsiz < 4120 then trunc((ksmchsiz+7928)/48)
? 6????????? when ksmchsiz < 8216 then 250
? 7????????? when ksmchsiz < 16408 then 251
? 8????????? when ksmchsiz < 32792 then 252
? 9????????? when ksmchsiz < 65560 then 253
?10????????? when ksmchsiz >= 65560 then 253
?11???? end bucket,
?12??? sum(ksmchsiz)? free_space,
?13??? count(*)? free_chunks,
?14??? trunc(avg(ksmchsiz))? average_size,
?15??? max(ksmchsiz)? biggest
?16? from
?17??? sys.x$ksmsp
?18? where
?19??? inst_id = userenv('Instance') and
?20??? ksmchcls = 'free'
?21? group by
?22??? case
?23????????? when ksmchsiz < 1672 then trunc((ksmchsiz-32)/8)
?24????????? when ksmchsiz < 4120 then trunc((ksmchsiz+7928)/48)
?25????????? when ksmchsiz < 8216 then 250
?26????????? when ksmchsiz < 16408 then 251
?27????????? when ksmchsiz < 32792 then 252
?28????????? when ksmchsiz < 65560 then 253
?29????????? when ksmchsiz >= 65560 then 253
?30???? end ,
?31??? ksmchidx, ksmchdur
?32? order by ksmchidx , ksmchdur
?33? /
?
ksmchidx?? ksmchdur???? bucket free_space free_chunks average_size??? biggest
---------- ---------- ---------- ---------- ----------- ------------ ----------
3???????? 1????????? 5???????? 72?????????? 1?????????? 72???????? 72
?????????????????? 13 ???????136?????????? 1????????? 136??????? 136
?????????????????? 27??????? 248?????????? 1????????? 248??????? 248
???????????????? ??48??????? 416?????????? 1????????? 416??????? 416
?????????????????? 56?????? 1920?????????? 4????????? 480??????? 480
?????????????????? 66?????? 1680?????????? 3????????? 560??????? 560
?????????????????? 68?????? 4608?????????? 8????????? 576??????? 576
?????????????????? 164?????? 1344?????????? 1???????? 1344?????? 1344
?????????????????? 180?????? 1472?????????? 1?? ??????1472?????? 1472
????????????????? 188?????? 1536?????????? 1???????? 1536?????? 1536
????????????????? 190?????? 1552?????????? 1???????? 1552?????? 1552
????????????????? 199?????? 1624?????????? 1???????? 1624?????? 1624
????????????????? 204????? ?1880?????????? 1???????? 1880?????? 1880
????????????????? 207?????? 2032?????????? 1???????? 2032?????? 2032
?
?????? 可以看到,最大的free memory塊才2032字節,而報錯中提到的申請的內存大小為4128字節。由于在第3個sub pool的第1個mini heap中沒有4128字節的連續free memory,所以導致內存申請失敗。
?
?????? 那么這里的問題是,為什么這個mini heap中的free memory那么少?正如前面提及,為什么這個mini heap中的已經使用的類型全是perm類型?這個問題的答案就在于”DURATION”。Oracle在啟用了SGA自動管理的模式下,為了便于在shared pool與buffer cache或其他內存之間動態調整大小,規定了在每一個mini heap中分配內存按照duration來進行。這里duration可以理解為內存塊的持久時間。
?????? perm類型的內存塊,就是分配后不能釋放,只能用于相同組件的重用。比如gcs resources這種組件的內存是perm類型,這種內存被分配后,不能釋放給sql area使用,也不能給gcs shadows使用,只能給其他的gcs resource使用。
?????? 按DURATION分配內存時,perm類型的內存就只能從每個sub pool的第1個mini heap中分配。而其他類型的內存通常在sub pool的第2-4個mini heap中分配。由于perm類型的內存不能釋放,也不能被其他組件的內存重用,所以里面的內存會越用越少,如果沒有了free memory怎么辦?前面說到,這種模式主要是工作在SGA自動管理模式下,如果free memory沒有了,就會從SGA中的其他部分,比如buffer cache中取得memory chunk,加入到缺少內存的mini heap中。正常情況下這種機制沒有問題。
?
?????? 完全使用SGA自動管理有一個缺陷就是,如果應用系統綁定變量做得不好,或者由于BUG,child cursor過多,導致shared pool會變得很大,甚至超過10G,嚴重的比buffer cache還大,另一方面,在buffer cache和shared pool之間頻繁地調整大小,可能會導致嚴重的解析問題和其他性能問題。
?
?????? 針對這個問題,通常有2種解決辦法:一種就是關閉SGA自動管理,即將SGA_TARGET設置為0,以9i的方式來設置shared_pool_size,db_cache_size這些參數,來手動管理SGA;第二種就是sga_target仍然大于0,即自動管理SGA,但是通過設置shared_pool_size,db_cache_size等參數限制這些內存組件的最小大小,而只留給系統極少的自動調整空間。
?
?????? 而出現問題的這套系統,正是使用了第二種方式,開啟了SGA自動調整,但是留給自動調整的空間極少。SGA_TARGET為35G,buffer_cache_size為30G,shared_pool_size為4G,再加上large_pool等組件,幾乎沒有什么可自動調整的余地。這種方式下,就存在了問題。
?
下面來做一個按時間的分析:
?????? (1)時間T1,數據庫啟動,shared pool只消耗了極少量的內存。
?????? (2)時間T2至時間T3,Oracle進程請求shared pool內存,Oracle會向操作系統以指定的粒度為單位(比如16MB)請求物理內存,加入到所請求內存所在的mini heap中。直至shared pool的大小達到shared pool最大容許的大小。這個容許大小由各參數計算而來。比如說SGA_TARGET為10G,其他組件的參數設置后最小值為8G,shared_pool_size的值為1G,但是shared pool的最大容許大小為2G。這個時候,每個sub pool的mini heap的大小已經固定。在到達shared pool最大容許大小這一階段,可能會從buffer cache等組件中占用。
?????? (3)時間T4,Oracle進程請求shared pool內存,這個時候只能從free list或age out內存塊后獲取內存,對于sub pool的第1個mini heap,只能從free list中獲取,因為這個mini heap中的已用內存全是perm,是不能age out的。
?????? (4)時間T5,Oracle進程請求shared sub pool中第1個mini heap的內存,但是free list中已經沒有內存。所以報ORA-04031錯誤。
??????
?????? 在上面的時間點T5那里,如果SGA有較大的自動調整空間,比如說完全沒有限制,即buffer_cache_size等參數很少或為0,這樣在請求第1個mini heap中的內存時,完全可以從buffer cache中占用,這樣的后果是使shared pool越來越大。
?
?????? 而本文案例的ORA-04031,正是由于SGA自動管理,而自動調整的余地又太小,最終使sub pool的第1個mini heap空間用光。當然我們可以分析為什么會用光,這個就顯得更為復雜,這跟數據量、應用系統都有很大的關系。而系統中第1次出現ORA-04031錯誤的進程,是一個job進程,而此后大部分出現的錯誤均是job進程,能檢查job代碼,發現在做大量的表的大量數據的UPDATE操作,這可能是引起gcs shadows和gcs resources大量內存使用的原因。在一套RAC數據庫中,gcs和ges相關的perm內存占用可能會比較大。
?
?????? 那么除了調整應用,應該怎么樣解決這樣問題?這里的解決方法是增加shared_pool_size參數到6G,同時將sga_target設置為0,再重啟。
?????? 而另一種可能的辦法是將參數“_enable_shared_pool_durations”設置為FALSE。這一參數為FALSE,將會使shared pool內存分配時,不再使某一類型的內存(比如perm)必須要求在一個固定的mini heap中。而實際上,sga_target設置為0之后,這一個參數自動會設為FALSE(由于這一參數是靜態參數,所以修改了sga_target之后需要重啟才會使這個隱含參數改變),所以建議的解決辦法是設置sga_target參數,而不建議修改隱含參數。當然還有一種辦法是完全讓Oracle自動管理SGA,將buffer_cache_size和shared_pool_size等參數設置為0,但是正如前面所說,這種方法有比較大的缺陷。
?
?
SGA管理方式由自動管理改為手工管理 ,示例:
alter system set sga_target=0m scope=spfile;
alter system set shared_pool_size=3072m scope=spfile;
alter system set large_pool_size=512m scope=spfile;
alter system set java_pool_size=16m scope=spfile;
alter system set db_cache_size=6400m scope=both;
?
?
?
?
?
-------------------------------------------------------------------------------------------------------
Blog: http://blog.csdn.net/tianlesoftware
Email: dvd.dba@gmail.com
DBA1 群:62697716(滿);?? DBA2 群:62697977(滿)?? DBA3 群:62697850(滿)??
DBA 超級群:63306533(滿);? DBA4 群: 83829929? DBA5群: 142216823???
DBA6 群:158654907? 聊天 群:40132017?? 聊天2群:69087192
--加群需要在備注說明Oracle表空間和數據文件的關系,否則拒絕申請
轉載于:https://www.cnblogs.com/springside-example/archive/2011/06/15/2529719.html
總結
以上是生活随笔為你收集整理的Oracle ORA-04031 错误 说明的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 小D学blend-----优美简洁的mp
- 下一篇: [Ruby][3] 条件语句