oracle动态采样超时,解决 ORACLE 11.2 动态采样导致的性能问题
【賽迪網報道】我們知道動態采樣一般在沒有統計信息的時候生效,但我們表都有最新的統計信息。為什么會這樣呢?BUG 就算是level8的采樣,也不過千百個block,肯定不準確,這個問題是發生在
我經過一番精心的準備之后,終于把三個生產數據倉庫全部升級為11.2.0.1,并打上了最新的CPU patch。一般經歷一次大的version變更后,都會發生點問題。尤其是在DW系統里面,海量數據的處理,稍不注意,就會有application team來抱怨什么job又forever了… …
升級完后12個小時,登錄系統開始檢查,JOB已經跑了2個小時了,根據等待事件還在讀data file。平常這個JOB只需要2分鐘。看執行計劃,COST高達154K。JOIN ORDER 明顯有問題,其中一個表是80GB大小的事實表,另外兩個表是很小的維度表。當前就是是FACT表先和一個維度表JOIN后,再和另外一個表進行JOIN。
SELECT DISTINCT SALES_ORD_TRX.SALES_ORD_NO,
SALES_ORD_TRX.SALES_ORD_ITEM_NO,
SALES_ORD_TRX.COMPANY_KEY,
SALES_ORD_TRX.SALES_ORD_DATE_KEY,
SAP_MATL_DOC_HDR_TRX.POSTING_DATE_IN_THE_DOC
FROM ODS.SALES_ORD_TRX,
ODSSTAGE.SAP_MATL_DOC_HDR_TRX,
ODSSTAGE.SAP_MATL_DOC_DETL_TRX
WHERE SAP_MATL_DOC_DETL_TRX.MVT_TYPE_CODE IN ('Y79', 'Y80')
AND SAP_MATL_DOC_DETL_TRX.SPL_STK_IND = 'W'
AND SAP_MATL_DOC_HDR_TRX.MATL_DOC_NO = SAP_MATL_DOC_DETL_TRX.MATL_DOC_NO
AND SAP_MATL_DOC_HDR_TRX.MATL_DOC_YEAR =
SAP_MATL_DOC_DETL_TRX.MATL_DOC_YEAR
AND SAP_MATL_DOC_HDR_TRX.REF_DOC_NO = SALES_ORD_TRX.D_DELIV_NO
---------------------------------------------------
| Id | Operation | Name |
Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
TQ |IN-OUT| PQ Distrib |
-------------------------------------------------
| 0 | SELECT STATEMENT | |
1010K| 83M| | 154K (1)| 00:35:59 | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 |
1010K| 83M| | 154K (1)| 00:35:59 | | | Q1,02 | P->S |
QC (RAND) |
| 3 | SORT UNIQUE | | 1010K|
83M| 100M| 154K (1)| 00:35:59 | | | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | | | | | | | | Q1,02 | PCWP | |
| 5 | PX SEND HASH | :TQ10001 | | | | | | | | Q1,01 | P->P | HASH |
| 6 | NESTED LOOPS | | | | | | | | | Q1,01 | PCWP | |
| 7 | NESTED LOOPS | | 1010K| 83M| | 154K (1)| 00:35:59 | | | Q1,01 | PCWP | |
|* 8 | HASH JOIN | | 44437 | 2343K| | 5629 (1)| 00:01:19 | | | Q1,01 | PCWP | |
| 9 | PX BLOCK ITERATOR | | 51443 | 1657K| | 98 (2)| 00:00:02 | | | Q1,01 | PCWC | |
|* 10 | TABLE ACCESS FULL | SAP_MATL_DOC_HDR_TRX | 51443 | 1657K| | 98 (2)| 00:00:02 | | | Q1,01 | PCWP | |
| 11 | BUFFER SORT | | | | | | | | | Q1,01 | PCWC | |
| 12 | PX RECEIVE | | 44437 | 911K| | 5530 (1)| 00:01:18 | | | Q1,01 | PCWP | |
| 13 | PX SEND BROADCAST | :TQ10000 | 44437 | 911K| | 5530 (1)| 00:01:18 | | | Q1,00 | P->P | BROADCAST |
| 14 | PX BLOCK ITERATOR | | 44437 | 911K| | 5530 (1)| 00:01:18 | | | Q1,00 | PCWC | |
|* 15 | TABLE ACCESS FULL | SAP_MATL_DOC_DETL_TRX | 44437 | 911K| | 5530 (1)| 00:01:18 | | | Q1,00 | PCWP | |
|* 16 | INDEX RANGE SCAN | GX_SALES_ORD_TRX_09 | 23 | | | 3 (0)| 00:00:01 | | | Q1,01 | PCWP | |
| 17 | TABLE ACCESS BY GLOBAL INDEX ROWID| SALES_ORD_TRX | 23 | 759 | | 6 (0)| 00:00:01 | ROWID | ROWID | Q1,01 | PCWP | |
--------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
8 - access("SAP_MATL_DOC_HDR_TRX"."MATL_DOC_NO"="SAP_MATL_DOC_DETL_TRX"."MATL_DOC_NO" AND
"SAP_MATL_DOC_HDR_TRX"."MATL_DOC_YEAR"="SAP_MATL_DOC_DETL_TRX"."MATL_DOC_YEAR")
10 - filter("SAP_MATL_DOC_HDR_TRX"."REF_DOC_NO" IS NOT NULL)
15 - filter("SAP_MATL_DOC_DETL_TRX"."SPL_STK_IND"='W' AND ("SAP_MATL_DOC_DETL_TRX"."MVT_TYPE_CODE"='Y79' OR "SAP_MATL_DOC_DETL_TRX"."MVT_TYPE_CODE"='Y80'))
16 - access("SAP_MATL_DOC_HDR_TRX"."REF_DOC_NO"="SALES_ORD_TRX"."D_DELIV_NO")
Note
-----
- dynamic sampling used for this statement (level=8)
首先我懷疑11.2.0.1的優化器改進,導致這個JOB水土不服,立馬拿出hint /*+ optimizer_features_enable('10.2.0.4') */。 SQL PLAN立刻改變成和我預想的一樣,兩個小表先JOIN ,然后再和事實JION.
-------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 85235 | 7241K| | 18163 (1)| 00:04:15 | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 85235 | 7241K| | 18163 (1)| 00:04:15 | | | Q1,02 | P->S | QC (RAND) |
| 3 | HASH UNIQUE | | 85235 | 7241K| 16M| 18163 (1)| 00:04:15 | | | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | 23 | 759 | | 6 (0)| 00:00:01 | | | Q1,02 | PCWP | |
| 5 | PX SEND HASH | :TQ10001 | 23 | 759 | | 6 (0)| 00:00:01 | | | Q1,01 | P->P | HASH |
| 6 | TABLE ACCESS BY GLOBAL INDEX ROWID| SALES_ORD_TRX | 23 | 759 | | 6 (0)| 00:00:01 | ROWID | ROWID | Q1,01 | PCWP | |
| 7 | NESTED LOOPS | | 85235 | 7241K| | 18162 (1)| 00:04:15 | | | Q1,01 | PCWP | |
|* 8 | HASH JOIN | | 3749 | 197K| | 5629 (1)| 00:01:19 | | | Q1,01 | PCWP | |
| 9 | PX RECEIVE | | 5073 | 104K| | 5530 (1)| 00:01:18 | | | Q1,01 | PCWP | |
| 10 | PX SEND BROADCAST | :TQ10000 | 5073 | 104K| | 5530 (1)| 00:01:18 | | | Q1,00 | P->P | BROADCAST |
| 11 | PX BLOCK ITERATOR | | 5073 | 104K| | 5530 (1)| 00:01:18 | | | Q1,00 | PCWC | |
|* 12 | TABLE ACCESS FULL | SAP_MATL_DOC_DETL_TRX | 5073 | 104K| | 5530 (1)| 00:01:18 | | | Q1,00 | PCWP | |
| 13 | PX BLOCK ITERATOR | | 51443 | 1657K| | 98 (2)| 00:00:02 | | | Q1,01 | PCWC | |
|* 14 | TABLE ACCESS FULL | SAP_MATL_DOC_HDR_TRX | 51443 | 1657K| | 98 (2)| 00:00:02 | | | Q1,01 | PCWP | |
|* 15 | INDEX RANGE SCAN | GX_SALES_ORD_TRX_09 | 23 | | | 3 (0)| 00:00:01 | | | Q1,01 | PCWP | |
--------------------------------------------------------------
客戶利用這個hint,完成了當前的JOB。
為了深入研究這個問題,我繼續分析。問題出在SAP_MATL_DOC_DETL_TRX上,在第一個計劃中顯示的cardinality是44437,第二個計劃中只有5073.難道11g對多表JOIN的基數算法有大的改進?
我又使用了一個hint /*+ LEADING(SAP_MATL_DOC_HDR_TRX SAP_MATL_DOC_DETL_TRX SALES_ORD_TRX) */, 這個hint強制SQL按我的JOIN ORDER進行。SQL在2分鐘類完成。在這個執行計劃里面,SAP_MATL_DOC_DETL_TRX的cardinality仍然只有5073。我并沒有改變優化器feature,那說明不是優化器升級帶來的問題。
- 我在回顧第一個執行計劃,一個非常重要的CLUE被我忽視了:dynamic sampling used for this statement (level=8)
原來動態采樣生效了!
為了驗證這個問題,我加入hint /*+ DYNAMIC_SAMPLING(SAP_MATL_DOC_DETL_TRX 0) */, 執行計劃恢復正常,并且SQL在2分鐘內完成。
------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 115K| 9797K| | 22590 (1)| 00:05:17 | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 115K| 9797K| | 22590 (1)| 00:05:17 | | | Q1,02 | P->S | QC (RAND) |
| 3 | SORT UNIQUE | | 115K| 9797K| 11M| 22590 (1)| 00:05:17 | | | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | | | | | | | | Q1,02 | PCWP | |
| 5 | PX SEND HASH | :TQ10001 | | | | | | | | Q1,01 | P->P | HASH |
| 6 | NESTED LOOPS | | | | | | | | | Q1,01 | PCWP | |
| 7 | NESTED LOOPS | | 115K| 9797K| | 22588 (1)| 00:05:17 | | | Q1,01 | PCWP | |
|* 8 | HASH JOIN | | 5073 | 267K| | 5629 (1)| 00:01:19 | | | Q1,01 | PCWP | |
| 9 | PX RECEIVE | | 5073 | 104K| | 5530 (1)| 00:01:18 | | | Q1,01 | PCWP | |
| 10 | PX SEND BROADCAST | :TQ10000 | 5073 | 104K| | 5530 (1)| 00:01:18 | | | Q1,00 | P->P | BROADCAST |
| 11 | PX BLOCK ITERATOR | | 5073 | 104K| | 5530 (1)| 00:01:18 | | | Q1,00 | PCWC | |
|* 12 | TABLE ACCESS FULL | SAP_MATL_DOC_DETL_TRX | 5073 | 104K| | 5530 (1)| 00:01:18 | | | Q1,00 | PCWP | |
| 13 | PX BLOCK ITERATOR | | 51443 | 1657K| | 98 (2)| 00:00:02 | | | Q1,01 | PCWC | |
|* 14 | TABLE ACCESS FULL | SAP_MATL_DOC_HDR_TRX | 51443 | 1657K| | 98 (2)| 00:00:02 | | | Q1,01 | PCWP | |
|* 15 | INDEX RANGE SCAN | GX_SALES_ORD_TRX_09 | 23 | | | 3 (0)| 00:00:01 | | | Q1,01 | PCWP | |
| 16 | TABLE ACCESS BY GLOBAL INDEX ROWID| SALES_ORD_TRX | 23 | 759 | | 6 (0)| 00:00:01 | ROWID | ROWID | Q1,01 | PCWP | |
我們知道動態采樣一般在沒有統計信息的時候生效,但我們表都有最新的統計信息。為什么會這樣呢?BUG 就算是level8的采樣,也不過千百個block,肯定不準確。
經過查詢metalink,并且和ORACLE support溝通以后,確認bug
Bug 9272549 - User statistics are ignored when dynamic sampling occurs 9272549.8
解決方案 關閉動態采樣
在 12.1 版本中修復 , GOD!
順便貼上10個level的動態采樣介紹
Level 0: Do not use dynamic sampling.
Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of dynamic sampling blocks (32).
Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.
Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.
Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of dynamic sampling blocks respectively.
Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.
總結
以上是生活随笔為你收集整理的oracle动态采样超时,解决 ORACLE 11.2 动态采样导致的性能问题的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: oracle 删除用户 递归,ORACL
- 下一篇: oracle 11 导入到 10,ora