Oracle-使用切片删除的方式清理非分区表中的超巨数据
文章目錄
- 概述
- Step1:rowid_chunk.sql
- Step2:文件上傳到Oracle主機(jī)用戶,執(zhí)行@rowid_chunk.sql
- Step2.1: 上傳腳本
- Step2.2: 連接數(shù)據(jù)庫(kù),獲取分片
- Step3: 外鍵校驗(yàn)以及通過(guò)存過(guò)清除分片數(shù)據(jù)
- Step3.1: 外鍵校驗(yàn)
- Step3.2: 根據(jù)分片清除過(guò)期數(shù)據(jù)
- Step3.3:FORALL和BULK COLLECT知識(shí)點(diǎn)
概述
大表中海量歷史數(shù)據(jù)的更新與刪除一直是一件非常頭痛的事情,在表已經(jīng)分區(qū)的前提下我們還可以利用并行或者truncate parition等手段來(lái)為UPDATE或者DELETE提速, 但是如果對(duì)象是普通的非分區(qū)對(duì)表(non-partitioned heap table)的話,似乎就沒(méi)有太好的加速方法了, nologging或parallel 對(duì)非分區(qū)表都沒(méi)有效果。
Step1:rowid_chunk.sql
將下面的SQL保存為 rowid_chunk.sql文件
主要功能:將表按照rowid范圍分區(qū),獲得指定數(shù)目的rowid Extent區(qū)間(Group sets of rows in the table into smaller chunks), 以便于非分區(qū)表利用rowid來(lái)實(shí)現(xiàn)并行刪除或更新
REM rowid_ranges should be at least 21 REM utilize this script help delete large table REM if update large table Why not online redefinition or CTAS -- This script spits desired number of rowid ranges to be used for any parallel operations. -- Best to use it for copying a huge table with out of row lob columns in it or CTAS/copy the data over db links. -- This can also be used to simulate parallel insert/update/delete operations. -- Maximum number of rowid ranges you can get here is 255. -- Doesn't work for partitioned tables, but with minor changes it can be adopted easily.-- Doesn't display any output if the total table blocks are less than rowid ranges times 128.-- It can split a table into more ranges than the number of extentsset verify off undefine rowid_ranges undefine segment_name undefine owner set head off set pages 0 set trimspool onselect 'where rowid between ''' ||sys.dbms_rowid.rowid_create(1, d.oid, c.fid1, c.bid1, 0) ||''' and ''' ||sys.dbms_rowid.rowid_create(1, d.oid, c.fid2, c.bid2, 9999) || '''' ||';'from (select distinct b.rn,first_value(a.fid) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) fid1,last_value(a.fid) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) fid2,first_value(decode(sign(range2 - range1),1,a.bid +((b.rn - a.range1) * a.chunks1),a.bid)) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) bid1,last_value(decode(sign(range2 - range1),1,a.bid +((b.rn - a.range1 + 1) * a.chunks1) - 1,(a.bid + a.blocks - 1))) over(partition by b.rn order by a.fid, a.bid rows between unbounded preceding and unbounded following) bid2from (select fid,bid,blocks,chunks1,trunc((sum2 - blocks + 1 - 0.1) / chunks1) range1,trunc((sum2 - 0.1) / chunks1) range2from (select /*+ rule */relative_fno fid,block_id bid,blocks,sum(blocks) over() sum1,trunc((sum(blocks) over()) / &&rowid_ranges) chunks1,sum(blocks) over(order by relative_fno, block_id) sum2from dba_extentswhere segment_name = upper('&&segment_name')and owner = upper('&&owner'))where sum1 > &&rowid_ranges) a,(select rownum - 1 rnfrom dualconnect by level <= &&rowid_ranges) bwhere b.rn between a.range1 and a.range2) c,(select max(data_object_id) oidfrom dba_objectswhere object_name = upper('&&segment_name')and owner = upper('&&owner')and data_object_id is not null) d/利用該腳本可以獲取到這些分割后的區(qū)間塊的起始rowid和結(jié)尾rowid,之后利用between start_rowid and end_rowid的條件構(gòu)造多條DML語(yǔ)句, 因?yàn)檫@些DML語(yǔ)句所更新的數(shù)據(jù)都是在互不相關(guān)的區(qū)間內(nèi)的,所以可以在多個(gè)終端內(nèi)并行地運(yùn)行這些DML語(yǔ)句,而不會(huì)造成鎖的爭(zhēng)用或者Oracle并行執(zhí)行協(xié)調(diào)(Parallel Execution coordinator ) 所帶來(lái)的一些開(kāi)銷。
Step2:文件上傳到Oracle主機(jī)用戶,執(zhí)行@rowid_chunk.sql
Step2.1: 上傳腳本
使用oracle用戶登錄主機(jī),上傳到目錄。 我這里上傳到了/oracle目錄下
Step2.2: 連接數(shù)據(jù)庫(kù),獲取分片
使用oracle用戶登錄主機(jī),在/oracle目錄下通過(guò)sqlplus登錄
如果數(shù)據(jù)量過(guò)大,可以分片多一些,少量多次刪除
artisandb:[/oracle$]pwd /oracle artisandb:[/oracle$]sqlplus artisan/artisan2018@TB -- 用戶名密碼以及tns的名字SQL*Plus: Release 11.2.0.4.0 Production on Thu May 31 16:08:37 2018Copyright (c) 1982, 2013, Oracle. All rights reserved.Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing optionsSQL> @rowid_chunk.sql Enter value for rowid_ranges: 10 --輸入分片個(gè)數(shù) Enter value for segment_name: XXXXX -- 輸入要操作的表名 Enter value for owner: YYYY-- 輸入用戶名 where rowid between 'AAAYHtAAGAAAAEAAAA' and 'AAAYHtAALAAAx5yCcP'; where rowid between 'AAAYHtAALAAAx5zAAA' and 'AAAYHtAANAAA75yCcP'; where rowid between 'AAAYHtAANAAA75zAAA' and 'AAAYHtAAOAACQPyCcP'; where rowid between 'AAAYHtAAOAACQPzAAA' and 'AAAYHtAAVAABJTyCcP'; where rowid between 'AAAYHtAAVAABJTzAAA' and 'AAAYHtAAWAAA7ryCcP'; where rowid between 'AAAYHtAAWAAA7rzAAA' and 'AAAYHtAAYAAA6hyCcP'; where rowid between 'AAAYHtAAYAAA6hzAAA' and 'AAAYHtAAaAACRLyCcP'; where rowid between 'AAAYHtAAaAACRLzAAA' and 'AAAYHtAAcAAAYpyCcP'; where rowid between 'AAAYHtAAcAAAYpzAAA' and 'AAAYHtAAiAACGnyCcP'; where rowid between 'AAAYHtAAiAACGnzAAA' and 'AAAYHtAAjAAA/xyCcP';10 rows selected.SQL>相當(dāng)于人為的將一張非分區(qū)表劃分成輸入的【rowid_ranges】個(gè)區(qū)域,每個(gè)區(qū)域都互不重疊,利用rowid做分界線。
同行情況下刪除非分區(qū)表TB_ARTSIAN_ATTR上 所有attr_id<99999999;的記錄,如果不優(yōu)化則是一條語(yǔ)句:
DELETE FROM TB_ARTSIAN_ATTRwhere attr_id<99999999; COMMIT;實(shí)際在很大的表上這樣刪除數(shù)據(jù)是不理想也不可行的,幾點(diǎn)理由:
1. 單條SQL語(yǔ)句串行執(zhí)行,速度緩慢
2. 運(yùn)行時(shí)間過(guò)長(zhǎng)可能引發(fā)ORA-1555等著名錯(cuò)誤
3. 如果失敗rollback回滾可能是一場(chǎng)災(zāi)難
若利用這里介紹的方法, 則可以構(gòu)造出多條DML語(yǔ)句并行刪除,每一條均只刪除一小部分:
DELETE FROM TB_ARTSIAN_ATTR where rowid between 'AAAYHtAAGAAAAEAAAA' and 'AAAYHtAALAAAx5yCcP' and attr_id<99999999; COMMIT;DELETE FROM TB_ARTSIAN_ATTR where rowid between 'AAAYHtAALAAAx5zAAA' and 'AAAYHtAANAAA75yCcP' and attr_id<99999999; COMMIT;....... ....... .......視你想要的并行度, 將以上構(gòu)成DML語(yǔ)句再分割幾塊,打開(kāi)多個(gè)終端同時(shí)執(zhí)行。
這樣做的幾個(gè)優(yōu)點(diǎn):
- 用戶手動(dòng)控制的并行執(zhí)行,省去了Oracle Parallel并行控制的開(kāi)銷,使用得當(dāng)?shù)脑挶燃觩arallel hint或者表上加并行度效率更高。
- 將數(shù)據(jù)分割成小塊(chunks)來(lái)處理,避免了ORA-1555錯(cuò)誤
- 用戶可以根據(jù)主機(jī)負(fù)載和IO 動(dòng)態(tài)地加減并行度
Step3: 外鍵校驗(yàn)以及通過(guò)存過(guò)清除分片數(shù)據(jù)
Step3.1: 外鍵校驗(yàn)
檢查一下 TB_ARTSIAN_ATTR的 外鍵和他表的外鍵管理,如果存在他表 的外鍵關(guān)聯(lián)該表的主鍵,他表上的外鍵要先去除。清理完記得恢復(fù)
disable / enable 也可以。
select 'ALTER TABLE ' || TABLE_NAME || ' drop CONSTRAINT ' ||constraint_name || '; ' as v_sqlfrom user_constraints where CONSTRAINT_TYPE in ('R' ) and owner = 'ARTISAN'and upper(table_name) in ('TB_ARTSIAN_ATTR') union all select 'ALTER TABLE ' || a.TABLE_NAME || ' drop CONSTRAINT ' ||a.constraint_name || '; ' as v_sqlfrom user_constraints a ,user_constraints b where a.owner = 'ARTISAN'and a.owner=b.ownerand a.r_owner=b.ownerand upper(b.table_name) in ('TB_ARTSIAN_ATTR')and a.r_constraint_name=b.constraint_name;Step3.2: 根據(jù)分片清除過(guò)期數(shù)據(jù)
CREATE OR REPLACE PROCEDURE PROC_CLEAN_BIGDATA ISV_SQL VARCHAR2(4000);maxrows NUMBER DEFAULT 30000;row_id_table DBMS_SQL.Urowid_Table;-- 過(guò)期數(shù)據(jù)CURSOR c1 ISselect /*+ PARALLEL( s,3) */s.rowid row_idfrom TB_ARTSIAN_ATTR swhere rowid between 'AAAo2CAAHAADDwJAAA' and 'AAAo2CAAHAADEqICcP' and -- 上面分片 生成的 ROWID 范圍ATTR_ID in ( select ATTR_ID from TB_ARTSIAN_INST_OLD where INST_STATE ='C') ;BEGIN--V_SQL := 'ALTER TABLE TB_ARTSIAN_ATTR nologging';--EXECUTE IMMEDIATE V_SQL;OPEN c1;LOOPFETCH c1 BULK COLLECTINTO row_id_table LIMIT maxrows;EXIT WHEN row_id_table.COUNT = 0;FORALL i IN 1 .. row_id_table.COUNTDELETE FROM TB_ARTSIAN_ATTR WHERE ROWID = row_id_table(i);COMMIT;--dbms_lock.sleep(5000);END LOOP;CLOSE c1;--V_SQL := 'ALTER TABLE TB_ARTSIAN_ATTR logging';--EXECUTE IMMEDIATE V_SQL;END PROC_CLEAN_BIGDATA; /Step3.3:FORALL和BULK COLLECT知識(shí)點(diǎn)
當(dāng)PL/SQL運(yùn)行時(shí)引擎處理一塊代碼時(shí),它使用PL/SQL引擎來(lái)執(zhí)行過(guò)程化的代碼,而將SQL語(yǔ)句發(fā)送給SQL引擎來(lái)執(zhí)行;SQL引擎執(zhí)行完畢后,將結(jié)果再返回給PL/SQL引擎。這種在PL/SQL引擎和SQL引擎之間的交互,稱為上下文交換(context switch)。每發(fā)生一次交換,就會(huì)帶來(lái)一定的額外開(kāi)銷.
從Oracle 8i開(kāi)始,PL/SQL得到了兩點(diǎn)增強(qiáng),可以將PL/SQL引擎和SQL引擎之間的多次上下文交換壓縮為一次交換:
- FORALL,用于增強(qiáng)PL/SQL引擎到SQL引擎的交換。
使用FORALL,可以將多個(gè)DML批量發(fā)送給SQL引擎來(lái)執(zhí)行,最大限度地減少上下文交互所帶來(lái)的開(kāi)銷
- BULK COLLECT,用于增強(qiáng)SQL引擎到PL/SQL引擎的交換。
BULK COLLECT子句會(huì)批量檢索結(jié)果,即一次性將結(jié)果集綁定到一個(gè)集合變量中,并從SQL引擎發(fā)送到PL/SQL引擎。通常可以在SELECT INTO、FETCH INTO以及RETURNING INTO子句中使用BULK COLLECT
更多請(qǐng)參考 Oracle數(shù)據(jù)庫(kù)之FORALL與BULK COLLECT語(yǔ)句
#Step4: 表分析,重新搜集統(tǒng)計(jì)信息
做一下表分析,給CBO提供更加精準(zhǔn)的信息,使ORACLE選擇更合理的執(zhí)行計(jì)劃
普通表:
---表分析下 (執(zhí)行時(shí)間,取決于數(shù)據(jù)量) BEGIN DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'ARTISAN',TABNAME => 'TB_ARTSIAN_ATTR',ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE,METHOD_OPT => 'for all columns size repeat',DEGREE => 8,CASCADE => TRUE,no_invalidate => false);END;如果是分區(qū)表 加上 GRANULARITY => 'ALL',
GRANULARITY: 收集統(tǒng)計(jì)信息的粒度。(只應(yīng)用于分區(qū)表),值包括: 'ALL':搜集(SUBPARTTION,PARTITION,AND GLOBAL)統(tǒng)計(jì)信息。 'AUTO':基于分區(qū)的類型來(lái)決定粒度,默認(rèn)值。 'DEFAULT':收集GLOBAL和PARTITION LEVEL的統(tǒng)計(jì)信息,等同與'GLOBAL AND PARTITION'。 'GLOBAL':收集全局統(tǒng)計(jì)信息 'GLOBAL AND PARTITION':收集GLOBAL和PARTITION LEVEL統(tǒng)計(jì)信息。 'GPARTITION':收集PARTITION-LEVEL的統(tǒng)計(jì)信息。 'SUBPARTITION':收集SUBPARTITION-LEVEL統(tǒng)計(jì)信息#Step5: 優(yōu)化
但是以上方法仍存在幾點(diǎn)不足:
-
rowid_chunk.sql腳本目前不支持分區(qū)表
-
因?yàn)閞owid_chunk.sql的腳本是根據(jù)表段的大小均勻地分割成指定數(shù)目的區(qū)域,試想當(dāng)一些要更新或者刪除的歷史數(shù)據(jù)集中分布在segment的某些位置時(shí)(例如所要?jiǎng)h除的數(shù)據(jù)均存放在一張表的前200個(gè)Extents中),因?yàn)槟_本是根據(jù)大小均勻分割區(qū)域的,所以某些區(qū)域是根本沒(méi)有我們所要處理的數(shù)據(jù)的,由這些區(qū)域構(gòu)造出來(lái)的DML語(yǔ)句都是無(wú)意義的.
基于以上這些考慮,重寫了獲取rowid分塊的SQL腳本
select 'and rowid between ''' || ora_rowid || ''' and ''' ||lead(ora_rowid, 1) over(order by rn asc) || '''' || ';'from (with cnt as (select count(*) from TB_ARTSIAN_ATTR ) -- 按需添加where 條件 (希望僅針對(duì)存有滿足o條件數(shù)據(jù)的范圍rowid分塊) 注意替換這里!!select rn, ora_rowidfrom (select rownum rn, ora_rowidfrom (select rowid ora_rowidfrom TB_ARTSIAN_ATTR -- 按需添加where 條件 (希望僅針對(duì)存有滿足o條件數(shù)據(jù)的范圍rowid分塊) 注意替換這里!!order by rowid))where rn in (select (rownum - 1) *trunc((select * from cnt) / &row_range) + 1from dba_tableswhere rownum < &row_range --輸入分區(qū)的數(shù)目unionselect * from cnt))上述腳本同樣可以實(shí)現(xiàn)rowid分區(qū)的目的,但是因?yàn)槠鋜owid是直接取自SELECT語(yǔ)句查詢,所以不存在不支持分區(qū)表等復(fù)雜對(duì)象的情況。 也因?yàn)閞owid是來(lái)源于SELECT,所以我們可以指定針對(duì)那些存在符合條件數(shù)據(jù)的范圍分區(qū)。
幾點(diǎn)注意事項(xiàng):
-
請(qǐng)將該腳本放到Pl/SQL Developer或Toad之類的工具中運(yùn)行,在sqlplus中運(yùn)行可能出現(xiàn)ORA-00933
-
不要忘記替換標(biāo)注中的條件
-
自行控制commit 避免出現(xiàn)ORA-1555錯(cuò)誤
該腳本目前存在一個(gè)不足,在獲取rowid分塊時(shí)要求大表上有適當(dāng)?shù)乃饕?#xff0c;否則可能會(huì)因?yàn)槿頀呙璨⑴判蚨志徛?#xff0c;若有恰當(dāng)?shù)乃饕齽t會(huì)使用INDEX FAST FULL SCAN。 這里的恰當(dāng)索引是指至少有一個(gè)非空列的普通b*tree索引, 最好的情況是有主鍵索引或者bitmap位圖索引。
Oracle在版本11.2中引入了DBMS_PARALLEL_EXECUTE 的新特性來(lái)幫助更新超大表
總結(jié)
以上是生活随笔為你收集整理的Oracle-使用切片删除的方式清理非分区表中的超巨数据的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 实战SSM_O2O商铺_16【商铺注册】
- 下一篇: 实战SSM_O2O商铺_17【商铺编辑】