oracle 全文索引 优化,通过案例学调优之--Oracle 全文索引
通過案例學(xué)調(diào)優(yōu)之--Oracle 全文索引
全文檢索(oracle text)
Oracle Text使Oracle9i具備了強大的文本檢索能力和智能化的文本管理能力,Oracle Text 是 Oracle9i 采用的新名稱,在 oracle8/8i 中被稱為 oracle intermedia text,oracle8 以前是 oracle context cartridge。Oracle Text 的索引和查找功能并不局限于存儲在數(shù)據(jù)庫中的數(shù)據(jù)。 它可以對存儲于文件系統(tǒng)中的文檔進(jìn)行檢索和查找,并可檢索超過 150 種文檔類型,包括 Microsoft Word、PDF和XML。Oracle Text查找功能包括模糊查找、詞干查找(搜索mice 和查找 mouse)、通配符、相近性等查找方式,以及結(jié)果分級和關(guān)鍵詞突出顯示等。你甚至 可以增加一個詞典,以查找搭配詞,并找出包含該搭配詞的文檔。
Oracle text 需要為可檢索的數(shù)據(jù)項建立索引,用戶才能夠通過搜索查找內(nèi)容,索引進(jìn) 程是根據(jù)管道建模的,在這個管道中,數(shù)據(jù)經(jīng)過一系列的轉(zhuǎn)換后,將其關(guān)鍵字會添加到索引 中。該索引進(jìn)程分為多個階段,如下圖
1.數(shù)據(jù)檢索(Datastore):只是將數(shù)據(jù)從數(shù)據(jù)存儲(例如 web 頁面、數(shù)據(jù)庫大型對象或本 地文件系統(tǒng))中取出,然后作為數(shù)據(jù)流傳送到下一個階段。
2. 過濾(Filter):過濾器負(fù)責(zé)將各種文件格式的數(shù)據(jù)轉(zhuǎn)換為純文本格式,索引管道中的其 他組件只能處理純文本數(shù)據(jù),不能識別 Ms word 或 excel 等文件格式。
3. 分段(Sectioner):分段器添加關(guān)于原始數(shù)據(jù)項結(jié)構(gòu)的元數(shù)據(jù)。
4. 詞法分析(Lexer):根據(jù)數(shù)據(jù)項的語言將字符流分為幾個字詞。 5. 索引(Index):最后一個階段將關(guān)鍵字添加到實際索引中。
全文檢索和普通檢索的區(qū)別
不使用Oracle text功能,當(dāng)然也有很多方法可以在Oracle數(shù)據(jù)庫中搜索文本,比如INSTR函數(shù)和LIKE操作:
1 、SELECT *FROM mytext WHERE INSTR (thetext, 'Oracle') > 0;
2 、SELECT * FROM mytext WHERE thetext LIKE '%Oracle%';
有很多時候,使用instr和like是很理想的, 特別是搜索僅跨越很小的表的時候。然而通過這些文本定位的方法將導(dǎo)致全表掃描,對資源來說消耗比較昂貴,而且實現(xiàn)的搜索功能也非常有限,因此對海量的文本數(shù)據(jù)進(jìn)行搜索時,建議使用oralce提供的全文檢索功能。
附:這里順帶記錄一下INSTR和LIKE:
Oracle中,可以使用 Instr 函數(shù)對某個字符串進(jìn)行判斷,判斷其是否含有指定的字符。其語法為:Instr(string, substring, position, occurrence)。
string:代表源字符串(寫入字段則表示此字段的內(nèi)容)。
substring:代表想從源字符串中查找的子串。
position:代表查找的開始位置,該參數(shù)可選的,默認(rèn)為1。
occurrence:代表想從源字符中查找出第幾次出現(xiàn)的substring,該參數(shù)也是可選的,默認(rèn)為1。
position 的值為負(fù)數(shù),那么代表從右往左進(jìn)行查找。
instr和like的性能比較
其實從效率角度來看,誰能用到索引,誰的查詢速度就會快。
like有時可以用到索引,例如:name like ‘李%’,而當(dāng)下面的情況時索引會失效:name like ‘%李’。所以一般我們查找中文類似于‘%字符%’時,索引都會失效。與其他數(shù)據(jù)庫不同的是,oracle支持函數(shù)索引。例如在name字段上建個instr索引,查詢速度就比較快了,這也是為什么instr會比like效率高的原因。
注:instr(title,’手冊’)>0 相當(dāng)于like‘%手冊%’
instr(title,’手冊’)=0 相當(dāng)于not like‘%手冊%’
Oracle Text 索引原理
Oracle text 索引將文本中所有的字符轉(zhuǎn)化成記號(token),如 www.taobao.com 會轉(zhuǎn)化 成 www,taobao,com 這樣的記號。
Oracle10g 里面支持四種類型的索引:
context、ctxcat、ctxrule、ctxxpath
CONTEXT
用于對含有大量連續(xù)文本數(shù)據(jù)進(jìn)行檢索。支持 word、html、xml、text 等很多數(shù)據(jù)格式。支持范圍(range)分區(qū),支持并行創(chuàng)建索引(Parallel indexing)的索引類型。支持類型:VARCHAR2, CLOB, BLOB, CHAR, BFILE, XMLType, and URIType.
DML 操作后,需要 CTX_DDL.SYNC_INDEX 手工同步索引 如果有查詢包含多個詞語,直接用空格隔開(如 oracle itpub)
案例分析:
設(shè)置全文檢索
步驟步驟一:檢查和設(shè)置數(shù)據(jù)庫角色
首先檢查數(shù)據(jù)庫中是否有CTXSYS用戶和CTXAPP腳色。如果沒有這個用戶和角色,意味著你的數(shù)據(jù)庫創(chuàng)建時未安裝intermedia功能(10G默認(rèn)安裝都有此用戶和角色)。你必須修改數(shù)據(jù)庫以安裝這項功能。默認(rèn)安裝情況下,ctxsys用戶是被鎖定的,因此要先啟用ctxsys的用戶。11:53:13?SYS@?prod?>select?username,account_status?from?dba_users?where?username?like?'CTX%';
USERNAME???????????????????????ACCOUNT_STATUS
------------------------------?--------------------------------
CTXSYS?????????????????????????EXPIRED?&?LOCKED
11:54:17?SYS@?prod?>alter?user?ctxsys?identified?by?oracle?account?unlock;
User?altered.
11:55:07?SYS@?prod?>select?username,account_status?from?dba_users?where?username?like?'CTX%';
USERNAME???????????????????????ACCOUNT_STATUS
------------------------------?--------------------------------
CTXSYS?????????????????????????OPEN
12:00:13?SYS@?prod?>select?role?from?dba_roles
12:00:23???2???where?role?like?'CTX%';
ROLE
------------------------------
CTXAPP
步驟二:賦權(quán)
在ctxsys用戶下,授予測試用戶scott以下權(quán)限:[oracle@RH6?~]$?cat?t.sql
GRANT?resource,?CONNECT,?ctxapp?TO?scott;
GRANT?EXECUTE?ON?ctxsys.ctx_cls?TO?scott;
GRANT?EXECUTE?ON?ctxsys.ctx_ddl?TO?scott;
GRANT?EXECUTE?ON?ctxsys.ctx_doc?TO?scott;
GRANT?EXECUTE?ON?ctxsys.ctx_output?TO?scott;
GRANT?EXECUTE?ON?ctxsys.ctx_query?TO?scott;
GRANT?EXECUTE?ON?ctxsys.ctx_report?TO?scott;
GRANT?EXECUTE?ON?ctxsys.ctx_thes?TO?scott;
GRANT?EXECUTE?ON?ctxsys.ctx_ulexer?TO?scott;
11:58:04?SYS@?prod?>@/home/oracle/t.sql
Grant?succeeded.
Elapsed:?00:00:00.15
Grant?succeeded.
Elapsed:?00:00:00.21
Grant?succeeded.
Elapsed:?00:00:00.09
Grant?succeeded.
Elapsed:?00:00:00.09
Grant?succeeded.
Elapsed:?00:00:00.13
Grant?succeeded.
Elapsed:?00:00:00.07
Grant?succeeded.
Elapsed:?00:00:00.09
Grant?succeeded.
Elapsed:?00:00:00.10
Grant?succeeded.
Elapsed:?00:00:00.07
步驟三:設(shè)置詞法分析器(lexer)
Oracle實現(xiàn)全文檢索,其機制其實很簡單。即通過Oracle專利的詞法分析器(lexer),將文章中所有的表意單元(Oracle 稱為 term)找出來,記錄在一組以dr$開頭的表中,同時記下該term出現(xiàn)的位置、次數(shù)、hash值等信息。檢索時,Oracle從這組表中查找相應(yīng)的term,并計算其出現(xiàn)頻率,根據(jù)某個算法來計算每個文檔的得分(score),即所謂的‘匹配率’。而lexer則是該機制的核心,它決定了全文檢索的效率。Oracle針對不同的語言提供了不同的lexer,而我們通常能用到其中的三個:basic_lexer:
針對英語。它能根據(jù)空格和標(biāo)點來將英語單詞從句子中分離,還能自動將一些出現(xiàn)頻率過高已經(jīng)失去檢索意義的單詞作為‘垃圾’處理,如if,is等,具有較高的處理效率。但該lexer應(yīng)用于漢語則有很多問題,由于它只認(rèn)空格和標(biāo)點,而漢語的一句話中通常不會有空格,因此,它會把整句話作為一個term,事實上失去檢索能力。以‘中國人民站起來了’這句話為例,basic_lexer分析的結(jié)果只有一個term,就是‘中國人民站起來了’。此時若檢索‘中國’,將檢索不到內(nèi)容。
chinese_vgram_lexer:
專門的漢語分析器,支持所有漢字字符集(ZHS16CGB231280?ZHS16GBK?ZHT32EUC?ZHT16BIG5?ZHT32TRIS?ZHT16MSWIN950?ZHT16HKSCS?UTF8?)。該分析器按字為單元來分析漢語句子。‘中國人民站起來了’這句話,會被它分析成如下幾個term:‘中’,‘中國’,‘國人’,‘人民’,‘民站’,‘站起’,起來’,‘來了’,‘了’。可以看出,這種分析方法,實現(xiàn)算法很簡單,并且能實現(xiàn)‘一網(wǎng)打盡’,但效率則是差強人意。
chinese_lexer:
這是一個新的漢語分析器,只支持utf8字符集。上面已經(jīng)看到,chinese?vgram?lexer這個分析器由于不認(rèn)識常用的漢語詞匯,因此分析的單元非常機械,像上面的‘民站’,‘站起’在漢語中根本不會單獨出現(xiàn),因此這種term是沒有意義的,反而影響效率。chinese_lexer的最大改進(jìn)就是該分析器能認(rèn)識大部分常用漢語詞匯,因此能更有效率地分析句子,像以上兩個愚蠢的單元將不會再出現(xiàn),極大提高了效率。但是它只支持utf8,如果你的數(shù)據(jù)庫是zhs16gbk字符集,則只能使用笨笨的那個Chinese?vgram?lexer。如果不做任何設(shè)置,Oracle缺省使用basic_lexer這個分析器。12:05:01?SYS@?prod?>select?userenv('language')?from?dual;
USERENV('LANGUAGE')
----------------------------------------------------
AMERICAN_AMERICA.ZHS16GBK
12:08:05?SCOTT@?prod?>desc?ctx_ddl
PROCEDURE?CREATE_PREFERENCE
Argument?Name??????????????????Type????????????????????In/Out?Default?
------------------------------?-----------------------?------?--------
PREFERENCE_NAME????????????????VARCHAR2????????????????IN
OBJECT_NAME????????????????????VARCHAR2????????????????IN
12:12:25?SCOTT@?prod?>EXEC?ctx_ddl.create_preference?('my_lexer',?'chinese_vgram_lexer');
PL/SQL?procedure?successfully?completed.
創(chuàng)建表
12:13:15?SCOTT@?prod?>CREATE?TABLE?textdemo(
12:15:47???2???????id?NUMBER?NOT?NULL?PRIMARY?KEY,
12:15:47???3???????book_author?varchar2(100),--作者
12:15:47???4???????publish_time?DATE,--發(fā)布日期
12:15:47???5???????title?varchar2(400),--標(biāo)題
12:15:47???6???????book_abstract?varchar2(2000),--摘要
12:15:47???7???????path?varchar2(200)--路徑
12:15:47???8??);
Table?created.插入數(shù)據(jù)
14:53:20?SCOTT@?prod?>insert?into?textdemo?values?(10,'luyao',sysdate,'pingfan?de?world','zhen?shi?de?gushi','/home/1.txt');
1?row?created.
14:54:32?SCOTT@?prod?>commit;
步驟四:在book_abstract字段建立索引使用剛剛設(shè)置的ORATEXT_LEXER :chinese_vgram_lexer作為分析器。12:16:15?SCOTT@?prod?>CREATE?INDEX?demo_abstract?ON?textdemo(book_abstract)?indextype?IS?ctxsys.context?parameters('lexer?my_LEXER');
之后如上所述多出很多dr$開頭的表和索引,系統(tǒng)會創(chuàng)建四個相關(guān)的表:
DR$DEMO_ABSTRACT$I(分詞后的TOKEN表)
DR$DEMO_ABSTRACT$K
DR$DEMO_ABSTRACT$N
DR$DEMO_ABSTRACT$R
14:56:16?SCOTT@?prod?>select?*?from?tab;
TNAME??????????????????????????TABTYPE??CLUSTERID
------------------------------?-------?----------
BONUS??????????????????????????TABLE
DEPT???????????????????????????TABLE
DR$DEMO_ABSTRACT$I?????????????TABLE
DR$DEMO_ABSTRACT$K?????????????TABLE
DR$DEMO_ABSTRACT$N?????????????TABLE
DR$DEMO_ABSTRACT$R?????????????TABLE
EMP????????????????????????????TABLE
SALGRADE???????????????????????TABLE
TEXTDEMO???????????????????????TABLE
9?rows?selected.
14:56:36?SCOTT@?prod?>desc?DR$DEMO_ABSTRACT$I
Name??????????????????????????????????????????????????????????????Null?????Type
-----------------------------------------------------------------?--------?--------------------------------------------
TOKEN_TEXT????????????????????????????????????????????????????????NOT?NULL?VARCHAR2(64)
TOKEN_TYPE????????????????????????????????????????????????????????NOT?NULL?NUMBER(3)
TOKEN_FIRST???????????????????????????????????????????????????????NOT?NULL?NUMBER(10)
TOKEN_LAST????????????????????????????????????????????????????????NOT?NULL?NUMBER(10)
TOKEN_COUNT???????????????????????????????????????????????????????NOT?NULL?NUMBER(10)
TOKEN_INFO?????????????????????????????????????????????????????????????????BLOB
14:57:45?SCOTT@?prod?>desc?DR$DEMO_ABSTRACT$K
Name??????????????????????????????????????????????????????????????Null?????Type
-----------------------------------------------------------------?--------?--------------------------------------------
DOCID??????????????????????????????????????????????????????????????????????NUMBER(38)
TEXTKEY???????????????????????????????????????????????????????????NOT?NULL?ROWID
14:57:57?SCOTT@?prod?>desc?DR$DEMO_ABSTRACT$N
Name??????????????????????????????????????????????????????????????Null?????Type
-----------------------------------------------------------------?--------?--------------------------------------------
NLT_DOCID?????????????????????????????????????????????????????????NOT?NULL?NUMBER(38)
NLT_MARK??????????????????????????????????????????????????????????NOT?NULL?CHAR(1)
14:58:11?SCOTT@?prod?>desc?DR$DEMO_ABSTRACT$R
Name??????????????????????????????????????????????????????????????Null?????Type
-----------------------------------------------------------------?--------?--------------------------------------------
ROW_NO?????????????????????????????????????????????????????????????????????NUMBER(3)
DATA???????????????????????????????????????????????????????????????????????BLOB
14:58:26?SCOTT@?prod?>select?index_name,index_type,table_name?from?user_indexes;
INDEX_NAME?????????????????????INDEX_TYPE??????????????????TABLE_NAME
------------------------------?---------------------------?------------------------------
DEMO_ABSTRACT??????????????????DOMAIN??????????????????????TEXTDEMO
SYS_C0013418???????????????????NORMAL??????????????????????TEXTDEMO
PK_EMP?????????????????????????NORMAL??????????????????????EMP
SYS_IL0000076525C00002$$???????LOB?????????????????????????DR$DEMO_ABSTRACT$R
SYS_IOT_TOP_76528??????????????IOT?-?TOP???????????????????DR$DEMO_ABSTRACT$N
SYS_IOT_TOP_76523??????????????IOT?-?TOP???????????????????DR$DEMO_ABSTRACT$K
SYS_IL0000076520C00006$$???????LOB?????????????????????????DR$DEMO_ABSTRACT$I
DR$DEMO_ABSTRACT$X?????????????NORMAL??????????????????????DR$DEMO_ABSTRACT$I
PK_DEPT????????????????????????NORMAL??????????????????????DEPT
9?rows?selected.
下面的語句可以查看索引創(chuàng)建過程中是否發(fā)生了錯誤:SELECT?*?FROM?ctx_USER_index_errors
附:對于建立索引的類型(例如ctxsys.context),包括四種:context,ctxcat,ctxrule,ctxxpath。
CONTEXT用于對含有大量連續(xù)文本數(shù)據(jù)進(jìn)行檢索。支持word、html、xml、text等很多數(shù)據(jù)格式。支持范圍(range)分區(qū),支持并行創(chuàng)建索引(Parallel?indexing)的索引類型。
支持類型:VARCHAR2,?CLOB,?BLOB,?CHAR,?BFILE,?XMLType,?and?URIType.DML。操作后,需要CTX_DDL.SYNC_INDEX手工同步索引如果有查詢包含多個詞語,直接用空格隔開(如?oracle?itpub)。
查詢標(biāo)識符CONTAINS
CTXCAT適用于混合查詢語句(如查詢條件包括產(chǎn)品id,價格,描述等)。適合于查詢較小的具有一定結(jié)構(gòu)的文本段。具有事務(wù)性。DML?操作后,索引會自動進(jìn)行同步。
操作符:and,or,>,;
查詢標(biāo)識符CATSEARCH
CTXRULE查詢標(biāo)識符MATCHES。
CTXXPATH(這兩個索引沒有去更多搜索相關(guān)內(nèi)容)
一般來說我們建立CONTEXT類型的索引(CONTAINS來查詢)。
步驟五:查詢測試查看執(zhí)行計劃
15:04:36?SCOTT@?prod?>r
1*?select?*?from?textdemo?where?contains(book_abstract,'gushi')>0
Elapsed:?00:00:00.02
Execution?Plan
----------------------------------------------------------
Plan?hash?value:?2570915478
---------------------------------------------------------------------------------------------
|?Id??|?Operation???????????????????|?Name??????????|?Rows??|?Bytes?|?Cost?(%CPU)|?Time?????|
---------------------------------------------------------------------------------------------
|???0?|?SELECT?STATEMENT????????????|???????????????|?????1?|??1392?|?????4???(0)|?00:00:01?|
|???1?|??TABLE?ACCESS?BY?INDEX?ROWID|?TEXTDEMO??????|?????1?|??1392?|?????4???(0)|?00:00:01?|
|*??2?|???DOMAIN?INDEX??????????????|?DEMO_ABSTRACT?|???????|???????|?????4???(0)|?00:00:01?|
---------------------------------------------------------------------------------------------
Predicate?Information?(identified?by?operation?id):
---------------------------------------------------
2?-?access("CTXSYS"."CONTAINS"("BOOK_ABSTRACT",'gushi')>0)
Note
-----
-?dynamic?sampling?used?for?this?statement?(level=2)
Statistics
----------------------------------------------------------
29??recursive?calls
0??db?block?gets
33??consistent?gets
0??physical?reads
0??redo?size
796??bytes?sent?via?SQL*Net?to?client
415??bytes?received?via?SQL*Net?from?client
2??SQL*Net?roundtrips?to/from?client
0??sorts?(memory)
0??sorts?(disk)
1??rows?processed
15:04:37?SCOTT@?prod?>
通過sql?trace查看詳細(xì)計劃(部分內(nèi)容)
SQL?ID:?2rsr1z6zkp24p
Plan?Hash:?2570915478
select?*
from
textdemo?where?contains(book_abstract,:"SYS_B_0")>:"SYS_B_1"
call?????count???????cpu????elapsed???????disk??????query????current????????rows
-------?------??--------?----------?----------?----------?----------??----------
Parse????????1??????0.00???????0.00??????????0??????????0??????????0???????????0
Execute??????1??????0.01???????0.01??????????0????????250??????????0???????????0
Fetch????????2??????0.00???????0.00??????????0??????????2??????????0???????????1
-------?------??--------?----------?----------?----------?----------??----------
total????????4??????0.01???????0.01??????????0????????252??????????0???????????1
Misses?in?library?cache?during?parse:?1
Optimizer?mode:?ALL_ROWS
Parsing?user?id:?101
Rows?????Row?Source?Operation
-------??---------------------------------------------------
1??TABLE?ACCESS?BY?INDEX?ROWID?TEXTDEMO?(cr=12?pr=0?pw=0?time=0?us?cost=4?size=1392?card=1)
1???DOMAIN?INDEX??DEMO_ABSTRACT?(cr=11?pr=0?pw=0?time=0?us?cost=4?size=0?card=0)
declare
cost?sys.ODCICost?:=?sys.ODCICost(NULL,?NULL,?NULL,?NULL);
arg0?VARCHAR2(1)?:=?null;
begin
:1?:=?"CTXSYS"."TEXTOPTSTATS".ODCIStatsFunctionCost(
sys.ODCIFuncInfo('CTXSYS',
'CTX_CONTAINS',
'TEXTCONTAINS',
2),
cost,
sys.ODCIARGDESCLIST(sys.ODCIARGDESC(2,?'TEXTDEMO',?'SCOTT',?'"BOOK_ABSTRACT"',?NULL,?NULL,?NULL),?sys.ODCIARG
DESC(1,?NULL,?NULL,?NULL,?NULL,?NULL,?NULL))
,?arg0,?:5,
sys.ODCIENV(:6,:7,:8,:9));
if?cost.CPUCost?IS?NULL?then
:2?:=?-1.0;
else
:2?:=?cost.CPUCost;
end?if;
if?cost.IOCost?IS?NULL?then
:3?:=?-1.0;
else
:3?:=?cost.IOCost;
end?if;
if?cost.NetworkCost?IS?NULL?then
:4?:=?-1.0;
else
:4?:=?cost.NetworkCost;
end?if;
exception
when?others?then
raise;
end;
call?????count???????cpu????elapsed???????disk??????query????current????????rows
-------?------??--------?----------?----------?----------?----------??----------
Parse????????2??????0.00???????0.00??????????0??????????0??????????0???????????0
Execute??????2??????0.00???????0.00??????????0?????????18??????????0???????????2
Fetch????????0??????0.00???????0.00??????????0??????????0??????????0???????????0
-------?------??--------?----------?----------?----------?----------??----------
total????????4??????0.00???????0.00??????????0?????????18??????????0???????????2
Misses?in?library?cache?during?parse:?0
Optimizer?mode:?ALL_ROWS
Parsing?user?id:?101?????(recursive?depth:?1)
全文索引和DML操作
Insert 操作:15:19:21?SCOTT@?prod?>insert?into?textdemo?values?(20,'huoda',sysdate,'musilin?de?zangli','meili?de?rensheng','/home/2.txt');
1?row?created.
15:20:10?SCOTT@?prod?>commit;
15:21:35?SCOTT@?prod?>select?id,BOOK_ABSTRACT?from?textdemo?where?BOOK_ABSTRACT?like?'%rensheng%'
ID?BOOK_ABSTRACT
----------?--------------------------------------------------
20?meili?de?rensheng
15:23:12?SCOTT@?prod?>set?autotrace?on
15:23:38?SCOTT@?prod?>select?id,BOOK_ABSTRACT?from?textdemo?where?contains(BOOK_ABSTRACT,'rensheng')>0
no?rows?selected
Execution?Plan
----------------------------------------------------------
Plan?hash?value:?2570915478
---------------------------------------------------------------------------------------------
|?Id??|?Operation???????????????????|?Name??????????|?Rows??|?Bytes?|?Cost?(%CPU)|?Time?????|
---------------------------------------------------------------------------------------------
|???0?|?SELECT?STATEMENT????????????|???????????????|?????1?|??1027?|?????4???(0)|?00:00:01?|
|???1?|??TABLE?ACCESS?BY?INDEX?ROWID|?TEXTDEMO??????|?????1?|??1027?|?????4???(0)|?00:00:01?|
|*??2?|???DOMAIN?INDEX??????????????|?DEMO_ABSTRACT?|???????|???????|?????4???(0)|?00:00:01?|
---------------------------------------------------------------------------------------------
Predicate?Information?(identified?by?operation?id):
---------------------------------------------------
2?-?access("CTXSYS"."CONTAINS"("BOOK_ABSTRACT",'rensheng')>0)
Note
-----
-?dynamic?sampling?used?for?this?statement?(level=2)
Statistics
----------------------------------------------------------
23??recursive?calls
0??db?block?gets
33??consistent?gets
0??physical?reads
0??redo?size
349??bytes?sent?via?SQL*Net?to?client
404??bytes?received?via?SQL*Net?from?client
1??SQL*Net?roundtrips?to/from?client
0??sorts?(memory)
0??sorts?(disk)
0??rows?processed
15:26:40?SYS@?prod?>select?*?from?ctxsys.dr$pending;
PND_CID????PND_PID?PND_ROWID??????????PND_TIMES?P
----------?----------?------------------?---------?-
1082??????????0?AAASrlAAEAAAAI1AAB?21-NOV-14?N
15:26:26?SCOTT@?prod?>alter?index?demo_abstract?rebuild?parameters('sync');
Index?altered.
15:30:10?SCOTT@?prod?>select?id,BOOK_ABSTRACT?from?textdemo?where?contains(BOOK_ABSTRACT,'rensheng')>0;
ID?BOOK_ABSTRACT
----------?--------------------------------------------------
20?meili?de?rensheng
在做Insert操作時,Oracle會把一條信息放入到CTXSYS.DR$PENDING表里,必須手工進(jìn)行同步才能更新全文索引。
Delete 操作:
15:30:37?SCOTT@?prod?>delete?from?textdemo?where?id=20;
1?row?deleted.
15:33:06?SCOTT@?prod?>select?id,BOOK_ABSTRACT?from?textdemo;
ID?BOOK_ABSTRACT
----------?--------------------------------------------------
10?zhen?shi?de?gushi
15:33:39?SCOTT@?prod?>rollback;
Rollback?complete.
15:33:50?SCOTT@?prod?>select?id,BOOK_ABSTRACT?from?textdemo;
ID?BOOK_ABSTRACT
----------?--------------------------------------------------
10?zhen?shi?de?gushi
20?meili?de?rensheng
Delete 操作后,索引會立刻更新。
Update 操作:15:38:14?SCOTT@?prod?>update?textdemo?set?BOOK_ABSTRACT='meili?de?gushi'?where?id=20;
1?row?updated.
15:39:48?SYS@?prod?>select?*?from?ctxsys.dr$delete;
no?rows?selected
15:39:59?SYS@?prod?>select?*?from?ctxsys.dr$pending;
no?rows?selected
15:43:03?SCOTT@?prod?>select?id,BOOK_ABSTRACT?from?textdemo?where?contains(BOOK_ABSTRACT,'gushi')>0;
ID?BOOK_ABSTRACT
----------?--------------------------------------------------
10?zhen?shi?de?gushi
15:43:14?SCOTT@?prod?>alter?index?demo_abstract?rebuild?parameters('sync');
Index?altered.
15:43:39?SCOTT@?prod?>select?id,BOOK_ABSTRACT?from?textdemo?where?contains(BOOK_ABSTRACT,'gushi')>0;
ID?BOOK_ABSTRACT
----------?--------------------------------------------------
10?zhen?shi?de?gushi
20?meili?de?gushi
對于update操作,應(yīng)該是包含了Delete和Insert的操作,需要手工同步后才能更新索引。
對多字段建立全文索引
很多時候需要從多個文本字段中查詢滿足條件的記錄,這時就需要建立針對多個字段的全文索引,例如需要從pmhsubjects(專題表)的 subjectname(專題名稱)和briefintro(簡介)上進(jìn)行全文檢索,則需要按以下步驟進(jìn)行操作:
建立多字段索引的preference,以ctxsys登錄,并執(zhí)行:
BEGIN
ctx_ddl.create_preference('ctx_demo_abstract_title','MULTI_COLUMN_DATASTORE');
END;
建立preference對應(yīng)的字段值(以ctxsys登錄) 對應(yīng)title path book_abstract三個字段建立索引:
BEGIN
ctx_ddl.set_attribute('ctx_demo_abstract_title ','columns','title,path');
END;
建立全文索引:
CREATE INDEX demo_abstract_title ON textdemo(book_abstract) indextype IS ctxsys.context parameters(' DATASTORE ctxsys. ctx_demo_ abstract_title lexer ORATEXT_LEXER');
commit;
測試
SELECT score(20),t.* FROM textdemo t WHERE contains(book_abstract,'移動城堡 or 俄羅斯',20)>0;
對大字段進(jìn)行檢索測試
CREATE TABLE mytable(id NUMBER PRIMARY KEY, docs CLOB);
INSERT INTO mytable VALUES(111555,'this text will be indexed');
INSERT INTO mytable VALUES(111556,'this is a direct_datastore example');
Commit;
CREATE INDEX myindex ON mytable(docs)
indextype IS ctxsys.context
parameters ('datastore ctxsys.default_datastore');
SELECT * FROM mytable WHERE contains(docs, 'text') > 0;
---以上內(nèi)容是對Oracle 全文索引的一點理解,后續(xù)的學(xué)習(xí)在繼續(xù)中,希望大家批評指正。
總結(jié)
以上是生活随笔為你收集整理的oracle 全文索引 优化,通过案例学调优之--Oracle 全文索引的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: [转]常用CASE工具介绍
- 下一篇: webrtc jitter buffer