日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 运维知识 > 数据库 >内容正文

数据库

关于PostgreSQL的GiST索引之四

發(fā)布時間:2025/7/14 数据库 26 豆豆
生活随笔 收集整理的這篇文章主要介紹了 关于PostgreSQL的GiST索引之四 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

3. hstore的GiST索引

hstore的GiST索引實現(xiàn)和全文檢索的tsvector的GiST索引的實現(xiàn)差不過,都是簽名文件索引。

3.1 GiST索引項的存儲

1)所有索引項都采用簽名向量壓縮
2)hstore中的所有鍵和所有值都進行哈希設(shè)置bit位得到一個簽名向量
3)簽名向量的大小為128bit

3.2?性能推論

1)索引尺寸比較小(因為簽名向量的大小是固定的而且一般比原始數(shù)據(jù)小)
2)每次查詢要掃描的索引頁很多(20%~40%的索引頁)
3)每次鍵查詢(?操作符)至少要recheck 1.6%(2/128)以上的數(shù)據(jù)行
? 每條記錄包含的鍵值對越多,需要recheck的數(shù)據(jù)行越多
4)每次鍵值查詢(@>操作符)至少要recheck 0.02%((2/128) * (2/128))以上的數(shù)據(jù)行
? 每條記錄包含的鍵值對越多,需要recheck的數(shù)據(jù)行越多

假設(shè)每條記錄平均包含10條鍵值對,需要recheck的數(shù)據(jù)行估算如下
鍵查詢(?操作符)匹配:7.5%的數(shù)據(jù)行

點擊(此處)折疊或打開

  • testdb=# with recursive tx1(n,sign_bits) as
  • testdb-# (select 1 n,1.0 sign_bits
  • testdb(# union all
  • testdb(# select n+1, sign_bits + 1 - sign_bits/128 from tx1 where n=10
  • testdb(# )
  • testdb-# select n,sign_bits,sign_bits/128 scan_probability from tx1 where n in(10);
  • ?n | sign_bits | scan_probability
  • ----+------------------------+------------------------
  • ?10 | 9.65566251563533320389 | 0.07543486340340104066
  • (1 row)
  • 由此可見對鍵查詢(多個鍵的?&查詢另當別論),hstore的GiST索引的查詢效率是比較低的。不過,從另外一個角度考慮,通常的應(yīng)用場景下,不同的key值數(shù)本來就不會很多,就不是索引的用武之地。

    鍵值查詢(@>操作符)匹配:0.57%(0.57%=0.07543486340340104066*0.07543486340340104066)

    但是這只是理論上的平均值,由于簽名向量的沖突,個別查詢的效率可能會很低。比如,如果大部分數(shù)據(jù)記錄都包含某個key,而查詢這個鍵值對的時候,碰巧它的值在簽名向量中對應(yīng)的bit位和這個很常用的key相同,那么大部分數(shù)據(jù)記錄都要進行recheck。

    3.3?相關(guān)代碼

    contrib/hstore/hstore_gist.c

    點擊(此處)折疊或打開

  • /* bigint defines */
  • #define BITBYTE 8
  • #define SIGLENINT 4????????????/* >122 => key will toast, so very */
  • #define SIGLEN????( sizeof(int)*SIGLENINT )
  • #define SIGLENBIT (SIGLEN*BITBYTE)

  • typedef char BITVEC[SIGLEN];
  • typedef char *BITVECP;

  • ...
  • Datum
  • ghstore_compress(PG_FUNCTION_ARGS)
  • {
  • ????GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
  • ????GISTENTRY *retval = entry;

  • ????if (entry->leafkey)
  • ????{
  • ????????GISTTYPE *res = (GISTTYPE *) palloc0(CALCGTSIZE(0));
  • ????????HStore???? *val = DatumGetHStoreP(entry->key);
  • ????????HEntry???? *hsent = ARRPTR(val);
  • ????????char???? *ptr = STRPTR(val);
  • ????????int????????????count = HS_COUNT(val);
  • ????????int????????????i;

  • ????????SET_VARSIZE(res, CALCGTSIZE(0));

  • //遍歷所有KV對
  • ????????for (i = 0; i count; ++i)
  • ????????{
  • ????????????int????????????h;
  • //對key做哈希
  • ????????????h = crc32_sz((char *) HS_KEY(hsent, ptr, i), HS_KEYLEN(hsent, i));
  • ????????????HASH(GETSIGN(res), h);
  • ????????????if (!HS_VALISNULL(hsent, i))
  • ????????????{
  • //對value做哈希
  • ????????????????h = crc32_sz((char *) HS_VAL(hsent, ptr, i), HS_VALLEN(hsent, i));
  • ????????????????HASH(GETSIGN(res), h);
  • ????????????}
  • ????????}

  • ????????retval = (GISTENTRY *) palloc(sizeof(GISTENTRY));
  • ????????gistentryinit(*retval, PointerGetDatum(res),
  • ???????????????????? entry->rel, entry->page,
  • ???????????????????? entry->offset,
  • ???????????????????? FALSE);
  • ????}
  • ????else if (!ISALLTRUE(DatumGetPointer(entry->key)))
  • ????{
  • ????????int32????????i;
  • ????????GISTTYPE *res;
  • ????????BITVECP????????sign = GETSIGN(DatumGetPointer(entry->key));

  • ????????LOOPBYTE
  • ????????{
  • ????????????if ((sign[i] & 0xff) != 0xff)
  • ????????????????PG_RETURN_POINTER(retval);
  • ????????}

  • ????????res = (GISTTYPE *) palloc(CALCGTSIZE(ALLISTRUE));
  • ????????SET_VARSIZE(res, CALCGTSIZE(ALLISTRUE));
  • ????????res->flag = ALLISTRUE;

  • ????????retval = (GISTENTRY *) palloc(sizeof(GISTENTRY));
  • ????????gistentryinit(*retval, PointerGetDatum(res),
  • ???????????????????? entry->rel, entry->page,
  • ???????????????????? entry->offset,
  • ???????????????????? FALSE);
  • ????}

  • ????PG_RETURN_POINTER(retval);
  • }

  • 3.4?性能驗證

    1)環(huán)境
    CentOS 6.5
    PostgreSQL 9.4.0


    2)測試數(shù)據(jù)準備
    點擊(此處)折疊或打開
  • testdb=# create extension hstore;
  • CREATE EXTENSION
  • Time: 235.391 ms
  • testdb=# create table tbhs(id serial,hs hstore);
  • CREATE TABLE
  • Time: 17.459 ms
  • testdb=# insert into tbhs select id,hstore(id::text, md5(id::text)) from generate_series(1,100000) id;
  • INSERT 0 100000
  • Time: 970.613 ms
  • testdb=# select pg_relation_size('tbhs');
  • ?pg_relation_size
  • ------------------
  • ??????????8445952
  • (1 row)

  • Time: 0.643 ms
  • testdb=# select * from tbhs limit 5;
  • ?id | hs
  • ----+-----------------------------------------
  • ??1 | "1"=>"c4ca4238a0b923820dcc509a6f75849b"
  • ??2 | "2"=>"c81e728d9d4c2f636f067f89cc14862c"
  • ??3 | "3"=>"eccbc87e4b5ce2fe28308fd9f2a7baf3"
  • ??4 | "4"=>"a87ff679a2f3e71d9181a67b7542122c"
  • ??5 | "5"=>"e4da3b7fbbce2345d7772b0674a318d5"
  • (5 rows)

  • Time: 1.147 ms
  • 注)上面每個記錄只有一個鍵值對,且每個key都是唯一的,這樣的數(shù)據(jù)模型就不適合使用hstore存儲,本文故意這么做只是為了驗證性能。


    2)無索引的查詢
    點擊(此處)折疊或打開
  • testdb=# explain (analyze,buffers) select * from tbhs where hs ? '10';
  • ?????????????????????????????????????????????QUERY PLAN
  • -----------------------------------------------------------------------------------------------------
  • ?Seq Scan on tbhs (cost=0.00..2281.00 rows=100 width=53) (actual time=0.025..31.587 rows=1 loops=1)
  • ???Filter: (hs ? '10'::text)
  • ???Rows Removed by Filter: 99999
  • ???Buffers: shared hit=1031
  • ?Planning time: 0.055 ms
  • ?Execution time: 31.619 ms
  • (6 rows)

  • Time: 32.150 ms

  • 3)建立GiST索引再查詢

    點擊(此處)折疊或打開

  • testdb=# create index tbhs_idx_gist on tbhs using gist(hs);
  • CREATE INDEX
  • Time: 6243.003 ms
  • testdb=# analyze tbhs;
  • ANALYZE
  • Time: 77.956 ms
  • testdb=# select pg_relation_size('tbhs_idx_gist'),pg_relation_size('tbhs_idx_gist')/8192 index_pages;
  • ?pg_relation_size | index_pages
  • ------------------+-------------
  • ??????????4825088 | 589
  • (1 row)

  • Time: 0.456 ms

  • ?查詢掃描了45%(265/589)的索引頁,Recheck了1.6%的數(shù)據(jù)行。

    點擊(此處)折疊或打開

  • testdb=# explain (analyze,buffers) select * from tbhs where hs ? '10';
  • ????????????????????????????????????????????????????????QUERY PLAN
  • ---------------------------------------------------------------------------------------------------------------------------
  • ?Bitmap Heap Scan on tbhs (cost=5.06..302.42 rows=100 width=53) (actual time=3.003..4.637 rows=1 loops=1)
  • ???Recheck Cond: (hs ? '10'::text)
  • ???Rows Removed by Index Recheck: 1558
  • ???Heap Blocks: exact=863
  • ???Buffers: shared hit=1128
  • ???-> Bitmap Index Scan on tbhs_idx_gist (cost=0.00..5.03 rows=100 width=0) (actual time=2.869..2.869 rows=1559 loops=1)
  • ?????????Index Cond: (hs ? '10'::text)
  • ?????????Buffers: shared hit=265
  • ?Planning time: 0.153 ms
  • ?Execution time: 5.073 ms
  • (10 rows)

  • Time: 6.209 ms

  • @>查詢掃描了19%(114/589)的索引頁,Recheck了0.012%的數(shù)據(jù)行。

    點擊(此處)折疊或打開

  • testdb=# explain (analyze,buffers) select * from tbhs where hs @> '1=>c4ca4238a0b923820dcc509a6f75849b';
  • ???????????????????????????????????????????????????????QUERY PLAN
  • -------------------------------------------------------------------------------------------------------------------------
  • ?Bitmap Heap Scan on tbhs (cost=5.06..302.42 rows=100 width=53) (actual time=2.123..2.205 rows=1 loops=1)
  • ???Recheck Cond: (hs @> '"1"=>"c4ca4238a0b923820dcc509a6f75849b"'::hstore)
  • ???Rows Removed by Index Recheck: 11
  • ???Heap Blocks: exact=12
  • ???Buffers: shared hit=126
  • ???-> Bitmap Index Scan on tbhs_idx_gist (cost=0.00..5.03 rows=100 width=0) (actual time=2.097..2.097 rows=12 loops=1)
  • ?????????Index Cond: (hs @> '"1"=>"c4ca4238a0b923820dcc509a6f75849b"'::hstore)
  • ?????????Buffers: shared hit=114
  • ?Planning time: 0.407 ms
  • ?Execution time: 2.294 ms
  • (10 rows)

  • Time: 3.332 ms

  • 4)建立GIN索引再查詢
    建立GIN索引對比一下。

    點擊(此處)折疊或打開

  • testdb=# drop index tbhs_idx_gist;
  • DROP INDEX
  • Time: 7.036 ms
  • testdb=# create index tbhs_idx_gin on tbhs using gin(hs);
  • CREATE INDEX
  • Time: 2244.241 ms
  • testdb=# analyze tbhs;
  • ANALYZE
  • Time: 64.608 ms
  • testdb=# select pg_relation_size('tbhs_idx_gin'),pg_relation_size('tbhs_idx_gin')/8192 index_pages;
  • ?pg_relation_size | index_pages
  • ------------------+-------------
  • ?????????17629184 | 2152
  • (1 row)

  • Time: 0.641 ms
  • testdb=# explain (analyze,buffers) select * from tbhs where hs ? '10';
  • ???????????????????????????????????????????????????????QUERY PLAN
  • ------------------------------------------------------------------------------------------------------------------------
  • ?Bitmap Heap Scan on tbhs (cost=16.77..314.14 rows=100 width=53) (actual time=0.096..0.097 rows=1 loops=1)
  • ???Recheck Cond: (hs ? '10'::text)
  • ???Heap Blocks: exact=1
  • ???Buffers: shared hit=5
  • ???-> Bitmap Index Scan on tbhs_idx_gin (cost=0.00..16.75 rows=100 width=0) (actual time=0.086..0.086 rows=1 loops=1)
  • ?????????Index Cond: (hs ? '10'::text)
  • ?????????Buffers: shared hit=4
  • ?Planning time: 0.349 ms
  • ?Execution time: 0.144 ms
  • (9 rows)

  • Time: 1.014 ms
  • testdb=# explain (analyze,buffers) select * from tbhs where hs @> '1=>c4ca4238a0b923820dcc509a6f75849b';
  • ???????????????????????????????????????????????????????QUERY PLAN
  • ------------------------------------------------------------------------------------------------------------------------
  • ?Bitmap Heap Scan on tbhs (cost=28.77..326.14 rows=100 width=53) (actual time=0.070..0.071 rows=1 loops=1)
  • ???Recheck Cond: (hs @> '"1"=>"c4ca4238a0b923820dcc509a6f75849b"'::hstore)
  • ???Heap Blocks: exact=1
  • ???Buffers: shared hit=8
  • ???-> Bitmap Index Scan on tbhs_idx_gin (cost=0.00..28.75 rows=100 width=0) (actual time=0.047..0.047 rows=1 loops=1)
  • ?????????Index Cond: (hs @> '"1"=>"c4ca4238a0b923820dcc509a6f75849b"'::hstore)
  • ?????????Buffers: shared hit=7
  • ?Planning time: 0.131 ms
  • ?Execution time: 0.105 ms
  • (9 rows)

  • Time: 0.661 ms
  • GIN索引的查詢效率相當不錯。GIN的缺點是:索引比較大,對更新速度有一定影響。

    3.5 參考

    http://blog.chinaunix.net/uid-20726500-id-4884626.html
    http://blog.chinaunix.net/uid-20726500-id-4884681.html
    http://blog.chinaunix.net/uid-20726500-id-4886571.html
    http://blog.chinaunix.net/uid-259788-id-4279627.html


    總結(jié)

    以上是生活随笔為你收集整理的关于PostgreSQL的GiST索引之四的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。