TCTDB存储结构
TCTDB是tokyo cabinet家族中的表格數(shù)據(jù)庫(kù)(如上圖),其實(shí)現(xiàn)基于TCHDB(hash database)和TCBDB(B-tree database)。
TCHDB參考:http://blog.chinaunix.net/space.php?uid=20196318&do=blog&id=327754
TCBDB的代碼沒(méi)有讀過(guò),有時(shí)間也閱讀一下,其結(jié)構(gòu)如下圖所示。
?
TCTDB的主要特性:
1.? 松散表格實(shí)現(xiàn),以primary key標(biāo)示表格的一行,每行包括多列,列以名字標(biāo)示。
行的存儲(chǔ)使用TCHDB,把所有的行作為一個(gè)value(在內(nèi)存中是一個(gè)map,map中為多個(gè)列的列名以及列值)存儲(chǔ),更新時(shí),也必須整體更新,即使是單個(gè)colomn的更新也需要把整體讀出來(lái)更新其中的部分信息。
2.?靈活的數(shù)據(jù)結(jié)構(gòu),所有的數(shù)據(jù)結(jié)構(gòu)都作為char[]存儲(chǔ),no schema,no data type。
3.??查詢(xún)機(jī)制,實(shí)現(xiàn)多種查詢(xún)操作(字符串匹配,正則,整數(shù)比較等),結(jié)果可按列字段排序。
4.??列索引,通過(guò)TCBDB存儲(chǔ)基于列的索引,查詢(xún)機(jī)制基于列索引(toke,q-gram),先后去對(duì)應(yīng)列的索引倒排表,然后在表中進(jìn)行匹配。
查詢(xún)支持的匹配方式
enum {?????????????????????????????????? /* enumeration for query conditions */
? TDBQCSTREQ,??????????????????????????? /* string is equal to */
? TDBQCSTRINC,?????????????????????????? /* string is included in */
? TDBQCSTRBW,??????????????????????????? /* string begins with */
? TDBQCSTREW,??????????????????????????? /* string ends with */
? TDBQCSTRAND,?????????????????????????? /* string includes all tokens in */
? TDBQCSTROR,??????????????????????????? /* string includes at least one token in */
? TDBQCSTROREQ,????????????????????????? /* string is equal to at least one token in */
? TDBQCSTRRX,??????????????????????????? /* string matches regular expressions of */
? TDBQCNUMEQ,??????????????????????????? /* number is equal to */
? TDBQCNUMGT,??????????????????????????? /* number is greater than */
? TDBQCNUMGE,??????????????????????????? /* number is greater than or equal to */
? TDBQCNUMLT,????????? ??????????????????/* number is less than */
? TDBQCNUMLE,??????????????????????????? /* number is less than or equal to */
? TDBQCNUMBT,??????????????????????????? /* number is between two tokens of */
? TDBQCNUMOREQ,????????????????????????? /* number is equal to at least one token in */
? TDBQCFTSPH,??????????????????????????? /* full-text search with the phrase of */
? TDBQCFTSAND,?????????????????????????? /* full-text search with all tokens in */
? TDBQCFTSOR,??????????????????????????? /* full-text search with at least one token in */
? TDBQCFT***,??????????????????????????? /* full-text search with the compound expression of */
? TDBQCNEGATE = 1 << 24,???????????????? /* negation flag */
? TDBQCNOIDX = 1 << 25?????????????????? /* no index flag */
};
?
TC支持多種類(lèi)型查詢(xún),匹配主要基于字符串和數(shù)值,字符串的匹配支持正則表達(dá)式,部分匹配,前向/后向匹配;數(shù)值的主要基于比較運(yùn)算符。
?
結(jié)果排序方式
enum {?????????????????????????????????? /* enumeration for order types */
? TDBQOSTRASC,?????????????????????????? /* string ascending */
? TDBQOSTRDESC,????????????????????????? /* string descending */
? TDBQONUMASC,?????????????????????????? /* number ascending */
? TDBQONUMDESC?????????????????????????? /* number descending */
};
?
TCTDB可對(duì)查詢(xún)結(jié)果進(jìn)行排序,支持以上四種方式的排序,按字符串升降序,按數(shù)值升降序。
?
索引類(lèi)型
enum {?????????????????????????????????? /* enumeration for index types */
? TDBITLEXICAL,????????????????????????? /* lexical string */
? TDBITDECIMAL,????????????????????????? /* decimal string */
? TDBITTOKEN,??????????????????????????? /* token inverted index */
? TDBITQGRAM,??????????????????????????? /* q-gram inverted index */
};
?
TCTDB在建立列索引時(shí),可設(shè)置索引類(lèi)型,主要包括字典串序索引,十進(jìn)制串序索引,TOKEN索引,q-gram索引。如果需要對(duì)某個(gè)列進(jìn)行建立索引,則每當(dāng)插入一個(gè)新的行時(shí),會(huì)對(duì)相應(yīng)的列添加指定的索引(每個(gè)索引對(duì)應(yīng)一個(gè)TCBDB來(lái)存儲(chǔ)索引數(shù)據(jù))。
?
| Name(主key) | Age | Company | Interest |
| Jack | 23 | baidu | basketball,sanguosha |
| Rose | 22 | tencent | pingpong,poker |
| Joe | 25 | taobao | kfc bicycle |
?
如果需要根據(jù)Age建立索引,索引類(lèi)型為TDBITDECIMAL
則當(dāng)三個(gè)行插入后,生成的索引(TCBDB)中包含三行,依次為22(rose),23(jack),25(joe)(索引會(huì)根據(jù)類(lèi)型進(jìn)行排序,括號(hào)中代表value)。
?
如果還需要對(duì)Company建立索引,類(lèi)型為TDBITDECIMAL,則當(dāng)插入三行后,生成的索引中三行依次為,baidu(jack),taobao(joe),tencent(rose)。
?
如果還需要對(duì)interest建立索引,類(lèi)型為TDBITTOKEN,則當(dāng)插入三行后,生成的索引中依次為,basketball(jack),bicycle(joe),kfc(joe),pingpong(rose),poker(rose),sanguosha(jack)。
?
有了以上索引,則當(dāng)需要根據(jù)某個(gè)colomn的值進(jìn)行查詢(xún)時(shí),效率會(huì)相當(dāng)高,其基于btree進(jìn)行查找。
?
轉(zhuǎn)載于:https://www.cnblogs.com/yunnotes/archive/2013/04/19/3032349.html
總結(jié)