Trie树(字典树)详细知识点及其应用
?Trie,又經(jīng)常叫前綴樹,字典樹等等。它有很多變種,如后綴樹,Radix Tree/Trie,PATRICIA tree,以及bitwise版本的crit-bit tree。當(dāng)然很多名字的意義其實(shí)有交叉。
?
定義
在計(jì)算機(jī)科學(xué)中,trie,又稱前綴樹或字典樹,是一種有序樹,用于保存關(guān)聯(lián)數(shù)組,其中的鍵通常是字符串。與二叉查找樹不同,鍵不是直接保存在節(jié)點(diǎn)中,而是由節(jié)點(diǎn)在樹中的位置決定。一個(gè)節(jié)點(diǎn)的所有子孫都有相同的前綴,也就是這個(gè)節(jié)點(diǎn)對(duì)應(yīng)的字符串,而根節(jié)點(diǎn)對(duì)應(yīng)空字符串。一般情況下,不是所有的節(jié)點(diǎn)都有對(duì)應(yīng)的值,只有葉子節(jié)點(diǎn)和部分內(nèi)部節(jié)點(diǎn)所對(duì)應(yīng)的鍵才有相關(guān)的值。
trie中的鍵通常是字符串,但也可以是其它的結(jié)構(gòu)。trie的算法可以很容易地修改為處理其它結(jié)構(gòu)的有序序列,比如一串?dāng)?shù)字或者形狀的排列。比如,bitwise trie中的鍵是一串位元,可以用于表示整數(shù)或者內(nèi)存地址
?
基本性質(zhì)
1,根節(jié)點(diǎn)不包含字符,除根節(jié)點(diǎn)意外每個(gè)節(jié)點(diǎn)只包含一個(gè)字符。
2,從根節(jié)點(diǎn)到某一個(gè)節(jié)點(diǎn),路徑上經(jīng)過的字符連接起來(lái),為該節(jié)點(diǎn)對(duì)應(yīng)的字符串。
3,每個(gè)節(jié)點(diǎn)的所有子節(jié)點(diǎn)包含的字符串不相同。
?
優(yōu)點(diǎn):
可以最大限度地減少無(wú)謂的字符串比較,故可以用于詞頻統(tǒng)計(jì)和大量字符串排序。
跟哈希表比較:
1,最壞情況時(shí)間復(fù)雜度比hash表好
2,沒有沖突,除非一個(gè)key對(duì)應(yīng)多個(gè)值(除key外的其他信息)
3,自帶排序功能(類似Radix Sort),中序遍歷trie可以得到排序。
缺點(diǎn):
1,雖然不同單詞共享前綴,但其實(shí)trie是一個(gè)以空間換時(shí)間的算法。其每一個(gè)字符都可能包含至多字符集大小數(shù)目的指針(不包含衛(wèi)星數(shù)據(jù))。
每個(gè)結(jié)點(diǎn)的子樹的根節(jié)點(diǎn)的組織方式有幾種。1>如果默認(rèn)包含所有字符集,則查找速度快但浪費(fèi)空間(特別是靠近樹底部葉子)。2>如果用鏈接法(如左兒子右兄弟),則節(jié)省空間但查找需順序(部分)遍歷鏈表。3>alphabet reduction: 減少字符寬度以減少字母集個(gè)數(shù)。,4>對(duì)字符集使用bitmap,再配合鏈接法。
2,如果數(shù)據(jù)存儲(chǔ)在外部存儲(chǔ)器等較慢位置,Trie會(huì)較hash速度慢(hash訪問O(1)次外存,Trie訪問O(樹高))。
3,長(zhǎng)的浮點(diǎn)數(shù)等會(huì)讓鏈變得很長(zhǎng)。可用bitwise trie改進(jìn)。
?
bit-wise Trie
類似于普通的Trie,但是字符集為一個(gè)bit位,所以孩子也只有兩個(gè)。
可用于地址分配,路由管理等。
雖然是按bit位存儲(chǔ)和判斷,但因?yàn)閏ache-local和可高度并行,所以性能很高。跟紅黑樹比,紅黑樹雖然紙面性能更高,但是因?yàn)閏ache不友好和串行運(yùn)行多,瓶頸在存儲(chǔ)訪問延遲而不是CPU速度。
?
壓縮Trie
壓縮分支條件:
1,Trie基本不變
2,只是查詢
3,key跟結(jié)點(diǎn)的特定數(shù)據(jù)無(wú)關(guān)
4,分支很稀疏
若允許添加和刪除,就可能需要分裂和合并結(jié)點(diǎn)。此時(shí)可能需要對(duì)壓縮率和更新(裂,并)頻率進(jìn)行折中。
?
外存Trie
某些變種如后綴樹適合存儲(chǔ)在外部,另外還有B-trie等。
?
應(yīng)用場(chǎng)景:
(1) 字符串檢索
事先將已知的一些字符串(字典)的有關(guān)信息保存到trie樹里,查找另外一些未知字符串是否出現(xiàn)過或者出現(xiàn)頻率。
舉例:
1,給出N 個(gè)單詞組成的熟詞表,以及一篇全用小寫英文書寫的文章,請(qǐng)你按最早出現(xiàn)的順序?qū)懗鏊胁辉谑煸~表中的生詞。
2,給出一個(gè)詞典,其中的單詞為不良單詞。單詞均為小寫字母。再給出一段文本,文本的每一行也由小寫字母構(gòu)成。判斷文本中是否含有任何不良單詞。例如,若rob是不良單詞,那么文本problem含有不良單詞。
3,1000萬(wàn)字符串,其中有些是重復(fù)的,需要把重復(fù)的全部去掉,保留沒有重復(fù)的字符串。
?
(2)文本預(yù)測(cè)、自動(dòng)完成,see also,拼寫檢查
?
(3)詞頻統(tǒng)計(jì)
1,有一個(gè)1G大小的一個(gè)文件,里面每一行是一個(gè)詞,詞的大小不超過16字節(jié),內(nèi)存限制大小是1M。返回頻數(shù)最高的100個(gè)詞。
2,一個(gè)文本文件,大約有一萬(wàn)行,每行一個(gè)詞,要求統(tǒng)計(jì)出其中最頻繁出現(xiàn)的前10個(gè)詞,請(qǐng)給出思想,給出時(shí)間復(fù)雜度分析。
3,尋找熱門查詢:搜索引擎會(huì)通過日志文件把用戶每次檢索使用的所有檢索串都記錄下來(lái),每個(gè)查詢串的長(zhǎng)度為1-255字節(jié)。假設(shè)目前有一千萬(wàn)個(gè)記錄,這些查詢串的重復(fù)度比較高,雖然總數(shù)是1千萬(wàn),但是如果去除重復(fù),不超過3百萬(wàn)個(gè)。一個(gè)查詢串的重復(fù)度越高,說(shuō)明查詢它的用戶越多,也就越熱門。請(qǐng)你統(tǒng)計(jì)最熱門的10個(gè)查詢串,要求使用的內(nèi)存不能超過1G。
(1) 請(qǐng)描述你解決這個(gè)問題的思路;
(2) 請(qǐng)給出主要的處理流程,算法,以及算法的復(fù)雜度。
==》若無(wú)內(nèi)存限制:Trie + “k-大/小根堆”(k為要找到的數(shù)目)。
否則,先hash分段再對(duì)每一個(gè)段用hash(另一個(gè)hash函數(shù))統(tǒng)計(jì)詞頻,再要么利用歸并排序的某些特性(如partial_sort),要么利用某使用外存的方法。參考
“海量數(shù)據(jù)處理之歸并、堆排、前K方法的應(yīng)用:一道面試題”http://www.dataguru.cn/thread-485388-1-1.html。
“算法面試題之統(tǒng)計(jì)詞頻前k大”http://blog.csdn.net/u011077606/article/details/42640867
?算法導(dǎo)論筆記——第九章 中位數(shù)和順序統(tǒng)計(jì)量?
?
(4)排序
Trie樹是一棵多叉樹,只要先序遍歷整棵樹,輸出相應(yīng)的字符串便是按字典序排序的結(jié)果。
比如給你N 個(gè)互不相同的僅由一個(gè)單詞構(gòu)成的英文名,讓你將它們按字典序從小到大排序輸出。
?
(5)字符串最長(zhǎng)公共前綴
Trie樹利用多個(gè)字符串的公共前綴來(lái)節(jié)省存儲(chǔ)空間,當(dāng)我們把大量字符串存儲(chǔ)到一棵trie樹上時(shí),我們可以快速得到某些字符串的公共前綴。
舉例:
給出N 個(gè)小寫英文字母串,以及Q 個(gè)詢問,即詢問某兩個(gè)串的最長(zhǎng)公共前綴的長(zhǎng)度是多少?
解決方案:首先對(duì)所有的串建立其對(duì)應(yīng)的字母樹。此時(shí)發(fā)現(xiàn),對(duì)于兩個(gè)串的最長(zhǎng)公共前綴的長(zhǎng)度即它們所在結(jié)點(diǎn)的公共祖先個(gè)數(shù),于是,問題就轉(zhuǎn)化為了離線(Offline)的最近公共祖先(Least Common Ancestor,簡(jiǎn)稱LCA)問題。
而最近公共祖先問題同樣是一個(gè)經(jīng)典問題,可以用下面幾種方法:
1. 利用并查集(Disjoint Set),可以采用采用經(jīng)典的Tarjan 算法;
2. 求出字母樹的歐拉序列(Euler Sequence )后,就可以轉(zhuǎn)為經(jīng)典的最小值查詢(Range Minimum Query,簡(jiǎn)稱RMQ)問題了;
?
(6)字符串搜索的前綴匹配
trie樹常用于搜索提示。如當(dāng)輸入一個(gè)網(wǎng)址,可以自動(dòng)搜索出可能的選擇。當(dāng)沒有完全匹配的搜索結(jié)果,可以返回前綴最相似的可能。
Trie樹檢索的時(shí)間復(fù)雜度可以做到n,n是要檢索單詞的長(zhǎng)度,
如果使用暴力檢索,需要指數(shù)級(jí)O(n2)的時(shí)間復(fù)雜度。
?
(7)?作為其他數(shù)據(jù)結(jié)構(gòu)和算法的輔助結(jié)構(gòu)
如后綴樹,AC自動(dòng)機(jī)等
后綴樹可以用于全文搜索
?
轉(zhuǎn)一篇關(guān)于幾種Trie速度比較的文章:http://www.hankcs.com/nlp/performance-comparison-of-several-trie-tree.html
Trie樹和其它數(shù)據(jù)結(jié)構(gòu)的比較?http://www.raychase.net/1783
?
參考:
[1] 維基百科:Trie,?https://en.wikipedia.org/wiki/Trie
[2] LeetCode字典樹(Trie)總結(jié),?http://www.jianshu.com/p/bbfe4874f66f
[3]?字典樹(Trie樹)的實(shí)現(xiàn)及應(yīng)用,http://www.cnblogs.com/binyue/p/3771040.html#undefined
[4]?6天通吃樹結(jié)構(gòu)—— 第五天 Trie樹,http://www.cnblogs.com/huangxincheng/archive/2012/11/25/2788268.html
?
?
?
?=============摘錄自維基百科============
In?computer science, a?trie, also called?digital tree?and sometimes?radix tree?or?prefix tree?(as they can be searched by prefixes), is a kind of?search tree—an ordered?tree?data structure?that is used to store a?dynamic set?or?associative arraywhere the keys are usually?strings. Unlike a?binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common?prefixof the string associated with that node, and the root is associated with the?empty string. Values are not necessarily associated with every node. Rather, values tend only to be associated with leaves, and with some inner nodes that correspond to keys of interest. For the space-optimized presentation of prefix tree, see?compact prefix tree.
?
A trie for keys "A","to", "tea", "ted", "ten", "i", "in", and "inn".
In the example shown, keys are listed in the nodes and values below them. Each complete English word has an arbitrary integer value associated with it. A trie can be seen as a tree-shaped?deterministic finite automaton. Each?finite language?is generated by a trie automaton, and each trie can be compressed into a?deterministic acyclic finite state automaton.
Though tries are usually keyed by character strings,[not verified in body]?they need not be. The same algorithms can be adapted to serve similar functions of ordered lists of any construct, e.g. permutations on a list of digits or shapes. In particular, a?bitwise trie?is keyed on the individual bits making up any fixed-length binary datum, such as an integer or memory address.[citation needed]
Contents
- 1?History and etymology
- 2?Applications
- 2.1?As a replacement for other data structures
- 2.2?Dictionary representation
- 2.3?Term indexing
- 3?Algorithms
- 3.1?Sorting
- 3.2?Full text search
- 4?Implementation strategies
- 4.1?Bitwise tries
- 4.2?Compressing tries
- 4.3?External memory tries
- 5?See also
- 6?References
- 7?External links
History and etymology
Tries were first described by René de la Briandais in 1959.[1][2]:336?The term?triewas coined two years later by?Edward Fredkin, who pronounces it?/?tri?/?(as "tree"), after the middle syllable of?retrieval.[3][4]?However, other authors pronounce it?/?tra?/?(as "try"), in an attempt to distinguish it verbally from "tree".[3][4][5]
Applications
As a replacement for other data structures
As discussed below, a trie has a number of advantages over binary search trees.[6]A trie can also be used to replace a?hash table, over which it has the following advantages:
- Looking up data in a trie is faster in the worst case, O(m) time (where m is the length of a search string), compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table is?O(N)?time, but far more typically is O(1), with O(m) time spent evaluating the hash.[citation needed]
- There are no collisions of different keys in a trie.
- Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value.[citation needed]
- There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
- A trie can provide an alphabetical ordering of the entries by key.
Tries do have some drawbacks as well:
- Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random-access time is high compared to main memory.[7]
- Some keys, such as floating point numbers, can lead to long chains and prefixes that are not particularly meaningful. Nevertheless, a bitwise trie can handle standard IEEE single and double format floating point numbers.[citation needed]
- Some tries can require more space than a hash table, as memory may be allocated for each character in the search string, rather than a single chunk of memory for the whole entry, as in most hash tables.
Dictionary representation
A common application of a trie is storing a?predictive text?or?autocompletedictionary, such as found on a?mobile telephone. Such applications take advantage of a trie's ability to quickly search for, insert, and delete entries; however, if storing dictionary words is all that is required (i.e., storage of information auxiliary to each word is not required), a minimal?deterministic acyclic finite state automaton(DAFSA) would use less space than a trie. This is because a DAFSA can compress identical branches from the trie which correspond to the same suffixes (or parts) of different words being stored.
Tries are also well suited for implementing approximate matching algorithms,[8]including those used in?spell checking?and?hyphenation[4]?software.
Term indexing
A?discrimination tree?term index?stores its information in a trie data structure.[9]
Algorithms
Lookup and membership are easily described. The listing below implements a recursive trie node as a?Haskell?data type. It stores an optional value and a list of children tries, indexed by the next character:
import Data.Mapdata Trie a = Trie { value :: Maybe a,children :: Map Char (Trie a) }We can look up a value in the trie as follows:
find :: String -> Trie a -> Maybe a find [] t = value t find (k:ks) t = doct <- Data.Map.lookup k (children t)find ks ctIn an imperative style, and assuming an appropriate data type in place, we can describe the same algorithm in?Python?(here, specifically for testing membership). Note that?children?is a list of a node's children; and we say that a "terminal" node is one which contains a valid word.
def find(node, key):for char in key:if char in node.children:node = node.children[char]else:return Nonereturn node.value == keyInsertion proceeds by walking the trie according to the string to be inserted, then appending new nodes for the suffix of the string that is not contained in the trie. In imperative?Pascal?pseudocode:
algorithm insert(root : node, s : string, value : any):node = rooti = 0n = length(s)while i < n:if node.child(s[i]) != nil:node = node.child(s[i])i = i + 1else:break(* append new nodes, if necessary *)while i < n:node.child(s[i]) = new nodenode = node.child(s[i])i = i + 1node.value = valueSorting
Lexicographic sorting of a set of keys can be accomplished with an?inorder?traversal over trie.
This algorithm is a form of?radix sort.
A trie forms the fundamental data structure of?Burstsort, which (in 2007) was the fastest known string sorting algorithm.[10]?However, now there are faster string sorting algorithms.[11]
Full text search
A special kind of trie, called a?suffix tree, can be used to index all suffixes in a text in order to carry out fast full text searches.
Implementation strategies
?
A trie implemented as a?doubly chained tree: vertical arrows are?child pointers, dashed horizontal arrows are?next pointers. The set of strings stored in this trie is?{baby, bad, bank, box, dad, dance}. The lists are sorted to allow traversal in lexicographic order.
There are several ways to represent tries, corresponding to different trade-offs between memory use and speed of the operations. The basic form is that of a linked set of nodes, where each node contains an array of child pointers, one for each symbol in the?alphabet?(so for the?English alphabet, one would store 26 child pointers and for the alphabet of bytes, 256 pointers). This is simple but wasteful in terms of memory: using the alphabet of bytes (size 256) and four-byte pointers, each node requires a kilobyte of storage, and when there is little overlap in the strings' prefixes, the number of required nodes is roughly the combined length of the stored strings.[2]:341?Put another way, the nodes near the bottom of the tree tend to have few children and there are many of them, so the structure wastes space storing null pointers.[12]
The storage problem can be alleviated by an implementation technique called?alphabet reduction, whereby the original strings are reinterpreted as longer strings over a smaller alphabet. E.g., a string of?n bytes can alternatively be regarded as a string of?2n?four-bit units?and stored in a trie with sixteen pointers per node. Lookups need to visit twice as many nodes in the worst case, but the storage requirements go down by a factor of eight.[2]:347–352
An alternative implementation represents a node as a triple?(symbol, child, next) and links the children of a node together as a?singly linked list:?child points to the node's first child,?next to the parent node's next child.[12][13]?The set of children can also be represented as a?binary search tree; one instance of this idea is the?ternary search tree?developed by?Bentley?and?Sedgewick.[2]:353
Another alternative in order to avoid the use of an array of 256 pointers (ASCII), as suggested before, is to store the alphabet array as a bitmap of 256 bits representing the ASCII alphabet, reducing dramatically the size of the nodes.[14]
Bitwise tries
| This section?does not?cite?any?sources.?Please help?improve this sectionby?adding citations to reliable sources. Unsourced material may be challenged and?removed.?(February 2015)?(Learn how and when to remove this template message) |
Bitwise tries are much the same as a normal character-based trie except that individual bits are used to traverse what effectively becomes a form of binary tree. Generally, implementations use a special CPU instruction to very quickly find the first set bit in a fixed length key (e.g., GCC's?__builtin_clz()?intrinsic). This value is then used to index a 32- or 64-entry table which points to the first item in the bitwise trie with that number of leading zero bits. The search then proceeds by testing each subsequent bit in the key and choosing?child[0]?or?child[1]appropriately until the item is found.
Although this process might sound slow, it is very cache-local and highly parallelizable due to the lack of register dependencies and therefore in fact has excellent performance on modern?out-of-order execution?CPUs. A?red-black tree?for example performs much better on paper, but is highly cache-unfriendly and causes multiple pipeline and?TLB?stalls on modern CPUs which makes that algorithm bound by memory latency rather than CPU speed. In comparison, a bitwise trie rarely accesses memory, and when it does, it does so only to read, thus avoiding SMP cache coherency overhead. Hence, it is increasingly becoming the algorithm of choice for code that performs many rapid insertions and deletions, such as memory allocators (e.g., recent versions of the famous?Doug Lea's allocator (dlmalloc) and its descendents).
Compressing tries
Compressing the trie and merging the common branches can sometimes yield large performance gains. This works best under the following conditions:
- The trie is mostly static (key insertions to or deletions from a pre-filled trie are disabled).[citation needed]
- Only lookups are needed.
- The trie nodes are not keyed by node-specific data, or the nodes' data are common.[15]
- The total set of stored keys is very sparse within their representation space.[citation needed]
For example, it may be used to represent sparse?bitsets, i.e., subsets of a much larger, fixed enumerable set. In such a case, the trie is keyed by the bit element position within the full set. The key is created from the string of bits needed to encode the integral position of each element. Such tries have a very degenerate form with many missing branches. After detecting the repetition of common patterns or filling the unused gaps, the unique leaf nodes (bit strings) can be stored and compressed easily, reducing the overall size of the trie.
Such compression is also used in the implementation of the various fast lookup tables for retrieving?Unicode?character properties. These could include case-mapping tables (e.g. for the?Greek?letter?pi, from ∏ to π), or lookup tables normalizing the combination of base and combining characters (like the a-umlaut?in?German, ?, or the?dalet-patah-dagesh-ole?in?Biblical Hebrew,??????). For such applications, the representation is similar to transforming a very large, unidimensional, sparse table (e.g. Unicode code points) into a multidimensional matrix of their combinations, and then using the coordinates in the hyper-matrix as the string key of an uncompressed trie to represent the resulting character. The compression will then consist of detecting and merging the common columns within the hyper-matrix to compress the last dimension in the key. For example, to avoid storing the full, multibyte Unicode code point of each element forming a matrix column, the groupings of similar code points can be exploited. Each dimension of the hyper-matrix stores the start position of the next dimension, so that only the offset (typically a single byte) need be stored. The resulting vector is itself compressible when it is also sparse, so each dimension (associated to a layer level in the trie) can be compressed separately.
Some implementations do support such data compression within dynamic sparse tries and allow insertions and deletions in compressed tries. However, this usually has a significant cost when compressed segments need to be split or merged. Some tradeoff has to be made between data compression and update speed. A typical strategy is to limit the range of global lookups for comparing the common branches in the sparse trie.[citation needed]
The result of such compression may look similar to trying to transform the trie into a?directed acyclic graph?(DAG), because the reverse transform from a DAG to a trie is obvious and always possible. However, the shape of the DAG is determined by the form of the key chosen to index the nodes, in turn constraining the compression possible.
Another compression strategy is to "unravel" the data structure into a single byte array.[16]?This approach eliminates the need for node pointers, substantially reducing the memory requirements. This in turn permits memory mapping and the use of virtual memory to efficiently load the data from disk.
One more approach is to "pack" the trie.[4]?Liang describes a space-efficient implementation of a sparse packed trie applied to automatic?hyphenation, in which the descendants of each node may be interleaved in memory.
External memory tries
Several trie variants are suitable for maintaining sets of strings in?external memory, including suffix trees. A combination of trie and?B-tree, called the?B-trie?has also been suggested for this task; compared to suffix trees, they are limited in the supported operations but also more compact, while performing update operations faster.[17]
See also
- Suffix tree
- Radix tree
- Directed acyclic word graph?(aka DAWG)
- Acyclic deterministic finite automata
- Hash trie
- Deterministic finite automata
- Judy array
- Search algorithm
- Extendible hashing
- Hash array mapped trie
- Prefix Hash Tree
- Burstsort
- Lule? algorithm
- Huffman coding
- Ctrie
- HAT-trie
References
- de la Briandais, René (1959).?File searching using variable length keys. Proc. Western J. Computer Conf. pp.?295–298.?Cited by Brass.
- ?
- Brass, Peter (2008).?Advanced Data Structures. Cambridge University Press.
- ?
- Black, Paul E. (2009-11-16).?"trie".?Dictionary of Algorithms and Data Structures.?National Institute of Standards and Technology. Archived from?the original?on 2010-05-19.
- ?
- Franklin Mark Liang (1983).?Word Hy-phen-a-tion By Com-put-er?(Doctor of Philosophy thesis). Stanford University. Archived from?the original?(PDF) on 2010-05-19. Retrieved?2010-03-28.
- ?
- Knuth, Donald?(1997). "6.3: Digital Searching".?The Art of Computer Programming Volume 3: Sorting and Searching?(2nd ed.). Addison-Wesley. p.?492.?ISBN?0-201-89685-0.
- ?
- Bentley, Jon;?Sedgewick, Robert?(1998-04-01).?"Ternary Search Trees".?Dr. Dobb's Journal. Dr Dobb's. Archived from?the original?on 2008-06-23.
- ?
- Edward Fredkin?(1960). "Trie Memory".?Communications of the ACM.?3?(9): 490–499.?doi:10.1145/367390.367400.
- ?
- Aho, Alfred V.; Corasick, Margaret J. (Jun 1975).?"Efficient String Matching: An Aid to Bibliographic Search"?(PDF).?Communications of the ACM.?18?(6): 333–340.doi:10.1145/360825.360855.
- ?
- John W. Wheeler; Guarionex Jordan.?"An Empirical Study of Term Indexing in the Darwin Implementation of the Model Evolution Calculus". 2004. p. 5.
- ?
- "Cache-Efficient String Sorting Using Copying"?(PDF). Retrieved?2008-11-15.
- ?
- "Engineering Radix Sort for Strings.".?Lecture Notes in Computer Science: 3–14.?doi:10.1007/978-3-540-89097-3_3.
- ?
- Allison, Lloyd.?"Tries". Retrieved?18 February 2014.
- ?
- Sahni, Sartaj.?"Tries".?Data Structures, Algorithms, & Applications in Java. University of Florida. Retrieved?18 February 2014.
- ?
- Bellekens, Xavier (2014).?A Highly-Efficient Memory-Compression Scheme for GPU-Accelerated Intrusion Detection Systems. Glasgow, Scotland, UK: ACM. pp.?302:302––302:309.?ISBN?978-1-4503-3033-6. Retrieved?21 October 2015.
- ?
- Jan Daciuk; Stoyan Mihov; Bruce W. Watson; Richard E. Watson (2000).?"Incremental Construction of Minimal Acyclic Finite-State Automata".?Computational Linguistics. Association for Computational Linguistics.?26: 3.?doi:10.1162/089120100561601. Archived from?the original?on 2006-03-13. Retrieved?2009-05-28.?This paper presents a method for direct building of minimal acyclic finite states automaton which recognizes a given finite list of words in lexicographical order. Our approach is to construct a minimal automaton in a single phase by adding new strings one by one and minimizing the resulting automaton on-the-fly
- ?
- Ulrich Germann; Eric Joanis; Samuel Larkin (2009).?"Tightly packed tries: how to fit large models into memory, and make them load fast, too"?(PDF).?ACL Workshops: Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing. Association for Computational Linguistics. pp.?31–39.?We present Tightly Packed Tries (TPTs), a compact implementation of read-only, compressed trie structures with fast on-demand paging and short load times. We demonstrate the benefits of TPTs for storing n-gram back-off language models and phrase tables for?statistical machine translation. Encoded as TPTs, these databases require less space than flat text file representations of the same data compressed with the gzip utility. At the same time, they can be mapped into memory quickly and be searched directly in time linear in the length of the key, without the need to decompress the entire file. The overhead for local decompression during search is marginal.
- ?
External links
| Wikimedia Commons has media related to?Trie. |
| Look up?trie?in Wiktionary, the free dictionary. |
- NIST's Dictionary of Algorithms and Data Structures: Trie
轉(zhuǎn)自https://www.cnblogs.com/justinh/p/7716421.html?
總結(jié)
以上是生活随笔為你收集整理的Trie树(字典树)详细知识点及其应用的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: mtx.exe - mtx是什么进程 有
- 下一篇: ACMer的AC福音!手动扩栈外挂!(防