trie树查找前缀串_Trie数据结构(前缀树)
trie樹查找前綴串
by Julia Geist
Julia·蓋斯特(Julia Geist)
A Trie, (also known as a prefix tree) is a special type of tree used to store associative data structures
Trie (也稱為前綴樹)是一種特殊類型的樹,用于存儲(chǔ)關(guān)聯(lián)數(shù)據(jù)結(jié)構(gòu)
A trie (pronounced try) gets its name from retrieval — its structure makes it a stellar matching algorithm.
Trie(發(fā)音為try)的名稱取自rev val -其結(jié)構(gòu)使其成為出色的匹配算法。
語境 (Context)
Write your own shuffle method to randomly shuffle characters in a string.Use the words text file, located at /usr/share/dict/words, and your shuffle method to create an anagram generator that only produces real words.Given a string as a command line argument, print one of its anagrams.I was presented with this challenge this week at Make School’s Product Academy.
本周,我在Make School的產(chǎn)品學(xué)院里接受了這一挑戰(zhàn)。
The words in the text file are separated by new lines. Its formatting makes it a lot easier to put the words into a data structure. For now, I’m storing them in a list — each element being a single word from the file.
文本文件中的單詞用新行分隔。 它的格式使將單詞放入數(shù)據(jù)結(jié)構(gòu)變得容易得多。 現(xiàn)在,我將它們存儲(chǔ)在列表中-每個(gè)元素都是文件中的單個(gè)單詞。
One approach to this challenge is to:
解決這一挑戰(zhàn)的一種方法是:
- randomly shuffle the characters in the string 隨機(jī)隨機(jī)播放字符串中的字符
then, check it against all words that were in /usr/share/dict/words to verify that it’s a real word.
然后,對(duì)照/ usr / share / dict / words中的所有單詞檢查它,以確認(rèn)它是真實(shí)單詞。
However, this approach requires that I check that the randomly shuffled characters in the new string matches one of 235,887 words in that file — that means 235,887 operations for each string that I want to verify as a real word.
但是,這種方法要求我檢查新字符串中隨機(jī)混洗的字符是否與該文件中的235887個(gè)單詞匹配-這意味著要驗(yàn)證為真實(shí)單詞的每個(gè)字符串需要進(jìn)行235887次操作 。
This was an unacceptable solution for me. I first looked up libraries that had already been implemented to check if words exist in a language, and found pyenchant. I first completed the challenge using the library, in a few lines of code.
對(duì)我來說,這是不可接受的解決方案。 我首先查找已經(jīng)實(shí)現(xiàn)的庫(kù),以檢查語言中是否存在單詞,然后找到pyenchant 。 我首先使用庫(kù)通過幾行代碼完成了挑戰(zhàn)。
def generateAnagram(string, language="en_US"): languageDict = enchant.Dict(language) numOfPossibleCombinationsForString = math.factorial(len(string)) for i in range(0, numOfPossibleCombinationsForString): wordWithShuffledCharacters = shuffleCharactersOf(string)if languageDict.check(wordWithShuffledCharacters): return wordWithShuffledCharacters return "There is no anagram in %s for %s." % (language, string)Using a couple of library functions in my code was a quick and easy solution. However, I didn’t learn much by finding a library to solve the problem for me.
在我的代碼中使用幾個(gè)庫(kù)函數(shù)是一個(gè)快速簡(jiǎn)便的解決方案。 但是,我沒有找到可以為我解決問題的圖書館,因此學(xué)不到很多東西。
I was positive that the library wasn’t using the approach I mentioned earlier. I was curious and dug through the source code — I found a trie.
我很肯定圖書館沒有使用我前面提到的方法。 我很好奇并仔細(xì)研究了源代碼,發(fā)現(xiàn)了一個(gè)特里。
特里 (Trie)
A trie stores data in “steps”. Each step is a node in the trie.
特里樹以“步驟”存儲(chǔ)數(shù)據(jù)。 每個(gè)步驟都是特里樹中的一個(gè)節(jié)點(diǎn)。
Storing words is a perfect use case for this kind of tree, since there are a finite amount of letters that can be put together to make a string.
對(duì)于此類樹而言,存儲(chǔ)單詞是一個(gè)完美的用例,因?yàn)榭梢詫⒁欢〝?shù)量的字母放在一起構(gòu)成一個(gè)字符串。
Each step, or node, in a language trie will represent one letter of a word. The steps begin to branch off when the order of the letters diverge from the other words in the trie, or when a word ends.
語言樹中的每個(gè)步驟或節(jié)點(diǎn)將代表一個(gè)單詞的一個(gè)字母。 當(dāng)字母的順序與特里中的其他單詞不同或單詞結(jié)束時(shí),步驟開始分支。
I created a trie out of directories on my Desktop to visualize stepping down through nodes. This is a trie that contains two words: apple and app.
我從桌面上的目錄中創(chuàng)建了一個(gè)Trie,以可視化逐步通過節(jié)點(diǎn)。 這是一個(gè)trie,包含兩個(gè)詞:apple和app。
Note that the end of a word is denoted with a ‘$’. I’m using ‘$’ because it is a unique character that will not be present in any word in any language.
注意,單詞的結(jié)尾用“ $”表示。 我使用的是“ $”,因?yàn)樗且粋€(gè)唯一字符,不會(huì)以任何語言出現(xiàn)在任何單詞中。
If I were to add the word ‘a(chǎn)perture’ to this trie, I would loop over the letters in the word ‘a(chǎn)perture’ while simultaneously stepping down the nodes in the trie. If the letter exists as a child of the current node, step down into it. If the letter does not exist as a child of the current node, create it and then step down into it. To visualize these steps using my directories:
如果要在此Trie中添加單詞“ aperture”,則將遍歷單詞“ aperture”中的字母,同時(shí)降低Trie中的節(jié)點(diǎn)。 如果該字母作為當(dāng)前節(jié)點(diǎn)的子代存在,請(qǐng)下移至該節(jié)點(diǎn)。 如果該字母不作為當(dāng)前節(jié)點(diǎn)的子代存在,請(qǐng)創(chuàng)建該字母,然后逐步降低該字母。 要使用我的目錄可視化這些步驟:
While stepping down the trie, the first two letters of ‘a(chǎn)perture’ are already present in the trie, so I step down into those nodes.
在下移Trie時(shí),“ aperture”的前兩個(gè)字母已經(jīng)存在于該Trie中,因此我下移到這些節(jié)點(diǎn)。
The third letter, ‘e’, however, is not a child of the ‘p’ node. A new node is created to represent the letter ‘e’, branching off from the other words in the trie. New nodes for the letters that follow are created as well.
但是,第三個(gè)字母“ e”不是“ p”節(jié)點(diǎn)的子代。 創(chuàng)建一個(gè)新的節(jié)點(diǎn)來表示字母“ e”,并從該樹中的其他單詞分支出來。 還創(chuàng)建了隨后字母的新節(jié)點(diǎn)。
To generate a trie from a words file, this process will happen for each word, until all combinations for every word are stored.
為了從單詞文件生成特里樹,將對(duì)每個(gè)單詞進(jìn)行此過程,直到存儲(chǔ)每個(gè)單詞的所有組合。
You might be thinking: “Wait, won’t it take really long to generate the trie from that text file with 235,887 words in it? What’s the point of looping over every single character in every single word?”
您可能會(huì)想:“等等,從包含235,887個(gè)單詞的文本文件生成trie是否真的需要很長(zhǎng)時(shí)間? 是什么在每一個(gè)字,遍歷每一個(gè)字符的意義呢?”
Yes, iterating over each character of every word to generate a trie does take some time. However, the time taken to create the trie is well worth it — because to check if a word exists in the text file, it takes at most, as many operations as the length of the word itself. Much better than the 235,887 operations it was going to take before.
是的,遍歷每個(gè)單詞的每個(gè)字符以生成特里確實(shí)需要一些時(shí)間。 但是,創(chuàng)建特里樹所花的時(shí)間是非常值得的 -因?yàn)闄z查文本文件中是否存在單詞,它最多花費(fèi)與單詞本身長(zhǎng)度一樣多的操作。 比之前要進(jìn)行的235,887次操作好得多 。
I wrote the simplest version of a trie, using nested dictionaries. This isn’t the most efficient way to implement one, but it is a good exercise to understand the logic behind a trie.
我使用嵌套詞典編寫了最簡(jiǎn)單的trie版本。 這不是實(shí)現(xiàn)一個(gè)的最有效方法,但是了解一個(gè)trie背后的邏輯是一個(gè)很好的練習(xí)。
endOfWord = "$"def generateTrieFromWordsArray(words): root = {} for word in words: currentDict = root for letter in word: currentDict = currentDict.setdefault(letter, {}) currentDict[endOfWord] = endOfWord return rootdef isWordPresentInTrie(trie, word): currentDict = trie for letter in word: if letter in currentDict: currentDict = currentDict[letter] else: return False return endOfWord in currentDictYou can see my solution for the anagram generator on my Github. Since exploring this algorithm, I’ve decided to make this blog post one of many — each post covering one algorithm or data structure. The code is available on my Algorithms and Data Structures repo — star it to stay updated!
您可以在我的Github上看到我的字謎生成器解決方案 。 自探索該算法以來,我決定將該博客文章設(shè)為眾多文章之一-每個(gè)文章都涉及一種算法或數(shù)據(jù)結(jié)構(gòu)。 該代碼在我的算法和數(shù)據(jù)結(jié)構(gòu)存儲(chǔ)庫(kù)中可用-對(duì)其加注星標(biāo)以保持更新!
下一步 (Next Steps)
I suggest checking out Ray Wenderlich’s trie repo. Although written in Swift, it’s a valuable source for explanations of various algorithms.
我建議您查看雷·溫德利希(Ray Wenderlich)的trie repo 。 盡管是用Swift編寫的,但它是解釋各種算法的寶貴資源。
Similar to the trie (but more memory efficient) is a suffix tree, or radix. In short, instead of storing single characters at every node, the end of a word, its suffix, is stored and the paths are created relatively.
后綴樹或基數(shù)與特里樹類似(但具有更高的存儲(chǔ)效率)。 簡(jiǎn)而言之,不是在每個(gè)節(jié)點(diǎn)上都存儲(chǔ)單個(gè)字符,而是存儲(chǔ)單詞的結(jié)尾及其后綴,并且相對(duì)地創(chuàng)建路徑。
However, a radix is more complicated to implement than a trie. I suggest taking a look at Ray Wenderlich’s radix repo if you’re interested.
但是,基數(shù)的實(shí)現(xiàn)比trie的實(shí)現(xiàn)更為復(fù)雜。 如果您有興趣,我建議您看看Ray Wenderlich的基數(shù)存儲(chǔ)庫(kù) 。
This is the first post of my algorithm and data structures series. In each post, I’ll present a problem that can be better solved with an algorithm or data structure to illustrate the algorithm/data structure itself.
這是我的算法和數(shù)據(jù)結(jié)構(gòu)系列的第一篇文章。 在每篇文章中,我將介紹一個(gè)可以通過算法或數(shù)據(jù)結(jié)構(gòu)更好地解決的問題,以說明算法/數(shù)據(jù)結(jié)構(gòu)本身。
Star my algorithms repo on Github and follow me on Twitter if you’d like to follow along!
在Github上為我的算法存儲(chǔ)庫(kù)加注星標(biāo),如果您想跟隨我,在Twitter上關(guān)注我!
Did you gain value by reading this article? Click here to share it on Twitter! If you’d like to see content like this more often, follow me on Medium and subscribe to my once-a-month newsletter below. Feel free to buy me a coffee too.
您通過閱讀本文獲得了價(jià)值嗎? 單擊此處在Twitter上分享! 如果您想經(jīng)常看到這樣的內(nèi)容,請(qǐng)?jiān)贛edium上關(guān)注我,并訂閱下面的每月一次的新聞通訊。 也可以給我買杯咖啡 。
翻譯自: https://www.freecodecamp.org/news/trie-prefix-tree-algorithm-ee7ab3fe3413/
trie樹查找前綴串
創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎(jiǎng)勵(lì)來咯,堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎(jiǎng)總結(jié)
以上是生活随笔為你收集整理的trie树查找前缀串_Trie数据结构(前缀树)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到在公厕小便是什么意思
- 下一篇: 测试驱动开发 测试前移_为什么测试驱动的