08.suggester02term_suggester
文章目錄
- 1. Term suggester
- 1. 常見的參數(shù)
- 2. 其他的參數(shù)
- 3. 請求樣例
1. Term suggester
In order to understand the format of suggestions, please read the Suggesters page first.
term suggester根據(jù)編輯距離suggest term。在提出term之前先對提供的suggest text進行分詞。每個分詞的suggest text token 都會提供suggest的term。term suggester 并未將整個query作為請求的一部分考慮在內。
The term suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The term suggester doesn’t take the query into account that is part of request.
1. 常見的參數(shù)
Common suggest options:
missing:僅對未在索引中的suggest text term提供suggest。這是默認值。
popular:僅suggest哪些比原始suggest text term在更多的文檔中出現(xiàn)的term。
always:根據(jù)suggest text中的term suggest任何匹配的suggest。
missing: Only provide suggestions for suggest text terms that are not in the index. This is the default.
popular: Only suggest suggestions that occur in more docs than the original suggest text term.
always: Suggest any matching suggestions based on terms in the suggest text.
Other term suggest options:
2. 其他的參數(shù)
1.max_edits
候選suggest可以具有最大編輯距離。只能是1到2之間的值。任何其他值都將導致引發(fā)錯誤的請求錯誤。默認為2。
2.prefix_length
必須匹配的最小前綴字符數(shù)才能成為suggest的候選者。默認值為1。增加此數(shù)字可提高拼寫檢查性能。通常用在拼寫錯誤不會出現(xiàn)在前面幾個字符的情況,比如英文單詞。 (舊名稱“ prefix_len”已棄用)
3.min_word_length
suggest text term必須包含的最小長度。默認值為4。(舊名稱“ min_word_len”已棄用)
4.shard_size
設置要從每個單獨的分片中檢索的suggest的最大數(shù)量。在reduce匯總階段,僅根據(jù)size選項返回前N個suggest。默認為size選項。將此值設置為大于size的值可能很有用,以便以性能為代價獲得更準確的文檔頻率以進行拼寫更正。由于term在分片之間是獨立的,因此分片級別文檔的頻率可能不準確。增大這個設置將使這些文檔搜索更加準確。
5.max_inspections
一個因子,用于與shards_size相乘,以便在shard級別上檢查更多的候選拼寫更正。可以以性能為代價提高準確性。默認為5。
6.min_doc_freq
suggest應出現(xiàn)的最小文檔數(shù)閾值。可以將其指定為絕對數(shù)量或相對數(shù)量的文檔數(shù)。通過僅suggest高頻項可以提高質量。默認為0f且未啟用。如果指定的值大于1,則數(shù)字不能為小數(shù)。分片級別文檔頻率用于此選項。
7.max_term_freq
可以包含suggest text令牌的文檔數(shù)量的最大閾值。可以是相對百分比數(shù)字(例如0.4)或代表文檔頻率的絕對數(shù)字。如果指定的值大于1,則不能指定小數(shù)。默認為0.01f。這可以用來排除高頻term-通常被正確拼寫-的拼寫檢查。這也提高了拼寫檢查性能。分片級別文檔頻率用于此選項。
8.string_distance
用于比較suggest term的相似程度的字符串編輯距離實現(xiàn)。可以指定五個可能的值:
internal: The default based on damerau_levenshtein but highly optimized for comparing string distance for terms inside the index.
damerau_levenshtein: String distance algorithm based on Damerau-Levenshtein algorithm.
levenshtein: String distance algorithm based on Levenshtein edit distance algorithm.
jaro_winkler: String distance algorithm based on Jaro-Winkler algorithm.
ngram: String distance algorithm based on character n-grams.
3. 請求樣例
POST _search {"suggest": {"text" : "tring out Elasticsearch","my-suggest-1" : {"term" : {"field" : "message"}},"my-suggest-2" : {"term" : {"field" : "user"}}} } POST twitter/_search {"query" : {"match": {"message": "tring out Elasticsearch"}},"suggest" : {"my-suggestion" : {"text" : "tring out Elasticsearch","term" : {"field" : "message"}}} } 超強干貨來襲 云風專訪:近40年碼齡,通宵達旦的技術人生總結
以上是生活随笔為你收集整理的08.suggester02term_suggester的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 07.suggester简述
- 下一篇: 10.completion_sugges