當前位置：首頁 >

08.suggester02term_suggester

發(fā)布時間：2024/2/28 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 08.suggester02term_suggester 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

- 1. Term suggester
- - 1. 常見的參數(shù)
  - 2. 其他的參數(shù)
  - 3. 請求樣例

1. Term suggester

In order to understand the format of suggestions, please read the Suggesters page first.

term suggester根據(jù)編輯距離suggest term。在提出term之前先對提供的suggest text進行分詞。每個分詞的suggest text token 都會提供suggest的term。term suggester 并未將整個query作為請求的一部分考慮在內。

The term suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The term suggester doesn’t take the query into account that is part of request.

1. 常見的參數(shù)

Common suggest options:

text : 要查詢的text,可以全局設置可以局部設置

field: 查找提示詞使用的field,可以全局設置可以局部設置

analyzer: 分詞器用來分詞suggest text。默認為suggest字段的搜索分詞器。

size: 每個suggest text token 將返回的最大更正數(shù)。

sort: 定義如何按suggest text term對suggest進行排序。兩個可能的值：

分數(shù)score ：首先按分數(shù)排序，然后記錄頻次，然后是詞條本身。

頻次frequency ：先按文檔頻次排序，然后按相似性得分排序，然后再按term本身排序。

suggest_mode: suggest_mode控制要包括的suggest，或控制suggest的文本term和suggest的控制。可以指定三個可能的值：
missing：僅對未在索引中的suggest text term提供suggest。這是默認值。
popular：僅suggest哪些比原始suggest text term在更多的文檔中出現(xiàn)的term。
always：根據(jù)suggest text中的term suggest任何匹配的suggest。

missing: Only provide suggestions for suggest text terms that are not in the index. This is the default.
popular: Only suggest suggestions that occur in more docs than the original suggest text term.
always: Suggest any matching suggestions based on terms in the suggest text.

Other term suggest options:

2. 其他的參數(shù)

1.max_edits
候選suggest可以具有最大編輯距離。只能是1到2之間的值。任何其他值都將導致引發(fā)錯誤的請求錯誤。默認為2。

2.prefix_length
必須匹配的最小前綴字符數(shù)才能成為suggest的候選者。默認值為1。增加此數(shù)字可提高拼寫檢查性能。通常用在拼寫錯誤不會出現(xiàn)在前面幾個字符的情況，比如英文單詞。（舊名稱“ prefix_len”已棄用）

3.min_word_length
suggest text term必須包含的最小長度。默認值為4。（舊名稱“ min_word_len”已棄用）

4.shard_size
設置要從每個單獨的分片中檢索的suggest的最大數(shù)量。在reduce匯總階段，僅根據(jù)size選項返回前N個suggest。默認為size選項。將此值設置為大于size的值可能很有用，以便以性能為代價獲得更準確的文檔頻率以進行拼寫更正。由于term在分片之間是獨立的，因此分片級別文檔的頻率可能不準確。增大這個設置將使這些文檔搜索更加準確。

5.max_inspections
一個因子，用于與shards_size相乘，以便在shard級別上檢查更多的候選拼寫更正。可以以性能為代價提高準確性。默認為5。

6.min_doc_freq
suggest應出現(xiàn)的最小文檔數(shù)閾值。可以將其指定為絕對數(shù)量或相對數(shù)量的文檔數(shù)。通過僅suggest高頻項可以提高質量。默認為0f且未啟用。如果指定的值大于1，則數(shù)字不能為小數(shù)。分片級別文檔頻率用于此選項。

7.max_term_freq
可以包含suggest text令牌的文檔數(shù)量的最大閾值。可以是相對百分比數(shù)字（例如0.4）或代表文檔頻率的絕對數(shù)字。如果指定的值大于1，則不能指定小數(shù)。默認為0.01f。這可以用來排除高頻term-通常被正確拼寫-的拼寫檢查。這也提高了拼寫檢查性能。分片級別文檔頻率用于此選項。

8.string_distance

用于比較suggest term的相似程度的字符串編輯距離實現(xiàn)。可以指定五個可能的值：

internal: The default based on damerau_levenshtein but highly optimized for comparing string distance for terms inside the index.
damerau_levenshtein: String distance algorithm based on Damerau-Levenshtein algorithm.
levenshtein: String distance algorithm based on Levenshtein edit distance algorithm.
jaro_winkler: String distance algorithm based on Jaro-Winkler algorithm.
ngram: String distance algorithm based on character n-grams.

3. 請求樣例

POST _search {"suggest": {"text" : "tring out Elasticsearch","my-suggest-1" : {"term" : {"field" : "message"}},"my-suggest-2" : {"term" : {"field" : "user"}}} } POST twitter/_search {"query" : {"match": {"message": "tring out Elasticsearch"}},"suggest" : {"my-suggestion" : {"text" : "tring out Elasticsearch","term" : {"field" : "message"}}} } 超強干貨來襲云風專訪：近40年碼齡，通宵達旦的技術人生

總結

以上是生活随笔為你收集整理的08.suggester02term_suggester的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內容還不錯，歡迎將生活随笔推薦給好友。

suggester02term_suggester

上一篇： 07.suggester简述
下一篇： 10.completion_sugges