10.completion_suggester
文章目錄
- 1. Completion Suggester 簡(jiǎn)介
- 2.存儲(chǔ)doc文檔
- 2. 查詢(xún)使用
- 3. 跳過(guò)重復(fù)的suggestions
- 4. Fuzzy queries
- 4. Regex queries
1. Completion Suggester 簡(jiǎn)介
有關(guān)不使用 suggest 者的更靈活的search-as-you-type類(lèi)型的搜索,請(qǐng)參閱search_as_you_type字段類(lèi)型。
completion suggester 提供自動(dòng)completion/search-as-you-type功能。這是一項(xiàng)導(dǎo)航功能,就是提示詞功能,可在用戶(hù)鍵入內(nèi)容時(shí)指導(dǎo)他們獲得相關(guān)結(jié)果,從而提高搜索精度。它不適用于term suggest或者phrase suggest拼寫(xiě)糾正或“您是否要說(shuō)”功能。
理想情況下,自動(dòng)completion功能應(yīng)與用戶(hù)鍵入的速度一樣快,以提供與用戶(hù)已經(jīng)鍵入的內(nèi)容相關(guān)的即時(shí)反饋。因此,completion suggester 的速度得到了優(yōu)化。completion suggester使用的數(shù)據(jù)結(jié)構(gòu)可實(shí)現(xiàn)快速查找,但構(gòu)建成本很高,并且存儲(chǔ)在內(nèi)存中。
In order to understand the format of suggestions, please read the Suggesters page first. For more flexible search-as-you-type searches that do not use suggesters, see the search_as_you_type field type.
The completion suggester provides auto-complete/search-as-you-type functionality. This is a navigational feature to guide users to relevant results as they are typing, improving search precision. It is not meant for spell correction or did-you-mean functionality like the term or phrase suggesters.
Ideally, auto-complete functionality should be as fast as a user types to provide instant feedback relevant to what a user has already typed in. Hence, completion suggester is optimized for speed. The suggester uses data structures that enable fast lookups, but are costly to build and are stored in-memory.
Mapping
使用這個(gè)feature需要為字段定義特殊的mapping
To use this feature, specify a special mapping for this field, which indexes the field values for fast completions.
Copy as cURL
View in Console
Mapping supports the following parameters:
1.analyzer :index analyzer,默認(rèn)為simple
The index analyzer to use, defaults to simple.
2.search_analyzer: 默認(rèn)同analyzer
3.preserve_separators
保留分隔符,默認(rèn)為true。如果禁用,則使用foof進(jìn)行suggest查找,則可以找到以Foo Fighters開(kāi)頭的字段。
Preserves the separators, defaults to true. If disabled, you could find a field starting with Foo Fighters, if you suggest for foof.
4.preserve_position_increments
啟用位置增量,默認(rèn)為true。如果禁用并且使用stop分析器,則使用字符串"b"進(jìn)行suggest查詢(xún)可以獲取以"The Beatles"開(kāi)頭的字段。注意:您也可以通過(guò)索引兩個(gè)輸入(Beatles 和 The Beatles)來(lái)實(shí)現(xiàn)此目的,如果您能夠豐富數(shù)據(jù),則無(wú)需更改simple analyzer。
Enables position increments, defaults to true. If disabled and using stopwords analyzer, you could get a field starting with The Beatles, if you suggest for b. Note: You could also achieve this by indexing two inputs, Beatles and The Beatles, no need to change a simple analyzer, if you are able to enrich your data.
5.max_input_length
限制單個(gè)輸入的長(zhǎng)度,默認(rèn)為50個(gè)UTF-16代碼點(diǎn)。此限制僅在索引時(shí)間使用,以減少每個(gè)輸入字符串的字符總數(shù),以防止大量輸入使基礎(chǔ)數(shù)據(jù)結(jié)構(gòu)膨脹。大多數(shù)用例不會(huì)受到默認(rèn)值的影響,因?yàn)榍熬Y補(bǔ)全很少會(huì)超出幾個(gè)字符。
Limits the length of a single input, defaults to 50 UTF-16 code points. This limit is only used at index time to reduce the total number of characters per input string in order to prevent massive inputs from bloating the underlying datastructure. Most use cases won’t be influenced by the default value since prefix completions seldom grow beyond prefixes longer than a handful of characters.
2.存儲(chǔ)doc文檔
和之前普通的doc index一樣,注意下面的例子中的suggest字段不是啥特殊字段,只是mapping中定義的field name 是suggest,可以是其他的任何字段。
index的是后可以帶一些參數(shù)input,weight等
Copy as cURL
View in Console
The following parameters are supported:
1.input: 要存儲(chǔ)的輸入,可以是字符串?dāng)?shù)組,也可以只是字符串。此字段是必填字段。
此值不能包含以下UTF-16控制字符:
2.weight: 正整數(shù)或包含正整數(shù)的字符串,定義權(quán)重并允許您對(duì) suggest 進(jìn)行排名。該字段是可選的。
對(duì)于一個(gè)doc的多個(gè)input 內(nèi)容可以這樣
PUT music/_doc/1?refresh {"suggest" : [{"input": "Nevermind","weight" : 10},{"input": "Nirvana","weight" : 3}] }Copy as cURL
View in Console
或者這樣
PUT music/_doc/1?refresh {"suggest" : [ "Nevermind", "Nirvana" ] }Copy as cURL
View in Console
2. 查詢(xún)使用
suggest 查詢(xún)與往常一樣工作,但是必須將 suggest 類(lèi)型指定為completion。 suggest 幾乎是實(shí)時(shí)的,這意味著可以通過(guò)refresh使新的 suggest 可見(jiàn),并且一旦刪除就不會(huì)顯示文檔。
POST music/_search?pretty {"suggest": {"song-suggest" : { # suggest 名稱(chēng)"prefix" : "nir", # 使用的前綴"completion" : { # suggest 類(lèi)型"field" : "suggest" # 對(duì)應(yīng)使用的字段}}} }Copy as cURL
View in Console
returns
{"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits": ..."took": 2,"timed_out": false,"suggest": {"song-suggest" : [ {"text" : "nir","offset" : 0,"length" : 3,"options" : [ {"text" : "Nirvana","_index": "music","_type": "_doc","_id": "1","_score": 1.0,"_source": {"suggest": ["Nevermind", "Nirvana"]}} ]} ]} }必須啟用_source元字段,這是默認(rèn)行為,才能啟用返回帶有 suggest 的_source。
為 suggest 配置的權(quán)重以_score的形式返回。text field 使用index 進(jìn)去的suggest 內(nèi)容。 suggest 默認(rèn)情況下返回完整的文檔_source。 _source的大小可能會(huì)由于磁盤(pán)獲取和網(wǎng)絡(luò)傳輸開(kāi)銷(xiāo)而影響性能。為了節(jié)省一些網(wǎng)絡(luò)開(kāi)銷(xiāo),請(qǐng)使用源過(guò)濾從_source過(guò)濾掉不必要的字段,以最小化_source大小。請(qǐng)注意,_suggest端點(diǎn)不支持源過(guò)濾,但在_search端點(diǎn)上使用 suggest 可以:
POST music/_search {"_source": "suggest", "suggest": {"song-suggest" : {"prefix" : "nir","completion" : {"field" : "suggest", "size" : 5 }}} }Copy as cURL
View in Console
過(guò)濾源以?xún)H返回 suggest 字段
在其中搜索 suggest 的字段名稱(chēng)
返回的 suggest 數(shù)
基本completion suggest 程序查詢(xún)支持以下參數(shù):
The basic completion suggester query supports the following parameters:
1.field: 在其上運(yùn)行查詢(xún)的字段的名稱(chēng)(必填)。
2.size: 返回的 suggest 數(shù)(默認(rèn)為5)。
3.skip_duplicates: 是否應(yīng)過(guò)濾掉重復(fù)的 suggest (默認(rèn)為false)。
completion suggest 考慮索引中的所有文檔。有關(guān)如何查詢(xún)文檔子集的說(shuō)明,請(qǐng)參見(jiàn)context suggester 。
The completion suggester considers all documents in the index. See Context Suggester for an explanation of how to query a subset of documents instead.
如果completion查詢(xún)跨越一個(gè)以上的分片,則中查找 suggest 會(huì)分為兩個(gè)階段,后一個(gè)階段是從相關(guān)分片中獲取查詢(xún)的結(jié)果集,這意味著對(duì)單個(gè)分片執(zhí)行completion請(qǐng)求的性能更高。為了獲得最佳的suggest查詢(xún)性能,建議將 suggest 索引到單個(gè)分片索引中。如果由于分片太大而導(dǎo)致堆使用率很高,則仍然將 suggest 索引到多個(gè)分片,而不是針對(duì)completion性能進(jìn)行優(yōu)化。
3. 跳過(guò)重復(fù)的suggestions
Skip duplicate suggestions
查詢(xún)可以返回來(lái)自不同文檔的重復(fù) suggest 。通過(guò)將skip_duplicates設(shè)置為true,可以修改此行為。設(shè)置后,此選項(xiàng)從結(jié)果中過(guò)濾出帶有重復(fù) suggest 的文檔。
POST music/_search?pretty {"suggest": {"song-suggest" : {"prefix" : "nor","completion" : {"field" : "suggest","skip_duplicates": true}}} }設(shè)置為true時(shí),此選項(xiàng)可能會(huì)減慢搜索速度,因?yàn)樾枰L問(wèn)更多 suggest 才能找到前N個(gè)。
4. Fuzzy queries
completion提示器還支持模糊查詢(xún)–這意味著您可以在搜索中輸入拼寫(xiě)錯(cuò)誤,并且仍然可以得到結(jié)果。
The completion suggester also supports fuzzy queries?—?this means you can have a typo in your search and still get results back.
POST music/_search?pretty {"suggest": {"song-suggest" : {"prefix" : "nor","completion" : {"field" : "suggest","fuzzy" : {"fuzziness" : 2}}}} }Copy as cURL
View in Console
與查詢(xún)前綴共享最長(zhǎng)前綴的 suggest 得分更高。
模糊查詢(xún)可以采用特定的模糊參數(shù)。支持以下參數(shù):
1.fuzziness: 模糊因子,默認(rèn)為AUTO。有關(guān)允許的設(shè)置,請(qǐng)參見(jiàn)模糊性。
2.transpositions: 如果設(shè)置為true,則位置互換計(jì)為一次更改而不是兩次更改,默認(rèn)為true
3.min_length: 返回模糊 suggest 之前的最小輸入長(zhǎng)度,默認(rèn)值為3
4.prefix_length: 輸入的最小長(zhǎng)度(不檢查模糊替代項(xiàng))默認(rèn)為1
5.unicode_aware: 如果為true,則所有度量(如模糊編輯距離,位置互換和長(zhǎng)度)均以Unicode代碼數(shù)量計(jì)算而不是以字節(jié)為單位。這比使用原始字節(jié)略慢,因此默認(rèn)情況下將其設(shè)置為false。
如果要堅(jiān)持默認(rèn)值,但仍要使用Fuzzy,則可以使用Fuzzy:{}或Fuzzy:true。
4. Regex queries
completion提示器還支持正則表達(dá)式查詢(xún),這意味著您可以將前綴表示為正則表達(dá)式
The completion suggester also supports regex queries meaning you can express a prefix as a regular expression
POST music/_search?pretty {"suggest": {"song-suggest" : {"regex" : "n[ever|i]r","completion" : {"field" : "suggest"}}} }Copy as cURL
View in Console
The regex query can take specific regex parameters. The following parameters are supported:
flags
Possible flags are ALL (default), ANYSTRING, COMPLEMENT, EMPTY, INTERSECTION, INTERVAL, or NONE. See regexp-syntax for their meaning
max_determinized_states
Regular expressions are dangerous because it’s easy to accidentally create an innocuous looking one that requires an exponential number of internal determinized automaton states (and corresponding RAM and CPU) for Lucene to execute. Lucene prevents these using the max_determinized_states setting (defaults to 10000). You can raise this limit to allow more complex regular expressions to execute.
總結(jié)
以上是生活随笔為你收集整理的10.completion_suggester的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 08.suggester02term_s
- 下一篇: 11.context_suggester