當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Elasticsearch 深入3

發布時間：2023/12/2 编程问答 47 豆豆

生活随笔收集整理的這篇文章主要介紹了 Elasticsearch 深入3 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

分詞器的內部組成到底是什么，以及內置分詞器的介紹

1、什么是分詞器

切分詞語，normalization（提升recall召回率）

給你一段句子，然后將這段句子拆分成一個一個的單個的單詞，同時對每個單詞進行normalization（時態轉換，單復數轉換），分詞器
recall，召回率：搜索的時候，增加能夠搜索到的結果的數量

character filter：在一段文本進行分詞之前，先進行預處理，比如說最常見的就是，過濾html標簽（<span>hello<span> --> hello），& --> and（I&you --> I and you）
tokenizer：分詞，hello you and me --> hello, you, and, me
token filter：lowercase，stop word，synonymom，dogs --> dog，liked --> like，Tom --> tom，a/the/an --> 干掉，mother --> mom，small --> little

一個分詞器，很重要，將一段文本進行各種處理，最后處理好的結果才會拿去建立倒排索引

2、內置分詞器的介紹

Set the shape to semi-transparent by calling set_trans(5)

standard analyzer：set, the, shape, to, semi, transparent, by, calling, set_trans, 5（默認的是standard）大小寫轉換? 括號去除等等
simple analyzer：set, the, shape, to, semi, transparent, by, calling, set, trans
whitespace analyzer：Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
language analyzer（特定的語言的分詞器，比如說，english，英語分詞器）：set, shape, semi, transpar, call, set_tran, 5

?_query string的分詞以及mapping引入案例遺留問題的大揭秘

1、query string分詞

query string必須以和index建立時相同的analyzer進行分詞
query string對exact value和full text的區別對待

date：exact value
_all：full text

比如我們有一個document，其中有一個field，包含的value是：hello you and me，建立倒排索引
我們要搜索這個document對應的index，搜索文本是hell me，這個搜索文本就是query string
query string，默認情況下，es會使用它對應的field建立倒排索引時相同的分詞器去進行分詞，分詞和normalization，只有這樣，才能實現正確的搜索

我們建立倒排索引的時候，將dogs --> dog，結果你搜索的時候，還是一個dogs，那不就搜索不到了嗎？所以搜索的時候，那個dogs也必須變成dog才行。才能搜索到。

知識點：不同類型的field，可能有的就是full text，有的就是exact value

post_date，date：exact value
_all：full text，分詞，normalization

3、測試分詞器

GET /_analyze
{
"analyzer": "standard",
"text": "Text to analyze"
}

mapping的核心數據類型以及dynamic mapping

1、核心的數據類型

string
byte，short，integer，long
float，double
boolean
date

2、dynamic mapping

true or false --> boolean
123 --> long
123.45 --> double
2017-01-01 --> date
"hello world" --> string/text

3、查看mapping

GET /index/_mapping/type

手動建立和修改mapping以及定制string類型數據是否分詞

1、如何建立索引

analyzed
not_analyzed
no

2、修改mapping

只能創建index時手動建立mapping，或者新增field mapping，但是不能update field mapping

PUT /website
{
????"mappings":{
????????"article":{
????????????"properties":{
????????????????"author_id":{
????????????????????"type":"long"
????????????????},
????????????????"title":{
????????????????????"type":"text",
????????????????????"analyzer":"english"
????????????????},
????????????????"content":{
????????????????????"type":"text"
????????????????},
????????????????"post_date":{
????????????????????"type":"date"
????????????????},
????????????????"publisher_id":{
????????????????????"type":"text",
????????????????????"index":"not_analyzed"
????????????????}
????????????}
????????}
????}
}

PUT /website
{
????"mappings":{
????????"article":{
????????????"properties":{
????????????????"author_id":{
????????????????????"type":"text"
????????????????}
????????????}
????????}
????}
}

{
"error": {
"root_cause": [
{
"type": "index_already_exists_exception",
"reason": "index [website/co1dgJ-uTYGBEEOOL8GsQQ] already exists",
"index_uuid": "co1dgJ-uTYGBEEOOL8GsQQ",
"index": "website"
}
],
"type": "index_already_exists_exception",
"reason": "index [website/co1dgJ-uTYGBEEOOL8GsQQ] already exists",
"index_uuid": "co1dgJ-uTYGBEEOOL8GsQQ",
"index": "website"
},
"status": 400
}

PUT /website/_mapping/article
{
"properties" : {
"new_field" : {
"type" : "string",
"index": "not_analyzed"
}
}
}

3、測試mapping

GET /website/_analyze
{
"field": "content",
"text": "my-dogs"
}

GET website/_analyze
{
"field": "new_field",
"text": "my dogs"
}

{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[4onsTYV][127.0.0.1:9300][indices:admin/analyze[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "Can't process field [new_field], Analysis requests are only supported on tokenized fields"
},
"status": 400
}

?_filter與query深入對比解密：相關度，性能

1、filter與query對比大解密

filter，僅僅只是按照搜索條件過濾出需要的數據而已，不計算任何相關度分數，對相關度沒有任何影響
query，會去計算每個document相對于搜索條件的相關度，并按照相關度進行排序

一般來說，如果你是在進行搜索，需要將最匹配搜索條件的數據先返回，那么用query；如果你只是要根據一些條件篩選出一部分數據，不關注其排序，那么用filter
除非是你的這些搜索條件，你希望越符合這些搜索條件的document越排在前面返回，那么這些搜索條件要放在query中；如果你不希望一些搜索條件來影響你的document排序，那么就放在filter中即可

2、filter與query性能

filter，不需要計算相關度分數，不需要按照相關度分數進行排序，同時還有內置的自動cache最常使用filter的數據
query，相反，要計算相關度分數，按照分數進行排序，而且無法cache結果

Text vs. keyword

ElasticSearch 5.0以后，string類型有重大變更，移除了string類型，string字段被拆分成兩種新的數據類型:?text用于全文搜索的,而keyword用于關鍵詞搜索。

ElasticSearch對字符串擁有兩種完全不同的搜索方式. 你可以按照整個文本進行匹配, 即關鍵詞搜索(keyword search), 也可以按單個字符匹配, 即全文搜索(full-text search). 對ElasticSearch稍有了解的人都知道, 前者的字符串被稱為not-analyzed字符, 而后者被稱作analyzed字符串。

Text：會分詞，然后進行索引

?????? 支持模糊、精確查詢

?????? 不支持聚合

keyword：不進行分詞，直接索引

?????? 支持模糊、精確查詢

?????? 支持聚合

text用于全文搜索的, 而keyword用于關鍵詞搜索.

如果想做類似于sql中的like查詢，可定義為keyword并使用通配符wildcard方式查詢。

轉載于:https://www.cnblogs.com/jiahaoJAVA/p/11009392.html

總結

以上是生活随笔為你收集整理的Elasticsearch 深入3的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

Elasticsearch

上一篇： 6.11python 作业
下一篇：弱监督学习下商品识别：CVPR 2018

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

Elasticsearch 深入3

Text vs. keyword

總結