白话Elasticsearch06- 深度探秘搜索技术之手动控制全文检索结果的精准度
文章目錄
- 概述
- 數據
- 小例子
- 搜索標題中包含java或elasticsearch的blog
- 搜索標題中包含java和elasticsearch的blog
- 搜索包含java,elasticsearch,spark,hadoop,4個關鍵字中,至少3個的blog
- 用bool組合多個搜索條件,來搜索title
- bool組合多個搜索條件,如何計算relevance score
- 搜索java,hadoop,spark,elasticsearch,至少包含其中3個關鍵字
概述
繼續跟中華石杉老師學習ES,第六篇
課程地址: https://www.roncoo.com/view/55
如果我們要想對全文檢索的方式實現更細粒度的控制該怎么辦呢? 這里我們就來探討下手動控制全文檢索結果的精準度的幾種方式
match query
6.4版本 :
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/query-dsl-match-query.html
7.0
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/query-dsl-match-query.html
數據
為了說明該部分,我們給帖子數據增加標題title字段
POST /forum/article/_bulk {"update":{"_id":"1"}} {"doc":{"title":"this is java and elasticsearch blog"}} {"update":{"_id":"2"}} {"doc":{"title":"this is java blog"}} {"update":{"_id":"3"}} {"doc":{"title":"this is elasticsearch blog"}} {"update":{"_id":"4"}} {"doc":{"title":"this is java, elasticsearch, hadoop blog"}} {"update":{"_id":"5"}} {"doc":{"title":"this is spark blog"}}看下其中一條數據檢查下title字段
mapping :
小例子
搜索標題中包含java或elasticsearch的blog
重點是: 或
The match query is of type boolean. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text
這個,就跟之前的那個term query,不一樣了。不是搜索exact value,是進行full text全文檢索。
match query,是負責進行全文檢索的。當然,如果要檢索的field,是 not_analyzed類型的,或者是keyword類型,那么match query也相當于term query。
title的字段映射為
我們先看下 “this is java and elasticsearch blog” 的分詞
被拆分成了 this 、 is 、java 、 and 、 elasticsearch 、 blog 存放在倒排索引中
我們要 搜索標題中包含java或elasticsearch的blog ,改如何做呢?
看看 java elasticsearch 的分詞
GET /forum/_analyze {"field": "title","text": "java elasticsearch" }所以,這個只要match query即可
GET /forum/_search {"query": {"match": {"title": "java elasticsearch"}} }返回4條數據 ,符合 或
{"took": 5,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 4,"max_score": 0.8092568,"hits": [{"_index": "forum","_type": "article","_id": "4","_score": 0.8092568,"_source": {"articleID": "QQPX-R-3956-#aD8","userID": 2,"hidden": true,"postDate": "2017-01-02","tag": ["java","elasticsearch"],"tag_cnt": 2,"view_cnt": 80,"title": "this is java, elasticsearch, hadoop blog"}},{"_index": "forum","_type": "article","_id": "1","_score": 0.5753642,"_source": {"articleID": "XHDK-A-1293-#fJ3","userID": 1,"hidden": false,"postDate": "2017-01-01","tag": ["java","hadoop"],"tag_cnt": 2,"view_cnt": 30,"title": "this is java and elasticsearch blog"}},{"_index": "forum","_type": "article","_id": "3","_score": 0.2876821,"_source": {"articleID": "JODL-X-1937-#pV7","userID": 2,"hidden": false,"postDate": "2017-01-01","tag": ["hadoop"],"tag_cnt": 1,"view_cnt": 100,"title": "this is elasticsearch blog"}},{"_index": "forum","_type": "article","_id": "2","_score": 0.19856805,"_source": {"articleID": "KDKE-B-9947-#kL5","userID": 1,"hidden": false,"postDate": "2017-01-02","tag": ["java"],"tag_cnt": 1,"view_cnt": 50,"title": "this is java blog"}}]} }搜索標題中包含java和elasticsearch的blog
重點是: 和
The operator flag can be set to or or and to control the boolean clauses (defaults to or).
如果你希望所有的搜索關鍵字都要匹配的,那么就用and,可以實現單純match query無法實現的效果
GET /forum/_search {"query": {"match": {"title": {"query": "java elasticsearch","operator": "and"}}} }返回2條數據 ,OK
搜索包含java,elasticsearch,spark,hadoop,4個關鍵字中,至少3個的blog
指定一些關鍵字中,必須至少匹配其中的多少個關鍵字,才能作為結果返回
The minimum number of optional should clauses to match can be set using the minimum_should_match parameter.
minimum_should_match 說明
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/query-dsl-minimum-should-match.html
百分比
GET /forum/_search {"query": {"match": {"title": {"query": "java elasticsearch spark hadoop","minimum_should_match": "75%"}}} }數字
GET /forum/_search {"query": {"match": {"title": {"query": "java elasticsearch spark hadoop","minimum_should_match": 3}}} }返回一條數據 ,符合了至少3個
用bool組合多個搜索條件,來搜索title
GET /forum/article/_search {"query": {"bool": {"must": {"match": {"title": "java"}},"must_not": {"match": {"title": "spark"}},"should": [{"match": {"title": "hadoop"}},{"match": {"title": "elasticsearch"}}]}} }match在匹配時會對所查找的關鍵詞進行分詞,然后按分詞匹配查找.
term會直接對關鍵詞進行查找。一般模糊查找的時候,多用match,而精確查找時可以使用term.
也可以使用term精確查找
GET /forum/_search {"query": {"bool": {"must": {"term": {"title": "java"}},"must_not": {"term": {"title": "spark"}},"should": [{"term": {"title": "hadoop"}},{"term": {"title": "elasticsearch"}}]}} }bool組合多個搜索條件,如何計算relevance score
must和should搜索對應的分數,加起來,除以must和should的總數
- 排名第一:java,同時包含should中所有的關鍵字,hadoop,elasticsearch
- 排名第二:java,同時包含should中的elasticsearch
- 排名第三:java,不包含should中的任何關鍵字
should是可以影響相關度分數的
must是確保說,誰必須有這個關鍵字,同時會根據這個must的條件去計算出document對這個搜索條件的relevance score
在滿足must的基礎之上,should中的條件,不匹配也可以,但是如果匹配的更多,那么document的relevance score就會更高
搜索java,hadoop,spark,elasticsearch,至少包含其中3個關鍵字
默認情況下,should是可以不匹配任何一個的,比如上面的搜索中,this is java blog,就不匹配任何一個should條件
但是有個例外的情況,如果沒有must的話,那么should中必須至少匹配一個才可以.
比如下面的搜索,should中有4個條件,默認情況下,只要滿足其中一個條件,就可以匹配作為結果返回, 但是可以精準控制,should的4個條件中,至少匹配幾個才能作為結果返回
GET /forum/article/_search {"query": {"bool": {"should": [{"match": {"title": "java"}},{"match": {"title": "elasticsearch"}},{"match": {"title": "hadoop"}},{"match": {"title": "spark"}}],"minimum_should_match": 3}} }總結一下
- 1、全文檢索的時候,進行多個值的檢索,有兩種做法,match query;should
- 2、控制搜索結果精準度:and operator、minimum_should_match
總結
以上是生活随笔為你收集整理的白话Elasticsearch06- 深度探秘搜索技术之手动控制全文检索结果的精准度的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 白话Elasticsearch05- 结
- 下一篇: 白话Elasticsearch07- 深