bool查询原理 es_吐血整理:一文看懂ES的R,查询与聚合
對es查詢的索引的company,其有如下字段,下面是一個示例數據
"id": "1", //id "name": "張三",//姓名 "sex": "男",//性別 "age": 49,//年齡 "birthday": "1970-01-01",//生日 "position": "董事長",//職位 "joinTime": "1990-01-01",//入職時間,日期格式 "modified": "1562167817000",//修改時間,毫秒 "created": "1562167817000" //創建時間,毫秒下面的搜索都會將關系型數據庫語句轉換成es的搜索api以及參數。
主要是用post方式,用DSL(結構化查詢)語句進行搜索。
一、查詢
1、簡單搜索
【sql】select * from company 【ES】有兩種方式1、GET http://192.168.197.100:9200/company/_search2、POST http://192.168.197.100:9200/company/_search{"query":{"match_all":{}}}2、精確匹配(不對查詢文本進行分詞)
【sql】select * from company where name='張三' 【ES】POST http://192.168.197.100:9200/company/_search{"query":{"term":{"name.keyword":"張三"}}}term是用于精確匹配的,類似于sql語句中的“=”,因為“name”字段用的是standard默認分詞器,其會將“張三”分成“張”和“三”,并不會匹配姓名為“張三”的人,而name.keyword可以讓其不會進行分詞。
也可以是terms,這個可以用多個值去匹配一個字段,例如
【sql】select * from company where name in ('張三','李四') 【ES】POST http://192.168.197.100:9200/company/_search{"query": {"terms": {"name.keyword": ["張三", "李四"]}}}3、模糊匹配
【sql】select * from company where name like '%張%' 【ES】POST http://192.168.197.100:9200/company/_search{"query": {"match": {"name": "張"}}}上述查詢會查出姓名中帶有“張”字的文檔
4、分頁查詢
【sql】select * from company limit 0,10 【ES】POST http://192.168.197.100:9200/company/_search{"from":0,"size":10}【注意】from+size不能大于10000,也可以進行修改,但不建議這么操作,因為es主要分片模式,其會在每個分片都會執行一樣的查詢,然后再進行匯總排序,如果數據太大,會撐爆內存。例如每個分片都查詢出10000條,總共5個分片,最后就會進行50000條數據的排序,最后再取值。
5、范圍查詢并進行排序
【sql】select * from company where age>=10 and age<=50 【ES】POST http://192.168.197.100:9200/company/_search{"query":{"range":{"age":{"gte":10,"lte":50}}},"sort":{"age":{"order":"desc"}}}范圍查詢是range,有四種參數
(1)gte:大于等于
(2)gt:大于
(3)lte:小于等于
(4)lt:小于
排序是sort,降序是desc,升序是asc,可以有多個排序字段
6、多字段匹配查詢
【sql】select * from company where sex like '%男%' or name like '%男%' 【ES】POST http://192.168.197.100:9200/company/_search{"query":{"multi_match":{"query":"男","fields":["name","sex"]}}}7、bool查詢(結構化查詢)
結構化查詢主要有三塊,分別是must,should,must_not,filter
(1)must:里面的條件都是“并”關系,都匹配
(2)should:里面的條件都是“或”關系,有一個條件匹配就行
(3)must_not:里面的條件都是“并”關系,都不能匹配
(4)filter:過濾查詢,不像其它查詢需要計算_score相關性,它不進行此項計算,故比query查詢快
例如:
條件:
年齡在10到50,性別是男
性別一定不能是女
id是1~8的或者職位帶有“董”字的
【sql】select * from company where (age>=10 and age=50 and sex="男")and (sex!="女") and (id in (1,2,3,4,5,6,7,8) or position like '%董%')and departments in ('市場部') 【ES】POST http://192.168.197.100:9200/company/_search{"query":{"bool":{"must":[{"term":{"sex":"男"}},{"range":{"age":{"gte":10,"lt":50}}}],"must_not":[{"term":{"sex":"女"}} ],"should":[{"terms":{"id":[1,2,3,4,5,6,7,8]}},{"match":{"position":"董"}}],"filter":[{"match":{"departments.keyword":"市場部"}} ]}}}另外,bool查詢是可以嵌套的,也就是must、must_not、should、filter里面還可以嵌套一個完整的bool查詢。
8、通配符查詢
?:只匹配一個字符
*:匹配多個字符
【sql】select * from company where departments like '%部' 【ES】POST http://192.168.197.100:9200/company/_search{"query":{"wildcard":{"departments.keyword":"*部"}}}9、前綴查詢
【sql】select * from company where departments like '市%' 【ES】POST http://192.168.197.100:9200/company/_search{"query":{"match_phrase_prefix":{"departments.keyword":"市"}}}10、查詢空值(null)
比如我添加一個文檔,里面沒有sex字段或者添加的時候sex字段為null,這種情況該怎么進行查詢呢?
//添加文檔 POST http://192.168.197.100:9200/company/_doc //沒有sex字段的文檔{"id": "1","name": "張十","age": 54,"birthday": "1960-01-01","position": "程序員","joinTime": "1980-01-01","modified": "1562167817000","created": "1562167817000" }//sex字段值為null的文檔{"id": "1","name": "張十一","age": 64,"sex":null,"birthday": "1960-01-01","position": "程序員","joinTime": "1980-01-01","modified": "1562167817000","created": "1562167817000" }這兩種情況的查詢是一樣的,都是用exists查詢匹配,例如:下面的查詢會匹配出上述添加的兩個文檔。
【sql】select * from company where sex is null 【ES】POST http://192.168.197.100:9200/company/_search{"query":{"bool":{"must_not":[{"exists":{"field":"sex"}}]}}}二、過濾(在es5之后被去除了)
過濾跟查詢很相似,都是用來查詢數據,只不過過濾會維系一個緩存數組,數組里面記錄了匹配的文檔,比如一個索引下面有兩個文檔,進行過濾,一個匹配,一個不匹配,那么數組是這樣的[1,0],匹配的文檔為1。
在頻繁查詢的時候,建議用過濾而不是索引。
過濾跟查詢的請求體基本相似,只不過多嵌套了一層filtered。
例如:
【sql】select * from company where departments like '%市%' 【ES】POST http://192.168.197.100:9200/company/_search{"query":{"filtered":{"filter":{"match":{"departments.keyword":"市"}}}}}三、聚合
聚合允許使用者對es文檔進行統計分析,類似與關系型數據庫中的group by,當然還有很多其他的聚合,例如取最大值、平均值等等。
語法如下:
POST http://192.168.197.100:9200/company/_search {"aggs": {"NAME": { //指定結果的名稱"AGG_TYPE": { //指定具體的聚合方法,TODO: //# 聚合體內制定具體的聚合字段}}TODO: //該處可以嵌套聚合} }聚合分析功能主要有指標聚合、桶聚合、管道聚合和矩陣聚合,常用的有指標聚合和桶聚合,本文主要看一下指標聚合和桶聚合怎么使用。
1、指標聚合
(1)對某個字段取最大值max
【sql】select max(age) from company 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"max_age":{"max":{"field":"age"}}},"size":0 //size=0是為了只看聚合結果}結果如下:
{"aggregations": {"max_age": {"value": 64}} }(2)對某個字段取最小值min
【sql】select min(age) from company 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"min_age":{"min":{"field":"age"}}},"size":0}結果如下:
{"aggregations": {"min_age": {"value": 1}} }(3)對某個字段計算總和sum
【sql】select sum(age) from company 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"sum_age":{"sum":{"field":"age"}}},"size":0}結果如下:
{"aggregations": {"sum_age": {"value": 315}} }(4)對某個字段的值計算平均值
【sql】select avg(sex) from company 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"age_avg":{"avg":{"field":"age"}}},"size":0}結果如下:
{"aggregations": {"age_avg": {"value": 35}} }(5)對某個字段的值進行去重之后再取總數
【sql】select count(distinct(sex)) from company 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"sex_distinct":{"cardinality":{"field":"sex"}}},"size":0}結果如下:
{"aggregations": {"sex_distinct": {"value": 2}} }(6)stats聚合,對某個字段一次性返回count,max,min,avg和sum五個指標
【sql】select count(distinct age),sum(age),avg(age),max(age),min(age) from company 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"age_stats":{"stats":{"field":"age"}}},"size":0}結果如下:
{"aggregations": {"age_stats": {"count": 9,"min": 1,"max": 64,"avg": 35,"sum": 315}} }(7)extended stats聚合,比stats聚合高級一點,多返回平方和、方差、標準差、平均值加/減兩個標準差的區間
【sql】--這個的sql不會寫,數學專業的人公式都忘了,恥辱 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"age_extended_stats":{"extended_stats":{"field":"age"}}},"size":0}結果如下:
{"aggregations": {"age_extended_stats": {"count": 9,"min": 1,"max": 64,"avg": 35,"sum": 315,"sum_of_squares": 13857,"variance": 314.6666666666667,"std_deviation": 17.73884626086676,"std_deviation_bounds": {"upper": 70.47769252173353,"lower": -0.4776925217335233}}} }(8)percentiles聚合,對某個字段的值進行百分位統計
【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"age_percentiles":{"percentiles":{"field":"age"}}},"size":0}結果如下:
{"aggregations": {"age_percentiles": {"values": {"1.0": 1,"5.0": 1,"25.0": 26,"50.0": 29,"75.0": 50.25,"95.0": 64,"99.0": 64}}} }(9)value count聚合,統計文檔中有某個字段的文檔數量
【sql】select sum(case when sex is null then 0 else 1 end) from company 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"sex_value_count":{"value_count":{"field":"sex"}}},"size":0}結果如下:總共有8個文檔,我在之前添加了兩個沒有sex字段的文檔
【sql】select sum(case when sex is null then 0 else 1 end) from company 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"sex_value_count":{"value_count":{"field":"sex"}}},"size":0}2、桶聚合
桶聚和相當于sql中的group by語句。
(1)terms聚合,分組統計
【sql】select sex,count(1) from company group by sex 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"sex_groupby":{"terms":{"field":"sex"}}},"size":0}結果如下:
{ "aggregations": {"sex_groupby": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "男","doc_count": 5},{"key": "女","doc_count": 1}]}} }(2)可以在terms分組下再對其他字段進行其他聚合
【sql】SELECT name,count(1),AVG(age) from company group by name 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"sex_groupby":{"terms":{"field":"sex"},"aggs":{"avg_age":{"avg":{"field":"age"}}}}},"size":0}結果如下:
{"aggregations": {"sex_groupby": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "男","doc_count": 5,"avg_age": {"value": 33.8}},{"key": "女","doc_count": 1,"avg_age": {"value": 27}}]}} }(3)filter聚合,過濾器聚合,對符合過濾器中條件的文檔進行聚合
【sql】select sum(age) from company where sex = '男' 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"sex_filter":{"filter":{"term":{"sex":"男"}},"aggs":{"sum_age":{"sum":{"field":"age"}}}}},"size":0 }結果如下:
{"aggregations": {"sex_filter": {"doc_count": 5,"sum_age": {"value": 169}}} }(4)filters多過濾器聚合
【sql】SELECT name,count(1),sum(age) from company group by name 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"sex_filter":{"filters":{"filters":[{"term":{"sex":"男"}},{"term":{"sex":"女"}}]},"aggs":{"sum_age":{"sum":{"field":"age"}}}}},"size":0 }結果如下:
{"aggregations": {"sex_filter": {"buckets": [{"doc_count": 5,"sum_age": {"value": 169}},{"doc_count": 1,"sum_age": {"value": 27}}]}} }(6)range范圍聚合,用于反映數據的分布情況
【sql】SELECT sum(case when age<=30 then 1 else 0 end), sum(case when age>30 and age<=50 then 1 else 0 end),sum(case when age>50 then 1 else 0 end)from company 【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"age_range":{"range":{"field":"age","ranges":[{"to":30},{"from":30,"to":50},{"from":50}]}}},"size":0 }結果如下:
{"aggregations": {"age_range": {"buckets": [{"key": "*-30.0","to": 30,"doc_count": 5},{"key": "30.0-50.0","from": 30,"to": 50,"doc_count": 2},{"key": "50.0-*","from": 50,"doc_count": 2}]}} }(7)missing聚合,空值聚合,可以統計缺少某個字段的文檔數量
【sql】SELECT count(1) from company where sex is null【ES】POST http://192.168.197.100:9200/company/_search{"aggs":{"missing_sex":{"missing":{"field":"sex"}}},"size":0 }結果如下:
{"aggregations": {"missing_sex": {"doc_count": 4}} }這個也可以用filter過濾器查詢,例如:得到的結果是一樣的
POST http://192.168.197.100:9200/company/_search{"aggs":{"missing_sex":{"filter":{"bool":{"must_not":[{"exists":{"field":"sex"} }]}}}},"size":0 }ok,上述就是ES常用的查詢和聚合操作。(看來要深入研究一下es了)
=======================================================
我是Liusy,一個喜歡健身的程序員。
歡迎關注微信公眾號【Liusy01】,一起交流Java技術及健身,獲取更多干貨。
總結
以上是生活随笔為你收集整理的bool查询原理 es_吐血整理:一文看懂ES的R,查询与聚合的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: matlab中双引号_Octave、Sc
- 下一篇: python boxplot pvalu