當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

02.elasticsearch bucket aggregation查询

發(fā)布時(shí)間：2024/2/28 编程问答 40 豆豆

生活随笔收集整理的這篇文章主要介紹了 02.elasticsearch bucket aggregation查询小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

文章目錄

- 1. bucket aggregation 查詢類型概覽
- 2. 數(shù)據(jù)準(zhǔn)備
- 3. 使用樣例
- - 1. Terms Aggregation:
  - - 1. 普通的terms agg
    - 2. 嵌套一個(gè)metric agg 作為sub agg查詢
    - 3. 嵌套一個(gè)terms agg作為sub agg查詢
  - 2. Range Aggregation:
  - 3. Date Histogram Aggregation:
  - 4. Date Range Aggregation
  - 5. Filter Aggregation
  - 6. Filters Aggregation
  - 7. Histogram Aggregation
  - 8. Missing Aggregation: 統(tǒng)計(jì)某個(gè)field不存在的doc
  - 9. nested aggs：用于nested的doc的聚合查詢，一般是再有一個(gè)子查詢來統(tǒng)計(jì)
  - 10. child agg 查詢，針對(duì)join類型的數(shù)據(jù)進(jìn)行查詢
  - 11. parent agg 查詢，針對(duì)join類型的數(shù)據(jù)進(jìn)行查詢
  - 12. Composite Aggregation 多個(gè)維度的terms進(jìn)行組合操作，類似多層terms的嵌套，但是結(jié)果不是嵌套的，和mysql中按照多個(gè)字段進(jìn)行g(shù)roup by類似
  - 13. Adjacency Matrix Aggregation，鄰接矩陣聚合
  - 14. global agg 查詢，針對(duì)所有數(shù)據(jù)的查詢
  - 15. Significant Terms Aggregation：自動(dòng)查找顯著性的關(guān)鍵字
  - 16. Significant Text Aggregation：自動(dòng)查找顯著性的關(guān)鍵字
  - 17. Sampler Aggregation: 抽樣數(shù)據(jù)聚合
  - 18.Reverse nested Aggregation 在nested agg中仍然可以對(duì)parent 的數(shù)據(jù)進(jìn)行統(tǒng)計(jì)

elasticsearch的aggregate查詢現(xiàn)在越來越豐富了，目前總共有4類。

metric aggregation: 主要是min,max,avg,sum,percetile 等單個(gè)統(tǒng)計(jì)指標(biāo)的查詢

bucket aggregation: 主要是類似group by的查詢操作

matrix aggregation: 使用多個(gè)字段的值進(jìn)行計(jì)算從而產(chǎn)生一個(gè)多維矩陣

pipline aggregation: 主要是能夠在其他的aggregation進(jìn)行一些附加的處理來增強(qiáng)數(shù)據(jù)

本篇就主要學(xué)習(xí)bucket aggregation，bucket aggregation查詢類似group by 查詢，而且相對(duì)metric aggregation 查詢來說，bucket agg可以有sub aggregation, 也就是可以進(jìn)行嵌套，嵌套的sub agg可以是bucket agg也可以是 metric agg。

1. bucket aggregation 查詢類型概覽

Terms Aggregation: 典型的grop by 類型，按照某個(gè)field將文檔進(jìn)行分桶，如果該field的value是數(shù)組的話，則該文檔會(huì)被統(tǒng)計(jì)到多個(gè)bucket當(dāng)中
Range Aggregation: 一般是針對(duì)number field，指定多個(gè)范圍進(jìn)行bucket劃分
Date Histogram Aggregation: 按照時(shí)間進(jìn)行分bucket,自動(dòng)按照月等進(jìn)行劃分
Date Range Aggregation: 按照時(shí)間范圍進(jìn)行bucket,類似range aggregation
Filter Aggregation: 就是一個(gè)簡單的過濾器，和query中的filter功能類似
Filters Aggregation: 多個(gè)filter進(jìn)行過濾
Histogram Aggregation: 柱狀圖的聚合

Missing Aggregation: 統(tǒng)計(jì)某個(gè)field不存在的doc
Adjacency Matrix Aggregation
Auto-interval Date Histogram Aggregation
Children Aggregation
Composite Aggregation
Diversified Sampler Aggregation
Geo Distance Aggregation
GeoHash grid Aggregation
GeoTile Grid Aggregation
Global Aggregation
IP Range Aggregation
Nested Aggregation
Parent Aggregation
Reverse nested Aggregation
Sampler Aggregation
Significant Terms Aggregation
Significant Text Aggregation

2. 數(shù)據(jù)準(zhǔn)備

演唱會(huì)的票信息
GET seats1028/_search

{ "play" : "Auntie Jo", # 演唱會(huì)名稱 "date" : "2018-11-6", # 時(shí)間 "theatre" : "Skyline", # 地點(diǎn) "sold" : false, # 這個(gè)票是否已經(jīng)賣出 "actors" : [ # 演員"Jo Hangum","Jon Hittle","Rob Kettleman","Laura Conrad","Simon Hower","Nora Blue"], "datetime" : 1541497200000, "price" : 8321, # 票價(jià) "tip" : 17.5, # 優(yōu)惠 "time" : "5:40PM" }

總共有3w+條這樣的數(shù)據(jù)

3. 使用樣例

1. Terms Aggregation:

典型的grop by 類型，按照某個(gè)field將文檔進(jìn)行分桶，如果該field的value是數(shù)組的話，則該文檔會(huì)被統(tǒng)計(jì)到多個(gè)bucket當(dāng)中

1. 普通的terms agg

GET seats1028/_search {"size": 0,"aggs": {"term_price":{"terms": {"field": "price","min_doc_count": 13,"size": 50}}} }返回 "aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 35384,"buckets" : [{"key" : 910,"doc_count" : 13},{"key" : 3273,"doc_count" : 13},{"key" : 3648,"doc_count" : 13}]}}

2. 嵌套一個(gè)metric agg 作為sub agg查詢

按照row進(jìn)行分組，取doc數(shù)量最多的前3個(gè)bucket，并計(jì)算每個(gè)bucket中的price的最大值。

GET seats1028/_search {"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"max_price": {"max": {"field": "price"}}}}} }返回"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"max_price" : {"value" : 9998.0}},{"key" : 3,"doc_count" : 5796,"max_price" : {"value" : 9999.0}},{"key" : 1,"doc_count" : 5791,"max_price" : {"value" : 9999.0}}]}}

3. 嵌套一個(gè)terms agg作為sub agg查詢

先按照row進(jìn)行bucket劃分，給出doc數(shù)量前3的row對(duì)應(yīng)的bucket，然后每個(gè)bucket按照number進(jìn)行再分bucket, 并給出doc數(shù)量前三的number值對(duì)應(yīng)的bucket。

GET seats1028/_search {"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"number_term": {"terms": {"field": "number","size": 3,"order": {"_count": "desc"}}}}}} }返回 "aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 3,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 1,"doc_count" : 5791,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4363,"buckets" : [{"key" : 5,"doc_count" : 476},{"key" : 6,"doc_count" : 476},{"key" : 7,"doc_count" : 476}]}}]}}

2. Range Aggregation:

一般是針對(duì)number field，指定多個(gè)范圍進(jìn)行bucket劃分,包含from數(shù)值，不包含to對(duì)應(yīng)的數(shù)值

GET seats1028/_search {"size": 0,"aggs": {"price_range": {"range": {"field": "price","ranges": [{"from": 5000,"to": 6000}]}}} }返回 "aggregations" : {"price_range" : {"buckets" : [{"key" : "5000.0-6000.0","from" : 5000.0,"to" : 6000.0,"doc_count" : 3646}]}}

3. Date Histogram Aggregation:

按照時(shí)間進(jìn)行分bucket,自動(dòng)按照月等進(jìn)行劃分

GET seats1028/_search {"size": 0,"aggs": {"price_date_histogram": {"date_histogram": {"field": "datetime","calendar_interval": "month"}}} }返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key_as_string" : "2018-03-01T00:00:00.000Z","key" : 1519862400000,"doc_count" : 2310},{"key_as_string" : "2018-04-01T00:00:00.000Z","key" : 1522540800000,"doc_count" : 3946},{"key_as_string" : "2018-05-01T00:00:00.000Z","key" : 1525132800000,"doc_count" : 3948},{"key_as_string" : "2018-06-01T00:00:00.000Z","key" : 1527811200000,"doc_count" : 3948},{"key_as_string" : "2018-07-01T00:00:00.000Z","key" : 1530403200000,"doc_count" : 3948}]}}

4. Date Range Aggregation

按照時(shí)間范圍進(jìn)行bucket,類似range aggregation

GET seats1028/_search {"size": 0,"aggs": {"price_date_histogram": {"date_range": {"field": "datetime","ranges": [{"from": "2018-10-01T00:00:00.000Z","to": "2018-11-01T00:00:00.000Z"}]}}} }返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key" : "2018-10-01T00:00:00.000Z-2018-11-01T00:00:00.000Z","from" : 1.538352E12,"from_as_string" : "2018-10-01T00:00:00.000Z","to" : 1.5410304E12,"to_as_string" : "2018-11-01T00:00:00.000Z","doc_count" : 3948}]}}

5. Filter Aggregation

就是一個(gè)簡單的過濾器，和query中的filter功能類似

GET seats1028/_search {"size": 0,"aggs": {"sold_filter": {"filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"aggs": {"max_price": {"max": {"field": "price"}}}}} }返回 "aggregations" : {"sold_filter" : {"doc_count" : 6300, # 這個(gè)是filter后的doc count"max_price" : {"value" : 9996.0}}}

6. Filters Aggregation

多個(gè)filter進(jìn)行過濾, 對(duì)于每個(gè)filter過濾的結(jié)果再應(yīng)用子agg查詢

GET seats1028/_search {"size": 0,"aggs": {"sold_filter": {"filters": {"filters": { # 這個(gè)地方的用法還是挺怪異的，最終還是"tip_filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"number_filter": {"range": {"number": {"gte": 5,"lte":10}}}}},"aggs": {"max_price": {"max": {"field": "price"}}}}} } 返回"aggregations" : {"sold_filter" : {"buckets" : {"number_filter" : {"doc_count" : 16072,"max_price" : {"value" : 9999.0}},"tip_filter" : { "doc_count" : 6300,"max_price" : {"value" : 9996.0}}}}}

可以看到這里對(duì)每一個(gè)子的filter都進(jìn)行了過濾

7. Histogram Aggregation

柱狀圖的聚合,這里用來聚合的字段一般是數(shù)值型，比較方便用來分組

GET seats1028/_search {"size": 0,"aggs": {"tip_histogram":{"histogram": {"field": "tip","interval": 4}}} }返回"aggregations" : {"number_histogram" : {"buckets" : [{"key" : 16.0,"doc_count" : 4200},{"key" : 20.0,"doc_count" : 8400},{"key" : 24.0,"doc_count" : 17808},{"key" : 28.0,"doc_count" : 5794}]}}

8. Missing Aggregation: 統(tǒng)計(jì)某個(gè)field不存在的doc

GET seats1028/_search {"size":0,"aggs": {"miss_f": {"missing": {"field": "row"}}} }返回 "aggregations" : {"miss_f" : {"doc_count" : 1}}

9. nested aggs：用于nested的doc的聚合查詢，一般是再有一個(gè)子查詢來統(tǒng)計(jì)

數(shù)據(jù)樣例
這個(gè)查詢用于nested的doc的聚合查詢，一般是再有一個(gè)子查詢來統(tǒng)計(jì)
數(shù)據(jù)樣例，班級(jí)里面有一個(gè)學(xué)生列表，學(xué)生有age,name屬性

GET nest_test/_mapping 返回 {"mappings" : {"properties" : {"c_name" : {"type" : "text"},"class" : {"type" : "nested","properties" : {"students" : {"type" : "nested","properties" : {"age" : {"type" : "integer"},"name" : {"type" : "text"}}}}}}}}對(duì)應(yīng)的文檔有兩個(gè) "_source" : {"c_name" : "start_class","class" : {"students" : [{"name" : "jack chen","age" : 30},{"name" : "jack man","age" : 20},{"name" : "pony wang","age" : 60},{"name" : "gebi wang","age" : 90}]}}"_source" : {"c_name" : "sun_class","class" : {"students" : [{"name" : "lucy chen","age" : 30},{"name" : "lucy man","age" : 20},{"name" : "dong wang","age" : 60},{"name" : "chess wang","age" : 90}]}}

對(duì)應(yīng)的查詢

GET nest_test/_search {"size": 0,"aggs": {"nested_agg": {"nested": {"path": "class.students"},"aggs": {"min_age": {"min": {"field": "class.students.age"}}}}} }返回"aggregations" : {"nested_agg" : {"doc_count" : 8,"min_age" : {"value" : 20.0}}}

10. child agg 查詢，針對(duì)join類型的數(shù)據(jù)進(jìn)行查詢

數(shù)據(jù)準(zhǔn)備，每個(gè)教室（class_room）可以有多個(gè)課程（subject）,每個(gè)學(xué)生（student）可以選擇一個(gè)或者多個(gè)class_room，這樣class_room和student就構(gòu)成了parent/child的關(guān)系

PUT join_class {"mappings": {"properties": {"subject":{"type": "keyword"},"class_student":{"type": "join","relations":{"class_room":"student"}}}} }PUT join_class/_doc/1 {"subject":["english","Chinese","Russia"],"class_student":{"name":"class_room"},"des":"this class room teach english, Chinese, Russia" }PUT join_class/_doc/2?routing=1 {"class_student":{"name":"student","parent":1},"name":"jack" }PUT join_class/_doc/3?routing=1 {"class_student":{"name":"student","parent":1},"name":"pony" }

下面這個(gè)查詢要查找的是每個(gè)subject的對(duì)應(yīng)的有哪些學(xué)生

GET join_class/_search {"size":0,"query": {"match_all": {}},"aggs": {"subject_term": {"terms": {"field": "subject","size": 10},"aggs": {"subject_student": {"children": {"type": "student"},"aggs": {"term_name": {"terms": {"field": "name.keyword","size": 10}}}}}}} }返回"aggregations" : {"subject_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "Russia","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "english","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}}]}}

11. parent agg 查詢，針對(duì)join類型的數(shù)據(jù)進(jìn)行查詢

承接上面的數(shù)據(jù)樣例，下面的請(qǐng)求查找每個(gè)學(xué)生選的課程

GET join_class/_search {"size":0,"query": {"match_all": {}},"aggs": {"student_term": {"terms": {"field": "name.keyword","size": 10},"aggs": {"subject_student": {"parent": {"type": "student"},"aggs": {"choose_subject": {"terms": {"field": "subject","size": 10}}}}}}} }

"aggregations" : {"student_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}}]}}

12. Composite Aggregation 多個(gè)維度的terms進(jìn)行組合操作，類似多層terms的嵌套，但是結(jié)果不是嵌套的，和mysql中按照多個(gè)字段進(jìn)行g(shù)roup by類似

數(shù)據(jù)初始化

PUT composite_test {"mappings": {"properties": {"area": {"type": "keyword"},"userid": {"type": "keyword"},"sendtime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss"}}} } POST composite_test/_bulk { "index" : {"_type" :"_doc"}} {"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"} { "index" : {"_type" : "_doc"}} {"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"} { "index" : {"_type" : "_doc"}} {"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"} { "index" : { "_type" : "_doc"}} {"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"} { "index" : {"_type" : "_doc"}} {"area":"33","userid":"400017","sendtime":"2019-01-17 00:00:00"}

下面的查詢會(huì)按照area，userid， sendtime 三個(gè)字段進(jìn)行g(shù)roup by查詢

GET composite_test/_search {"size": 0,"aggs": {"my_buckets": {"composite": {"sources": [{"area": {"terms": {"field": "area"}}},{"userid": {"terms": {"field": "userid"}}},{"sendtime": {"date_histogram": {"field": "sendtime","fixed_interval": "1d","format": "yyyy-MM-dd"}}}]}}} }

"aggregations" : {"my_buckets" : {"after_key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"buckets" : [{"key" : {"area" : "33","userid" : "400015","sendtime" : "2019-01-17"},"doc_count" : 2},{"key" : {"area" : "33","userid" : "400017","sendtime" : "2019-01-17"},"doc_count" : 1},{"key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"doc_count" : 2}]}}

13. Adjacency Matrix Aggregation，鄰接矩陣聚合

鄰接矩陣聚合，上面的composition是多個(gè)維度的terms求交，這個(gè)更弱一些，只能做指定的field的某些值進(jìn)行鄰接矩陣生成
使用上面的數(shù)據(jù)樣例,下面的查詢會(huì)返回area=33的doc統(tǒng)計(jì)，userid=400015的doc統(tǒng)計(jì)，同時(shí)還會(huì)返回area=33 & userid=400015的doc統(tǒng)計(jì)

GET composite_test/_search {"size": 0,"aggs": {"composite_two": {"adjacency_matrix": {"filters": {"area_filter":{"terms":{"area":["33"]}},"user_id_filter":{"terms":{"userid":["400015"]}}}}}}

"aggregations" : {"composite_two" : {"buckets" : [{"key" : "area_filter","doc_count" : 3},{"key" : "area_filter&user_id_filter","doc_count" : 2},{"key" : "user_id_filter","doc_count" : 2}]}}

14. global agg 查詢，針對(duì)所有數(shù)據(jù)的查詢

這個(gè)就是忽略query的過濾信息，直接針對(duì)index中的所有數(shù)據(jù)進(jìn)行子聚合

GET seats1028/_search {"size": 0, "query": {"term": {"row": {"value": 5}}},"aggs": {"global_row": {"global": {},"aggs": {"avg_row": {"avg": {"field": "row"}}}},"avg_row02":{"avg": {"field": "row"}}} }

"aggregations" : {"global_row" : {"doc_count" : 30992,"avg_row" : {"value" : 4.333871123874673 # 這個(gè)值是從所有的doc中算出來的}},"avg_row02" : {"value" : 5.0 # 這個(gè)是query過濾后的doc中計(jì)算出來的}}

15. Significant Terms Aggregation：自動(dòng)查找顯著性的關(guān)鍵字

這個(gè)是在keyword的字段中查找當(dāng)前的顯著性的字段，查找出現(xiàn)頻率比較高的字段
還是使用案例來說明更靠譜，這里舉例的是網(wǎng)頁新聞news，每個(gè)新聞news有作者（author） title, topic,等信息
相關(guān)數(shù)據(jù)構(gòu)造如下

PUT news {"mappings": {"properties": {"published": {"type": "date","format": "dateOptionalTime"},"author": {"type": "keyword"},"title": {"type": "text"},"topic": {"type": "keyword"},"views": {"type": "integer"}}} }POST news/_bulk {"index": {"_index": "news"} } {"author": "John Michael","published": "2018-07-08","title": "Tesla is flirting with its lowest close in over 1 1/2 years (TSLA)","topic": "automobile","views": "431" } {"index": {"_index": "news"} } {"author": "John Michael","published": "2018-07-22","title": "Tesla to end up like Lehman Brothers (TSLA)","topic": "automobile","views": "1921" } {"index": {"_index": "news"} } {"author": "John Michael","published": "2018-07-29","title": "Tesla (TSLA) official says that they are going to release a new self-driving car model in the coming year","topic": "automobile","views": "1849" } {"index": {"_index": "news"} } {"author": "John Michael","published": "2018-08-14","title": "Five ways Tesla uses AI and Big Data","topic": "ai","views": "871" } {"index": {"_index": "news"} } {"author": "John Michael","published": "2018-08-14","title": "Toyota partners with Tesla (TSLA) to improve the security of self-driving cars","topic": "automobile","views": "871" } {"index": {"_index": "news"} } {"author": "Robert Cann","published": "2018-08-25","title": "Is AI dangerous for humanity","topic": "ai","views": "981" } {"index": {"_index": "news"} } {"author": "Robert Cann","published": "2018-09-13","title": "Is AI dangerous for humanity","topic": "ai","views": "871" } {"index": {"_index": "news"} } {"author": "Robert Cann","published": "2018-09-27","title": "Introduction to Generative Adversarial Networks (GANs) in self-driving cars","topic": "automobile","views": "1183" } {"index": {"_index": "news"} } {"author": "Robert Cann","published": "2018-10-09","title": "Introduction to Natural Language Processing","topic": "ai","views": "786" } {"index": {"_index": "news"} } {"author": "Robert Cann","published": "2018-10-15","title": "New Distant Objects Found in the Fight for Planet X ","topic": "astronomy","views": "542" }

查找每個(gè)作者關(guān)注最多的topic，那么該作者肯定在該topic的發(fā)問最多

GET news/_search {"size": 0,"aggregations": {"authors": {"terms": {"field": "author"},"aggregations": {"significant_topic_types": {"significant_terms": {"field": "topic"}}}}} }

"aggregations" : {"authors" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "John Michael","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5,"bg_count" : 10,"buckets" : [{"key" : "automobile","doc_count" : 4,"score" : 0.4800000000000001,"bg_count" : 5}]}},{"key" : "Robert Cann","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5, # Robert Cann 總的doc數(shù)量為5個(gè)"bg_count" : 10, # index中所有的doc數(shù)量為10"buckets" : [{"key" : "ai","doc_count" : 3, # Robert Cann 的topic為ai的doc總共有3個(gè)"score" : 0.2999999999999999,"bg_count" : 4 ## 這里是指索引中topic是ai的文檔總共有4個(gè)}]}}]}}

上面的統(tǒng)計(jì)說明John Michael 這位作者最關(guān)注的話題是 automobile（自動(dòng)駕駛），而Robert Cann 最關(guān)注的是ai相關(guān)的話題，相關(guān)的bg_count的說明查看上面的注釋

16. Significant Text Aggregation：自動(dòng)查找顯著性的關(guān)鍵字

這個(gè)和上面的Significant terms Aggregation類似，就是針對(duì)的是text字段，而且會(huì)進(jìn)行分詞處理
使用上面的數(shù)據(jù)進(jìn)行下面的查詢

GET news/_search {"query": {"match": {"title": " AI "}},"size": 0,"aggs": {"significant_title": {"significant_text": {"field": "title"}}} }

"aggregations" : {"significant_title" : {"doc_count" : 3,"bg_count" : 10,"buckets" : [{"key" : "ai","doc_count" : 3,"score" : 2.3333333333333335,"bg_count" : 3}]}}

17. Sampler Aggregation: 抽樣數(shù)據(jù)聚合

這個(gè)一般是在significant_terms 查詢的時(shí)候，有時(shí)候索引中的數(shù)據(jù)可能非常大，導(dǎo)致耗時(shí)也比較嚴(yán)重，可以用這個(gè)來做抽樣聚合，抽取更相關(guān)的樣本數(shù)據(jù)來進(jìn)行聚合

POST /stackoverflow/_search?size=0 {"query": {"query_string": {"query": "tags:kibana OR tags:javascript"}},"aggs": {"sample": {"sampler": {"shard_size": 200},"aggs": {"keywords": {"significant_terms": {"field": "tags","exclude": ["kibana", "javascript"]}}}}} }

shard_size 參數(shù)指的是每個(gè)分片抽取的樣本數(shù)量，默認(rèn)為 100
返回

{..."aggregations": {"sample": {"doc_count": 200,"keywords": {"doc_count": 200,"bg_count": 650,"buckets": [{"key": "elasticsearch","doc_count": 150,"score": 1.078125,"bg_count": 200},{"key": "logstash","doc_count": 50,"score": 0.5625,"bg_count": 50}]}}} }

18.Reverse nested Aggregation 在nested agg中仍然可以對(duì)parent 的數(shù)據(jù)進(jìn)行統(tǒng)計(jì)

Reverse nested Aggregation 的作用主要是能夠讓聚合在作為 Nested Aggregation 子聚合的情況下，跳出嵌套類型，對(duì)根文檔的數(shù)據(jù)作聚合計(jì)算。
有例子：

PUT /issues {"mappings": {"properties" : {"tags" : { "type" : "keyword" },"comments" : { "type" : "nested","properties" : {"username" : { "type" : "keyword" },"comment" : { "type" : "text" }}}}} }PUT issues/_doc/1 {"tags": ["bug","improve"],"comments": [{"username": "jack","comment": " this is a bug"},{"username": "pony","comment": " this is a improve"}] }PUT issues/_doc/2 {"tags": ["advice","improve"],"comments": [{"username": "jack","comment": " this is a good job "},{"username": "nacy","comment": " this is a improvement"}] }

查詢

GET /issues/_search {"size": 0,"query": {"match_all": {}},"aggs": {"comments": {"nested": {"path": "comments"},"aggs": {"top_usernames": {"terms": {"field": "comments.username"},"aggs": {"comment_to_issue": {"reverse_nested": {},"aggs": {"top_tags_per_comment": {"terms": {"field": "tags"}}}}}}}}} }

"aggregations" : {"comments" : {"doc_count" : 4,"top_usernames" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 2,"comment_to_issue" : {"doc_count" : 2,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "improve","doc_count" : 2},{"key" : "advice","doc_count" : 1},{"key" : "bug","doc_count" : 1}]}}},{"key" : "nacy","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "advice","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "bug","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}}]}}}

在 Nested Aggregation 聚合下，Reverse nested Aggregation 的子聚合計(jì)算聚合的數(shù)據(jù)集是該嵌套文檔的根文檔。
根據(jù) Reverse nested Aggregation 的作用，可以清楚這是一個(gè)專門作為 Nested Aggregation 子聚合的聚合計(jì)算，所以作為頂層聚合或者是作為非 Nested Aggregation 的子聚合是沒意義的。
在默認(rèn)情況下， Reverse nested Aggregation 將找到根文檔，當(dāng)然如果有多層嵌套，也可以通過 path 參數(shù)指定文檔的路徑。

總結(jié)

以上是生活随笔為你收集整理的02.elasticsearch bucket aggregation查询的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： 01.elasticsearch met
下一篇： 03.elasticsearch pip