日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

02.elasticsearch bucket aggregation查询

發(fā)布時間:2024/2/28 编程问答 34 豆豆
生活随笔 收集整理的這篇文章主要介紹了 02.elasticsearch bucket aggregation查询 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

    • 1. bucket aggregation 查詢類型概覽
    • 2. 數(shù)據(jù)準備
    • 3. 使用樣例
      • 1. Terms Aggregation:
        • 1. 普通的terms agg
        • 2. 嵌套一個metric agg 作為sub agg查詢
        • 3. 嵌套一個terms agg作為sub agg查詢
      • 2. Range Aggregation:
      • 3. Date Histogram Aggregation:
      • 4. Date Range Aggregation
      • 5. Filter Aggregation
      • 6. Filters Aggregation
      • 7. Histogram Aggregation
      • 8. Missing Aggregation: 統(tǒng)計某個field不存在的doc
      • 9. nested aggs:用于nested的doc的聚合查詢,一般是再有一個子查詢來統(tǒng)計
      • 10. child agg 查詢,針對join類型的數(shù)據(jù)進行查詢
      • 11. parent agg 查詢,針對join類型的數(shù)據(jù)進行查詢
      • 12. Composite Aggregation 多個維度的terms進行組合操作,類似多層terms的嵌套,但是結(jié)果不是嵌套的,和mysql中按照多個字段進行g(shù)roup by類似
      • 13. Adjacency Matrix Aggregation,鄰接矩陣聚合
      • 14. global agg 查詢,針對所有數(shù)據(jù)的查詢
      • 15. Significant Terms Aggregation: 自動查找顯著性的關(guān)鍵字
      • 16. Significant Text Aggregation: 自動查找顯著性的關(guān)鍵字
      • 17. Sampler Aggregation: 抽樣數(shù)據(jù)聚合
      • 18.Reverse nested Aggregation 在nested agg中仍然可以對parent 的數(shù)據(jù)進行統(tǒng)計

elasticsearch的aggregate查詢現(xiàn)在越來越豐富了,目前總共有4類。

  • metric aggregation: 主要是min,max,avg,sum,percetile 等單個統(tǒng)計指標(biāo)的查詢
  • bucket aggregation: 主要是類似group by的查詢操作
  • matrix aggregation: 使用多個字段的值進行計算從而產(chǎn)生一個多維矩陣
  • pipline aggregation: 主要是能夠在其他的aggregation進行一些附加的處理來增強數(shù)據(jù)
  • 本篇就主要學(xué)習(xí)bucket aggregation,bucket aggregation查詢類似group by 查詢,而且相對metric aggregation 查詢來說,bucket agg可以有sub aggregation, 也就是可以進行嵌套,嵌套的sub agg可以是bucket agg也可以是 metric agg。

    1. bucket aggregation 查詢類型概覽

    Terms Aggregation: 典型的grop by 類型,按照某個field將文檔進行分桶,如果該field的value是數(shù)組的話,則該文檔會被統(tǒng)計到多個bucket當(dāng)中
    Range Aggregation: 一般是針對number field,指定多個范圍進行bucket劃分
    Date Histogram Aggregation: 按照時間進行分bucket,自動按照月等進行劃分
    Date Range Aggregation: 按照時間范圍進行bucket,類似range aggregation
    Filter Aggregation: 就是一個簡單的過濾器,和query中的filter功能類似
    Filters Aggregation: 多個filter進行過濾
    Histogram Aggregation: 柱狀圖的聚合

    Missing Aggregation: 統(tǒng)計某個field不存在的doc
    Adjacency Matrix Aggregation
    Auto-interval Date Histogram Aggregation
    Children Aggregation
    Composite Aggregation
    Diversified Sampler Aggregation
    Geo Distance Aggregation
    GeoHash grid Aggregation
    GeoTile Grid Aggregation
    Global Aggregation
    IP Range Aggregation
    Nested Aggregation
    Parent Aggregation
    Reverse nested Aggregation
    Sampler Aggregation
    Significant Terms Aggregation
    Significant Text Aggregation

    2. 數(shù)據(jù)準備

    演唱會的票信息
    GET seats1028/_search

    { "play" : "Auntie Jo", # 演唱會名稱 "date" : "2018-11-6", # 時間 "theatre" : "Skyline", # 地點 "sold" : false, # 這個票是否已經(jīng)賣出 "actors" : [ # 演員"Jo Hangum","Jon Hittle","Rob Kettleman","Laura Conrad","Simon Hower","Nora Blue"], "datetime" : 1541497200000, "price" : 8321, # 票價 "tip" : 17.5, # 優(yōu)惠 "time" : "5:40PM" }

    總共有3w+條這樣的數(shù)據(jù)

    3. 使用樣例

    1. Terms Aggregation:

    典型的grop by 類型,按照某個field將文檔進行分桶,如果該field的value是數(shù)組的話,則該文檔會被統(tǒng)計到多個bucket當(dāng)中

    1. 普通的terms agg

    GET seats1028/_search {"size": 0,"aggs": {"term_price":{"terms": {"field": "price","min_doc_count": 13,"size": 50}}} }返回 "aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 35384,"buckets" : [{"key" : 910,"doc_count" : 13},{"key" : 3273,"doc_count" : 13},{"key" : 3648,"doc_count" : 13}]}}

    2. 嵌套一個metric agg 作為sub agg查詢

    按照row進行分組,取doc數(shù)量最多的前3個bucket,并計算每個bucket中的price的最大值。

    GET seats1028/_search {"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"max_price": {"max": {"field": "price"}}}}} }返回"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"max_price" : {"value" : 9998.0}},{"key" : 3,"doc_count" : 5796,"max_price" : {"value" : 9999.0}},{"key" : 1,"doc_count" : 5791,"max_price" : {"value" : 9999.0}}]}}

    3. 嵌套一個terms agg作為sub agg查詢

    先按照row進行bucket劃分,給出doc數(shù)量前3的row對應(yīng)的bucket,然后每個bucket按照number進行再分bucket, 并給出doc數(shù)量前三的number值對應(yīng)的bucket。

    GET seats1028/_search {"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"number_term": {"terms": {"field": "number","size": 3,"order": {"_count": "desc"}}}}}} }返回 "aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 3,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 1,"doc_count" : 5791,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4363,"buckets" : [{"key" : 5,"doc_count" : 476},{"key" : 6,"doc_count" : 476},{"key" : 7,"doc_count" : 476}]}}]}}

    2. Range Aggregation:

    一般是針對number field,指定多個范圍進行bucket劃分,包含from數(shù)值,不包含to對應(yīng)的數(shù)值

    GET seats1028/_search {"size": 0,"aggs": {"price_range": {"range": {"field": "price","ranges": [{"from": 5000,"to": 6000}]}}} }返回 "aggregations" : {"price_range" : {"buckets" : [{"key" : "5000.0-6000.0","from" : 5000.0,"to" : 6000.0,"doc_count" : 3646}]}}

    3. Date Histogram Aggregation:

    按照時間進行分bucket,自動按照月等進行劃分

    GET seats1028/_search {"size": 0,"aggs": {"price_date_histogram": {"date_histogram": {"field": "datetime","calendar_interval": "month"}}} }返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key_as_string" : "2018-03-01T00:00:00.000Z","key" : 1519862400000,"doc_count" : 2310},{"key_as_string" : "2018-04-01T00:00:00.000Z","key" : 1522540800000,"doc_count" : 3946},{"key_as_string" : "2018-05-01T00:00:00.000Z","key" : 1525132800000,"doc_count" : 3948},{"key_as_string" : "2018-06-01T00:00:00.000Z","key" : 1527811200000,"doc_count" : 3948},{"key_as_string" : "2018-07-01T00:00:00.000Z","key" : 1530403200000,"doc_count" : 3948}]}}

    4. Date Range Aggregation

    按照時間范圍進行bucket,類似range aggregation

    GET seats1028/_search {"size": 0,"aggs": {"price_date_histogram": {"date_range": {"field": "datetime","ranges": [{"from": "2018-10-01T00:00:00.000Z","to": "2018-11-01T00:00:00.000Z"}]}}} }返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key" : "2018-10-01T00:00:00.000Z-2018-11-01T00:00:00.000Z","from" : 1.538352E12,"from_as_string" : "2018-10-01T00:00:00.000Z","to" : 1.5410304E12,"to_as_string" : "2018-11-01T00:00:00.000Z","doc_count" : 3948}]}}

    5. Filter Aggregation

    就是一個簡單的過濾器,和query中的filter功能類似

    GET seats1028/_search {"size": 0,"aggs": {"sold_filter": {"filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"aggs": {"max_price": {"max": {"field": "price"}}}}} }返回 "aggregations" : {"sold_filter" : {"doc_count" : 6300, # 這個是filter后的doc count"max_price" : {"value" : 9996.0}}}

    6. Filters Aggregation

    多個filter進行過濾, 對于每個filter過濾的結(jié)果再應(yīng)用子agg查詢

    GET seats1028/_search {"size": 0,"aggs": {"sold_filter": {"filters": {"filters": { # 這個地方的用法還是挺怪異的,最終還是"tip_filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"number_filter": {"range": {"number": {"gte": 5,"lte":10}}}}},"aggs": {"max_price": {"max": {"field": "price"}}}}} } 返回"aggregations" : {"sold_filter" : {"buckets" : {"number_filter" : {"doc_count" : 16072,"max_price" : {"value" : 9999.0}},"tip_filter" : { "doc_count" : 6300,"max_price" : {"value" : 9996.0}}}}}

    可以看到這里對每一個子的filter都進行了過濾

    7. Histogram Aggregation

    柱狀圖的聚合,這里用來聚合的字段一般是數(shù)值型,比較方便用來分組

    GET seats1028/_search {"size": 0,"aggs": {"tip_histogram":{"histogram": {"field": "tip","interval": 4}}} }返回"aggregations" : {"number_histogram" : {"buckets" : [{"key" : 16.0,"doc_count" : 4200},{"key" : 20.0,"doc_count" : 8400},{"key" : 24.0,"doc_count" : 17808},{"key" : 28.0,"doc_count" : 5794}]}}

    8. Missing Aggregation: 統(tǒng)計某個field不存在的doc

    GET seats1028/_search {"size":0,"aggs": {"miss_f": {"missing": {"field": "row"}}} }返回 "aggregations" : {"miss_f" : {"doc_count" : 1}}

    9. nested aggs:用于nested的doc的聚合查詢,一般是再有一個子查詢來統(tǒng)計

    數(shù)據(jù)樣例
    這個查詢用于nested的doc的聚合查詢,一般是再有一個子查詢來統(tǒng)計
    數(shù)據(jù)樣例,班級里面有一個學(xué)生列表,學(xué)生有age,name屬性

    GET nest_test/_mapping 返回 {"mappings" : {"properties" : {"c_name" : {"type" : "text"},"class" : {"type" : "nested","properties" : {"students" : {"type" : "nested","properties" : {"age" : {"type" : "integer"},"name" : {"type" : "text"}}}}}}}}對應(yīng)的文檔有兩個 "_source" : {"c_name" : "start_class","class" : {"students" : [{"name" : "jack chen","age" : 30},{"name" : "jack man","age" : 20},{"name" : "pony wang","age" : 60},{"name" : "gebi wang","age" : 90}]}}"_source" : {"c_name" : "sun_class","class" : {"students" : [{"name" : "lucy chen","age" : 30},{"name" : "lucy man","age" : 20},{"name" : "dong wang","age" : 60},{"name" : "chess wang","age" : 90}]}}

    對應(yīng)的查詢

    GET nest_test/_search {"size": 0,"aggs": {"nested_agg": {"nested": {"path": "class.students"},"aggs": {"min_age": {"min": {"field": "class.students.age"}}}}} }返回"aggregations" : {"nested_agg" : {"doc_count" : 8,"min_age" : {"value" : 20.0}}}

    10. child agg 查詢,針對join類型的數(shù)據(jù)進行查詢

    數(shù)據(jù)準備,每個教室(class_room)可以有多個課程(subject),每個學(xué)生(student)可以選擇一個或者多個class_room,這樣class_room和student就構(gòu)成了parent/child的關(guān)系

    PUT join_class {"mappings": {"properties": {"subject":{"type": "keyword"},"class_student":{"type": "join","relations":{"class_room":"student"}}}} }PUT join_class/_doc/1 {"subject":["english","Chinese","Russia"],"class_student":{"name":"class_room"},"des":"this class room teach english, Chinese, Russia" }PUT join_class/_doc/2?routing=1 {"class_student":{"name":"student","parent":1},"name":"jack" }PUT join_class/_doc/3?routing=1 {"class_student":{"name":"student","parent":1},"name":"pony" }

    下面這個查詢要查找的是每個subject的對應(yīng)的有哪些學(xué)生

    GET join_class/_search {"size":0,"query": {"match_all": {}},"aggs": {"subject_term": {"terms": {"field": "subject","size": 10},"aggs": {"subject_student": {"children": {"type": "student"},"aggs": {"term_name": {"terms": {"field": "name.keyword","size": 10}}}}}}} }返回"aggregations" : {"subject_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "Russia","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "english","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}}]}}

    11. parent agg 查詢,針對join類型的數(shù)據(jù)進行查詢

    承接上面的數(shù)據(jù)樣例,下面的請求查找每個學(xué)生選的課程

    GET join_class/_search {"size":0,"query": {"match_all": {}},"aggs": {"student_term": {"terms": {"field": "name.keyword","size": 10},"aggs": {"subject_student": {"parent": {"type": "student"},"aggs": {"choose_subject": {"terms": {"field": "subject","size": 10}}}}}}} }

    返回

    "aggregations" : {"student_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}}]}}

    12. Composite Aggregation 多個維度的terms進行組合操作,類似多層terms的嵌套,但是結(jié)果不是嵌套的,和mysql中按照多個字段進行g(shù)roup by類似

    數(shù)據(jù)初始化

    PUT composite_test {"mappings": {"properties": {"area": {"type": "keyword"},"userid": {"type": "keyword"},"sendtime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss"}}} } POST composite_test/_bulk { "index" : {"_type" :"_doc"}} {"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"} { "index" : {"_type" : "_doc"}} {"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"} { "index" : {"_type" : "_doc"}} {"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"} { "index" : { "_type" : "_doc"}} {"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"} { "index" : {"_type" : "_doc"}} {"area":"33","userid":"400017","sendtime":"2019-01-17 00:00:00"}

    下面的查詢會按照area,userid, sendtime 三個字段進行g(shù)roup by查詢

    GET composite_test/_search {"size": 0,"aggs": {"my_buckets": {"composite": {"sources": [{"area": {"terms": {"field": "area"}}},{"userid": {"terms": {"field": "userid"}}},{"sendtime": {"date_histogram": {"field": "sendtime","fixed_interval": "1d","format": "yyyy-MM-dd"}}}]}}} }

    返回

    "aggregations" : {"my_buckets" : {"after_key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"buckets" : [{"key" : {"area" : "33","userid" : "400015","sendtime" : "2019-01-17"},"doc_count" : 2},{"key" : {"area" : "33","userid" : "400017","sendtime" : "2019-01-17"},"doc_count" : 1},{"key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"doc_count" : 2}]}}

    13. Adjacency Matrix Aggregation,鄰接矩陣聚合

    鄰接矩陣聚合,上面的composition是多個維度的terms求交,這個更弱一些,只能做指定的field的某些值進行鄰接矩陣生成
    使用上面的數(shù)據(jù)樣例,下面的查詢會返回area=33的doc統(tǒng)計,userid=400015的doc統(tǒng)計,同時還會返回area=33 & userid=400015的doc統(tǒng)計

    GET composite_test/_search {"size": 0,"aggs": {"composite_two": {"adjacency_matrix": {"filters": {"area_filter":{"terms":{"area":["33"]}},"user_id_filter":{"terms":{"userid":["400015"]}}}}}}

    返回

    "aggregations" : {"composite_two" : {"buckets" : [{"key" : "area_filter","doc_count" : 3},{"key" : "area_filter&user_id_filter","doc_count" : 2},{"key" : "user_id_filter","doc_count" : 2}]}}

    14. global agg 查詢,針對所有數(shù)據(jù)的查詢

    這個就是忽略query的過濾信息,直接針對index中的所有數(shù)據(jù)進行子聚合

    GET seats1028/_search {"size": 0, "query": {"term": {"row": {"value": 5}}},"aggs": {"global_row": {"global": {},"aggs": {"avg_row": {"avg": {"field": "row"}}}},"avg_row02":{"avg": {"field": "row"}}} }

    返回

    "aggregations" : {"global_row" : {"doc_count" : 30992,"avg_row" : {"value" : 4.333871123874673 # 這個值是從所有的doc中算出來的}},"avg_row02" : {"value" : 5.0 # 這個是query過濾后的doc中計算出來的}}

    15. Significant Terms Aggregation: 自動查找顯著性的關(guān)鍵字

    這個是在keyword的字段中查找當(dāng)前的顯著性的字段,查找出現(xiàn)頻率比較高的字段
    還是使用案例來說明更靠譜,這里舉例的是網(wǎng)頁新聞news,每個新聞news有作者(author) title, topic,等信息
    相關(guān)數(shù)據(jù)構(gòu)造如下

    PUT news {"mappings": {"properties": {"published": {"type": "date","format": "dateOptionalTime"},"author": {"type": "keyword"},"title": {"type": "text"},"topic": {"type": "keyword"},"views": {"type": "integer"}}} }POST news/_bulk {"index": {"_index": "news"} } {"author": "John Michael","published": "2018-07-08","title": "Tesla is flirting with its lowest close in over 1 1/2 years (TSLA)","topic": "automobile","views": "431" } {"index": {"_index": "news"} } {"author": "John Michael","published": "2018-07-22","title": "Tesla to end up like Lehman Brothers (TSLA)","topic": "automobile","views": "1921" } {"index": {"_index": "news"} } {"author": "John Michael","published": "2018-07-29","title": "Tesla (TSLA) official says that they are going to release a new self-driving car model in the coming year","topic": "automobile","views": "1849" } {"index": {"_index": "news"} } {"author": "John Michael","published": "2018-08-14","title": "Five ways Tesla uses AI and Big Data","topic": "ai","views": "871" } {"index": {"_index": "news"} } {"author": "John Michael","published": "2018-08-14","title": "Toyota partners with Tesla (TSLA) to improve the security of self-driving cars","topic": "automobile","views": "871" } {"index": {"_index": "news"} } {"author": "Robert Cann","published": "2018-08-25","title": "Is AI dangerous for humanity","topic": "ai","views": "981" } {"index": {"_index": "news"} } {"author": "Robert Cann","published": "2018-09-13","title": "Is AI dangerous for humanity","topic": "ai","views": "871" } {"index": {"_index": "news"} } {"author": "Robert Cann","published": "2018-09-27","title": "Introduction to Generative Adversarial Networks (GANs) in self-driving cars","topic": "automobile","views": "1183" } {"index": {"_index": "news"} } {"author": "Robert Cann","published": "2018-10-09","title": "Introduction to Natural Language Processing","topic": "ai","views": "786" } {"index": {"_index": "news"} } {"author": "Robert Cann","published": "2018-10-15","title": "New Distant Objects Found in the Fight for Planet X ","topic": "astronomy","views": "542" }

    查找每個作者關(guān)注最多的topic,那么該作者肯定在該topic的發(fā)問最多

    GET news/_search {"size": 0,"aggregations": {"authors": {"terms": {"field": "author"},"aggregations": {"significant_topic_types": {"significant_terms": {"field": "topic"}}}}} }

    返回

    "aggregations" : {"authors" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "John Michael","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5,"bg_count" : 10,"buckets" : [{"key" : "automobile","doc_count" : 4,"score" : 0.4800000000000001,"bg_count" : 5}]}},{"key" : "Robert Cann","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5, # Robert Cann 總的doc數(shù)量為5個"bg_count" : 10, # index中所有的doc數(shù)量為10"buckets" : [{"key" : "ai","doc_count" : 3, # Robert Cann 的topic為ai的doc總共有3個"score" : 0.2999999999999999,"bg_count" : 4 ## 這里是指索引中topic是ai的文檔總共有4個}]}}]}}

    上面的統(tǒng)計說明John Michael 這位作者最關(guān)注的話題是 automobile(自動駕駛),而Robert Cann 最關(guān)注的是ai相關(guān)的話題,相關(guān)的bg_count的說明查看上面的注釋

    16. Significant Text Aggregation: 自動查找顯著性的關(guān)鍵字

    這個和上面的Significant terms Aggregation類似,就是針對的是text字段,而且會進行分詞處理
    使用上面的數(shù)據(jù)進行下面的查詢

    GET news/_search {"query": {"match": {"title": " AI "}},"size": 0,"aggs": {"significant_title": {"significant_text": {"field": "title"}}} }

    返回

    "aggregations" : {"significant_title" : {"doc_count" : 3,"bg_count" : 10,"buckets" : [{"key" : "ai","doc_count" : 3,"score" : 2.3333333333333335,"bg_count" : 3}]}}

    17. Sampler Aggregation: 抽樣數(shù)據(jù)聚合

    這個一般是在significant_terms 查詢的時候,有時候索引中的數(shù)據(jù)可能非常大,導(dǎo)致耗時也比較嚴重,可以用這個來做抽樣聚合,抽取更相關(guān)的樣本數(shù)據(jù)來進行聚合

    POST /stackoverflow/_search?size=0 {"query": {"query_string": {"query": "tags:kibana OR tags:javascript"}},"aggs": {"sample": {"sampler": {"shard_size": 200},"aggs": {"keywords": {"significant_terms": {"field": "tags","exclude": ["kibana", "javascript"]}}}}} }

    shard_size 參數(shù)指的是每個分片抽取的樣本數(shù)量,默認為 100
    返回

    {..."aggregations": {"sample": {"doc_count": 200,"keywords": {"doc_count": 200,"bg_count": 650,"buckets": [{"key": "elasticsearch","doc_count": 150,"score": 1.078125,"bg_count": 200},{"key": "logstash","doc_count": 50,"score": 0.5625,"bg_count": 50}]}}} }

    18.Reverse nested Aggregation 在nested agg中仍然可以對parent 的數(shù)據(jù)進行統(tǒng)計

    Reverse nested Aggregation 的作用主要是能夠讓聚合在作為 Nested Aggregation 子聚合的情況下,跳出嵌套類型,對根文檔的數(shù)據(jù)作聚合計算。
    有例子:

    PUT /issues {"mappings": {"properties" : {"tags" : { "type" : "keyword" },"comments" : { "type" : "nested","properties" : {"username" : { "type" : "keyword" },"comment" : { "type" : "text" }}}}} }PUT issues/_doc/1 {"tags": ["bug","improve"],"comments": [{"username": "jack","comment": " this is a bug"},{"username": "pony","comment": " this is a improve"}] }PUT issues/_doc/2 {"tags": ["advice","improve"],"comments": [{"username": "jack","comment": " this is a good job "},{"username": "nacy","comment": " this is a improvement"}] }

    查詢

    GET /issues/_search {"size": 0,"query": {"match_all": {}},"aggs": {"comments": {"nested": {"path": "comments"},"aggs": {"top_usernames": {"terms": {"field": "comments.username"},"aggs": {"comment_to_issue": {"reverse_nested": {},"aggs": {"top_tags_per_comment": {"terms": {"field": "tags"}}}}}}}}} }

    返回

    "aggregations" : {"comments" : {"doc_count" : 4,"top_usernames" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 2,"comment_to_issue" : {"doc_count" : 2,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "improve","doc_count" : 2},{"key" : "advice","doc_count" : 1},{"key" : "bug","doc_count" : 1}]}}},{"key" : "nacy","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "advice","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "bug","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}}]}}}

    在 Nested Aggregation 聚合下,Reverse nested Aggregation 的子聚合計算聚合的數(shù)據(jù)集是該嵌套文檔的根文檔。
    根據(jù) Reverse nested Aggregation 的作用,可以清楚這是一個專門作為 Nested Aggregation 子聚合的聚合計算,所以作為頂層聚合或者是作為非 Nested Aggregation 的子聚合是沒意義的。
    在默認情況下, Reverse nested Aggregation 將找到根文檔,當(dāng)然如果有多層嵌套,也可以通過 path 參數(shù)指定文檔的路徑。

    總結(jié)

    以上是生活随笔為你收集整理的02.elasticsearch bucket aggregation查询的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。