Elasticsearch 之(6)kibana嵌套聚合,下钻分析,聚合分析
生活随笔
收集整理的這篇文章主要介紹了
Elasticsearch 之(6)kibana嵌套聚合,下钻分析,聚合分析
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
兩個核心概念:bucket和metric
city name
北京 小李
北京 小王
上海 小張
上海 小麗
上海 小陳
基于city劃分buckets
劃分出來兩個bucket,一個是北京bucket,一個是上海bucket
北京bucket:包含了2個人,小李,小王
上海bucket:包含了3個人,小張,小麗,小陳
按照某個字段進行bucket劃分,那個字段的值相同的那些數據,就會被劃分到一個bucket中
有一些mysql的sql知識的話,聚合,首先第一步就是分組,對每個組內的數據進行聚合分析,分組,就是我們的bucket
metric:對一個數據分組執行的統計
當我們有了一堆bucket之后,就可以對每個bucket中的數據進行聚合分詞了,比如說計算一個bucket內所有數據的數量,或者計算一個bucket內所有數據的平均值,最大值,最小值
bucket:group by user_id --> 那些user_id相同的數據,就會被劃分到一個bucket中
metric,就是對一個bucket執行的某種聚合分析的操作,比如說求平均值,求最大值,求最小值
計算一個數量計算每個tag下的商品數量 GET /ecommerce/product/_search {"size" : 0,??"aggs": {"group_by_tags": {"terms": { "field": "tags" }}} } size:只獲取聚合結果,而不要執行聚合的原始數據
aggs:固定語法,要對一份數據執行分組聚合操作
gourp_by_tags:就是對每個aggs,都要起一個名字,這個名字是隨機的,你隨便取什么都ok
terms:根據字段的值進行分組
field:根據指定的字段的值進行分組將文本
field的fielddata屬性設置為true (正排索引 用于嵌套聚合查詢, 后面會詳細描述) PUT /ecommerce/_mapping/product{"properties": {"tags": {"type": "text","fielddata": true}} } GET /ecommerce/product/_search {"size": 0,"aggs": {"all_tags": {"terms": { "field": "tags" }}} }{"took": 20,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 4,"max_score": 0,"hits": []},"aggregations": {"group_by_tags": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "fangzhu","doc_count": 2},{"key": "meibai","doc_count": 2},{"key": "qingxin","doc_count": 1}]}} } hits.hits:我們指定了size是0,所以hits.hits就是空的,否則會把執行聚合的那些原始數據給你返回回來
aggregations:聚合結果
gourp_by_tags:我們指定的某個聚合的名稱
buckets:根據我們指定的field劃分出的buckets
key:每個bucket對應的那個值
doc_count:這個bucket分組內,有多少個數據
每種tag對應的bucket中的數據的
默認的排序規則:按照doc_count降序排序
對名稱中包含yagao的商品,計算每個tag下的商品數量 GET /ecommerce/product/_search {"size": 0,"query": {"match": {"name": "yagao"}},"aggs": {"all_tags": {"terms": {"field": "tags"}}} } {"took": 35,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 3,"max_score": 0,"hits": []},"aggregations": {"all_tags": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "fangzhu","doc_count": 2},{"key": "meibai","doc_count": 1},{"key": "qingxin","doc_count": 1}]}} }
top_hits 獲取前幾個doc_ source 返回指定field GET /ecommerce/product/_search {"size": 0,"aggs" : {"group_by_tags" : {"terms" : { "field" : "tags" },"aggs" : {"top_tags": {"top_hits": { "_source": {"include": "name"}, "size": 1}} }}} }
計算每個tag下的商品的平均價格/最小價格/最大價格/總價
count:bucket,terms,自動就會有一個doc_count,就相當于是count
avg:avg aggs,求平均值
max:求一個bucket內,指定field值最大的那個數據
min:求一個bucket內,指定field值最小的那個數據
sum:求一個bucket內,指定field值的總和先分組,再算每組的平均值
GET /ecommerce/product/_search {"size": 0,"aggs" : {"group_by_tags" : {"terms" : { "field" : "tags" },"aggs" : {"avg_price": { "avg": { "field": "price" } },"min_price" : { "min": { "field": "price"} },?"max_price" : { "max": { "field": "price"} },"sum_price" : { "sum": { "field": "price" } }?}}} avg_price:我們自己取的metric aggs的名字
value:我們的metric計算的結果,每個bucket中的數據的price字段求平均值后的結果
{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 3,"max_score": 0,"hits": []},"aggregations": {"group_by_tags": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "fangzhu","doc_count": 2,"max_price": {"value": 30},"min_price": {"value": 25},"avg_price": {"value": 27.5},"sum_price": {"value": 55}},{"key": "meibai","doc_count": 1,"max_price": {"value": 30},"min_price": {"value": 30},"avg_price": {"value": 30},"sum_price": {"value": 30}},{"key": "qingxin","doc_count": 1,"max_price": {"value": 40},"min_price": {"value": 40},"avg_price": {"value": 40},"sum_price": {"value": 40}}]}} }
計算每個tag下的商品的平均價格,并且按照平均價格降序排序
GET /ecommerce/product/_search {"size": 0,"aggs" : {"all_tags" : {"terms" : { "field" : "tags", "collect_mode" : "breadth_first",?"order": { "avg_price": "desc" } },"aggs" : {"avg_price" : {"avg" : { "field" : "price" }}}}} } {"took": 2,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 3,"max_score": 0,"hits": []},"aggregations": {"all_tags": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "qingxin","doc_count": 1,"avg_price": {"value": 40}},{"key": "meibai","doc_count": 1,"avg_price": {"value": 30}},{"key": "fangzhu","doc_count": 2,"avg_price": {"value": 27.5}}]}} }
" ranges ": [{},{}] 按照指定的價格范圍區間進行分組,然后在每組內再按照tag進行分組,最后再計算每組的平均價格
GET /ecommerce/product/_search {"size": 0,"aggs": {"group_by_price": {"range": {"field": "price","ranges": [{"from": 0,"to": 20},{"from": 20,"to": 40},{"from": 40,"to": 50}]},"aggs": {"group_by_tags": {"terms": {"field": "tags"},"aggs": {"average_price": {"avg": {"field": "price"}}}}}}} }
histogram
類似于terms,也是進行bucket分組操作,接收一個field,按照這個field的值的各個范圍區間,進行bucket分組操作
date histogram
按照我們指定的某個date類型的日期field,以及日期interval,按照一定的日期間隔,去劃分bucket
出來兩個結果,一個結果,是基于query搜索結果來聚合的; 一個結果,是對所有數據執行聚合的
all.all_brand_avg_price:拿到所有品牌的平均價格
city name
北京 小李
北京 小王
上海 小張
上海 小麗
上海 小陳
基于city劃分buckets
劃分出來兩個bucket,一個是北京bucket,一個是上海bucket
北京bucket:包含了2個人,小李,小王
上海bucket:包含了3個人,小張,小麗,小陳
按照某個字段進行bucket劃分,那個字段的值相同的那些數據,就會被劃分到一個bucket中
有一些mysql的sql知識的話,聚合,首先第一步就是分組,對每個組內的數據進行聚合分析,分組,就是我們的bucket
metric:對一個數據分組執行的統計
當我們有了一堆bucket之后,就可以對每個bucket中的數據進行聚合分詞了,比如說計算一個bucket內所有數據的數量,或者計算一個bucket內所有數據的平均值,最大值,最小值
bucket:group by user_id --> 那些user_id相同的數據,就會被劃分到一個bucket中
metric,就是對一個bucket執行的某種聚合分析的操作,比如說求平均值,求最大值,求最小值
計算一個數量計算每個tag下的商品數量 GET /ecommerce/product/_search {"size" : 0,??"aggs": {"group_by_tags": {"terms": { "field": "tags" }}} } size:只獲取聚合結果,而不要執行聚合的原始數據
aggs:固定語法,要對一份數據執行分組聚合操作
gourp_by_tags:就是對每個aggs,都要起一個名字,這個名字是隨機的,你隨便取什么都ok
terms:根據字段的值進行分組
field:根據指定的字段的值進行分組將文本
field的fielddata屬性設置為true (正排索引 用于嵌套聚合查詢, 后面會詳細描述) PUT /ecommerce/_mapping/product{"properties": {"tags": {"type": "text","fielddata": true}} } GET /ecommerce/product/_search {"size": 0,"aggs": {"all_tags": {"terms": { "field": "tags" }}} }{"took": 20,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 4,"max_score": 0,"hits": []},"aggregations": {"group_by_tags": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "fangzhu","doc_count": 2},{"key": "meibai","doc_count": 2},{"key": "qingxin","doc_count": 1}]}} } hits.hits:我們指定了size是0,所以hits.hits就是空的,否則會把執行聚合的那些原始數據給你返回回來
aggregations:聚合結果
gourp_by_tags:我們指定的某個聚合的名稱
buckets:根據我們指定的field劃分出的buckets
key:每個bucket對應的那個值
doc_count:這個bucket分組內,有多少個數據
每種tag對應的bucket中的數據的
默認的排序規則:按照doc_count降序排序
對名稱中包含yagao的商品,計算每個tag下的商品數量 GET /ecommerce/product/_search {"size": 0,"query": {"match": {"name": "yagao"}},"aggs": {"all_tags": {"terms": {"field": "tags"}}} } {"took": 35,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 3,"max_score": 0,"hits": []},"aggregations": {"all_tags": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "fangzhu","doc_count": 2},{"key": "meibai","doc_count": 1},{"key": "qingxin","doc_count": 1}]}} }
top_hits 獲取前幾個doc_ source 返回指定field GET /ecommerce/product/_search {"size": 0,"aggs" : {"group_by_tags" : {"terms" : { "field" : "tags" },"aggs" : {"top_tags": {"top_hits": { "_source": {"include": "name"}, "size": 1}} }}} }
計算每個tag下的商品的平均價格/最小價格/最大價格/總價
count:bucket,terms,自動就會有一個doc_count,就相當于是count
avg:avg aggs,求平均值
max:求一個bucket內,指定field值最大的那個數據
min:求一個bucket內,指定field值最小的那個數據
sum:求一個bucket內,指定field值的總和先分組,再算每組的平均值
GET /ecommerce/product/_search {"size": 0,"aggs" : {"group_by_tags" : {"terms" : { "field" : "tags" },"aggs" : {"avg_price": { "avg": { "field": "price" } },"min_price" : { "min": { "field": "price"} },?"max_price" : { "max": { "field": "price"} },"sum_price" : { "sum": { "field": "price" } }?}}} avg_price:我們自己取的metric aggs的名字
value:我們的metric計算的結果,每個bucket中的數據的price字段求平均值后的結果
{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 3,"max_score": 0,"hits": []},"aggregations": {"group_by_tags": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "fangzhu","doc_count": 2,"max_price": {"value": 30},"min_price": {"value": 25},"avg_price": {"value": 27.5},"sum_price": {"value": 55}},{"key": "meibai","doc_count": 1,"max_price": {"value": 30},"min_price": {"value": 30},"avg_price": {"value": 30},"sum_price": {"value": 30}},{"key": "qingxin","doc_count": 1,"max_price": {"value": 40},"min_price": {"value": 40},"avg_price": {"value": 40},"sum_price": {"value": 40}}]}} }
collect_mode
對于子聚合的計算,有兩種方式:
- depth_first 直接進行子聚合的計算
- breadth_first 先計算出當前聚合的結果,針對這個結果在對子聚合進行計算。
計算每個tag下的商品的平均價格,并且按照平均價格降序排序
GET /ecommerce/product/_search {"size": 0,"aggs" : {"all_tags" : {"terms" : { "field" : "tags", "collect_mode" : "breadth_first",?"order": { "avg_price": "desc" } },"aggs" : {"avg_price" : {"avg" : { "field" : "price" }}}}} } {"took": 2,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 3,"max_score": 0,"hits": []},"aggregations": {"all_tags": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "qingxin","doc_count": 1,"avg_price": {"value": 40}},{"key": "meibai","doc_count": 1,"avg_price": {"value": 30}},{"key": "fangzhu","doc_count": 2,"avg_price": {"value": 27.5}}]}} }
" ranges ": [{},{}] 按照指定的價格范圍區間進行分組,然后在每組內再按照tag進行分組,最后再計算每組的平均價格
GET /ecommerce/product/_search {"size": 0,"aggs": {"group_by_price": {"range": {"field": "price","ranges": [{"from": 0,"to": 20},{"from": 20,"to": 40},{"from": 40,"to": 50}]},"aggs": {"group_by_tags": {"terms": {"field": "tags"},"aggs": {"average_price": {"avg": {"field": "price"}}}}}}} }
histogram
類似于terms,也是進行bucket分組操作,接收一個field,按照這個field的值的各個范圍區間,進行bucket分組操作
interval:10,劃分范圍,0~10,10~20,20~30
GET /ecommerce/product/_search {"size" : 0,"aggs":{"price":{"histogram":{ "field": "price","interval": 10},"aggs":{"revenue": {"sum": { "field" : "price"}}}}} }{"took": 1,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 3,"max_score": 0,"hits": []},"aggregations": {"price": {"buckets": [{"key": 20,"doc_count": 1,"revenue": {"value": 25}},{"key": 30,"doc_count": 1,"revenue": {"value": 30}},{"key": 40,"doc_count": 1,"revenue": {"value": 40}}]}} }date histogram
按照我們指定的某個date類型的日期field,以及日期interval,按照一定的日期間隔,去劃分bucket
date interval = 1m,
2017-01-01~2017-01-31,就是一個bucket
2017-02-01~2017-02-28,就是一個bucket
然后會去掃描每個數據的date field,判斷date落在哪個bucket中,就將其放入那個bucket
min_doc_count:即使某個日期interval,2017-01-01~2017-01-31中,一條數據都沒有,那么這個區間也是要返回的,不然默認是會過濾掉這個區間的
extended_bounds,min,max:劃分bucket的時候,會限定在這個起始日期,和截止日期內
出來兩個結果,一個結果,是基于query搜索結果來聚合的; 一個結果,是對所有數據執行聚合的
global
就是global bucket,就是將所有數據納入聚合的scope,而不管之前的query
GET /tvs/sales/_search {"size": 0, "query": {"term": {"brand": {"value": "長虹"}}},"aggs": {"single_brand_avg_price": {"avg": {"field": "price"}},"all": {"global": {},"aggs": {"all_brand_avg_price": {"avg": {"field": "price"}}}}} } {"took": 4,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 3,"max_score": 0,"hits": []},"aggregations": {"all": {"doc_count": 8,"all_brand_avg_price": {"value": 2650}},"single_brand_avg_price": {"value": 1666.6666666666667}} } single_brand_avg_price:就是針對query搜索結果,執行的,拿到的,就是長虹品牌的平均價格all.all_brand_avg_price:拿到所有品牌的平均價格
總結
以上是生活随笔為你收集整理的Elasticsearch 之(6)kibana嵌套聚合,下钻分析,聚合分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: linux环境下ps命令行,Linux系
- 下一篇: 求助帖|用conda安装软件时报错Con