日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) >

elasticsearch高级查询进阶

發(fā)布時(shí)間:2024/8/23 31 豆豆
生活随笔 收集整理的這篇文章主要介紹了 elasticsearch高级查询进阶 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

文章目錄

  • 前期準(zhǔn)備
  • 應(yīng)用場(chǎng)景
    • 1.constant_score查詢(xún)-不考慮文檔頻率得分,與搜索關(guān)鍵字命中更多的返回結(jié)果
    • 2.sort排序-分?jǐn)?shù)相同情況下,按照指定價(jià)格域排序
    • 3.不考慮文檔頻率TF/IDF情況下,不同域打分權(quán)重不同進(jìn)行召回
    • 4.不考慮文檔頻率TF/IDF情況下,不同域打分權(quán)重不同,再加上制定field的分?jǐn)?shù),最后最終得分返回,eg:title\^3\+content^1+time
    • 5.不考慮TFIDF得分,同一區(qū)域下,不同品牌權(quán)重不同
    • 6.如何基于地理位置查詢(xún),并且類(lèi)似于自如租房查找周邊價(jià)格便宜并且距離近的搜索,但是距離不會(huì)完全限定死?
    • 7.有些場(chǎng)景需要根據(jù)配置參數(shù)值進(jìn)行排序,例如在所有手機(jī)中xiaomi手機(jī)得分最高?
    • 8.bm25相似度調(diào)優(yōu),禁用歸一化
    • 9.query_string使用:
    • 10.黃桃、罐頭badcase-命中黃桃和罐頭商品排在前面,沒(méi)有完全命中排在后面解決方案
  • 監(jiān)控
    • _stats索引監(jiān)控

前期準(zhǔn)備

索引mappings:

{"shop_titled_index": {"mappings": {"properties": {"brand": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"price": {"type": "long"},"region": {"type": "long"},"shopId": {"type": "long"},"skuId": {"type": "long"},"title": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}}} }

準(zhǔn)備數(shù)據(jù):

{"_index": "shop_titled_index","_type": "_doc","_id": "dJAM3HYByj_ONITHr0gq","_score": 1,"_source": {"brand": "iphone","price": 8000,"title": "iphone 12 64G red 5G","skuId": 2020122201,"shopId": 2,"region": 1001}} {"_index": "shop_titled_index","_type": "_doc","_id": "9ZA6inYByj_ONITHT0bH","_score": 1,"_source": {"brand": "iphone","price": 8000,"title": "iphone 12 64G red 5G","skuId": 2020122201,"shopId": 1,"region": 1001}}

應(yīng)用場(chǎng)景

1.constant_score查詢(xún)-不考慮文檔頻率得分,與搜索關(guān)鍵字命中更多的返回結(jié)果

{"query": {"bool": {"should": [{"constant_score": {"filter": {"match": {"title": "iphone"}},"boost": 1}},{"constant_score": {"filter": {"match": {"title": "12"}}}}]}}

2.sort排序-分?jǐn)?shù)相同情況下,按照指定價(jià)格域排序

{"query": {"bool": {"should": [{"constant_score": {"filter": {"match": {"title": "iphone"}},"boost": 1}},{"constant_score": {"filter": {"match": {"title": "12"}}}}]}},"sort": [{"_score": {"order": "desc"}},{"price": {"order": "asc"}}] }

3.不考慮文檔頻率TF/IDF情況下,不同域打分權(quán)重不同進(jìn)行召回

{"query": {"bool": {"should": [{"constant_score": {"filter": {"match": {"title": "red"}},"boost": 1}},{"constant_score": {"filter": {"match": {"brand": "iphone"}},"boost":3}}]}},"sort":[{"_score":{"order":"desc"},"price":{"order":"asc"}}] }

4.不考慮文檔頻率TF/IDF情況下,不同域打分權(quán)重不同,再加上制定field的分?jǐn)?shù),最后最終得分返回,eg:title^3+content^1+time

{"query": {"function_score": {"query": {"bool": {"should": [{"constant_score": {"filter": {"match": {"title": "red"}},"boost": 1}},{"constant_score": {"filter": {"match": {"brand": "iphone"}},"boost": 3}}]}},"field_value_factor": {"field": "shopId"},"boost_mode": "sum"}} }

5.不考慮TFIDF得分,同一區(qū)域下,不同品牌權(quán)重不同

文檔:https://www.elastic.co/guide/cn/elasticsearch/guide/current/function-score-filters.html

{"query": {"function_score": {"query": {"term": {"region":1002}},"boost": "1","functions": [{"filter": {"term": {"brand.keyword": "huawei"}},"weight": 3},{"filter":{"match":{"brand":"xiaomi"}},"weight":1}],"score_mode": "sum","boost_mode": "sum"}} }

使用注意,以下查詢(xún)會(huì)由于function_score沒(méi)有主query,則會(huì)返回所有文檔

{"query": {"function_score": {"functions": [{"filter": {"term": {"brand.keyword": "huawei"}},"weight": 3},{"filter":{"match":{"brand":"xiaomi"}},"weight":1}],"score_mode": "sum","boost_mode": "sum"}} }

6.如何基于地理位置查詢(xún),并且類(lèi)似于自如租房查找周邊價(jià)格便宜并且距離近的搜索,但是距離不會(huì)完全限定死?

參考文檔:https://www.cnblogs.com/xiaoxiaoliu/p/11054405.html

  • 新建索引
  • 創(chuàng)建mappings
  • post geo_index/_mappings {"properties": {"location": {"type": "geo_point"},"price": {"type": "double"},"name": {"type": "text"}} }

    3.準(zhǔn)備數(shù)據(jù)

    {"location":{"lon":"116.488781","lat":"39.950565"},"price":"4000","name":"朝陽(yáng)公園 兩室一廳 12m" } {"location":{"lon":"116.327805","lat":"39.900988"},"price":"2400","name":"北京西站 三室一廳 9m" } {"location": {"lon": "116.403981","lat": "39.916485"},"price": "88888","name": "故宮 無(wú)價(jià)之寶" } {"location": {"lon": "116.341316","lat": "39.948795"},"price": "3700","name": "北京動(dòng)物園 三室一廳 19m" }

    4.geo_distance:找出附近兩公里以?xún)?nèi)數(shù)據(jù)

    GET geo_index/_search {"query": {"constant_score": {"filter": {"geo_distance": {"distance": "2km","location": {"lat": 39.93869837,"lon": 116.48357391}}},"boost": 1.2}} }

    輸出

    {"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 1.2,"hits": [{"_index": "geo_index","_type": "_doc","_id": "1JC14HYByj_ONITHikiw","_score": 1.2,"_source": {"location": {"lon": "116.488781","lat": "39.950565"},"price": "4000","name": "朝陽(yáng)公園 兩室一廳 12m"}}]} }

    5.找出數(shù)據(jù),并按照距離排序

    文檔:https://www.elastic.co/guide/cn/elasticsearch/guide/current/sorting-by-distance.html

    {"query": {"constant_score": {"filter": {"geo_distance": {"distance": "10km","location": {"lat": 39.93869837,"lon": 116.48357391}}},"boost": 1.2}},"sort": {"_geo_distance": {"location": [{"lat": 39.93869837,"lon": 116.48357391}],"unit": "km","distance_type": "arc","order": "asc"}} }

    6.根據(jù)附近租房和價(jià)格查找數(shù)據(jù)

    我更偏向距離更近,因此將權(quán)重調(diào)高
    參考:https://www.elastic.co/guide/cn/elasticsearch/guide/current/decay-functions.html#CO119-4

    {"query": {"function_score": {"query": {"range":{"price":{"gte":2000,"lte":5000}}},"functions": [{"gauss": {"location": {"origin": {"lon": "116.47464752","lat": "39.94606859"},"offset": "100m","scale": "1000m"}},"weight":2.0},{"gauss": {"price": {"origin": 3000,"offset": 100,"scale":500}}}],"score_mode": "sum","boost_mode": "replace"}} }

    結(jié)果:

    {"took": 5,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4,"relation": "eq"},"max_score": 0.7460326,"hits": [{"_index": "geo_index","_type": "_doc","_id": "95A14XYByj_ONITHg0if","_score": 0.7460326,"_source": {"location": {"lon": "116.47155762","lat": "39.9523853"},"price": "3500","name": "亮馬橋 兩室一廳 12m"}},{"_index": "geo_index","_type": "_doc","_id": "1JC14HYByj_ONITHikiw","_score": 0.36586136,"_source": {"location": {"lon": "116.488781","lat": "39.950565"},"price": "4000","name": "朝陽(yáng)公園 兩室一廳 12m"}},{"_index": "geo_index","_type": "_doc","_id": "1ZC34HYByj_ONITHRkht","_score": 5.823735e-39,"_source": {"location": {"lon": "116.341316","lat": "39.948795"},"price": "3700","name": "北京動(dòng)物園 三室一廳 19m"}},{"_index": "geo_index","_type": "_doc","_id": "1pC44HYByj_ONITHAkgJ","_score": 0,"_source": {"location": {"lon": "116.327805","lat": "39.900988"},"price": "2400","name": "北京西站 三室一廳 9m"}}]} }

    7.有些場(chǎng)景需要根據(jù)配置參數(shù)值進(jìn)行排序,例如在所有手機(jī)中xiaomi手機(jī)得分最高?

    function_score結(jié)合scrit_score排序

    {"query": {"function_score": {"query": {"match_all":{}},"functions": [{"script_score": {"script": {"lang": "painless","params": {"brand": "xiaomi"},"source": "if(doc['brand.keyword'].size() == 0)return 0f; String brandStr = doc['brand.keyword'].value ?: new String();if(params.brand.compareTo(brandStr) == 0){return 1f}return 0"}}}],"score_mode":"sum","boost_mode":"replace"}} }

    score_mode定義的是如何將各個(gè)function的分值合并成一個(gè)綜合的分值; boost_mode則定義如何將這個(gè)綜合的分值作用在原始query產(chǎn)生的分值上

    8.bm25相似度調(diào)優(yōu),禁用歸一化

    BM25:bm25提供兩個(gè)調(diào)參因子
    k1:k1 這個(gè)參數(shù)控制著詞頻結(jié)果在詞頻飽和度中的上升速度。默認(rèn)值為 1.2 。值越小飽和度變化越快,值越大飽和度變化越慢。詞頻飽和度可以參看下面官方文檔的截圖,圖中反應(yīng)了詞頻對(duì)應(yīng)的得分曲線,k1 控制 tf of BM25 這條曲線。

    b:這個(gè)參數(shù)控制著字段長(zhǎng)歸一值所起的作用, 0.0 會(huì)禁用歸一化, 1.0 會(huì)啟用完全歸一化。默認(rèn)值為 0.75

  • mapping設(shè)置
  • {"settings": {"index": {"number_of_shards": "1","provided_name": "my_sim_index","similarity": {"cbm25": {"type": "BM25","b": "0"}},"creation_date": "1610181315498","number_of_replicas": "1","uuid": "V8NhMRofQRu-oPFt6hheWA","version": {"created": "7070099"}}},"mappings": {"_doc": {"properties": {"body": {"similarity": "BM25","type": "text"},"title": {"similarity": "cbm25","type": "text"}}}} }
  • 數(shù)據(jù)準(zhǔn)備
  • {"title": "Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similarity setting provides a simple way of choosing a similarity algorithm other than the default BM25, such as TF/IDF.","body": "Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similarity setting provides a simple way of choosing a similarity algorithm other than the default BM25, such as TF/IDF." } {"title": "A simple boolean similarity, which is used when full-text ranking is not needed and the score should only be based on whether the query terms match or not. Boolean similarity gives terms a score equal to their query boost.","body": "A simple boolean similarity, which is used when full-text ranking is not needed and the score should only be based on whether the query terms match or not. Boolean similarity gives terms a score equal to their query boost." } {"title": "or similarity per field. The similarity setting provides a simple way of choosing a similarity","body": "or similarity per field. The similarity setting provides a simple way of choosing a similarity" }
  • 搜索
    title用兩cbm25忽略文檔長(zhǎng)度歸一化,搜索結(jié)果與文檔長(zhǎng)度無(wú)關(guān)
  • GET my_sim_index/_search {"query":{"match":{"title":"similarity"}} }

    輸出:

    {"took": 1,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3,"relation": "eq"},"max_score": 0.20983505,"hits": [{"_index": "my_sim_index","_type": "_doc","_id": "nZBO5nYByj_ONITHhknJ","_score": 0.20983505,"_source": {"title": "Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similarity setting provides a simple way of choosing a similarity algorithm other than the default BM25, such as TF/IDF.","body": "Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similarity setting provides a simple way of choosing a similarity algorithm other than the default BM25, such as TF/IDF."}},{"_index": "my_sim_index","_type": "_doc","_id": "oZBW5nYByj_ONITHkEli","_score": 0.20983505,"_source": {"title": "or similarity per field. The similarity setting provides a simple way of choosing a similarity","body": "or similarity per field. The similarity setting provides a simple way of choosing a similarity"}},{"_index": "my_sim_index","_type": "_doc","_id": "npBP5nYByj_ONITHK0mo","_score": 0.18360566,"_source": {"title": "A simple boolean similarity, which is used when full-text ranking is not needed and the score should only be based on whether the query terms match or not. Boolean similarity gives terms a score equal to their query boost.","body": "A simple boolean similarity, which is used when full-text ranking is not needed and the score should only be based on whether the query terms match or not. Boolean similarity gives terms a score equal to their query boost."}}]} }

    0.20983505得分相同,盡管文檔長(zhǎng)度不一樣

    利用body搜索:

    GET my_sim_index/_search {"query":{"match":{"body":"similarity"}} }

    可以看出最后雖然都命中similary兩次但是會(huì)受到文檔長(zhǎng)度影響

    9.query_string使用:

    {"query":{"query_string":{"query":"(title:red)^1.0 AND (brand:iphone)"}} }

    10.黃桃、罐頭badcase-命中黃桃和罐頭商品排在前面,沒(méi)有完全命中排在后面解決方案

    方案一:利用contant_score
    添加一個(gè)忽略TFIDF得分并且自定義得分的查詢(xún)過(guò)濾器用來(lái)給完全命中的商品排在前面

    "should": [{"constant_score": {"filter": {"query_string": {"query": "allWord:(+(黃桃) AND +(罐頭))"}},"boost": 500}}]

    方案二
    在原function_score查詢(xún)語(yǔ)句下的functions里面添加過(guò)濾器并添加權(quán)重

    "function_score" : {"query" : {"bool" : {"must" : [{"query_string" : {"query" : "(title:(+(黃桃 罐頭))^2.4 OR catBrand:(+(黃桃 罐頭))^0.6 OR facet:(+(黃桃 罐頭))^0.6 OR allWord:(+(黃桃 罐頭))^0.0)","fields" : [ ],"use_dis_max" : true,"tie_breaker" : 0.0,"default_operator" : "or","auto_generate_phrase_queries" : false,"max_determinized_states" : 10000,"enable_position_increments" : true,"fuzziness" : "AUTO","fuzzy_prefix_length" : 0,"fuzzy_max_expansions" : 50,"phrase_slop" : 0,"escape" : false,"split_on_whitespace" : true,"boost" : 1.0}}],"filter" : [{"term" : {"skuDocType" : {"value" : 1,"boost" : 1.0}}},{"bool" : {"must_not" : [{"term" : {"spMask" : {"value" : 1,"boost" : 1.0}}}],"disable_coord" : false,"adjust_pure_negative" : true,"boost" : 1.0}}],"disable_coord" : false,"adjust_pure_negative" : true,"boost" : 1.0}},"functions" : [{"filter": {"query_string": {"query":"allWord:(黃桃 AND 罐頭)"}},"weight":400},{"filter" : {"match_all" : {"boost" : 1.0}},"script_score" : {"script" : {"id" : "osop_score_script","lang" : "painless","params" : {"catSearch" : false,"fakeCat" : "cat16035591","weight" : true,"topSku" : {"pop8013634719" : 300.0,"1130765898" : 300.0},"hotCatIds" : {"cat16035591" : 0.9666818804198996}}}}}],"score_mode" : "sum","boost_mode" : "sum","max_boost" : 3.4028235E38,"boost" : 1.0}

    監(jiān)控

    _stats索引監(jiān)控

    Elasticsearch Index Monitoring(索引監(jiān)控)之Index Stats API詳解
    請(qǐng)求方式:

    GET 索引名/_stats

    參數(shù)解釋:

    1 { 2 "_nodes": {3 "total": 1,4 "successful": 1,5 "failed": 06 },7 "cluster_name": "ELKTEST",8 "nodes": {9 "lnlHC8yERCKXCuAc_2DPCQ": {10 "timestamp": 1534242595995,11 "name": "OPS01-ES01",12 "transport_address": "10.9.125.148:9300",13 "host": "10.9.125.148",14 "ip": "10.9.125.148:9300",15 "roles": [16 "master",17 "data",18 "ingest"19 ],20 "attributes": {21 "ml.machine_memory": "8203104256",22 "xpack.installed": "true",23 "ml.max_open_jobs": "20",24 "ml.enabled": "true"25 },26 "indices": {27 "docs": {28 "count": 8111612, # 顯示節(jié)點(diǎn)上有多少文檔29 "deleted": 16604 # 有多少已刪除的文檔還未從數(shù)據(jù)段中刪除30 },31 "store": {32 "size_in_bytes": 2959876263 # 顯示該節(jié)點(diǎn)消耗了多少物理存儲(chǔ)33 },34 "indexing": { #表示索引文檔的次數(shù),這個(gè)是通過(guò)一個(gè)計(jì)數(shù)器累加計(jì)數(shù)的。當(dāng)文檔被刪除時(shí),它不會(huì)減少。注意這個(gè)值永遠(yuǎn)是遞增的,發(fā)生在內(nèi)部索引數(shù)據(jù)的時(shí)候,包括那些更新操作35 "index_total": 17703152,36 "index_time_in_millis": 2801934,37 "index_current": 0,38 "index_failed": 0,39 "delete_total": 46242,40 "delete_time_in_millis": 2130,41 "delete_current": 0,42 "noop_update_total": 0,43 "is_throttled": false,44 "throttle_time_in_millis": 0 # 這個(gè)值高的時(shí)候,說(shuō)明磁盤(pán)流量設(shè)置太低45 },46 "get": {47 "total": 185179,48 "time_in_millis": 22341,49 "exists_total": 185178,50 "exists_time_in_millis": 22337,51 "missing_total": 1,52 "missing_time_in_millis": 4,53 "current": 054 },55 "search": { 56 "open_contexts": 0, # 主動(dòng)檢索的次數(shù),57 "query_total": 495447, # 查詢(xún)總數(shù)58 "query_time_in_millis": 298344, # 節(jié)點(diǎn)啟動(dòng)到此查詢(xún)消耗總時(shí)間, query_time_in_millis / query_total的比值可以作為你的查詢(xún)效率的粗略指標(biāo)。比值越大,每個(gè)查詢(xún)用的時(shí)間越多,你就需要考慮調(diào)整或者優(yōu)化。59 "query_current": 0,         #后面關(guān)于fetch的統(tǒng)計(jì),是描述了查詢(xún)的第二個(gè)過(guò)程(也就是query_the_fetch里的fetch)。fetch花的時(shí)間比query的越多,表示你的磁盤(pán)很慢,或者你要fetch的的文檔太多。或者你的查詢(xún)參數(shù)分頁(yè)條件太大,(例如size等于1萬(wàn)60 "fetch_total": 130194,61 "fetch_time_in_millis": 51211,62 "fetch_current": 0,63 "scroll_total": 22,64 "scroll_time_in_millis": 2196665,65 "scroll_current": 0,66 "suggest_total": 0,67 "suggest_time_in_millis": 0,68 "suggest_current": 069 },70 "merges": { # 包含lucene段合并的信息,它會(huì)告訴你有多少段合并正在進(jìn)行,參與的文檔數(shù),這些正在合并的段的總大小,以及花在merge上的總時(shí)間。                如果你的集群寫(xiě)入比較多,這個(gè)merge的統(tǒng)計(jì)信息就很重要。merge操作會(huì)消耗大量的磁盤(pán)io和cpu資源。如果你的索引寫(xiě)入很多,你會(huì)看到大量的merge操作71 "current": 0,72 "current_docs": 0,73 "current_size_in_bytes": 0, ..

    總結(jié)

    以上是生活随笔為你收集整理的elasticsearch高级查询进阶的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

    如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。