日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

白话Elasticsearch23-深度探秘搜索技术之通过ngram分词机制实现index-time搜索推荐

發(fā)布時間:2025/3/21 编程问答 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 白话Elasticsearch23-深度探秘搜索技术之通过ngram分词机制实现index-time搜索推荐 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

  • 概述
  • 官網(wǎng)
  • 什么是ngram
  • 什么是edge ngram
  • ngram和index-time搜索推薦原理
  • 例子

概述

繼續(xù)跟中華石杉老師學(xué)習(xí)ES,第23篇

課程地址: https://www.roncoo.com/view/55


官網(wǎng)

NGram Tokenizer:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

NGram Token Filter:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html


Edge NGram Tokenizer:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html

Edge NGram Token Filter:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenfilter.html


什么是ngram

什么是ngram

假設(shè)有個單詞quick,5種長度下的ngram

ngram length=1,會被拆成 q u i c k ngram length=2,會被拆成 qu ui ic ck ngram length=3,會被拆成 qui uic ick ngram length=4,會被拆成 quic uick ngram length=5,會被拆成 quick

其中任意一個被拆分的部分 就被稱為ngram 。


什么是edge ngram

quick,anchor首字母后進(jìn)行ngram

q qu qui quic quick

上述拆分方式就被稱為edge ngram


使用edge ngram將每個單詞都進(jìn)行進(jìn)一步的分詞切分,用切分后的ngram來實現(xiàn)前綴搜索推薦功能

舉個例子 兩個doc
doc1 hello world
doc2 hello we

使用edge ngram拆分

h
he
hel
hell
hello -------> 可以匹配 doc1,doc2

w -------> 可以匹配 doc1,doc2
wo
wor
worl
world
e ---------> 可以匹配 doc2

使用hello w去搜索

hello --> hello,doc1
w --> w,doc1

doc1中hello和w,而且position也匹配,所以,ok,doc1返回,hello world


ngram和index-time搜索推薦原理

搜索的時候,不用再根據(jù)一個前綴,然后掃描整個倒排索引了,而是簡單的拿前綴去倒排索引中匹配即可,如果匹配上了,那么就好了,就和match query全文檢索一樣


例子

PUT /my_index {"settings": {"analysis": {"filter": {"autocomplete_filter": { "type": "edge_ngram","min_gram": 1,"max_gram": 20}},"analyzer": {"autocomplete": {"type": "custom","tokenizer": "standard","filter": ["lowercase","autocomplete_filter" ]}}}} }

helloworld
設(shè)置

min ngram = 1 max ngram = 3

使用edge_ngram ,則會被拆分為一下 ,

h he hel


知識點(diǎn): autocomplete

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html


GET /my_index/_analyze {"analyzer": "autocomplete","text": "helll world" }

設(shè)置mapping , 查詢的時候還是使用standard

PUT /my_index/_mapping/my_type {"properties": {"title": {"type": "text","analyzer": "autocomplete","search_analyzer": "standard"}} }

造數(shù)據(jù)

PUT /my_index/my_type/1 {"content":"hello Jack" }PUT /my_index/my_type/2 {"content":"hello John" }PUT /my_index/my_type/3 {"content":"hello Jose" }

查詢

GET /my_index/my_type/_search {"query": {"match": {"content": "hello J"}} }

返回:

{"took": 7,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 3,"max_score": 0.2876821,"hits": [{"_index": "my_index","_type": "my_type","_id": "2","_score": 0.2876821,"_source": {"content": "hello John"}},{"_index": "my_index","_type": "my_type","_id": "1","_score": 0.2876821,"_source": {"content": "hello Jack"}},{"_index": "my_index","_type": "my_type","_id": "3","_score": 0.2876821,"_source": {"content": "hello Jose"}}]} }
  • 如果用match,只有hello的也會出來,全文檢索,只是分?jǐn)?shù)比較低
  • 推薦使用match_phrase,要求每個term都有,而且position剛好靠著1位,符合我們的期望的

總結(jié)

以上是生活随笔為你收集整理的白话Elasticsearch23-深度探秘搜索技术之通过ngram分词机制实现index-time搜索推荐的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。