日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Elasticsearch入门之从零开始安装ik分词器

發(fā)布時(shí)間:2025/3/14 编程问答 12 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Elasticsearch入门之从零开始安装ik分词器 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

起因

需要在ES中使用聚合進(jìn)行統(tǒng)計(jì)分析,但是聚合字段值為中文,ES的默認(rèn)分詞器對于中文支持非常不友好:會把完整的中文詞語拆分為一系列獨(dú)立的漢字進(jìn)行聚合,顯然這并不是我的初衷。我們來看個(gè)實(shí)例:

POST http://192.168.80.133:9200/my_index_name/my_type_name/_search {"size": 0,"query" : {"range" : {"time": {"gte": 1513778040000,"lte": 1513848720000}}},"aggs": {"keywords": {"terms": {"field": "keywords"},"aggs": {"emotions": {"terms": {"field": "emotion"}}}} } }

輸出結(jié)果:

{"took": 22,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 32,"max_score": 0.0,"hits": []},"aggregations": {"keywords": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "力", # 完整的詞被拆分為獨(dú)立的漢字"doc_count": 2,"emotions": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": -1,"doc_count": 1},{"key": 0,"doc_count": 1}]}},{"key": "動","doc_count": 2,"emotions": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": -1,"doc_count": 1},{"key": 0,"doc_count": 1}]}}]}} }

既然ES的默認(rèn)分詞器對于中文支持非常不友好,那么有沒有可以支持中文的分詞器呢?如果有,該如何使用呢?
第一個(gè)問題,萬能的谷歌告訴了我結(jié)果,已經(jīng)有了支持中文的分詞器,而且是開源實(shí)現(xiàn):IK Analysis for Elasticsearch,詳見:https://github.com/medcl/elasticsearch-analysis-ik。
秉著“拿來主義”不重復(fù)造輪子的指導(dǎo)思想,直接先拿過來使用一下,看看效果怎么樣。那么,如何使用IK分詞器呢?其實(shí)這是一個(gè)ES插件,直接安裝并對ES進(jìn)行相應(yīng)的配置即可。

安裝IK分詞器

我的ES版本為2.4.1,需要下載的IK版本為:1.10.1(注意:必須下載與ES版本對應(yīng)的IK,否則不能使用)。

1.下載,編譯IK

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v1.10.1/elasticsearch-analysis-ik-1.10.1.zip unzip elasticsearch-analysis-ik-1.10.1.zip cd elasticsearch-analysis-ik-1.10.1 mvn clean package

在elasticsearch-analysis-ik-1.10.1\target\releases目錄下生成打包文件:elasticsearch-analysis-ik-1.10.1.zip。

2.在ES中安裝IK插件

將上述打包好的IK插件:elasticsearch-analysis-ik-1.10.1.zip拷貝到ES/plugins目錄下,執(zhí)行解壓。

unzip elasticsearch-analysis-ik-1.10.1.zip rm -rf elasticsearch-analysis-ik-1.10.1.zip # 解壓完之后一定要?jiǎng)h除這個(gè)zip包,否則在啟動ES時(shí)報(bào)錯(cuò)

重啟ES。

使用IK分詞器

安裝IK分詞器完畢之后,就可以在ES使用了。

第一步:新建index

PUT http://192.168.80.133:9200/my_index_name

第二步:給將來要使用的doc字段添加mapping
在這里我在ES中存儲的doc格式如下:

{"nagtive_kw": []"is_all": false,"emotion": 0,"focuce": false,"keywords": ["動力","外觀","油耗"], // 在keywords字段上進(jìn)行聚合分析"source": "汽車之家","time": -1,"machine_emotion": 0,"title": "no title","spider": "qczj_index","content": {},"url": "http://xxx","brand": "寶馬","series": "寶馬1系","model": "2017款" }

需要在keywords字段上進(jìn)行聚合分析,所以給keywords字段添加mapping設(shè)置:

POST http://192.168.80.133:9200/my_index_name/my_type_name/_mapping {"properties": {"keywords": { # 設(shè)置keywords字段使用ik分詞器"type": "string","store": "no","analyzer": "ik_smart","search_analyzer": "ik_smart","boost": 8}} }

注意: 在設(shè)置mapping時(shí)有一個(gè)小插曲,我根據(jù)IK的官網(wǎng)設(shè)置“keywords”的type為“text”時(shí)報(bào)錯(cuò):

POST http://192.168.80.133:9200/my_index_name/my_type_name/_mapping {"properties": {"keywords": {"type": "text", # text類型在2.4.1版本中不支持"store": "no","analyzer": "ik_smart","search_analyzer": "ik_smart","boost": 8}} }

報(bào)錯(cuò):

{"error": {"root_cause": [{"type": "mapper_parsing_exception","reason": "No handler for type [text] declared on field [keywords]"}],"type": "mapper_parsing_exception","reason": "No handler for type [text] declared on field [keywords]"},"status": 400 }

這是因?yàn)槲沂褂玫腅S版本比較低:2.4.1,而text類型是ES5.0之后才添加的類型,所以不支持。在ES2.4.1版本中需要使用string類型。

第三步:添加doc對象

POST http://192.168.80.133:9200/my_index_name/my_type_name/ {"nagtive_kw": ["動力","外觀","油耗"]"is_all": false,"emotion": 0,"focuce": false,"keywords": ["動力","外觀","油耗"], // 在keywords字段上進(jìn)行聚合分析"source": "汽車之家","time": -1,"machine_emotion": 0,"title": "從動次打次吃大餐","spider": "qczj_index","content": {},"url": "http://xxx","brand": "寶馬","series": "寶馬1系","model": "2017款" }

第四步:聚合分析

POST http://192.168.80.133:9200/my_index_name/my_type_name/_search {"size": 0,"query" : {"range" : {"time": {"gte": 1513778040000,"lte": 1513848720000}}},"aggs": {"keywords": {"terms": {"field": "keywords"},"aggs": {"emotions": {"terms": {"field": "emotion"}}}} } }

輸出結(jié)果:

{"took": 22,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 32,"max_score": 0.0,"hits": []},"aggregations": {"keywords": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "動力", # 完整的詞沒有被拆分為獨(dú)立的漢字"doc_count": 2,"emotions": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": -1,"doc_count": 1},{"key": 0,"doc_count": 1}]}}]}} }

【參考】
http://www.cnblogs.com/xing901022/p/5910139.html 如何在Elasticsearch中安裝中文分詞器(IK+pinyin)
https://elasticsearch.cn/question/47 關(guān)于聚合(aggs)的問題
https://github.com/medcl/elasticsearch-analysis-ik/issues/276 create map時(shí)出現(xiàn)No handler for type [text] declared on field [content] #276
http://blog.csdn.net/guo_jia_liang/article/details/52980716 Elasticsearch2.4學(xué)習(xí)(三)------Elasticsearch2.4插件安裝詳解

轉(zhuǎn)載于:https://www.cnblogs.com/nuccch/p/8207261.html

總結(jié)

以上是生活随笔為你收集整理的Elasticsearch入门之从零开始安装ik分词器的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。