當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Elasticsearch入门之从零开始安装ik分词器

發(fā)布時(shí)間：2025/3/14 编程问答 12 豆豆

生活随笔收集整理的這篇文章主要介紹了 Elasticsearch入门之从零开始安装ik分词器小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

起因

需要在ES中使用聚合進(jìn)行統(tǒng)計(jì)分析，但是聚合字段值為中文，ES的默認(rèn)分詞器對于中文支持非常不友好：會把完整的中文詞語拆分為一系列獨(dú)立的漢字進(jìn)行聚合，顯然這并不是我的初衷。我們來看個(gè)實(shí)例：

POST http://192.168.80.133:9200/my_index_name/my_type_name/_search {"size": 0,"query" : {"range" : {"time": {"gte": 1513778040000,"lte": 1513848720000}}},"aggs": {"keywords": {"terms": {"field": "keywords"},"aggs": {"emotions": {"terms": {"field": "emotion"}}}} } }

輸出結(jié)果：

{"took": 22,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 32,"max_score": 0.0,"hits": []},"aggregations": {"keywords": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "力", # 完整的詞被拆分為獨(dú)立的漢字"doc_count": 2,"emotions": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": -1,"doc_count": 1},{"key": 0,"doc_count": 1}]}},{"key": "動","doc_count": 2,"emotions": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": -1,"doc_count": 1},{"key": 0,"doc_count": 1}]}}]}} }

既然ES的默認(rèn)分詞器對于中文支持非常不友好，那么有沒有可以支持中文的分詞器呢？如果有，該如何使用呢？
第一個(gè)問題，萬能的谷歌告訴了我結(jié)果，已經(jīng)有了支持中文的分詞器，而且是開源實(shí)現(xiàn)：IK Analysis for Elasticsearch，詳見：https://github.com/medcl/elasticsearch-analysis-ik。
秉著“拿來主義”不重復(fù)造輪子的指導(dǎo)思想，直接先拿過來使用一下，看看效果怎么樣。那么，如何使用IK分詞器呢？其實(shí)這是一個(gè)ES插件，直接安裝并對ES進(jìn)行相應(yīng)的配置即可。

安裝IK分詞器

我的ES版本為2.4.1，需要下載的IK版本為：1.10.1（注意：必須下載與ES版本對應(yīng)的IK，否則不能使用）。

1.下載，編譯IK

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v1.10.1/elasticsearch-analysis-ik-1.10.1.zip unzip elasticsearch-analysis-ik-1.10.1.zip cd elasticsearch-analysis-ik-1.10.1 mvn clean package

在elasticsearch-analysis-ik-1.10.1\target\releases目錄下生成打包文件：elasticsearch-analysis-ik-1.10.1.zip。

2.在ES中安裝IK插件

將上述打包好的IK插件：elasticsearch-analysis-ik-1.10.1.zip拷貝到ES/plugins目錄下，執(zhí)行解壓。

unzip elasticsearch-analysis-ik-1.10.1.zip rm -rf elasticsearch-analysis-ik-1.10.1.zip # 解壓完之后一定要?jiǎng)h除這個(gè)zip包，否則在啟動ES時(shí)報(bào)錯(cuò)

重啟ES。

使用IK分詞器

安裝IK分詞器完畢之后，就可以在ES使用了。

第一步：新建index

PUT http://192.168.80.133:9200/my_index_name

第二步：給將來要使用的doc字段添加mapping
在這里我在ES中存儲的doc格式如下：

{"nagtive_kw": []"is_all": false,"emotion": 0,"focuce": false,"keywords": ["動力","外觀","油耗"], // 在keywords字段上進(jìn)行聚合分析"source": "汽車之家","time": -1,"machine_emotion": 0,"title": "no title","spider": "qczj_index","content": {},"url": "http://xxx","brand": "寶馬","series": "寶馬1系","model": "2017款" }

需要在keywords字段上進(jìn)行聚合分析，所以給keywords字段添加mapping設(shè)置：

POST http://192.168.80.133:9200/my_index_name/my_type_name/_mapping {"properties": {"keywords": { # 設(shè)置keywords字段使用ik分詞器"type": "string","store": "no","analyzer": "ik_smart","search_analyzer": "ik_smart","boost": 8}} }

注意： 在設(shè)置mapping時(shí)有一個(gè)小插曲，我根據(jù)IK的官網(wǎng)設(shè)置“keywords”的type為“text”時(shí)報(bào)錯(cuò)：

POST http://192.168.80.133:9200/my_index_name/my_type_name/_mapping {"properties": {"keywords": {"type": "text", # text類型在2.4.1版本中不支持"store": "no","analyzer": "ik_smart","search_analyzer": "ik_smart","boost": 8}} }

報(bào)錯(cuò)：

{"error": {"root_cause": [{"type": "mapper_parsing_exception","reason": "No handler for type [text] declared on field [keywords]"}],"type": "mapper_parsing_exception","reason": "No handler for type [text] declared on field [keywords]"},"status": 400 }

這是因?yàn)槲沂褂玫腅S版本比較低：2.4.1，而text類型是ES5.0之后才添加的類型，所以不支持。在ES2.4.1版本中需要使用string類型。

第三步：添加doc對象

POST http://192.168.80.133:9200/my_index_name/my_type_name/ {"nagtive_kw": ["動力","外觀","油耗"]"is_all": false,"emotion": 0,"focuce": false,"keywords": ["動力","外觀","油耗"], // 在keywords字段上進(jìn)行聚合分析"source": "汽車之家","time": -1,"machine_emotion": 0,"title": "從動次打次吃大餐","spider": "qczj_index","content": {},"url": "http://xxx","brand": "寶馬","series": "寶馬1系","model": "2017款" }

第四步：聚合分析

輸出結(jié)果：

{"took": 22,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 32,"max_score": 0.0,"hits": []},"aggregations": {"keywords": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "動力", # 完整的詞沒有被拆分為獨(dú)立的漢字"doc_count": 2,"emotions": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": -1,"doc_count": 1},{"key": 0,"doc_count": 1}]}}]}} }

【參考】
http://www.cnblogs.com/xing901022/p/5910139.html 如何在Elasticsearch中安裝中文分詞器(IK+pinyin)
https://elasticsearch.cn/question/47 關(guān)于聚合（aggs）的問題
https://github.com/medcl/elasticsearch-analysis-ik/issues/276 create map時(shí)出現(xiàn)No handler for type [text] declared on field [content] #276
http://blog.csdn.net/guo_jia_liang/article/details/52980716 Elasticsearch2.4學(xué)習(xí)（三）------Elasticsearch2.4插件安裝詳解

轉(zhuǎn)載于:https://www.cnblogs.com/nuccch/p/8207261.html

總結(jié)

以上是生活随笔為你收集整理的Elasticsearch入门之从零开始安装ik分词器的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Dos下的小技巧
下一篇： BZOJ1951 [Sdoi2010]古