當前位置：首頁 > 编程语言 > python >内容正文

python

python连接es_Elasticsearch --- 3. ik中文分词器, python操作es

發布時間：2024/9/19 python 37 豆豆

生活随笔收集整理的這篇文章主要介紹了 python连接es_Elasticsearch --- 3. ik中文分词器, python操作es 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一.IK中文分詞器

1.下載安裝

2.測試

#顯示結果

{"tokens": [

{"token" : "上海","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},

{"token" : "自來水","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 1},

{"token" : "自來","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 2},

{"token" : "水","start_offset" : 4,"end_offset" : 5,"type" : "CN_CHAR","position" : 3},

{"token" : "來自","start_offset" : 5,"end_offset" : 7,"type" : "CN_WORD","position" : 4},

{"token" : "海上","start_offset" : 7,"end_offset" : 9,"type" : "CN_WORD","position" : 5}

]

}

二. Ik 分詞器的基本操作

1.ik_max_word(最細粒度的拆分)

#建立索引

PUT ik1

{"mappings": {"doc": {"dynamic": false,"properties": {"content": {"type": "text","analyzer": "ik_max_word"}

}

#添加數據

PUT ik1/doc/1{"content":"今天是個好日子"}

PUT ik1/doc/2{"content":"心想的事兒都能成"}

PUT ik1/doc/3{"content":"我今天不活了"}

開始查詢

GET ik1/_search

{"query": {"match": {"content": "心想"}

}

顯示結果

{"took" : 1,"timed_out" : false,"_shards": {"total" : 5,"successful" : 5,"skipped" : 0,"failed" : 0},"hits": {"total" : 1,"max_score" : 0.2876821,"hits": [

{"_index" : "ik1","_type" : "doc","_id" : "2","_score" : 0.2876821,"_source": {"content" : "心想的事兒都能成"}

}

]

}

2.ik_smart(最粗粒度的拆分)

①以最粗粒度拆分

GET _analyze

{"analyzer": "ik_smart","text": "今天是個好日子"}

結果是:

{"tokens": [

{"token" : "今天是","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 0},

{"token" : "個","start_offset" : 3,"end_offset" : 4,"type" : "CN_CHAR","position" : 1},

{"token" : "好日子","start_offset" : 4,"end_offset" : 7,"type" : "CN_WORD","position" : 2}

]

}

②以最細粒度拆分文檔

GET _analyze

{"analyzer": "ik_max_word","text": "今天是個好日子"}

結果是:

{"tokens": [

{"token" : "今天是","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 0},

{"token" : "今天","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 1},

{"token" : "是","start_offset" : 2,"end_offset" : 3,"type" : "CN_CHAR","position" : 2},

{"token" : "個","start_offset" : 3,"end_offset" : 4,"type" : "CN_CHAR","position" : 3},

{"token" : "好日子","start_offset" : 4,"end_offset" : 7,"type" : "CN_WORD","position" : 4},

{"token" : "日子","start_offset" : 5,"end_offset" : 7,"type" : "CN_WORD","position" : 5}

]

}

3.短語查詢(即match_phrase)

GET ik1/_search

{"query": {"match_phrase": {"content": "今天"}

}

4.短語前綴查詢(match_phrase_prefix)

GET ik1/_search

{"query": {"match_phrase_prefix": {"content": {"query": "今天好日子","slop": 2}

}

三.python操作elasticsearch

1.安裝elasticsearch模塊

pip install elasticsearch

# 豆瓣源

pip install-i https://pypi.doubanio.com/simple/ elasticsearch

2.連接

fromelasticsearch import Elasticsearch

# es=Elasticsearch() # 默認連接本地elasticsearch

# es= Elasticsearch(['127.0.0.1:9200']) # 連接本地9200端口

es=Elasticsearch(

["192.168.1.10", "192.168.1.11", "192.168.1.12"], # 連接集群，以列表的形式存放各節點的IP地址

sniff_on_start=True, # 連接前測試

sniff_on_connection_fail=True, # 節點無響應時刷新節點

sniff_timeout=60# 設置超時時間

)

配置忽略響應狀態碼

es = Elasticsearch(['127.0.0.1:9200'],ignore=400) # 忽略返回的400狀態碼

es= Elasticsearch(['127.0.0.1:9200'],ignore=[400, 405, 502]) # 以列表的形式忽略多個狀態碼

3.常用的連接方式

fromelasticsearch import Elasticsearch

es=Elasticsearch() # 默認連接本地elasticsearch

#創建

print(es.index(index='py2', doc_type='doc', id=1, body={'name': "張開", "age": 18}))

#查詢指定文檔

print(es.get(index='py2', doc_type='doc', id=1))

4.結果過濾

filter_path參數用于減少elasticsearch返回的響應

還支持*通配符以匹配字段名稱、任何字段或者字段部分：

①

print(es.search(index='py2', filter_path=['hits.total', 'hits.hits._source']))

# 可以省略type類型

print(es.search(index='w2', doc_type='doc')) # 可以指定type類型

print(es.search(index='w2', doc_type='doc', filter_path=['hits.total']))

②

print(es.search(index='py2', filter_path=['hits.*']))

print(es.search(index='py2', filter_path=['hits.hits._*']))

print(es.search(index='py2', filter_path=['hits.to*'])) # 僅返回響應數據的total

print(es.search(index='w2', doc_type='doc', filter_path=['hits.hits._*']))

# 可以加上可選的type類型

5.基本操作

①es.index，向指定索引添加或更新文檔，如果索引不存在，首先會創建該索引，然后再執行添加或者更新操作。

# print(es.index(index='w2', doc_type='doc', id='4', body={"name":"可可", "age": 18})) # 正常

# print(es.index(index='w2', doc_type='doc', id=5, body={"name":"卡卡西", "age":22})) # 正常

# print(es.index(index='w2', id=6, body={"name": "鳴人", "age": 22})) # 會報錯，TypeError: index() missing 1 required positional argument: 'doc_type'print(es.index(index='w2', doc_type='doc', body={"name": "鳴人", "age": 22})) # 可以不指定id，默認生成一個id

② es.get，查詢索引中指定文檔。

print(es.get(index='w2', doc_type='doc', id=5)) # 正常

print(es.get(index='w2', doc_type='doc')) # TypeError: get() missing 1 required positional argument: 'id'print(es.get(index='w2', id=5)) # TypeError: get() missing 1 required positional argument: 'doc_type'

③es.search，執行搜索查詢并獲取與查詢匹配的搜索匹配。這個用的最多，可以跟復雜的查詢條件。

index要搜索的以逗號分隔的索引名稱列表; 使用_all 或空字符串對所有索引執行操作。

doc_type 要搜索的以逗號分隔的文檔類型列表; 留空以對所有類型執行操作。

body 使用Query DSL(QueryDomain Specific Language查詢表達式)的搜索定義。

_source 返回_source字段的true或false，或返回的字段列表，返回指定字段。

_source_exclude要從返回的_source字段中排除的字段列表，返回的所有字段中，排除哪些字段。

_source_include從_source字段中提取和返回的字段列表，跟_source差不多。

print(es.search(index='py3', doc_type='doc', body={"query": {"match":{"age": 20}}})) # 一般查詢

print(es.search(index='py3', doc_type='doc', body={"query": {"match":{"age": 19}}},_source=['name', 'age'])) # 結果字段過濾

print(es.search(index='py3', doc_type='doc', body={"query": {"match":{"age": 19}}},_source_exclude =[ 'age']))

print(es.search(index='py3', doc_type='doc', body={"query": {"match":{"age": 19}}},_source_include =[ 'age']))

④ es.get_source，通過索引、類型和ID獲取文檔的來源，其實，直接返回想要的字典。

print(es.get_source(index='py3', doc_type='doc', id='1')) # {'name': '王五', 'age': 19}

⑤es.count，執行查詢并獲取該查詢的匹配數。比如查詢年齡是18的文檔。

body ={"query": {"match": {"age": 18}

}

print(es.count(index='py2', doc_type='doc', body=body))

# {'count': 1, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}}

print(es.count(index='py2', doc_type='doc', body=body)['count'])

# 1print(es.count(index='w2'))

# {'count': 6, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}}

print(es.count(index='w2', doc_type='doc'))

# {'count': 6, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed':

⑥es.delete，刪除指定的文檔。比如刪除文章id為4的文檔，但不能刪除僅只刪除索引，

如果想要刪除索引，還需要es.indices.delete來處理

print(es.delete(index='py3', doc_type='doc', id='4'))

⑦ es.delete_by_query，刪除與查詢匹配的所有文檔。

index?要搜索的以逗號分隔的索引名稱列表; 使用_all 或空字符串對所有索引執行操作。

doc_type?要搜索的以逗號分隔的文檔類型列表; 留空以對所有類型執行操作。

body使用Query DSL的搜索定義。

print(es.delete_by_query(index='py3', doc_type='doc', body={"query": {"match":{"age": 20}}}))

⑧es.exists，查詢elasticsearch中是否存在指定的文檔，返回一個布爾值。

print(es.exists(index='py3', doc_type='doc', id='1'))

⑨es.info，獲取當前集群的基本信息。

print(es.info())

⑩ es.ping，如果群集已啟動，則返回True，否則返回False。

print(es.ping())

6.Indices(es.indices )

① es.indices.create，在Elasticsearch中創建索引，用的最多。

比如創建一個嚴格模式、有4個字段、并為title字段指定ik_max_word查詢粒度的mappings。

并應用到py4索引中。這也是常用的創建自定義索引的方式。

body ={"mappings": {"doc": {"dynamic": "strict","properties": {"title": {"type": "text","analyzer": "ik_max_word"},"url": {"type": "text"},"action_type": {"type": "text"},"content": {"type": "text"}

}

es.indices.create('py4', body=body)

② es.indices.delete，在Elasticsearch中刪除索引

print(es.indices.delete(index='py4'))

print(es.indices.delete(index='w3')) # {'acknowledged': True}

③ es.indices.put_alias，為一個或多個索引創建別名，查詢多個索引的時候，可以使用這個別名。

index?別名應指向的逗號分隔的索引名稱列表(支持通配符)，使用_all對所有索引執行操作。

name要創建或更新的別名的名稱。

body別名的設置，例如路由或過濾器。

print(es.indices.put_alias(index='py4', name='py4_alias')) # 為單個索引創建別名

print(es.indices.put_alias(index=['py3', 'py2'], name='py23_alias')) # 為多個索引創建同一個別名，聯查用

④es.indices.delete_alias，刪除一個或多個別名。

print(es.indices.delete_alias(index='alias1'))

print(es.indices.delete_alias(index=['alias1, alias2']))

以下查看詳細

Cluster(集群相關)

Node(節點相關)

Cat(一種查詢方式)

Snapshot(快照相關)

Task(任務相關)

總結

以上是生活随笔為你收集整理的python连接es_Elasticsearch --- 3. ik中文分词器, python操作es的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：刚挖回来的紫薯怎么保存
下一篇：集成学习python_从Boosting