日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程语言 > python >内容正文

python

python爬取elasticsearch内容

發(fā)布時(shí)間:2023/12/19 python 39 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python爬取elasticsearch内容 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

我們以上篇的elasticsearch添加的內(nèi)容為例,對(duì)其內(nèi)容進(jìn)行爬取,并獲得有用信息個(gè)過(guò)程。

先來(lái)看一下elasticsearch中的內(nèi)容:

?

{"took": 88,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 3,"max_score": 1,"hits": [{"_index": "megacorp","_type": "employee","_id": "2","_score": 1,"_source": {"first_name": "Jane","last_name": "Smith","age": 32,"about": "I like to collect rock albums","interests": ["music"]}},{"_index": "megacorp","_type": "employee","_id": "1","_score": 1,"_source": {"first_name": "John","last_name": "Smith","age": 25,"about": "I love to go rock climbing","interests": ["sports","music"]}},{"_index": "megacorp","_type": "employee","_id": "3","_score": 1,"_source": {"first_name": "Douglas","last_name": "Fir","age": 35,"about": "I like to build cabinets","interests": ["forestry"]}}]} }

?

1.在python中,首先要用到urllib的包,其次對(duì)其進(jìn)行讀取的格式為json。

import urllib.request as request import json

2.接下來(lái),我們獲取相應(yīng)的路徑請(qǐng)求,并用urlopen打開(kāi)請(qǐng)求的文件:

if __name__ == '__main__':req = request.Request("http://localhost:9200/megacorp/employee/_search")resp = request.urlopen(req)

3.對(duì)得到的resp,我們需要用json的格式迭代輸出:(注意是字符串類型)

jsonstr=""for line in resp:jsonstr+=line.decode()data=json.loads(jsonstr)print(data)

4.但是我們得到的信息是包含內(nèi)容和屬性的,我們只想得到內(nèi)容,那么久需要對(duì)每層的屬性進(jìn)行分解獲取:

employees = data['hits']['hits']for e in employees:_source=e['_source']full_name=_source['first_name']+"."+_source['last_name']age=_source["age"]about=_source["about"]interests=_source["interests"]print(full_name,'is',age,",")print(full_name,"info is",about)print(full_name,'likes',interests)

得到的內(nèi)容為:

?

Jane.Smith is 32 , Jane.Smith info is I like to collect rock albums Jane.Smith likes ['music']John.Smith is 25 , John.Smith info is I love to go rock climbing John.Smith likes ['sports', 'music']Douglas.Fir is 35 , Douglas.Fir info is I like to build cabinets Douglas.Fir likes ['forestry']

?

?

?

?

?

對(duì)于需要聚合的內(nèi)容,我們可以通過(guò)下面的方法進(jìn)行獲取:

1:獲取路徑

?

url="http://localhost:9200/megacorp/employee/_search"

?

2.獲取聚合的格式查詢

?

data=''' {"aggs" : {"all_interests" : {"terms" : { "field" : "interests" },"aggs" : {"avg_age" : {"avg" : { "field" : "age" }}}}} }'''

?

3.標(biāo)明頭部信息

headers={"Content-Type":"application/json"}

4.同樣,以請(qǐng)求和相應(yīng)的方式獲取信息并迭代為json格式

req=request.Request(url=url,data=data.encode(),headers=headers,method="GET")resp=request.urlopen(req)jsonstr=""for line in resp:jsonstr+=line.decode()rsdata=json.loads(jsonstr)

5.有用聚合信息內(nèi)部依然是數(shù)組形式,所以依然需要迭代輸出:

agg = rsdata['aggregations'] buckets = agg['all_interests']['buckets']for b in buckets:key = b['key']doc_count = b['doc_count']avg_age = b['avg_age']['value']
print('aihao',key,'gongyou',doc_count,'ren,tamenpingjuageshi',avg_age)

最終得到信息:

aihao music gongyou 2 ren,tamenpingjuageshi 28.5aihao forestry gongyou 1 ren,tamenpingjuageshi 35.0aihao sports gongyou 1 ren,tamenpingjuageshi 25.0

?

轉(zhuǎn)載于:https://www.cnblogs.com/qianshuixianyu/p/9287556.html

創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎(jiǎng)勵(lì)來(lái)咯,堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎(jiǎng)

總結(jié)

以上是生活随笔為你收集整理的python爬取elasticsearch内容的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。