日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

Python异步爬取知乎热榜

發布時間:2025/3/20 python 21 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Python异步爬取知乎热榜 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、錯誤代碼:摘要和詳細的url獲取不到

import asyncio from bs4 import BeautifulSoup import aiohttpheaders={'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36','referer': 'https://www.baidu.com/s?tn=02003390_43_hao_pg&isource=infinity&iname=baidu&itype=web&ie=utf-8&wd=%E7%9F%A5%E4%B9%8E%E7%83%AD%E6%A6%9C' } async def getPages(url):async with aiohttp.ClientSession(headers=headers) as session:async with session.get(url) as resp:print(resp.status) # 打印狀態碼html=await resp.text()soup=BeautifulSoup(html,'lxml')items=soup.select('.HotList-item')for item in items:title=item.select('.HotList-itemTitle')[0].texttry:abstract=item.select('.HotList-itemExcerpt')[0].textexcept:abstract='No Abstract'hot=item.select('.HotList-itemMetrics')[0].texttry:img=item.select('.HotList-itemImgContainer img')['src']except:img='No Img'print("{}\n{}\n{}".format(title,abstract,img))if __name__ == '__main__':url='https://www.zhihu.com/billboard'loop=asyncio.get_event_loop()loop.run_until_complete(getPages(url))loop.close()

二、查看JS代碼

發現詳細鏈接、圖片鏈接、問題摘要等都在JS里面(CSDN的開發者助手插件確實好用)

?

正則表達式獲取上述信息

?

接下來就是詳細的代碼啦

import asyncio import json import re import aiohttpheaders={'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36','referer': 'https://www.baidu.com/s?tn=02003390_43_hao_pg&isource=infinity&iname=baidu&itype=web&ie=utf-8&wd=%E7%9F%A5%E4%B9%8E%E7%83%AD%E6%A6%9C' } async def getPages(url):async with aiohttp.ClientSession(headers=headers) as session:async with session.get(url) as resp:print(resp.status) # 打印狀態碼html=await resp.text()regex=re.compile('"hotList":(.*?),"guestFeeds":')text=regex.search(html).group(1)# print(json.loads(text)) # json換成字典格式for item in json.loads(text):title=item['target']['titleArea']['text']question=item['target']['excerptArea']['text']hot=item['target']['metricsArea']['text']link=item['target']['link']['url']img=item['target']['imageArea']['url']if not img:img='No Img'if not question:question='No Abstract'print("Title:{}\nPopular:{}\nQuestion:{}\nLink:{}\nImg:{}".format(title,hot,question,link,img))if __name__ == '__main__':url='https://www.zhihu.com/billboard'loop=asyncio.get_event_loop()loop.run_until_complete(getPages(url))loop.close()

?

?

總結

以上是生活随笔為你收集整理的Python异步爬取知乎热榜的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。