當前位置：首頁 > 编程语言 > python >内容正文

python

python 爬虫热搜_Python网络爬虫之爬取微博热搜

發布時間：2023/12/2 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 python 爬虫热搜_Python网络爬虫之爬取微博热搜小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

微博熱搜的爬取較為簡單，我只是用了lxml和requests兩個庫

1.分析網頁的源代碼：右鍵--查看網頁源代碼.

從網頁代碼中可以獲取到信息

(1)熱搜的名字都在

的子節點里

(2)熱搜的排名都在

的里(注意置頂微博是沒有排名的！)

(3)熱搜的訪問量都在

的子節點里

2.requests獲取網頁

(1)先設置url地址，然后模擬瀏覽器(這一步可以不用)防止被認出是爬蟲程序。

###網址

url="https://s.weibo.com/top/summary?refer=top_hot&topnav=1&wvr=6"

###模擬瀏覽器，這個請求頭windows下都能用

header={'user-agent':'mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/73.0.3683.103 safari/537.36'}

(2)利用req uests庫的get()和lxml的etr ee()來獲取網頁代碼

###獲取html頁面

html=etree.html(requests.get(url,headers=header).text)

3.構造xpath路徑

上面第一步中三個xath路徑分別是：

affair=html.xpath('//td[@class="td-02"]/a/text()')

rank=html.xpath('//td[@class="td-01 ranktop"]/text()')

view=html.xpath('//td[@class="td-02"]/span/text()')

xpath的返回結果是列表，所以affair、rank、view都是字符串列表

4.格式化輸出

需要注意的是affair中多了一個置頂熱搜，我們先將他分離出來。

top=affair[0]

affair=affair[1:]

這里利用了python的切片。

print('{0:<10}\t{1:<40}'.format("top",top))

for i in range(0, len(affair)):

print("{0:<10}\t{1:{3}<30}\t{2:{3}>20}".format(rank[i],affair[i],view[i],chr(12288)))

這里還是沒能做到完全對齊。。。

5.全部代碼

###導入模塊

import requests

from lxml import etree

###網址

url="https://s.weibo.com/top/summary?refer=top_hot&topnav=1&wvr=6"

###模擬瀏覽器

header={'user-agent':'mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/73.0.3683.103 safari/537.36'}

###主函數

def main():

###獲取html頁面

html=etree.html(requests.get(url,headers=header).text)

rank=html.xpath('//td[@class="td-01 ranktop"]/text()')

affair=html.xpath('//td[@class="td-02"]/a/text()')

view = html.xpath('//td[@class="td-02"]/span/text()')

top=affair[0]

affair=affair[1:]

print('{0:<10}\t{1:<40}'.format("top",top))

for i in range(0, len(affair)):

print("{0:<10}\t{1:{3}<30}\t{2:{3}>20}".format(rank[i],affair[i],view[i],chr(12288)))

main()

結果展示:

總結

以上所述是小編給大家介紹的python網絡爬蟲之爬取微博熱搜,希望對大家有所幫助

希望與廣大網友互動？？

點此進行留言吧！

總結

以上是生活随笔為你收集整理的python 爬虫热搜_Python网络爬虫之爬取微博热搜的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：京喜app的东西可靠吗
下一篇： python冒泡排序函数_python冒