當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

爬取CSDN最新月份所写的文章的最高阅读量文章（以及统计整个月所写的文章的阅读量的累积和）

發(fā)布時間：2025/4/16 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了爬取CSDN最新月份所写的文章的最高阅读量文章（以及统计整个月所写的文章的阅读量的累积和）小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

項目簡述

所用工具：python 3
統(tǒng)計中，在最下面的主函數(shù)中，url就是我們需要放的初始鏈接。
比如，我放的鏈接就是https://blog.csdn.net/a19990412/
這個頁面的具體效果，可以點擊上面鏈接看到，也可以，直接看下面圖片

大概就是這樣的一個界面。然后只要把這個網(wǎng)頁的url放到下面的函數(shù)中，就可以直接獲取對應的url的信息（具體信息如題目所說）

具體內容簡介：
就是爬取最新的寫作頁的所有文章的閱讀量。
通過編織的爬蟲去統(tǒng)計一下數(shù)據(jù)。

例如，在我寫這篇文章的時候，運行這個代碼，輸出結果是

In [date]: 2018年5月原 Fiddler捕捉數(shù)據(jù)包得到幾乎全是加密過的tunnel to【解決方法】 Its count is 150 The Sum of all the article is 365

代碼

import requests from bs4 import BeautifulSoup import redef cal(soup):global thebestOne_countglobal thebestOne_nameglobal Sum_countdiv_article = soup.find('div', attrs={'class': "article-list"})divs = div_article.find_all('div', attrs={'class': "article-item-box csdn-tracking-statistics"})for d in divs:co = int(re.search('\d+', d.find('span', attrs={'class': "read-num"}).text).group())if co > thebestOne_count:h4 = d.find('h4').text.replace('\n', '').replace(' ', '')thebestOne_name = h4thebestOne_count = coSum_count += codef main(url):global dateheaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}res = requests.get(url, headers=headers)soup = BeautifulSoup(res.text, 'lxml')try:a = soup.find('aside').find('ul', attrs={'class': 'archive-list'}).find('a')date = list(filter(lambda x: x and x != '\n', a.text.split(' ')))[0]href = a['href']except Exception as e:print('[Error]', e.args)else:res = requests.get(href, headers=headers)soup = BeautifulSoup(res.text, 'lxml')ids = soup.find('div', attrs={'id': 'pageBox'})ids_string = str(ids)data = re.findall('data-page="(\d+)"', ids_string)cal(soup)if data:data = data[1:]for d in data:newhref = href + d + '?'res = requests.get(newhref, headers=headers)soup = BeautifulSoup(res.text, 'lxml')cal(soup)if __name__ == '__main__':Sum_count = 0thebestOne_name = ''thebestOne_count = 0date = ''url = 'https://blog.csdn.net/a19990412/'main(url)print('In [date]: %s' % date, thebestOne_name, '\nIts count is %d' % thebestOne_count)print('The Sum of all the article is %d' % Sum_count)

總結

以上是生活随笔為你收集整理的爬取CSDN最新月份所写的文章的最高阅读量文章（以及统计整个月所写的文章的阅读量的累积和）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【requests】Python轻松爬取
下一篇：雅克比迭代法求方程组的解（Python实