日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

利用Python搜索51CTO推荐博客并保存至Excel

發布時間:2025/4/5 python 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 利用Python搜索51CTO推荐博客并保存至Excel 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、背景

近期在學習爬蟲,利用Requests模塊獲取頁面,BeautifulSoup來獲取需要的內容,最后利用xlsxwriter模塊講內容保存至excel,在此記錄一下,后續可舉一反三,利用其抓取其他內容持久和存儲到文件內,或數據庫等。

二、代碼

編寫了兩個模塊,geturl3和getexcel3,最后在main內調用

git源碼地址

geturl3.py代碼內容如下:

#!/bin/env python # -*- coding:utf-8 -*- # @Author : kaliarchimport requests from bs4 import BeautifulSoupclass get_urldic:#獲取搜索關鍵字def get_url(self):urlList = []first_url = 'https://blog.51cto.com/search/result?q='after_url = '&type=&page='try:search = input("Please input search name:")page = int(input("Please input page:"))except Exception as e:print('Input error:',e)exit()for num in range(1,page+1):url = first_url + search + after_url + str(num)urlList.append(url)print("Please wait....")return urlList,search#獲取網頁文件def get_html(self,urlList):response_list = []for r_num in urlList:request = requests.get(r_num)response = request.contentresponse_list.append(response)return response_list#獲取blog_name和blog_urldef get_soup(self,html_doc):result = {}for g_num in html_doc:soup = BeautifulSoup(g_num,'html.parser')context = soup.find_all('a',class_='m-1-4 fl')for i in context:title=i.get_text()result[title.strip()]=i['href']return resultif __name__ == '__main__':blog = get_urldic()urllist, search = blog.get_url()html_doc = blog.get_html(urllist)result = blog.get_soup(html_doc)for k,v in result.items():print('search blog_name is:%s,blog_url is:%s' % (k,v))

getexcel3.py代碼內容如下:

#!/bin/env python # -*- coding:utf-8 -*- # @Author : kaliarchimport xlsxwriterclass create_excle:def __init__(self):self.tag_list = ["blog_name", "blog_url"]def create_workbook(self,search=" "):excle_name = search + '.xlsx'#定義excle名稱workbook = xlsxwriter.Workbook(excle_name)worksheet_M = workbook.add_worksheet(search)print('create %s....' % excle_name)return workbook,worksheet_Mdef col_row(self,worksheet):worksheet.set_column('A:A', 12)worksheet.set_row(0, 17)worksheet.set_column('A:A',58)worksheet.set_column('B:B', 58)def shell_format(self,workbook):#表頭格式merge_format = workbook.add_format({'bold': 1,'border': 1,'align': 'center','valign': 'vcenter','fg_color': '#FAEBD7'})#標題格式name_format = workbook.add_format({'bold': 1,'border': 1,'align': 'center','valign': 'vcenter','fg_color': '#E0FFFF'})#正文格式normal_format = workbook.add_format({'align': 'center',})return merge_format,name_format,normal_format#寫入title和列名def write_title(self,worksheet,search,merge_format):title = search + "搜索結果"worksheet.merge_range('A1:B1', title, merge_format)print('write title success')def write_tag(self,worksheet,name_format):tag_row = 1tag_col = 0for num in self.tag_list:worksheet.write(tag_row,tag_col,num,name_format)tag_col += 1print('write tag success')#寫入內容def write_context(self,worksheet,con_dic,normal_format):row = 2for k,v in con_dic.items():if row > len(con_dic):breakcol = 0worksheet.write(row,col,k,normal_format)col+=1worksheet.write(row,col,v,normal_format)row+=1print('write context success')#關閉exceldef workbook_close(self,workbook):workbook.close()if __name__ == '__main__':print('This is create excel mode')

main.py代碼內容如下:

#!/bin/env python # -*- coding:utf-8 -*- # @Author : kaliarchimport geturl3 import getexcel3#獲取url字典 def get_dic():blog = geturl3.get_urldic()urllist, search = blog.get_url()html_doc = blog.get_html(urllist)result = blog.get_soup(html_doc)return result,search#寫入excle def write_excle(urldic,search):excle = getexcel3.create_excle()workbook, worksheet = excle.create_workbook(search)excle.col_row(worksheet)merge_format, name_format, normal_format = excle.shell_format(workbook)excle.write_title(worksheet,search,merge_format)excle.write_tag(worksheet,name_format)excle.write_context(worksheet,urldic,normal_format)excle.workbook_close(workbook)def main():url_dic ,search_name = get_dic()write_excle(url_dic,search_name)if __name__ == '__main__':main()

三、效果展示

運行代碼,填寫搜索的關鍵字,及搜索多少頁

查看會生成一個以搜索關鍵字命名的excel,打開寫入的內容

利用其就可以搜索并保持自己需要的51CTO推薦博客,可以多搜索幾個

轉載于:https://blog.51cto.com/kaliarch/2067103

《新程序員》:云原生和全面數字化實踐50位技術專家共同創作,文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的利用Python搜索51CTO推荐博客并保存至Excel的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。