Python3-网页爬取-批量爬取贴吧页面数据
生活随笔
收集整理的這篇文章主要介紹了
Python3-网页爬取-批量爬取贴吧页面数据
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
# 批量爬取貼吧頁面數據
# 網頁抓取漢字轉碼、多個參數拼接
# 第1頁: https://tieba.baidu.com/f?kw=%E6%97%85%E8%A1%8C%E9%9D%92%E8%9B%99&ie=utf-8&pn=0
# 第2頁:https://tieba.baidu.com/f?kw=%E6%97%85%E8%A1%8C%E9%9D%92%E8%9B%99&ie=utf-8&pn=50
# 第3頁 https://tieba.baidu.com/f?kw=%E6%97%85%E8%A1%8C%E9%9D%92%E8%9B%99&ie=utf-8&pn=100
# 第4頁 pn=150
# 及格水平---單頁爬取
# base_url = "https://tieba.baidu.com/f?kw=%E6%97%85%E8%A1%8C%E9%9D%92%E8%9B%99&ie=utf-8&pn="
# for page in range(10):
# new_url = base_url + str(page*50)
# print(new_url)
# 進階水平--單頁爬取
# 從鍵盤去輸入貼吧名稱和頁數,然后爬取指定頁面的內容
base_url = 'https://tieba.baidu.com/f?'
name = input("請輸入貼吧名稱:")
page = input("請輸入貼吧頁數:") # page輸入的時候就是字符串
from urllib import request, parse# qs={'kw':name,
# 'pn':(int(page)-1)*50}
#
# qs_data=parse.urlencode(qs)
# url=base_url+qs_data
# print(url)
#
# headers={
# 'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0'
#
# }
# req=request.Request(url,headers=headers)
# response=request.urlopen(req)
# html=response.read()
# html=html.decode('utf-8')
#
# with open(name+'第'+page+'頁'+'.html','w',encoding='utf-8') as f:
# f.write(html)
# 進階水平----批量爬取
# 從鍵盤去輸入貼吧名稱和頁數,然后爬取指定頁面的內容
for i in range(int(page)):qs = {'kw': name,
'pn': i * 50}qs_data = parse.urlencode(qs)url = base_url + qs_dataprint(url)headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0'
}req = request.Request(url, headers=headers)response = request.urlopen(req)html = response.read()html = html.decode('utf-8')with open(name + '第' + str(i+1) + '頁' + '.html', 'w', encoding='utf-8') as f:f.write(html)
C:\Users\Apple\PycharmProjects\spider\venv\Scripts\python.exe C:/Users/Apple/PycharmProjects/spider/04tieba.py
請輸入貼吧名稱:旅行青蛙
請輸入貼吧頁數:2
https://tieba.baidu.com/f?kw=%E6%97%85%E8%A1%8C%E9%9D%92%E8%9B%99&pn=0
https://tieba.baidu.com/f?kw=%E6%97%85%E8%A1%8C%E9%9D%92%E8%9B%99&pn=50
Process finished with exit code 0
總結
以上是生活随笔為你收集整理的Python3-网页爬取-批量爬取贴吧页面数据的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Centos7 Mysql 一键安装(
- 下一篇: python井字棋