當前位置：首頁 > 编程语言 > python >内容正文

python

python批量爬取小说（一步一步实现，适合新手入门）

發布時間：2024/3/26 python 52 豆豆

生活随笔收集整理的這篇文章主要介紹了 python批量爬取小说（一步一步实现，适合新手入门）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1、下載小說的一個章節

讓我們首先打開書趣閣網站中的一個小說中的一個章節，如圖：

然后我們開始請求網頁數據：

response = requests.get('http://www.shuquge.com/txt/63542/9645082.html') # 自動解決編碼問題 response.encoding = response.apparent_encoding

使用 parsel 庫對數據進行解析：
解析數據一般有三種方式：正則表達式、xpath 路徑提取器、css 選擇器。在這里，我們使用 css 選擇器。

# 將字符串內容實例化成一個對象 sel = parsel.Selector(response.text) # ::text 是文字屬性提取器 title = sel.css('.content h1::text').get() # 可以用 #wrapper>div.book.reader>div.content>h1 代替 content = sel.css('#content::text').getall() # 可以用 .content div.showtxt 代替

其中，::text 是文字屬性提取器，sel.css() 中的內容可以用下面這種方式獲得：
首先打開開發者工具，在查看器中找到小說章節的名字，然后點擊鼠標右鍵 --> 復制 --> CSS 選擇器。

之后，我們就可以將小說內容保存到 .txt 文件中了：

# 保存小說內容 with open(title+'.txt', mode='w', encoding='utf-8') as f:f.write(title+'\n')for i in content:f.write(i.strip()+'\n')

其中，.strip() 是為了去掉所有空格。

2、下載小說中的所有章節

先把之前的下載一章的代碼封裝成一個函數：

def download_one_chapter(url):response = requests.get(url)response.encoding = response.apparent_encodingsel = parsel.Selector(response.text)title = sel.css('.content h1::text').get()content = sel.css('#content::text').getall()with open(title+'.txt', mode='w', encoding='utf-8') as f:f.write(title+'\n')for i in content:f.write(i.strip()+'\n')

然后回到這個小說的目錄頁，用同樣的方法在查看器中找到小說每一章節的下載地址的最后幾位數字：

# 請求目錄頁，獲取所有章節的下載地址 url = 'http://www.shuquge.com/txt/5809/index.html' response = requests.get(url) response.encoding = response.apparent_encoding sel = parsel.Selector(response.text) index = sel.css('.listmain dd a::attr(href)').getall() for i in index[12:]:download_one_chapter('http://www.shuquge.com/txt/5809/'+i)

其中，index 中的內容就是這些數字，sel.css(）中的內容也是按之前那種方法獲取。::attr(href) 用來提取 href 中的內容。

總結

以上是生活随笔為你收集整理的python批量爬取小说（一步一步实现，适合新手入门）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python爬虫实践记录-基于reque
下一篇： websocket python爬虫_p