當前位置：首頁 >

Python 爬虫学习系列教程

發布時間：2024/7/23 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python 爬虫学习系列教程小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Python爬蟲 --- 中高級爬蟲學習路線

：https://www.cnblogs.com/Eeyhan/p/14148832.html

看不清圖時，可以把圖片保存到本地在打開查看。。。

Python爬蟲學習系列教程

From：https://cuiqingcai.com/1052.html

一、爬蟲入門

1.?Python爬蟲入門一之綜述

2.?Python爬蟲入門二之爬蟲基礎了解

3.?Python爬蟲入門三之Urllib庫的基本使用

4.?Python爬蟲入門四之Urllib庫的高級用法

5.?Python爬蟲入門五之URLError異常處理

6.?Python爬蟲入門六之Cookie的使用

7.?Python爬蟲入門七之正則表達式

二、爬蟲實戰

1.?Python爬蟲實戰一之爬取糗事百科段子

# -*- coding:utf-8 -*-import requests import re import osclass QSBK(object):def __init__(self):self.__url = r'https://www.qiushibaike.com'self.__head = Noneself.__data = Noneself.__proxy = Nonedef drop_n(self, content):'''去掉換行符和網頁注釋:param content: html 網頁內容:return: 返回去掉換行符之后的網頁內容'''content = re.sub(r'\n', '', content)content = re.sub(r'', '', content)return contentdef crawl(self):r = requests.get("{0}/hot".format(self.__url))if r.status_code == 200:print("status_code : {0}".format(r.status_code))print r.urlcontent = self.drop_n(r.content)page_num_regex = re.compile(r'<li><span class="current" >(.*?)</span></li>')page_num = re.findall(page_num_regex, content)[0]s = r'<div class="article block untagged mb15.*?>' \r'<div class="author clearfix">' \r'<a .*?>.*?</a><a.*?web-list-author-text.*?><h2>(.*?)</h2></a>' \r'.*?<a href="(.*?)".*?web-list-content.*?><div class="content"><span>(.*?)</span>'# print spattern = re.compile(s)items = re.findall(pattern, content)print u'第 {0} 頁'.format(page_num)for item in items:print item[0], item[1], item[2]#os.system('pause')raw_input(u'按 Enter鍵繼續...')next_page_regex = re.compile(r'<ul class="pagination">.*<li><a href="(.*?)".*?><span.*?/span></a></li></ul>')next_page = re.findall(next_page_regex, content)[0]while next_page:next_url = '{0}{1}'.format(self.__url, next_page)r = requests.get(next_url)if r.status_code == 200:print("status_code : {0}".format(r.status_code))print r.urlcontent = self.drop_n(r.content)page_num = re.findall(page_num_regex, content)[0]items = re.findall(pattern, content)print u'第 {0} 頁'.format(page_num)for item in items:print item[0], item[1], item[2]# os.system('pause')raw_input(u'按 Enter鍵繼續...')next_page = re.findall(next_page_regex, content)[0]print next_pagepasselse:print("status_code : {0}".format(r.status_code))passif __name__ == "__main__":qsbk = QSBK()qsbk.crawl()pass

運行結果截圖：

2.?Python爬蟲實戰二之爬取百度貼吧帖子

3.?Python爬蟲實戰三之實現山東大學無線網絡掉線自動重連

4.?Python爬蟲實戰四之抓取淘寶MM照片

5.?Python爬蟲實戰五之模擬登錄淘寶并獲取所有訂單

6.?Python爬蟲實戰六之抓取愛問知識人問題并保存至數據庫

7.?Python爬蟲實戰七之計算大學本學期績點

8.?Python爬蟲實戰八之利用Selenium抓取淘寶匿名旺旺

三、爬蟲利器

1.?Python爬蟲利器一之Requests庫的用法

2.?Python爬蟲利器二之Beautiful Soup的用法

3.?Python爬蟲利器三之Xpath語法與lxml庫的用法

4.?Python爬蟲利器四之PhantomJS的用法

5.?Python爬蟲利器五之Selenium的用法

6.?Python爬蟲利器六之PyQuery的用法

四、爬蟲進階

1.?Python爬蟲進階一之爬蟲框架概述

2.?Python爬蟲進階二之PySpider框架安裝配置

3.?Python爬蟲進階三之爬蟲框架Scrapy安裝配置

4.?Python爬蟲進階四之PySpider的用法

5.?Python爬蟲進階五之多線程的用法

6.?Python爬蟲進階六之多進程的用法

7.?Python爬蟲進階七之設置ADSL撥號服務器代理

《一只小爬蟲》

《一只并發的小爬蟲》

《Python與簡單網絡爬蟲的編寫》

《Python寫爬蟲——抓取網頁并解析HTML》

《[Python]網絡爬蟲（一）：抓取網頁的含義和URL基本構成》

《[Python]網絡爬蟲（二）：利用urllib2通過指定的URL抓取網頁內容》

《[Python]網絡爬蟲（三）：異常的處理和HTTP狀態碼的分類》

《[Python]網絡爬蟲（四）：Opener與Handler的介紹和實例應用》

《[Python]網絡爬蟲（五）：urllib2的使用細節與抓站技巧》

《[Python]網絡爬蟲（六）：一個簡單的百度貼吧的小爬蟲》

《[Python]網絡爬蟲（七）：Python中的正則表達式教程》

《[Python]網絡爬蟲（八）：糗事百科的網絡爬蟲（v0.2）源碼及解析》

《[Python]網絡爬蟲（九）：百度貼吧的網絡爬蟲（v0.4）源碼及解析》

《[Python]網絡爬蟲（十）：一個爬蟲的誕生全過程（以山東大學績點運算為例）》

《用python爬蟲抓站的一些技巧總結 zz》

《python爬蟲高級代碼》

總結

以上是生活随笔為你收集整理的Python 爬虫学习系列教程的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Linux 动态库和静态库
下一篇：简明Python教程学习笔记_5_解决问