當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

python3.5安装scrapy_Python3.5下安装测试Scrapy

發(fā)布時間：2023/12/10 python 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 python3.5安装scrapy_Python3.5下安装测试Scrapy 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

1、引言

Scrapy框架結(jié)構(gòu)清晰，基于twisted的異步架構(gòu)可以充分利用計算機(jī)資源，是做爬蟲必備基礎(chǔ)，本文將對Scrapy的安裝作介紹。

2、安裝lxml

2.1? 下載地址：https://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted　　選擇對應(yīng)python3.5的lxml庫

2.2 如果pip的版本過低，先升級pip：

python -m pip install -U pip

2.3 安裝lxml庫(先將下載的庫文件copy到python的安裝目錄，按住shift鍵并鼠標(biāo)右擊選擇“在此處打開命令窗口”)

pip install lxml-4.1.1-cp35-cp35m-win_amd64.whl

看到出現(xiàn)successfully等字樣說明按章成功。

3、安裝Twisted庫

3.1 下載鏈接：https://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted　　選擇對應(yīng)python3.5的庫文件

3.2 安裝

pip install Twisted-17.9.0-cp35-cp35m-win_amd64.whl

看到出現(xiàn)successfully等字樣說明按章成功。

Note：部分機(jī)器可能安裝失敗，可以嘗試將?Twisted-17.9.0-cp35-cp35m-win_amd64.whl文件移動到 ?$python/Scripts/ ? 目錄下，重新安裝。

4、安裝Scrapy

twisted庫安裝成功后，安裝scrapy就簡單了，在命令提示符窗口直接輸入命令：

pip install scrapy

看到出現(xiàn)successfully等字樣說明按章成功。

5、Scrapy測試

5.1 新建項(xiàng)目

先新建一個Scrapy爬蟲項(xiàng)目，選擇python的工作目錄(我的是：H:\PycharmProjects? ?然后安裝Shift鍵并鼠標(biāo)右鍵選擇“在此處打開命令窗口”)，然后輸入命令：

scrapy startproject allister

對應(yīng)目錄會生成目錄allister文件夾，目錄結(jié)構(gòu)如下：

└── allister

├── allister

│ ├── __init__.py

│ ├── items.py

│ ├── pipelines.py

│ ├── settings.py

│ └── spiders

└── scrapy.cfg

簡單介紹個文件的作用：

# -----------------------------------------------

scrapy.cfg：項(xiàng)目的配置文件；

allister/ : 項(xiàng)目的python模塊，將會從這里引用代碼

allister/items.py:項(xiàng)目的items文件

allister/pipelines.py:項(xiàng)目的pipelines文件

allister/settings.py ：項(xiàng)目的設(shè)置文件

allister/spiders : 存儲爬蟲的目錄

5.2 修改allister/items.py文件：

# -*- coding: utf-8 -*-

# Define here the models for your scraped items

# See documentation in:

# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy

class AllisterItem(scrapy.Item):

name = scrapy.Field()

level = scrapy.Field()

info = scrapy.Field()

5.3 編寫文件 AllisterSpider.py

# !/usr/bin/env python

# -*- coding: utf-8 -*-

# @File : AllisterSpider.py

# @Author: Allister.Liu

# @Date : 2018/1/18

# @Desc :

import scrapy

from allister.items import AllisterItem

class ItcastSpider(scrapy.Spider):

name = "ic2c"

allowed_domains = ["http://www.itcast.cn"]

start_urls = [

"http://www.itcast.cn/channel/teacher.shtml#ac"

]

def parse(self, response):

items = []

for site in response.xpath('//div[@class="li_txt"]'):

item = AllisterItem()

t_name = site.xpath('h3/text()')

t_level = site.xpath('h4/text()')

t_desc = site.xpath('p/text()')

unicode_teacher_name = t_name.extract_first().strip()

unicode_teacher_level = t_level.extract_first().strip()

unicode_teacher_info = t_desc.extract_first().strip()

item["name"] = unicode_teacher_name

item["level"] = unicode_teacher_level

item["info"] = unicode_teacher_info

yield item

編寫完成后復(fù)制至項(xiàng)目的 \allister\spiders目錄下，cmd選擇項(xiàng)目根目錄輸入以下命令：

scrapy crawl ic2c -o itcast_teachers.json -t json

抓取的數(shù)據(jù)將以json的格式存儲在ic2c_infos.json文件中；

如果出現(xiàn)如下錯誤請看對應(yīng)解決辦法：

總結(jié)

以上是生活随笔為你收集整理的python3.5安装scrapy_Python3.5下安装测试Scrapy的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：慕课软件工程(第二十章.ISO9000标
下一篇： csi python 摄像头树莓派_树