日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程语言 > python >内容正文

python

python爬取千图网高清图

發(fā)布時(shí)間:2023/12/31 python 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python爬取千图网高清图 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

###一、scrapy圖片爬蟲(chóng)構(gòu)建思路
1.分析網(wǎng)站
2.選擇爬取方式與策略
3.創(chuàng)建爬蟲(chóng)項(xiàng)目 → 定義items.py
4.編寫(xiě)爬蟲(chóng)文件
5.編寫(xiě)pipelines與setting
6.調(diào)試

二、千圖網(wǎng)難點(diǎn)(http://www.58pic.com/)

1.要爬取全站的圖片
2.要爬取高清的圖片------找出高清地址即可
3.要有相應(yīng)的反爬蟲(chóng)機(jī)制------如模擬瀏覽器,不記錄cookie等,只要相應(yīng)注釋去掉即可COOKIES_ENABLED = False

三、散點(diǎn)知識(shí)

1.from scrapy.http import Request 是回調(diào)函數(shù)用Request(url=…,callback=…)
2.xpath的//表示提取所有符合的節(jié)點(diǎn)

代碼:

items.py

import scrapy class QiantuwangItem(scrapy.Item):# define the fields for your item here like:# name = scrapy.Field()url = scrapy.Field()title = scrapy.Field()

middlewares.py

from scrapy import signalsclass QiantuwangSpiderMiddleware(object):# Not all methods need to be defined. If a method is not defined,# scrapy acts as if the spider middleware does not modify the# passed objects.@classmethoddef from_crawler(cls, crawler):# This method is used by Scrapy to create your spiders.s = cls()crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)return sdef process_spider_input(response, spider):# Called for each response that goes through the spider# middleware and into the spider.# Should return None or raise an exception.return Nonedef process_spider_output(response, result, spider):# Called with the results returned from the Spider, after# it has processed the response.# Must return an iterable of Request, dict or Item objects.for i in result:yield idef process_spider_exception(response, exception, spider):# Called when a spider or process_spider_input() method# (from other spider middleware) raises an exception.# Should return either None or an iterable of Response, dict# or Item objects.passdef process_start_requests(start_requests, spider):# Called with the start requests of the spider, and works# similarly to the process_spider_output() method, except# that it doesn’t have a response associated.# Must return only requests (not items).for r in start_requests:yield rdef spider_opened(self, spider):spider.logger.info('Spider opened: %s' % spider.name) import urllib import random class QiantuwangPipeline(object):def process_item(self, item, spider):try:title = item['title'][0].encode('gbk')file = "E:/tupian/" + str(title) + str(int(random.random() * 10000)) + ".jpg"urllib.urlretrieve(item['url'][0], filename=file)except Exception, e:print epassreturn item

總結(jié)

以上是生活随笔為你收集整理的python爬取千图网高清图的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。