當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Scrapy框架的学习(8.scrapy中settings.py里面配置说明以及怎样设置配置或者参数以及怎样使用)

發(fā)布時(shí)間：2024/9/30 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 Scrapy框架的学习(8.scrapy中settings.py里面配置说明以及怎样设置配置或者参数以及怎样使用) 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

1.settings.py里面的參數(shù)說(shuō)明

? ? 每個(gè)參數(shù)其對(duì)應(yīng)的官方得文檔的網(wǎng)址

# -*- coding: utf-8 -*-# Scrapy settings for tencent project # # For simplicity, this file contains only settings considered important or # commonly used. You can find more settings consulting the documentation: # # https://doc.scrapy.org/en/latest/topics/settings.html # https://doc.scrapy.org/en/latest/topics/downloader-middleware.html # https://doc.scrapy.org/en/latest/topics/spider-middleware.htmlBOT_NAME = 'tencent'SPIDER_MODULES = ['tencent.spiders'] # 新建一個(gè)爬蟲會(huì)在什么位置 NEWSPIDER_MODULE = 'tencent.spiders'# LOG_LEVEL = "WARNING"# Crawl responsibly by identifying yourself (and your website) on the user-agent # 瀏覽器的標(biāo)識(shí) # USER_AGENT = 'tencent (+http://www.yourdomain.com)'# Obey robots.txt rules # 是否遵守robots協(xié)議 ROBOTSTXT_OBEY = True# Configure maximum concurrent requests performed by Scrapy (default: 16) # 設(shè)置最大并發(fā)請(qǐng)求 #CONCURRENT_REQUESTS = 32# Configure a delay for requests for the same website (default: 0) # See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay # See also autothrottle settings and docs # 下載延遲 #DOWNLOAD_DELAY = 3 # The download delay setting will honor only one of: # 每個(gè)域名的最大并發(fā)請(qǐng)求數(shù) #CONCURRENT_REQUESTS_PER_DOMAIN = 16 # 每個(gè)IP的最大并發(fā)請(qǐng)求數(shù) #CONCURRENT_REQUESTS_PER_IP = 16# Disable cookies (enabled by default) # cookies 是否開啟，默認(rèn)是被開啟來(lái)的 #COOKIES_ENABLED = False# Disable Telnet Console (enabled by default) # Telnet 插件是否開啟 #TELNETCONSOLE_ENABLED = False# Override the default request headers: # 默認(rèn)請(qǐng)求頭 DEFAULT_REQUEST_HEADERS = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Language': 'en', }# Enable or disable spider middlewares # See https://doc.scrapy.org/en/latest/topics/spider-middleware.html # 爬蟲中間鍵 #SPIDER_MIDDLEWARES = { # 'tencent.middlewares.TencentSpiderMiddleware': 543, #}# Enable or disable downloader middlewares # See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html # 下載中間鍵 #DOWNLOADER_MIDDLEWARES = { # 'tencent.middlewares.TencentDownloaderMiddleware': 543, #}# Enable or disable extensions # See https://doc.scrapy.org/en/latest/topics/extensions.html # 設(shè)置插件 #EXTENSIONS = { # 'scrapy.extensions.telnet.TelnetConsole': None, #}# Configure item pipelines # See https://doc.scrapy.org/en/latest/topics/item-pipeline.html # 設(shè)置 pipelines ITEM_PIPELINES = {'tencent.pipelines.TencentPipeline': 300, }# Enable and configure the AutoThrottle extension (disabled by default) # See https://doc.scrapy.org/en/latest/topics/autothrottle.html # 通過(guò)設(shè)置下面的參數(shù)可以讓爬蟲的速度變慢一點(diǎn) #AUTOTHROTTLE_ENABLED = True # The initial download delay #AUTOTHROTTLE_START_DELAY = 5 # The maximum download delay to be set in case of high latencies #AUTOTHROTTLE_MAX_DELAY = 60 # The average number of requests Scrapy should be sending in parallel to # each remote server #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 # Enable showing throttling stats for every response received: #AUTOTHROTTLE_DEBUG = False# Enable and configure HTTP caching (disabled by default) # 有關(guān)HTTP緩存設(shè)置 # See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings #HTTPCACHE_ENABLED = True #HTTPCACHE_EXPIRATION_SECS = 0 #HTTPCACHE_DIR = 'httpcache' #HTTPCACHE_IGNORE_HTTP_CODES = [] #HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

2. 在settings.py里面定義好的配置或者參數(shù)怎樣在爬蟲和pipelines等里面用呢？

? ? (1)? ? 在其他py文件通過(guò)? from ...? import ... 的方式直接導(dǎo)入就行

? ? ? ? ? 想在那個(gè)文件用，直接導(dǎo)入直接用就行

? ? (2) 可以使用? ?.settings 的方式，因?yàn)槭且粋€(gè)字典，

? ? ? ? ? 可以通過(guò)? ?.settings["配置或參數(shù)名"]? 或者? ?.settings.get("配置或參數(shù)名"）

? ? ? ? ?例如在爬蟲文件里的parese(self,response) 里面進(jìn)行? ??

? ? ? ? ? ? ?self.settings["配置或參數(shù)名"]? ?或者? ?self.settings.get("配置或參數(shù)名")

? ? ? ? 在pipelines里面?def process_item(self, item, spider) 的方法里面，

? ? ??? ?spider.settings["配置或參數(shù)名"]? ?或者? ?spider.settings.get("配置或參數(shù)名")

總結(jié)

以上是生活随笔為你收集整理的Scrapy框架的学习(8.scrapy中settings.py里面配置说明以及怎样设置配置或者参数以及怎样使用)的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Scrapy框架的学习(7. 了解Scr
下一篇： Scrapy框架的学习(9.Scrapy