日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 运维知识 > Nginx >内容正文

Nginx

IIS6/IIS7以上、Nginx、Apache拦截屏蔽垃圾蜘蛛UA爬行降低负载方法IIS7.5如何限制某UserAgent 禁止访问

發(fā)布時(shí)間:2023/12/18 Nginx 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 IIS6/IIS7以上、Nginx、Apache拦截屏蔽垃圾蜘蛛UA爬行降低负载方法IIS7.5如何限制某UserAgent 禁止访问 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

最近網(wǎng)站訪問非常慢,cpu占用非常高,服務(wù)器負(fù)載整體也非常高,打開日志發(fā)現(xiàn)有很多不知名的蜘蛛一直在爬行我的站點(diǎn),根據(jù)經(jīng)驗(yàn)肯定是這里的問題,于是根據(jù)我的情況寫了規(guī)則做了屏蔽,屏蔽后負(fù)載降下來了,下面整理下iis及nginx及apache環(huán)境下如何屏蔽不知名的蜘蛛ua。海寧育嬰師

注意(請根據(jù)自己的情況調(diào)整刪除或增加ua信息,我提供的規(guī)則中包含了不常用的蜘蛛ua,幾乎用不著,若您的網(wǎng)站比較特殊,需要不同的蜘蛛爬取,建議仔細(xì)分析規(guī)則,將指定ua刪除即可)

IIS7.5測試ok

指定特征禁止UA訪問,返回代碼403

<rule name="NoUserAgent" stopProcessing="true"> <match url=".*" /> <conditions> <add input="{HTTP_USER_AGENT}" pattern="|特征1|特征2|特征3" /> </conditions> <action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="You did not present a User-Agent header which is required for this site" /> </rule>

例如只禁止空UA

<add input="{HTTP_USER_AGENT}" pattern="|^$|特征2|特征3" />

例如禁止其他UA+空UA

<add input="{HTTP_USER_AGENT}" pattern="^$|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot" />

禁止特定蜘蛛

<rewrite> <rules> <rule name="Block Some Ip Adresses OR Bots" stopProcessing="true"> <match url="(.*)" /> <conditions logicalGrouping="MatchAny"> <add input="{HTTP_USER_AGENT}" pattern="蜘蛛名稱" ignoreCase="true" /> <!-- 來禁止特定蜘蛛 --> <add input="{HTTP_USER_AGENT}" pattern="^$" /> <!-- 禁止空 UA 訪問 --> <add input="{REMOTE_ADDR}" pattern="單獨(dú)IP或使用正則表達(dá)的IP地址" /> </conditions> <!-- 你也可以使用<action type="AbortRequest" />來直接代替下面這段代碼 --> <action type="CustomResponse" statusCode="403" statusReason="Access is forbidden." statusDescription="Access is forbidden." /> </rule> </rules> </rewrite>

禁止瀏覽某文件

<rule name="Block spider"> <match url="(^robotssss.txt$)" ignoreCase="false" negate="true" /> <!-- 禁止瀏覽某文件 --> <action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Forbidden" /> </rule>





1、nginx禁止垃圾蜘蛛訪問,把下列代碼放到你的nginx配置文件里面。
#禁止Scrapy等工具的抓取

if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) { return 403; } #禁止指定UA及UA為空的訪問 if ($http_user_agent ~ "opensiteexplorer|MauiBot|FeedDemon|SemrushBot|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|semrushbot|alphaseobot|semrush|Feedly|UniversalFeedParser|webmeup-crawler|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|^$" ) { return 403; } #禁止非GET|HEAD|POST方式的抓取 if ($request_method !~ ^(GET|HEAD|POST)$) { return 403; }


2、IIS7/IIS8/IIS10及以上web服務(wù)請?jiān)诰W(wǎng)站根目錄下創(chuàng)建web.config文件,并寫入如下代碼即可;

<?xml version="1.0" encoding="UTF-8"?> <configuration> <system.webServer> <rewrite> <rules> <rule name="Block spider"> <match url="(^robots.txt$)" ignoreCase="false" negate="true" /> <conditions> <add input="{HTTP_USER_AGENT}" pattern="MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" ignoreCase="true" /> </conditions> <action type="AbortRequest" /> </rule> </rules> </rewrite> </system.webServer> </configuration>


3、apache請?jiān)?htaccess文件中添加如下規(guī)則即可:

<IfModule mod_rewrite.c> RewriteEngine On #Block spider RewriteCond %{HTTP_USER_AGENT} "MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" [NC] RewriteRule !(^robots\.txt$) - [F] </IfModule>


注:規(guī)則中默認(rèn)屏蔽部分不明蜘蛛,要屏蔽其他蜘蛛按規(guī)則添加即可

附各大蜘蛛名字:

google蜘蛛:googlebot

百度蜘蛛:baiduspider

百度手機(jī)蜘蛛:baiduboxapp

yahoo蜘蛛:slurp

alexa蜘蛛:ia_archiver

msn蜘蛛:msnbot

bing蜘蛛:bingbot

altavista蜘蛛:scooter

lycos蜘蛛:lycos_spider_(t-rex)

alltheweb蜘蛛:fast-webcrawler

inktomi蜘蛛:slurp

有道蜘蛛:YodaoBot和OutfoxBot

熱土蜘蛛:Adminrtspider

搜狗蜘蛛:sogou spider

SOSO蜘蛛:sosospider

360搜蜘蛛:360spider




網(wǎng)絡(luò)上常見的垃圾UA列表
內(nèi)容采集

FeedDemon
Java 內(nèi)容采集
Jullo 內(nèi)容采集
Feedly 內(nèi)容采集
UniversalFeedParser 內(nèi)容采集
SQL注入

BOT/0.1 (BOT for JCE)
CrawlDaddy
無用爬蟲

EasouSpider
Swiftbot
YandexBot
AhrefsBot
jikeSpider
MJ12bot
YYSpider
oBot
CC攻擊器

ApacheBench
WinHttp
TCP攻擊

HttpClient
掃描

Microsoft URL Control
ZmEu phpmyadmin
jaunty




總結(jié)

以上是生活随笔為你收集整理的IIS6/IIS7以上、Nginx、Apache拦截屏蔽垃圾蜘蛛UA爬行降低负载方法IIS7.5如何限制某UserAgent 禁止访问的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。