生活随笔
收集整理的這篇文章主要介紹了
02-requests模块的概述
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
什么是requests模塊
官方文檔:https://requests.readthedocs.io/en/master/
- requests模塊是python中原生的基于網絡請求的模塊,其主要作用是用來模擬瀏覽器發起請求。功能強大,用法簡潔高效。在爬蟲領域中占據著半壁江山的地位。
- 為什么要使用requests模塊
因為在使用urllib模塊的時候,會有諸多不便之處,總結如下: - 手動處理url編碼
- 手動處理post請求參數
- 處理cookie和代理操作繁瑣
- …
使用requests模塊: - 自動處理url編碼
- 自動處理post請求參數
- 簡化cookie和代理操作
…
如何使用requests模塊
安裝:
pip install requests
使用流程
- 指定url
- 基于requests模塊發起請求
- 獲取響應對象中的數據值
- 持久化存儲
response的常用屬性
- response.text 響應體 str類型
- respones.content 響應體 bytes類型
- response.status_code 響應狀態碼
- response.request.headers 響應對應的請求頭
- response.headers 響應頭
- response.request.cookies 響應對應請求的cookie
- response.cookies 響應的cookie(經過了set-cookie動作)
Requests庫中有7個主要的函數
分別是 request() 、get() 、 head() 、post() 、put() 、patch() 、delete() 。這七個函數中request()函數是其余六個函數的基礎函數,其余六個函數的實現都是通過調用該函數實現的。
方法 說明
- requests.request() 構造一個請求,支撐一下方法的基礎方法
- requests.get() 獲取HTML網頁的主要方法,對應于HTTP的GET(請求URL位置的資源)
- requests.head() 獲取HTML網頁頭信息的方法,對應于HTTP的HEAD(請求URL位置的資源的頭部信息)
- requests.post() 向HTML網頁提交POST請求的方法,對應于HTTP的POST(請求向URL位置的資源附加新的數據)
- requests.put() 向HTML網頁提交PUT請求的方法,對應于HTTP的PUT(請求向URL位置儲存一個資源,覆蓋原來URL位置的資源)
- requests.patch() 向HTML網頁提交局部修改的請求,對應于HTTP的PATCH(請求局部更新URL位置的資源)
- requests.delete() 向HTML網頁提交刪除請求,對應于HTTP的DELETE(請求刪除URL位置儲存的資源)
而這幾個函數中,最常用的又是 requests.get() 函數。get函數有很多的參數,我只舉幾個比較常用的參數
參數 說明
- url 就是網站的url
- params 將字典或字節序列,作為參數添加到url中,get形式的參數
- data 將字典或字節序列,作為參數添加到url中,post形式的參數
- headers 請求頭,可以修改User-Agent等參數
- timeout 超時時間
- proxies 設置代理
import requests
from lxml
import etree
from Chaojiying_Python
import chaojiying
import time headers
= {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"} session
= requests
.Session
()
url
="https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.aspx"
page_text
= session
.get
(url
=url
,headers
=headers
).text
tree
= etree
.HTML
(page_text
)
img_src
= "https://so.gushiwen.cn/"+tree
.xpath
('//*[@id="imgCode"]/@src')[0]
img_data
= session
.get
(img_src
,headers
=headers
).content
with open ('C:/Users/gpc/Desktop/python/Chaojiying_Python/code.jpg',"wb") as fp
:fp
.write
(img_data
)
code_text
= chaojiying
.Chaojiying_Client
.tranformImgCode
()
time
.sleep
(2)login_url
= "https://so.gushiwen.cn/user/login.aspx?from=http%3a%2f%2fso.gushiwen.cn%2fuser%2fcollect.aspx"
data
= {"__VIEWSTATE": "wLzVsPN64jZIa8aQJI9HzVvaaknH6pBhUG+UOMQKX8NEFV49xwtLRgU8GH4O1o+mClDbtnYiKbXMOIM6VRh7HGzM4hpMpd0qBUM3b/pXlzZ2gnbcuB+5RUBJ/i0=","__VIEWSTATEGENERATOR": "C93BE1AE","from": "http://so.gushiwen.cn/user/collect.aspx","email": "18398149392","pwd": "cheng.1023","code": code_text
,"denglu": "登錄"
}
page_text_login
= session
.post
(url
= login_url
,headers
=headers
,data
=data
).text
print(page_text_login
)
import requests headers
= {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
}
url
= "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsList" for pg
in range(1,6):data
= {"on": "true","page": str(pg
),"pageSize": "15","productName":"","conditionType": "1","applyname":"","applysn":""}response
= requests
.post
(url
=url
,headers
=headers
,data
=data
) page_text
= response
.json
() uuid
= page_text
["list"]for idd
in uuid
:iidd
= (idd
["ID"])url2
= "http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsById" data2
= {"id":iidd
}response
= requests
.post
(url
=url2
,headers
=headers
,data
=data2
) page_text
= response
.json
() qymc
= page_text
["epsName"]xkzbh
= page_text
["productSn"]xkxm
= page_text
["certStr"]qyzs
= page_text
["epsAddress"]scdz
= page_text
["epsProductAddress"]shxydm
= page_text
["businessLicenseNumber"]fddbr
= page_text
["businessPerson"]qyfzr
= page_text
["legalPerson"]zlfzr
= page_text
["qualityPerson"]fzjg
= page_text
["qfManagerName"]qfr
= page_text
["xkName"]rcjdgljg
= page_text
["rcManagerDepartName"]rcjdglry
= page_text
["rcManagerUser"]yxq
= page_text
["xkDate"]fzrq
= page_text
["xkDateStr"]xiangqing
= ("企業名稱:" + qymc
+ "\n" + "許可證編號:" + xkzbh
+ "\n" + "許可項目:" + xkxm
+ "\n" + "企業住所:" + qyzs
+ "\n" + "生產地址:" + scdz
+ "\n" + "社會信用代碼:" + shxydm
+ "\n" + "法定代表人:" + fddbr
+ "\n" + "企業負責人:" + qyfzr
+ "\n" + "質量負責人:" + zlfzr
+ "\n" + "發證機關:" + fzjg
+ "\n" + "簽發人:" + qfr
+ "\n" + "日常監督管理機構:" + rcjdgljg
+ "\n" + "日常監督管理人員:" + rcjdglry
+ "\n" + "有效期至:" + yxq
+ "\n" + "發證日期:" + fzrq
)print(xiangqing
)with open ("藥監局詳情爬取.txt","a",encoding
="utf-8") as fp
: fp
.write
(xiangqing
+ "\n")
import requests headers
= {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"
}
url
= "https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=0&limit=20" params
= {'type': '5','interval_id': '100:90','action':'','start': '0','limit': '20'
}response
= requests
.get
(url
=url
,headers
=headers
,params
=params
) page_text
= response
.json
()
for movie
in page_text
:name
= movie
['title']score
= movie
['score']print(name
,score
)with open ("豆瓣.txt","a",encoding
="utf-8") as fp
: fp
.write
(name
+ " " + score
+ "\n")
總結
以上是生活随笔為你收集整理的02-requests模块的概述的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。