www.python123.org_python爬虫-requests
Requests庫是目前常用且效率較高的爬取網(wǎng)頁的庫
1.一個(gè)簡單的例子
import requests #引入requests庫
r = requests.get("http://www.baidu.com") #調(diào)用get方法獲取界面print(r.status_code) #輸出狀態(tài)碼
print(r.text) #輸出頁面信息
通過以下代碼,便可獲取一個(gè)response對(duì)象
2.通用代碼框架
importrequestsdefgetHtmlText(url):try:
r= requests.get(url, timeout = 30) #設(shè)置響應(yīng)時(shí)間和地址
r.raise_for_status() #獲取狀態(tài)碼,如果不是200會(huì)引發(fā)HTTPERROR異常
r.encoding=r.apparent_encoding #apparent_encoding是識(shí)別網(wǎng)頁的編碼類型returnr.textexcept:return "產(chǎn)生異常"
if __name__ == "__main__":
url= ‘http://www.baidu.com‘
print(getHtmlText(url))
3.requests庫的具體介紹
3.1 response屬性介紹
屬性邏輯結(jié)構(gòu):
3.2requests方法介紹
requests庫對(duì)比http協(xié)議
????
ps:在這些方法中,大致有三個(gè)參數(shù),略有差別
3.2.1 ?get方法
r = requests.get(‘http://www.baidu.com‘)print(r.text)
3.2.2head方法
r = requests.head(‘http://www.baidu.com‘)print(r.headers)
3.2.3 post方法
payload = {‘key1‘: ‘value1‘, ‘key2‘ : ‘value2‘}
r= requests.post(‘http://httpbin.org/post‘, data =payload)print(r.text)#輸出結(jié)果
{..."form": {"key1": "value1","key2": "value2"},
...}
3.2.4 put方法
payload = {‘key1‘:?‘value1‘,?‘key2‘?:?‘value2‘}
r= requests.put(‘http://httpbin.org/post‘, data =payload)print(r.text) #向URL傳一個(gè)字典,自動(dòng)編碼為表單
#........字符串,........data#輸出結(jié)果
{..."form": {"key1": "value1","key2": "value2"},
...}
3.2.5 reuqest方法--構(gòu)造請(qǐng)求
requests.request(method, url, **kwrags)#method:請(qǐng)求方式,對(duì)應(yīng)get/put/post等七種#url : 鏈接#**kwrags : 13個(gè)控制訪問的參數(shù)
method請(qǐng)求方式:
requests.request(‘GET‘, url, **kwrags)
requests.request(‘HEAD‘, url, **kwrags)
requests.request(‘POST‘, url, **kwrags)
requests.request(‘PUT‘, url, **kwrags)
requests.request(‘PATCH‘, url, **kwrags)
requests.request(‘DELETE‘, url, **kwrags)
requests.request(‘OPTIONS‘, url, **kwrags)
**kwargs詳解:
kv = {‘key1‘: ‘value1‘, ‘key2‘ : ‘value2‘} #params
r= requests.request(‘POST‘, ‘http://python123.io/ws‘, data =kv)
data1= ‘hellowrld‘ #datar= requests.request(‘POST‘, ‘http://python123.io/ws‘, data = data1)
jso = {‘key1‘: ‘value1‘} #json
r = requests.request(‘POST‘,‘http://python123.io/ws‘, json = jso)
hd = {‘key1‘: ‘value1‘} #headers
r = requests.request(‘POST‘,‘http://python123.io/ws‘, headers = hd)
fs = {‘file‘ : open(‘data.xls‘,‘rb‘)} #files
r = requests.request(‘POST‘,‘http://python123.io/ws‘, files =fs)#timeout
r = requests.request(‘POST‘,‘http://python123.io/ws‘, timeout = 10)#proxies
pxs = {‘http‘: ‘http://usr:pass@10.10.10:1234‘,‘https‘ : ‘https://10.10.10.1:4321‘}
r= requests.request(‘GET‘,‘http://www.baidu.com‘, proxies = pxs)
3.2.6 delete方法
3.2.7 patch方法
3.3PATCH和PUT的區(qū)別
.
4.requests庫的異常
本文是通過整合慕課網(wǎng)上的資料和網(wǎng)上相關(guān)資料完成
總結(jié)
以上是生活随笔為你收集整理的www.python123.org_python爬虫-requests的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: js回车键事件
- 下一篇: python异常处理_Python入门