當前位置：首頁 > 编程语言 > python >内容正文

python

python爬虫urllib 数据处理_Python 爬虫笔记之Urllib的用法

發布時間：2025/3/11 python 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 python爬虫urllib 数据处理_Python 爬虫笔记之Urllib的用法小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

urllib總共有四個子模塊,分別為request,error,parse,robotparser

request用于發送request(請求)和取得response(回應)

error包含request的異常,通常用于捕獲異常

parse用于解析和處理url

robotparser用于robot.txt文件的處理

urllib.request 模塊import urllib.request

response=urllib.request.urlopen("http://blog.youhaiqun.mom")

print(response.read().decode('utf-8'))

response是一個Httpresponse對象,它主要包含的方法有 read()

getheader(name),getheaders(),fileno()等函數

主要包含的屬性為status,msg,reason,closed,debuglevel

可以利用response.status,或response.read()來調用并獲取信息

urllib.request.urlopen()模塊urllib.request.urlopen(url,data,timeout,cafile,capath,cadefault,context)

利用URLopen打開url所對應的網址,data為附加參數,其必須為bytes型,(可以利用data來進行post方式的訪問)

urllib.parse.urlencode()模塊urllib.parse.urlencode({'word':'hello'})

可以把字典轉化為字符串

同時利用上面兩個模塊

data={'word':'hello'}

data=bytes(urllib.parse.urlencode(data),encoding='utf-8')

response=urllib.request.urlopen('http://blog.youhaiqun.mom',data,timeout=9)

urllib.request.Request()模塊

當需要在請求中加入header時就需要用到urllib.request.Request(),urllib.request.urlopen()只能利用data來傳遞附加的參數

request=urllib.request.Request(url,data,headers,method='get/post')

注意: 上面并沒有開始對url進行請求,只是構造了一個request,里面包含的headers,data等數據,需要經過下面的語句才算正式開始訪問

response=urllib.request.urlopen(request)

print(response.read().decode('utf-8'))

也可以通過add_header()來添加headers

request=urllib.request.Request(url,data,method='POST')

request.add_header('User-Agent','Mozilla/4.0(compatible;MSIE 5.5;Windows NT)')

urllib.request.Request的高級特征

對于cookie,代理的處理`

以上是生活随笔為你收集整理的python爬虫urllib 数据处理_Python 爬虫笔记之Urllib的用法的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。