生活随笔
收集整理的這篇文章主要介紹了
python爬虫模拟登录人人网
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
模擬登錄:爬取基于某些用戶的用戶信息。
需求1:對人人網進行模擬登錄。
點擊登錄按鈕之后會發起一個post請求 post請求中會攜帶登錄之前錄入的相關的登錄信息(用戶名,密碼,驗證碼…) 驗證碼:每次請求都會變化
需求2:爬取當前用戶的相關的用戶信息(個人主頁中顯示的用戶信息)
http/https協議特性:無狀態。
沒有請求到對應頁面數據的原因:
發起的第二次基于個人主頁頁面請求的時候,服務器端并不知道該此請求是基于登錄狀態下的請求。
cookie:用來讓服務器端記錄客戶端的相關狀態。
手動處理:通過抓包工具獲取cookie值,將該值封裝到headers中。(不建議) 自動處理: - cookie值的來源是哪里? - 模擬登錄post請求后,由服務器端創建。
session會話對象: 作用:
可以進行請求的發送。 如果請求過程中產生了cookie,則該cookie會被自動存儲/攜帶在該session對象中。 - 創建一個session對象:session = requests.Session() - 使用session對象進行模擬登錄post請求的發送(cookie就會被存儲在session中) - session對象對個人主頁對應的get請求進行發送(攜帶了cookie)
1. 對http://www.renren.com/發送請求,拿到下面這個頁面的源碼
2. 對頁面中的驗證碼圖片進行定位,獲取到img標簽中的src屬性的值,再對src中的網址發送get請求,將驗證碼圖片保存到本地,后面會使用超級鷹打碼平臺將保存到本地的驗證碼圖片進行識別
3. 點擊登錄按鈕通過瀏覽器抓包,發現瀏覽器向服務器發送了一個post請求,請求的url為http://www.renren.com/ajaxLogin/login?1=1&uniqueTimestamp=202112910495,抓取該次請求的數據包,查看響應頭信息中是否存在set-cookie,如果有,則證實該次請求時,服務器端給客戶端創建了會話對象,且創建了cookie返回給了客戶端進行存儲。
果然存在set-cookie,因此,我們在使用requests模塊進行模擬登陸時,發起的請求也是需要攜帶cookie的 。那么cookie如何被攜帶到requests的請求中呢?
將cookie手動從抓包工具中獲取,然后封裝到requests請求的headers中,將headers作用到請求方法中。(不建議)
headers
= { 'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36' , 'Cookie' : 'xxxxxxxxx'
}
創建會話對象,使用會話對象進行請求發送。因為會話中會自動攜帶且處理cookie。 (推薦)
session
= requests
. Session
( )
page_text
= session
. get
( url
= url
, headers
= headers
) . text
. . . . . .
4. 通過對網站登錄的抓包,發現了請求的url為:http://www.renren.com/974713149,響應回來的就是我們所需要的登錄成功之后的首頁。所以對這個url發送請求,并注意模擬請求頭User-Agent、Referer、Cookie
5. 對http://www.renren.com/974713149/profile發送get請求拿到下面個人主頁的源碼:
代碼演示:
將cookie手動從抓包工具中獲取,然后封裝到requests請求的headers中,將headers作用到請求方法中。(不建議)
import requests
from lxml
import etree
from hashlib
import md5
def getCodeText ( userName
, password
, appId
, imgUrl
) : class Chaojiying_Client ( object ) : def __init__ ( self
, username
, password
, soft_id
) : self
. username
= usernamepassword
= password
. encode
( 'utf8' ) self
. password
= md5
( password
) . hexdigest
( ) self
. soft_id
= soft_idself
. base_params
= { 'user' : self
. username
, 'pass2' : self
. password
, 'softid' : self
. soft_id
, } self
. headers
= { 'Connection' : 'Keep-Alive' , 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)' , } def PostPic ( self
, im
, codetype
) : """im: 圖片字節codetype: 題目類型 參考 http://www.chaojiying.com/price.html""" params
= { 'codetype' : codetype
, } params
. update
( self
. base_params
) files
= { 'userfile' : ( 'ccc.jpg' , im
) } r
= requests
. post
( 'http://upload.chaojiying.net/Upload/Processing.php' , data
= params
, files
= files
, headers
= self
. headers
) return r
. json
( ) def ReportError ( self
, im_id
) : """im_id:報錯題目的圖片ID""" params
= { 'id' : im_id
, } params
. update
( self
. base_params
) r
= requests
. post
( 'http://upload.chaojiying.net/Upload/ReportError.php' , data
= params
, headers
= self
. headers
) return r
. json
( ) if __name__
== '__main__' : chaojiying
= Chaojiying_Client
( userName
, password
, appId
) im
= open ( imgUrl
, 'rb' ) . read
( ) return chaojiying
. PostPic
( im
, 1902 )
headers
= { 'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36' , 'Referer' : 'http://www.renren.com/SysHome.do' , 'Cookie' : 'anonymid=klgdsqz5n7c6dn; depovince=ZGQT; _r01_=1; JSESSIONID=abcqWHDNhNOVf95ntfjFx; taihe_bi_sdk_uid=926da97ed7bdff5fc3ece47fdd554b0b; taihe_bi_sdk_session=ffa92a5a812142ba8dac302676d881cd; ick_login=426dff64-6952-4319-8c8f-96ea6f498550; first_login_flag=1; ln_uact=910456393@qq.com; ln_hurl=http://hdn.xnimg.cn/photos/hdn421/205/2035/h_main_9aN0_0c1b00037b06195a.jpg; wp_fold=0; jebecookies=c2363801-e587-4f54-8566-24b86aa22659|||||; _de=B3D043F455F38852340E4CEC836F3769696BF75400CE19CC; p=2e69883207d99e253471f621d896037d9; t=1f917c44eaa1178b8bd357e96d7346fc9; societyguester=1f917c44eaa1178b8bd357e96d7346fc9; id=974713149; xnsid=364172ac; loginfrom=syshome'
}
url
= 'http://www.renren.com/'
page_text
= requests
. get
( url
= url
, headers
= headers
) . text
tree
= etree
. HTML
( page_text
)
img_url
= tree
. xpath
( '//*[@id="verifyPic_login"]/@src' ) [ 0 ]
print ( img_url
)
img_data
= requests
. get
( img_url
, headers
= headers
) . content
print ( img_data
)
with open ( './code.jpg' , 'wb' ) as fp
: fp
. write
( img_data
)
result
= getCodeText
( '用戶名' , '密碼' , 'appid' , '驗證碼本地存儲的路徑' )
print ( result
[ 'pic_str' ] ) login_url
= 'http://www.renren.com/9747139'
login_page_text
= requests
. get
( url
= login_url
, headers
= headers
) . text
with open ( 'renren.html' , 'w' , encoding
= 'utf-8' ) as fp
: fp
. write
( login_page_text
)
detail_url
= 'http://www.renren.com/974713149/profile'
detail_page_text
= requests
. get
( url
= detail_url
, headers
= headers
) . text
with open ( 'zep.html' , 'w' , encoding
= 'utf-8' ) as fp
: fp
. write
( detail_page_text
)
保存到本地的renren.html: 保存到本地的zep.html: 2. 創建會話對象,使用會話對象進行請求發送。因為會話中會自動攜帶且處理cookie。 (推薦)
import requests
from lxml
import etree
from hashlib
import md5
def getCodeText ( userName
, password
, appId
, imgUrl
) : class Chaojiying_Client ( object ) : def __init__ ( self
, username
, password
, soft_id
) : self
. username
= usernamepassword
= password
. encode
( 'utf8' ) self
. password
= md5
( password
) . hexdigest
( ) self
. soft_id
= soft_idself
. base_params
= { 'user' : self
. username
, 'pass2' : self
. password
, 'softid' : self
. soft_id
, } self
. headers
= { 'Connection' : 'Keep-Alive' , 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)' , } def PostPic ( self
, im
, codetype
) : """im: 圖片字節codetype: 題目類型 參考 http://www.chaojiying.com/price.html""" params
= { 'codetype' : codetype
, } params
. update
( self
. base_params
) files
= { 'userfile' : ( 'ccc.jpg' , im
) } r
= requests
. post
( 'http://upload.chaojiying.net/Upload/Processing.php' , data
= params
, files
= files
, headers
= self
. headers
) return r
. json
( ) def ReportError ( self
, im_id
) : """im_id:報錯題目的圖片ID""" params
= { 'id' : im_id
, } params
. update
( self
. base_params
) r
= requests
. post
( 'http://upload.chaojiying.net/Upload/ReportError.php' , data
= params
, headers
= self
. headers
) return r
. json
( ) if __name__
== '__main__' : chaojiying
= Chaojiying_Client
( userName
, password
, appId
) im
= open ( imgUrl
, 'rb' ) . read
( ) return chaojiying
. PostPic
( im
, 1902 )
session
= requests
. Session
( )
headers
= { 'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36' , 'Referer' : 'http://www.renren.com/SysHome.do' ,
}
url
= 'http://www.renren.com/'
page_text
= session
. get
( url
= url
, headers
= headers
) . text
tree
= etree
. HTML
( page_text
)
img_url
= tree
. xpath
( '//*[@id="verifyPic_login"]/@src' ) [ 0 ]
print ( img_url
)
img_data
= session
. get
( img_url
, headers
= headers
) . content
print ( img_data
)
with open ( './code.jpg' , 'wb' ) as fp
: fp
. write
( img_data
)
result
= getCodeText
( '用戶名' , '密碼' , 'appid' , '驗證碼圖片的路徑' )
print ( result
[ 'pic_str' ] ) login_post_url
= 'http://www.renren.com/ajaxLogin/login?1=1&uniqueTimestamp=202112910495'
data
= { 'email' : '910451393@qq.com' , 'icode' : result
[ 'pic_str' ] , 'origURL' : 'http://www.renren.com/home' , 'domain' : 'renren.com' , 'key_id' : '1' , 'captcha_type' : 'web_login' , 'password' : '346d050fe82d3cfe090210864d73b65b5608bf90173371b3c10e7df6e533' , 'rkey' : '3a7cdde0b042c1ba11169c3378fd5b' , 'f' : 'http%3A%2F%2Fwww.renren.com%2F974713149%2Fnewsfeed%2Fphoto'
}
response
= session
. post
( url
= login_post_url
, headers
= headers
, data
= data
)
print ( response
. text
) login_url
= 'http://www.renren.com/974713149'
login_page_text
= session
. get
( url
= login_url
, headers
= headers
) . text
with open ( 'renren.html' , 'w' , encoding
= 'utf-8' ) as fp
: fp
. write
( login_page_text
)
detail_url
= 'http://www.renren.com/974713149/profile'
detail_page_text
= session
. get
( url
= detail_url
, headers
= headers
) . text
with open ( 'zep.html' , 'w' , encoding
= 'utf-8' ) as fp
: fp
. write
( detail_page_text
)
zep.html:
總結
以上是生活随笔 為你收集整理的python爬虫模拟登录人人网 的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔 網站內容還不錯,歡迎將生活随笔 推薦給好友。