日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

scrapy 报错401

發(fā)布時間:2023/12/16 编程问答 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 scrapy 报错401 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

新人學scrapy,最近在爬金華信義居的房屋信息新房 - 列表,樓盤詳細信息都很順利的爬取成功了,但抓樓盤單元就報401的錯,抓破腦袋不知道咋搞!

百度了說是401 是需要驗證用戶信息,但具體應該怎么操作呢???

貼上代碼

import datetime import json import pandas as pd import scrapy from jinhua.items import HouseItemclass ProjectSpider(scrapy.Spider):name = 'jhhouse'allowed_domains = ['https://www.jhtmsf.com']start_urls = ['https://www.jhtmsf.com/House/GetPageForRoom']spiderTime = datetime.datetime.now().strftime('%Y-%m-%d')def start_requests(self):df = pd.read_excel('../采集結果/py_jhtmsf_loupan.xlsx')for idx, row in df.iterrows():project_id = row['project_id']referer_url = 'https://www.jhtmsf.com/House/Room/' + str(project_id)yield scrapy.FormRequest(self.start_urls[0],headers={'Referer': referer_url},cookies={'__RequestVerificationToken': 'XpgK_gMXlG71JzgTKt27kPr9ZQE1Ptbm6DhRfN7Ol7OMuuS_p43T6XOKkwg48zNUlI5jSYlJA97oO_KoupElbXV5Zm-1ldmVCCjUkltPB8c1','ASP.NET_SessionId': 'vwtw2pdanxrkdnfbqimzol1u','Hm_lvt_88b265ab6b07373c61ffa7d36d6db2c3': '1634609773,1634711852,1634882161,1634883812','Hm_lpvt_88b265ab6b07373c61ffa7d36d6db2c3': '1634883832'},formdata={'eid': str(project_id), 'bulid': '', 'layer': '', 'status': '0', 'pageNumber': '1', 'pageSize': '15', 'sortName': 'StartDate', 'sortOrder': 'desc'},callback=self.parse,meta={'project_id': project_id, 'referer_url': referer_url})def parse(self, response):jsonBody = json.loads(response.body)page = jsonBody["TotalPage"]total = jsonBody["Total"]project_id = response.meta['project_id']referer_url = response.meta['referer_url']for pg in range(1, int(page) + 1):yield scrapy.FormRequest(self.start_urls[0], headers={'Referer': str(referer_url)},formdata={'eid': str(project_id), 'bulid': '', 'layer': '', 'status': "0", 'pageNumber': str(pg),'pageSize': "15", 'sortName': 'StartDate', 'sortOrder': 'desc'},callback=self.content_parse, meta={'project_id': project_id, 'total': total})def content_parse(self, response):jsonBody = json.loads(response.body)jrows = jsonBody["Rows"]if jrows:for row in jrows:item = HouseItem()item['project_id'] = response.meta['project_id']item['total'] = response.meta['total']item['area'] = row['Area']item['build_nb'] = row['Bulid']item['on_layer'] = row['Layer']item['price'] = row['Price']item['room_nb'] = row['RoomNO']item['start_time'] = row['StartDate']item['house_status'] = row['Status']item['spider_time'] = self.spiderTimeyield item

總結

以上是生活随笔為你收集整理的scrapy 报错401的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。