當前位置：首頁 > 编程语言 > python >内容正文

python

Python笔记-获取拉钩网南京关于python岗位数据

發布時間：2025/3/15 python 24 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python笔记-获取拉钩网南京关于python岗位数据小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

FIddler抓包如下：

程序打印如下：

源碼如下：

import re import requestsclass HandleLaGou(object):def __init__(self):self.laGou_session = requests.session()self.header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}self.city_list = ""#獲取全國城市列表def handle_city(self):city_search = re.compile(r'zhaopin/">(.*?)</a>')city_url = "https://www.lagou.com/jobs/allCity.html"city_result = self.handle_request(method = "GET", url = city_url)self.city_list = city_search.findall(city_result)self.laGou_session.cookies.clear()def handle_city_job(self, city):first_request_url = "https://www.lagou.com/jobs/list_python?city=%s&cl=false&fromSearch=true&labelWords=&suginput=" % cityfirst_response = self.handle_request(method = "GET", url = first_request_url)total_page_search = re.compile(r'class="span\stotalNum">(\d+)</span>')try:total_page = total_page_search.search(first_response).group(1)except:returnelse:for i in range(1, int(total_page) + 1):data = {"pn": i,"kd": "python"}page_url = "https://www.lagou.com/jobs/positionAjax.json?city=%s&needAddtionalResult=false" % cityreferer_url = "https://www.lagou.com/jobs/list_python?city=%s&cl=false&fromSearch=true&labelWords=&suginput=" % cityself.header['Referer'] = referer_url.encode()response = self.handle_request(method = "POST", url = page_url, data = data)print(response)def handle_request(self, method, url, data= None, info = None):if method == "GET":response = self.laGou_session.get(url = url, headers = self.header, proxies={"http": "http://127.0.0.1:8888", "https":"http:127.0.0.1:8888"},verify=r"D:/Fiddler/FiddlerRoot.pem")elif method == "POST":response = self.laGou_session.post(url = url, headers = self.header, data=data, proxies={"http": "http://127.0.0.1:8888", "https":"http:127.0.0.1:8888"},verify=r"D:/Fiddler/FiddlerRoot.pem")response.encoding = 'utf-8'return response.textif __name__ == '__main__':laGou = HandleLaGou()laGou.handle_city()for city in laGou.city_list:laGou.handle_city_job(city)breakpass

這里有個小技巧

以前用C++去搞爬蟲，簡直累死，現在用python真是香，很多都幫忙處理了！

通過使用這個session，當在爬數據時，可能他會先觸發一個頁面，設置了cookie后，才能進入爬取。

總結

以上是生活随笔為你收集整理的Python笔记-获取拉钩网南京关于python岗位数据的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：对HTTP基本认识（HTTP协议入门必备
下一篇： Python笔记-内置装饰器