生活随笔
收集整理的這篇文章主要介紹了
爬取51job数据
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
1.先導包requests,json(我用的pycharm,如果你沒有這個包的話,他會提示你,你直接點擊import這個就可以,pycharm安裝教程網(wǎng)上搜)
2.代碼如下
import requests
import json
from lxml import etreeBASE_DOMAIN =
'https://search.51job.com'
HEADERS =
{'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
}
Recruitments =
[]def parse_page
(url
):resp = requests
.get
(url
,HEADERS
)text = resp
.content
.decode
('gbk')tree = etree
.HTML
(text
)PositionAndCompany = tree
.xpath
("//div[@class='el']//span/a/@title")Company = PositionAndCompany
[1::2
]Position = PositionAndCompany
[::2
]Workplace = tree
.xpath
("//div[@class='el']//span[@class='t3']/text()")Payroll = tree
.xpath
("//div[@class='el']//span[@class='t4']/text()")Releasetime = tree
.xpath
("//div[@class='el']//span[@class='t5']/text()")for value in zip
(Position
, Company
, Workplace
, Payroll
, Releasetime
):Position
, Company
, Workplace
, Payroll
, Releasetime = valueRecruitment =
{'職位': Position
,'公司': Company
,'工作地點': Workplace
,'薪資': Payroll
,'發(fā)布時間': Releasetime
,}Recruitments
.append
(Recruitment
)with open
('51job.json', 'w', encoding=
'utf-8') as fp:json
.dump
(Recruitments
, fp
, ensure_ascii=False
)def spider
():base_urls =
'https://search.51job.com/list/120200%252C010000%252C020000%252C030200%252C040000,000000,0000,00,9,99,python,2,{}.html'for x in range
(1
,51
):page_url = base_urls
.format
(x
)parse_page
(page_url
)print
('第%s頁爬取完成' % x
)def main
():spider
()if __name__ ==
'__main__':main
()
運行結(jié)果
觸動精靈連接不上設備這個網(wǎng)址上有解決辦https://www.smzy.com/smzy/tech29119.html但是檢查設備上的觸動精靈服務和廣播開關(guān)是否為開啟狀態(tài)不知道設備上的觸動精靈服務和廣播開關(guān)在哪里
總結(jié)
以上是生活随笔為你收集整理的爬取51job数据的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。