日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程语言 > python >内容正文

python

python爬虫——智联招聘(上)

發(fā)布時(shí)間:2023/12/14 python 38 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python爬虫——智联招聘(上) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

開發(fā)環(huán)境

win7+,python3.4+

pymysql庫,安裝:pip3 install pymysql

selenium庫,火狐瀏覽器56.0版本,geckodriver.exe,selenium知識(shí)點(diǎn)

MySQL5.5數(shù)據(jù)庫,Navicat圖形化界面


爬取步驟

1.分析智聯(lián)招聘網(wǎng),獲取網(wǎng)頁信息

????打開“https://www.zhaopin.com/”選擇城市“北京”,輸入“GIS”點(diǎn)擊“搜工作”網(wǎng)頁將顯示與“GIS”相關(guān)的北京地區(qū)的招聘信息

?? F12進(jìn)去開發(fā)者后臺(tái)“城市”“工作輸入”“搜工作按鈕”的html元素分別為“id=JobLocation”,“id=KeyWord_kw2”,“class=dosearch”(selenium知識(shí)點(diǎn))。根據(jù)這些可以自動(dòng)轉(zhuǎn)入下個(gè)頁面:

代碼一:

def get_main_page(keyword, city):fox = webdriver.Firefox()url = 'https://www.zhaopin.com/' fox.get(url)time.sleep(1)jl = fox.find_element_by_id('JobLocation')jl.clear()jl.send_keys(city)zl = fox.find_element_by_id('KeyWord_kw2')zl.clear()zl.send_keys(keyword)sj = fox.find_element_by_class_name('doSearch').click()time.sleep(3)


2.分析招聘信息,獲取信息

????查看源代碼找到各個(gè)部分的信息具體如下

def get_everypage_info(fox, keyword, city):fox.switch_to_window(fox.window_handles[-1])tables = fox.find_elements_by_tag_name('table') for i in range(0, len(tables)):if i == 0:''' row = ['職位名稱', '公司名稱', '工作地點(diǎn)', '公司規(guī)模', '工作經(jīng)驗(yàn)', '平均月薪', '學(xué)歷要求', '職位描述'] information.append(row) ''' else:address, develop, jingyan, graduate, require = " ", " ", " ", " ", " " job = tables[i].find_element_by_tag_name('a').textcompany = tables[i].find_element_by_css_selector('.gsmc a').textsalary = tables[i].find_element_by_css_selector('.zwyx').textspans = tables[i].find_elements_by_css_selector('.newlist_deatil_two span')for j in range(0, len(spans)):if "地點(diǎn)" in spans[j].get_attribute('textContent'):address = (spans[j].get_attribute('textContent'))[3:]elif "公司規(guī)模" in spans[j].get_attribute('textContent'):develop = (spans[j].get_attribute('textContent'))[5:]elif "經(jīng)驗(yàn)" in spans[j].get_attribute('textContent'):jingyan = (spans[j].get_attribute('textContent'))[3:]elif "學(xué)歷" in spans[j].get_attribute('textContent'):graduate = (spans[j].get_attribute('textContent'))[3:]require = (tables[i].find_element_by_css_selector('.newlist_deatil_last').get_attribute('textContent'))[8:]

以上代碼得到每一頁的每個(gè)招聘公司的信息:職位名稱', '公司名稱', '工作地點(diǎn)', '公司規(guī)模', '工作經(jīng)驗(yàn)', '平均月薪', '學(xué)歷要求', '職位描述'


3.信息存入MySQL數(shù)據(jù)庫

????連接mysql并且創(chuàng)建新表,將數(shù)據(jù)逐行寫入數(shù)據(jù)庫,同時(shí)將“職位描述”寫入一個(gè)txt文件

連接mysql:

table_name = city + '_' + keyword conn = pymysql.Connect(host='127.0.0.1', port=3306, user='root', passwd='', db='python', charset='utf8') cursor = conn.cursor()

創(chuàng)建新表:

sql = """CREATE TABLE IF NOT EXISTS %s( 職位名稱 CHAR(100), 公司名稱 CHAR(100), 工作地點(diǎn) CHAR(100), 公司規(guī)模 CHAR(100), 工作經(jīng)驗(yàn) CHAR(100), 平均月薪 CHAR(100), 學(xué)歷要求 CHAR(100) )default charset=UTF8""" % (table_name) cursor.execute(sql)

將信息分別寫入mysql和txt:

insert_row = ('insert into {0}(職位名稱,公司名稱,工作地點(diǎn),公司規(guī)模,工作經(jīng)驗(yàn),平均月薪,學(xué)歷要求) VALUES(%s,%s,%s,%s,%s,%s,%s)'.format(table_name)) insert_data = (job, company, address, develop, jingyan, salary, graduate) cursor.execute(insert_row, insert_data) conn.commit() with open('%s職位描述.txt' % (table_name), 'a', encoding='utf-8') as f:f.write(require)


4.招聘信息頁面跳轉(zhuǎn)

“下一頁”按鈕的html元素通過下面代碼找到并跳轉(zhuǎn):

count = 0 while count <= 10:try:next_page = fox.find_element_by_class_name('pagesDown-pos').click()break except:time.sleep(8)count += 1 continue if count > 10:fox.close() else:time.sleep(1)get_everypage_info(fox, keyword, city) 注意:此處十分重要,while循環(huán)用于判斷是否到了最后一頁,如果進(jìn)行10次“next_page = fox.find_element_by_class_name('pagesDown-pos').click()”仍然沒反應(yīng),就會(huì)跳出循環(huán)進(jìn)去下面的if,關(guān)閉瀏覽器;如果“next_page = fox.find_element_by_class_name('pagesDown-pos').click()”有反應(yīng)break也會(huì)跳出while進(jìn)入下面“else”進(jìn)而跳轉(zhuǎn)到下一頁


5.“main”設(shè)置進(jìn)行城市循環(huán)

if __name__ == "__main__":citys = ['上海', '深圳', '廣州', '武漢', '杭州', '南京', '成都', '青島'] # '北京', 已爬取 job = '數(shù)據(jù)挖掘分析' for city in citys:print(" ")get_main_page(job, city)

每個(gè)城市的job信息爬取完了自動(dòng)進(jìn)行列表中下個(gè)城市信息爬取


6.注意和問題

(1)創(chuàng)建mysql表問題一:定義表的編碼形式“default charset=UTF8”,不然輸入寫入時(shí)報(bào)錯(cuò)

(2)數(shù)據(jù)寫入mysql表問題二:'insert into {0}(職位名稱,公司名稱,工作地點(diǎn),公司規(guī)模,工作經(jīng)驗(yàn),平均月薪,學(xué)歷要求) VALUES(%s,%s,%s,%s,%s,%s,%s)'.format(table_name)處要先將表名帶入,insert 語句中表名和列名都不能帶單引號(hào)和雙引號(hào),提前寫入可以避免。和值一起寫入時(shí)默認(rèn)代了引號(hào);

?insert_row = ('insert into {0}(職位名稱,公司名稱,工作地點(diǎn),公司規(guī)模,工作經(jīng)驗(yàn),平均月薪,學(xué)歷要求) VALUES(%s,%s,%s,%s,%s,%s,%s)'.format(table_name))

????????????insert_data = (job, company, address, develop, jingyan, salary, graduate)

? ? ? ? ? ? cursor.execute(insert_row, insert_data)

(3)time.sleep()根據(jù)網(wǎng)速和電腦性能而定,上佳的時(shí)間可以設(shè)置短;不佳的就要適當(dāng)延長時(shí)間設(shè)置,不讓代碼將捕捉不到html元素


完整代碼:

from selenium import webdriver from selenium.webdriver.common.keys import Keys import time import pymysqldef get_main_page(keyword, city):fox = webdriver.Firefox()url = 'https://www.zhaopin.com/' fox.get(url)time.sleep(1)jl = fox.find_element_by_id('JobLocation')jl.clear()jl.send_keys(city)zl = fox.find_element_by_id('KeyWord_kw2')zl.clear()zl.send_keys(keyword)sj = fox.find_element_by_class_name('doSearch').click()time.sleep(3)get_everypage_info(fox, keyword, city)def get_everypage_info(fox, keyword, city):fox.switch_to_window(fox.window_handles[-1])tables = fox.find_elements_by_tag_name('table')table_name = city + '_' + keywordconn = pymysql.Connect(host='127.0.0.1', port=3306, user='root', passwd='', db='python', charset='utf8')cursor = conn.cursor()sql = """CREATE TABLE IF NOT EXISTS %s( 職位名稱 CHAR(100), 公司名稱 CHAR(100), 工作地點(diǎn) CHAR(100), 公司規(guī)模 CHAR(100), 工作經(jīng)驗(yàn) CHAR(100), 平均月薪 CHAR(100), 學(xué)歷要求 CHAR(100) )default charset=UTF8""" % (table_name)cursor.execute(sql)for i in range(0, len(tables)):if i == 0:''' row = ['職位名稱', '公司名稱', '工作地點(diǎn)', '公司規(guī)模', '工作經(jīng)驗(yàn)', '平均月薪', '學(xué)歷要求', '職位描述'] information.append(row) ''' else:address, develop, jingyan, graduate, require = " ", " ", " ", " ", " " job = tables[i].find_element_by_tag_name('a').textcompany = tables[i].find_element_by_css_selector('.gsmc a').textsalary = tables[i].find_element_by_css_selector('.zwyx').textspans = tables[i].find_elements_by_css_selector('.newlist_deatil_two span')for j in range(0, len(spans)):if "地點(diǎn)" in spans[j].get_attribute('textContent'):address = (spans[j].get_attribute('textContent'))[3:]elif "公司規(guī)模" in spans[j].get_attribute('textContent'):develop = (spans[j].get_attribute('textContent'))[5:]elif "經(jīng)驗(yàn)" in spans[j].get_attribute('textContent'):jingyan = (spans[j].get_attribute('textContent'))[3:]elif "學(xué)歷" in spans[j].get_attribute('textContent'):graduate = (spans[j].get_attribute('textContent'))[3:]require = (tables[i].find_element_by_css_selector('.newlist_deatil_last').get_attribute('textContent'))[8:]row = [job, company, address, develop, jingyan, salary, graduate, require]insert_row = ('insert into {0}(職位名稱,公司名稱,工作地點(diǎn),公司規(guī)模,工作經(jīng)驗(yàn),平均月薪,學(xué)歷要求) VALUES(%s,%s,%s,%s,%s,%s,%s)'.format(table_name))insert_data = (job, company, address, develop, jingyan, salary, graduate)cursor.execute(insert_row, insert_data)conn.commit()with open('%s職位描述.txt' % (table_name), 'a', encoding='utf-8') as f:f.write(require)print('此頁已抓取···')conn.close()count = 0 while count <= 10:try:next_page = fox.find_element_by_class_name('pagesDown-pos').click()break except:time.sleep(8)count += 1 continue if count > 10:fox.close()else:time.sleep(1)get_everypage_info(fox, keyword, city)if __name__ == "__main__":citys = ['上海', '深圳', '廣州', '武漢', '杭州', '南京', '成都', '青島'] # '北京', 已爬取 job = '數(shù)據(jù)挖掘分析' for city in citys:print(" ")get_main_page(job, city)

最后獲取的輸入如圖




總結(jié)

以上是生活随笔為你收集整理的python爬虫——智联招聘(上)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。