當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

㊙️【教你用python挣零花钱】自动化简历内推，学弟直呼牛逼！！

發(fā)布時間：2024/7/23 python 43 豆豆

生活随笔收集整理的這篇文章主要介紹了㊙️【教你用python挣零花钱】自动化简历内推，学弟直呼牛逼！！小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

最近，小編在處理簡歷時，發(fā)現(xiàn)大量簡歷需要一個個打開文件，復(fù)制姓名、郵箱、電話號碼、學(xué)歷等關(guān)鍵信息，效率特別低且部分文件無法直接復(fù)制。于是，小編便寫了簡歷解析處理的腳本，支持文件格式有：doc，docx，pdf。

ps. 上月戰(zhàn)績，內(nèi)推400+人，內(nèi)推成功8人，入職5人，收入8000*2 + 5000*3=31000 元。

{'感謝您的投遞': 331, '簡歷處理中': 19, '簡歷初篩': 5, '本輪通過': 6,?'Offer已發(fā)放': 1, '進(jìn)行中': 2, '拒絕Offer': 3, '接受Offer': 5}?

一準(zhǔn)備工作

腳本功能：分析簡歷文本，一鍵內(nèi)推

輸入：要解析的文件路徑

輸出：解析的內(nèi)容，包括不限于姓名、郵箱、電話號碼、學(xué)歷等信息。

環(huán)境準(zhǔn)備：python 3.6 、mac（下文中doc轉(zhuǎn)docx是mac寫法，windows更簡單，導(dǎo)入win32的包即可）

需要導(dǎo)入的包

二開始解析

2.1 獲取簡歷文件

def get_files(path):res = []for i in os.listdir(path):# 去掉臨時文件if os.path.isfile(path+i) and '~$' not in i and '.DS' not in i:# 去重 1.doc 和 1.docxif (path+i).split(".")[0] not in str(res):res.append(path+i)return res

2.2 解析PDF

得到res文本后，可以通過正則，匹配出郵箱，手機(jī)號，學(xué)歷等

def pdf_reader(file):fp = open(file, "rb")# 創(chuàng)建一個與文檔相關(guān)聯(lián)的解釋器parser = PDFParser(fp)# PDF文檔對象doc = PDFDocument(parser)# 鏈接解釋器和文檔對象parser.set_document(doc)# doc.set_paeser(parser)# 初始化文檔# doc.initialize("")# 創(chuàng)建PDF資源管理器resource = PDFResourceManager()# 參數(shù)分析器laparam = LAParams()# 創(chuàng)建一個聚合器device = PDFPageAggregator(resource, laparams=laparam)# 創(chuàng)建PDF頁面解釋器interpreter = PDFPageInterpreter(resource, device)# 使用文檔對象得到頁面集合res = ''for page in PDFPage.create_pages(doc):# 使用頁面解釋器來讀取interpreter.process_page(page)# 使用聚合器來獲取內(nèi)容layout = device.get_result()for out in layout:if hasattr(out, "get_text"):res = res + '' + out.get_text()return res

2.3 解析word

? ? ? ? 待優(yōu)化情況：word中如果包含execl，無法解析。

def word_reader(file):try:# docx 直接讀if 'docx' in file:res = ''f = docx.Document(file)for para in f.paragraphs:res = res + '\n' +para.textelse:# 先轉(zhuǎn)格式doc>docxos.system("textutil -convert docx '%s'"%file)word_reader(file+'x')res = ''f = docx.Document(file+'x')for para in f.paragraphs:res = res + '\n' +para.textreturn resexcept:# print(file, 'read failed')return ''

2.4 完整代碼

# encoding: utf-8 import os, sys import docx from pdfminer.pdfparser import PDFParser from pdfminer.pdfdocument import PDFDocument from pdfminer.pdfpage import PDFPage from pdfminer.pdfinterp import PDFResourceManager from pdfminer.pdfinterp import PDFPageInterpreter from pdfminer.layout import LAParams from pdfminer.converter import PDFPageAggregatordef get_files(path):res = []for i in os.listdir(path):# 去掉臨時文件if os.path.isfile(path+i) and '~$' not in i and '.DS' not in i:# 去重 1.doc 和 1.docxif (path+i).split(".")[0] not in str(res):res.append(path+i)return resdef pdf_reader(file):fp = open(file, "rb")# 創(chuàng)建一個與文檔相關(guān)聯(lián)的解釋器parser = PDFParser(fp)# PDF文檔對象doc = PDFDocument(parser)# 鏈接解釋器和文檔對象parser.set_document(doc)# doc.set_paeser(parser)# 初始化文檔# doc.initialize("")# 創(chuàng)建PDF資源管理器resource = PDFResourceManager()# 參數(shù)分析器laparam = LAParams()# 創(chuàng)建一個聚合器device = PDFPageAggregator(resource, laparams=laparam)# 創(chuàng)建PDF頁面解釋器interpreter = PDFPageInterpreter(resource, device)# 使用文檔對象得到頁面集合res = ''for page in PDFPage.create_pages(doc):# 使用頁面解釋器來讀取interpreter.process_page(page)# 使用聚合器來獲取內(nèi)容layout = device.get_result()for out in layout:if hasattr(out, "get_text"):res = res + '' + out.get_text()return resdef word_reader(file):try:# docx 直接讀if 'docx' in file:res = ''f = docx.Document(file)for para in f.paragraphs:res = res + '\n' +para.textelse:# 先轉(zhuǎn)格式doc>docxos.system("textutil -convert docx '%s'"%file)word_reader(file+'x')res = ''f = docx.Document(file+'x')for para in f.paragraphs:res = res + '\n' +para.textreturn resexcept:# print(file, 'read failed')return ''def file_reader(file):if 'doc' in file:res = word_reader(file)elif 'pdf' in file:res = pdf_reader(file)else:res = '不是doc，也不是pdf，文件格式不支持！'return resif __name__ == '__main__':path = "/Users/XXXXX/Mine/XXXXX/"abs_files = get_files(path)print(abs_files)for file in abs_files:file_text = file_reader(file)print(file_text)

三效果展示

姓名? 工齡? 電話? 學(xué)歷背景? 公司背景? 關(guān)鍵標(biāo)簽? 郵箱

本期實(shí)現(xiàn)：任何格式的簡歷，解析成文本，便于后續(xù)篩選優(yōu)質(zhì)簡歷。

下期揭曉：簡歷分析，推送到最適合的崗位，包括學(xué)歷背景，穩(wěn)定性，公司背景，擅長技術(shù)組件等。

我是橋哥，專注分享互聯(lián)網(wǎng)黑科技，點(diǎn)贊、收藏不迷路?！！！

總結(jié)

以上是生活随笔為你收集整理的㊙️【教你用python挣零花钱】自动化简历内推，学弟直呼牛逼！！的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：刚入职场的菜鸟，这些大数据知识点，你必须
下一篇： python实现统计你一共写了多少行代码