當(dāng)前位置:
首頁(yè) >
Python2 Python3 爬取赶集网租房信息,带源码分析
發(fā)布時(shí)間:2025/3/21
69
豆豆
生活随笔
收集整理的這篇文章主要介紹了
Python2 Python3 爬取赶集网租房信息,带源码分析
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
*之前偶然看了某個(gè)騰訊公開(kāi)課的視頻,寫(xiě)的爬取趕集網(wǎng)的租房信息,這幾天突然想起來(lái),于是自己分析了一下趕集網(wǎng)的信息,然后自己寫(xiě)了一遍,寫(xiě)完又用用Python3重寫(xiě)了一遍.之中也遇見(jiàn)了少許的坑.記一下.算是一個(gè)總結(jié).*
分析目標(biāo)網(wǎng)站url 尋找目標(biāo)標(biāo)簽 獲取,并寫(xiě)入csv文件
#-*- coding: utf-8 -*-
from bs4 import BeautifulSoup
from urlparse import urljoin
import requests
import csvURL = 'http://jn.ganji.com/fang1/o{page}p{price}/'
# 首先最基本的是 jn,ganji.com/fang1 其中jn為濟(jì)南,也就是我的城市,默認(rèn)登錄后為此
# 而fang1 位租房信息 fang5 為二手房信息,zhaopin 為招聘模塊等,我們這次只查找fang1
# 不過(guò)這個(gè)鏈接還可以更復(fù)雜
#比如http://jn.ganji.com/fang1/tianqiao/h1o1p1/ 或者
# http://jn.ganji.com/fang1/tianqiao/b1000e1577/
# 其中h:房型,o頁(yè)面,p價(jià)格區(qū)間,其中h,p后的數(shù)字與網(wǎng)站相應(yīng)菜單的排列順序相對(duì)應(yīng)
# 而s與e則為對(duì)應(yīng)的自己輸入的價(jià)格區(qū)間
# h: house o:page p:price
# jn jinan fang1 zufang tiaoqiao:tianqiaoqu b:begin 1000 e:end start 1755ADDR = 'http://bj.ganji.com/'
start_page =1
end_page = 5
price =1# 注意wb格式打開(kāi)寫(xiě)入可能會(huì)導(dǎo)致csv文件每次寫(xiě)入前面多一個(gè)空格
# 對(duì)此你可以參考這篇文章:http://blog.csdn.net/pfm685757/article/details/47806469
with open('info.csv','wb') as f :csv_writer = csv.writer(f,delimiter=',')print 'starting'while start_page<end_page:start_page+=1# 通過(guò)分析標(biāo)簽可知我們要獲取的標(biāo)簽信息必須要通過(guò)多個(gè)class確認(rèn)才能保證唯一性# 之后是獲取信息的具體設(shè)置print 'get{0}'.format(URL.format(page = start_page,price=price))response = requests.get(URL.format(page = start_page,price=price))html=BeautifulSoup(response.text,'html.parser')house_list = html.select('.f-list > .f-list-item > .f-list-item-wrap')#check house_listif not house_list:print 'No house_list'breakfor house in house_list:house_title = house.select('.title > a')[0].string.encode('utf-8')house_addr = house.select('.address > .area > a')[-1].string.encode('utf-8')house_price = house.select('.info > .price > .num')[0].string.encode('utf-8')house_url = urljoin(ADDR,house.select('.title > a ')[0]['href'])# 寫(xiě)入csv文件csv_writer.writerow([house_title,house_addr,house_price,house_url])print 'ending'
urlparse.urljoin 改為urllib.urlparse.urljoin
# python2
from urlparse import urljoin
# Python3
from urllib.parse import urljoin
Python3中csv對(duì)bytes和str兩種類(lèi)型進(jìn)行了嚴(yán)格區(qū)分,open的寫(xiě)入格式應(yīng)該進(jìn)行改變wb->w 設(shè)置utf8編碼格式
with open('info.csv','w',encoding='utf8') as f :csv_writer = csv.writer(f,delimiter=',')
python2 爬取趕集網(wǎng)租房信息與網(wǎng)站分析
Python3 爬取趕集網(wǎng)i租房信息
要注意的點(diǎn)
完整代碼如下
#-*- coding: utf-8 -*- from bs4 import BeautifulSoup from urllib.parse import urljoin import requests import csvURL = 'http://jn.ganji.com/fang1/o{page}p{price}/' # h: house o:page p:price # http://jn.ganji.com/fang1/tianqiao/b1000e1577/ # jn jinan fang1 zufang tiaoqiao:tianqiaoqu b:begin 1000 e:end start 1755 # fang5 為二手房 zhipin 為 招聘 趕集網(wǎng)的url劃分的都很簡(jiǎn)單,時(shí)間充足完全可以獲取非常多的信息ADDR = 'http://bj.ganji.com/' start_page =1 end_page = 5price =1''' URL = 'http://jn.ganji.com/fang1/h{huxing}o{page}b{beginPrice}e{endPrice}/' # 選擇戶(hù)型為h1-h5 # 輸入價(jià)位為 begin or end price='b1000e2000'# 戶(hù)型為''' # 默認(rèn)為utf8打開(kāi),否則會(huì)以默認(rèn)編碼GBK寫(xiě)入 with open('info.csv','w',encoding='utf8') as f :csv_writer = csv.writer(f,delimiter=',')print('starting')while start_page<end_page:start_page+=1print('get{0}'.format(URL.format(page = start_page,price=price)))response = requests.get(URL.format(page = start_page,price=price))html=BeautifulSoup(response.text,'html.parser')house_list = html.select('.f-list > .f-list-item > .f-list-item-wrap')#check house_listif not house_list:print('No house_list')breakfor house in house_list:house_title = house.select('.title > a')[0].stringhouse_addr = house.select('.address > .area > a')[-1].stringhouse_price = house.select('.info > .price > .num')[0].stringhouse_url = urljoin(ADDR, house.select('.title > a ')[0]['href'])csv_writer.writerow([house_title,house_addr,house_price,house_url])print('ending')最后的csv文件展示一下:
總結(jié)
以上是生活随笔為你收集整理的Python2 Python3 爬取赶集网租房信息,带源码分析的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: BeautifulSoup 一行代码获取
- 下一篇: 朴素贝叶斯法分类器的Python3 实现