日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程语言 > python >内容正文

python

Python爬取房价

發(fā)布時(shí)間:2023/12/8 python 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Python爬取房价 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

# ==========導(dǎo) 包============= import requests from bs4 import BeautifulSoup import numpy as np import requests from requests.exceptions import RequestException import pandas as pd # =====step_1 : 指 定 url========= url = 'https://gz.fang.lianjia.com/ /'# =====step_2 : 發(fā) 起 請(qǐng) 求 :====== # 使 用 get 方 法 發(fā) 起 get 請(qǐng) 求 , 該 方 法 會(huì) 返 回 一 個(gè) 響 應(yīng) 對(duì) 象 。 參 數(shù) url 表 示 請(qǐng) 求 對(duì) 應(yīng) 的 url response = requests.get(url=url)# =====step_3 : 獲 取 響 應(yīng) 數(shù) 據(jù) :=== # 通 過 調(diào) 用 響 應(yīng) 對(duì) 象 的 text 屬 性 , 返 回 響 應(yīng) 對(duì) 象 中 存 儲(chǔ) 的 字 符 串 形 式 的 響 應(yīng) 數(shù) 據(jù) ( 頁 面 源 碼數(shù) 據(jù) ) page_text = response.text# ====step_4 : 持 久 化 存 儲(chǔ)======= with open('廣州房?jī)r(jià) . html ', 'w', encoding='utf -8') as fp:fp.write(page_text) print(' 爬 取 數(shù) 據(jù) 完 畢 !!!') # ==================導(dǎo)入相關(guān)庫==================================# =============讀取網(wǎng)頁========================================= def craw(url, page):try:headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3947.100 Safari/537.36"}html1 = requests.request("GET", url, headers=headers, timeout=10)html1.encoding = 'utf-8' # 加編碼,重要!轉(zhuǎn)換為字符串編碼,read()得到的是byte格式的html = html1.textreturn htmlexcept RequestException: # 其他問題print('第{0}讀取網(wǎng)頁失敗'.format(page))return None# ==========解析網(wǎng)頁并保存數(shù)據(jù)到表格====================== def pase_page(url, page):html = craw(url, page)html = str(html)if html is not None:soup = BeautifulSoup(html, 'lxml')"--先確定房子信息,即li標(biāo)簽列表--"houses = soup.select('.resblock-list-wrapper li') # 房子列表"--再確定每個(gè)房子的信息--"for j in range(len(houses)): # 遍歷每一個(gè)房子house = houses[j]"名字"recommend_project = house.select('.resblock-name a.name')recommend_project = [i.get_text() for i in recommend_project] # 名字 英華天元,斌鑫江南御府...recommend_project = ' '.join(recommend_project)# print(recommend_project)"類型"house_type = house.select('.resblock-name span.resblock-type')house_type = [i.get_text() for i in house_type] # 寫字樓,底商...house_type = ' '.join(house_type)# print(house_type)"銷售狀態(tài)"sale_status = house.select('.resblock-name span.sale-status')sale_status = [i.get_text() for i in sale_status] # 在售,在售,售罄,在售...sale_status = ' '.join(sale_status)# print(sale_status)"大地址"big_address = house.select('.resblock-location span')big_address = [i.get_text() for i in big_address] #big_address = ''.join(big_address)# print(big_address)"具體地址"small_address = house.select('.resblock-location a')small_address = [i.get_text() for i in small_address] #small_address = ' '.join(small_address)# print(small_address)"優(yōu)勢(shì)。"advantage = house.select('.resblock-tag span')advantage = [i.get_text() for i in advantage] #advantage = ' '.join(advantage)# print(advantage)"均價(jià):多少1平"average_price = house.select('.resblock-price .main-price .number')average_price = [i.get_text() for i in average_price] # 16000,25000,價(jià)格待定..average_price = ' '.join(average_price)# print(average_price)"總價(jià),單位萬"total_price = house.select('.resblock-price .second')total_price = [i.get_text() for i in total_price] # 總價(jià)400萬/套,總價(jià)100萬/套'...total_price = ' '.join(total_price)# print(total_price)# =====================寫入表格=================================================information = [recommend_project, house_type, sale_status, big_address, small_address, advantage,average_price, total_price]information = np.array(information)information = information.reshape(-1, 8)information = pd.DataFrame(information, columns=['名稱', '類型', '銷售狀態(tài)', '大地址', '具體地址', '優(yōu)勢(shì)', '均價(jià)', '總價(jià)'])information.to_csv('廣州房?jī)r(jià).csv', mode='a+', index=False, header=False) # mode='a+'追加寫入print('第{0}頁存儲(chǔ)數(shù)據(jù)成功'.format(page))else:print('解析失敗')# ==================雙線程===================================== import threadingfor i in range(1, 100, 2): # 遍歷網(wǎng)頁1-101url1 = "https://gz.fang.lianjia.com/loupan/pg" + str(i) + "/"url2 = "https://gz.fang.lianjia.com/loupan/pg" + str(i + 1) + "/"t1 = threading.Thread(target=pase_page, args=(url1, i)) # 線程1t2 = threading.Thread(target=pase_page, args=(url2, i + 1)) # 線程2t1.start()t2.start()

總結(jié)

以上是生活随笔為你收集整理的Python爬取房价的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。