當前位置：首頁 > 编程语言 > python >内容正文

python

利用Python获取某游戏网站热销商品并用pands进行Excel数据存储

發(fā)布時間：2024/3/12 python 44 豆豆

生活随笔收集整理的這篇文章主要介紹了利用Python获取某游戏网站热销商品并用pands进行Excel数据存储小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

??????? 因為要求，這個不知名的網(wǎng)站用S代替了。

????????有剛剛使用S的用戶，不知道玩什么游戲怎么辦？往往熱銷商品會使他們最合適的選擇。

??????? 當然，某個第三方的網(wǎng)站上面的數(shù)據(jù)會更詳細，什么游戲用戶活躍度高，哪個區(qū)服游戲價格更便宜上面都會有。但是加上了一層Cloudflare的瀏覽器驗證。

????????有人說用cloudscraper，但是cloudscraper對商用版的Cloudflare好像不管用（應該是吧，如果有大佬有更好的方法請及時指出，謝謝），之后會用其他的方法再試試。所以這邊先按下不表，開始獲取S的熱銷信息。

一、熱銷獲取分析

?????????點擊進入熱銷商品頁：

https://那個網(wǎng)站/search/?sort_by=_ASC&force_infinite=1&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&page=2&os=win

??????? 上面的鏈接，僅僅能獲取第一頁的數(shù)據(jù)。

????????通過開發(fā)者模式找到真正的內(nèi)容獲取鏈接是：

https://那個網(wǎng)站/search/results/?query&start=0&count=50&sort_by=_ASC&os=win&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&infinite=1

????????其中start對應了開始位置，對應了翻頁。count對應了一次獲取了多少數(shù)據(jù)。

????????get請求即可，上代碼：

def getInfo(self):url = 'https://那個網(wǎng)站/search/results/?query&start=0&count=50&sort_by=_ASC&os=win&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&infinite=1'res = self.getRes(url,self.headers,'','','GET')#自己封裝的請求方法res = res.json()['results_html']sel = Selector(text=res)nodes = sel.css('.search_result_row')for node in nodes:gamedata = {}gamedata['url'] = node.css('a::attr(href)').extract_first()#鏈接gamedata['name'] = node.css('a .search_name .title::text').extract_first()#游戲名gamedata['sales_date'] = node.css('a .search_released::text').extract_first()#發(fā)售日discount = node.css('.search_discount span::text').extract_first()#是否打折gamedata['discount'] = discount if discount else 'no discount'price = node.css('a .search_price::text').extract_first().strip()#價格discountPrice = node.css('.discounted::text').extract()#打折后的價格discountPrice = discountPrice[-1] if discountPrice else ''gamedata['price'] = discountPrice if discountPrice else price#最終價格print(gamedata)

二、pandas保存數(shù)據(jù)?

2.1 構建pandas DataFrame對象

????????pandas存儲Excel數(shù)據(jù)利用的是pandas對象的to_excel方法，將pandas的Dataframe對象直接插入Excel表中。

????????而DataFrame表示的是矩陣的數(shù)據(jù)表，包含已排序的列集合。

????????首先，先將獲取到的數(shù)據(jù)，構建成Dataframe對象，先將我們獲取的數(shù)據(jù)分別存入對應的list中，獲取的url存到url的list，游戲名存到name的list：

url = []name = []sales_date = []discount = []price = [] url = node.css('a::attr(href)').extract_first() if url not in self.url:self.url.append(url)name = node.css('a .search_name .title::text').extract_first()sales_date = node.css('a .search_released::text').extract_first()discount = node.css('.search_discount span::text').extract_first()discount = discount if discount else 'no discount'price = node.css('a .search_price::text').extract_first().strip()discountPrice = node.css('.discounted::text').extract()discountPrice = discountPrice[-1] if discountPrice else ''price = discountPrice if discountPrice else priceself.name.append(name)self.sales_date.append(sales_date)self.discount.append(discount)self.price.append(price) else:print('已存在')

????????將list組成相應的字典

data = {'URL':self.url,'游戲名':self.name,'發(fā)售日':self.sales_date,'是否打折':self.discount,'價格':self.price}

????????其中dict中的key值對應的是Excel的列名。之后用pandas的DataFrame()方法構建對象，之后插入Excel文件。

data = {'URL':self.url,'游戲名':self.name,'發(fā)售日':self.sales_date,'是否打折':self.discount,'價格':self.price} frame = pd.DataFrame(data) xlsxFrame = pd.read_excel('./steam.xlsx')

????????其中pd是引入pandas包的對象，約定俗成的見到pd就是引入了pandas。

import pandas as pd

2.2 pandas追加插入Excel

????????如果要是翻頁的話，重復調(diào)用插入Excel方法時你會發(fā)現(xiàn)Excel表內(nèi)的數(shù)據(jù)并不會增多，因為每一次to_excel()方法都會把你上一次寫入的數(shù)據(jù)覆蓋掉。

????????所以若想保留之前寫入的數(shù)據(jù)，那就先把之前寫入的數(shù)據(jù)讀出來，然后和新產(chǎn)生的數(shù)據(jù)進行DaraFrame對象的合并，將總的數(shù)據(jù)再次寫入Excel

frame = frame.append(xlsxFrame)

????????寫入方法如下：

def insert_info(self):data = {'URL':self.url,'游戲名':self.name,'發(fā)售日':self.sales_date,'是否打折':self.discount,'價格':self.price}frame = pd.DataFrame(data)xlsxFrame = pd.read_excel('./steam.xlsx')print(xlsxFrame)if xlsxFrame is not None:print('追加')frame = frame.append(xlsxFrame)frame.to_excel('./steam.xlsx', index=False)else:frame.to_excel('./steam.xlsx', index=False)

邏輯：

將已有的數(shù)據(jù)生成DataFrame

讀取之前寫入的Excel文件，判斷是否寫入過數(shù)據(jù)

如果寫入，將數(shù)據(jù)讀出來合并后再次寫入Excel

如果源文件為空，直接寫入即可

三、代碼整合

import requests from scrapy import Selector import pandas as pdclass getSteamInfo():headers = {"Host": "那個網(wǎng)站","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36",}url = []name = []sales_date = []discount = []price = []# api獲取ipdef getApiIp(self):# 獲取且僅獲取一個ipapi_url = 'api地址'res = requests.get(api_url, timeout=5)try:if res.status_code == 200:api_data = res.json()['data'][0]proxies = {'http': 'http://{}:{}'.format(api_data['ip'], api_data['port']),'https': 'http://{}:{}'.format(api_data['ip'], api_data['port']),}print(proxies)return proxieselse:print('獲取失敗')except:print('獲取失敗')def getInfo(self):url = 'https://那個網(wǎng)站/search/results/?query&start=0&count=50&sort_by=_ASC&os=win&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&infinite=1'res = self.getRes(url,self.headers,'','','GET')#自己封裝的請求方法res = res.json()['results_html']sel = Selector(text=res)nodes = sel.css('.search_result_row')for node in nodes:url = node.css('a::attr(href)').extract_first()if url not in self.url:self.url.append(url)name = node.css('a .search_name .title::text').extract_first()sales_date = node.css('a .search_released::text').extract_first()discount = node.css('.search_discount span::text').extract_first()discount = discount if discount else 'no discount'price = node.css('a .search_price::text').extract_first().strip()discountPrice = node.css('.discounted::text').extract()discountPrice = discountPrice[-1] if discountPrice else ''price = discountPrice if discountPrice else priceself.name.append(name)self.sales_date.append(sales_date)self.discount.append(discount)self.price.append(price)else:print('已存在')# self.insert_info()def insert_info(self):data = {'URL':self.url,'游戲名':self.name,'發(fā)售日':self.sales_date,'是否打折':self.discount,'價格':self.price}frame = pd.DataFrame(data)xlsxFrame = pd.read_excel('./steam.xlsx')print(xlsxFrame)if xlsxFrame is not None:print('追加')frame = frame.append(xlsxFrame)frame.to_excel('./steam.xlsx', index=False)else:frame.to_excel('./steam.xlsx', index=False)# 專門發(fā)送請求的方法,代理請求三次，三次失敗返回錯誤def getRes(self,url, headers, proxies, post_data, method):if proxies:for i in range(3):try:# 傳代理的post請求if method == 'POST':res = requests.post(url, headers=headers, data=post_data, proxies=proxies)# 傳代理的get請求else:res = requests.get(url, headers=headers, proxies=proxies)if res:return resexcept:print(f'第{i+1}次請求出錯')else:return Noneelse:for i in range(3):proxies = self.getApiIp()try:# 請求代理的post請求if method == 'POST':res = requests.post(url, headers=headers, data=post_data, proxies=proxies)# 請求代理的get請求else:res = requests.get(url, headers=headers, proxies=proxies)if res:return resexcept:print(f"第{i+1}次請求出錯")else:return Noneif __name__ == '__main__':getSteamInfo().getInfo()

????????對了，本次數(shù)據(jù)是獲取的美服數(shù)據(jù)哦。最近國內(nèi)訪問不穩(wěn)定，若是想要獲取數(shù)據(jù)不買游戲的話建議使用代理進行訪問。我這里使用的時ipidea的代理，新用戶可以白嫖流量哦。

? ? ? ? 地址：http://www.ipidea.net/?utm-source=csdn&utm-keyword=?wb?

????????最后奉勸大家：適當游戲，理智消費，認真生活，支持正版。（大批量的數(shù)據(jù)還是存數(shù)據(jù)庫吧，人家也支持導出Excel）

總結

以上是生活随笔為你收集整理的利用Python获取某游戏网站热销商品并用pands进行Excel数据存储的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【IOT】智能楼宇 - HVAC 暖通技
下一篇：欧拉角，万向节锁和四元数