日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

利用Python获取某游戏网站热销商品并用pands进行Excel数据存储

發(fā)布時間:2024/3/12 python 44 豆豆
生活随笔 收集整理的這篇文章主要介紹了 利用Python获取某游戏网站热销商品并用pands进行Excel数据存储 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

??????? 因為要求,這個不知名的網(wǎng)站用S代替了。

????????有剛剛使用S的用戶,不知道玩什么游戲怎么辦?往往熱銷商品會使他們最合適的選擇。

??????? 當然,某個第三方的網(wǎng)站上面的數(shù)據(jù)會更詳細,什么游戲用戶活躍度高,哪個區(qū)服游戲價格更便宜上面都會有。但是加上了一層Cloudflare的瀏覽器驗證。

????????有人說用cloudscraper,但是cloudscraper對商用版的Cloudflare好像不管用(應該是吧,如果有大佬有更好的方法請及時指出,謝謝),之后會用其他的方法再試試。所以這邊先按下不表,開始獲取S的熱銷信息。

一、熱銷獲取分析

?????????點擊進入熱銷商品頁:

https://那個網(wǎng)站/search/?sort_by=_ASC&force_infinite=1&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&page=2&os=win

??????? 上面的鏈接,僅僅能獲取第一頁的數(shù)據(jù)。

????????通過開發(fā)者模式找到真正的內(nèi)容獲取鏈接是:

https://那個網(wǎng)站/search/results/?query&start=0&count=50&sort_by=_ASC&os=win&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&infinite=1

????????其中start對應了開始位置,對應了翻頁。count對應了一次獲取了多少數(shù)據(jù)。

????????get請求即可,上代碼:

def getInfo(self):url = 'https://那個網(wǎng)站/search/results/?query&start=0&count=50&sort_by=_ASC&os=win&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&infinite=1'res = self.getRes(url,self.headers,'','','GET')#自己封裝的請求方法res = res.json()['results_html']sel = Selector(text=res)nodes = sel.css('.search_result_row')for node in nodes:gamedata = {}gamedata['url'] = node.css('a::attr(href)').extract_first()#鏈接gamedata['name'] = node.css('a .search_name .title::text').extract_first()#游戲名gamedata['sales_date'] = node.css('a .search_released::text').extract_first()#發(fā)售日discount = node.css('.search_discount span::text').extract_first()#是否打折gamedata['discount'] = discount if discount else 'no discount'price = node.css('a .search_price::text').extract_first().strip()#價格discountPrice = node.css('.discounted::text').extract()#打折后的價格discountPrice = discountPrice[-1] if discountPrice else ''gamedata['price'] = discountPrice if discountPrice else price#最終價格print(gamedata)

二、pandas保存數(shù)據(jù)?

2.1 構建pandas DataFrame對象

????????pandas存儲Excel數(shù)據(jù)利用的是pandas對象的to_excel方法,將pandas的Dataframe對象直接插入Excel表中。

????????而DataFrame表示的是矩陣的數(shù)據(jù)表,包含已排序的列集合。

????????首先,先將獲取到的數(shù)據(jù),構建成Dataframe對象,先將我們獲取的數(shù)據(jù)分別存入對應的list中,獲取的url存到url的list,游戲名存到name的list:

url = []name = []sales_date = []discount = []price = [] url = node.css('a::attr(href)').extract_first() if url not in self.url:self.url.append(url)name = node.css('a .search_name .title::text').extract_first()sales_date = node.css('a .search_released::text').extract_first()discount = node.css('.search_discount span::text').extract_first()discount = discount if discount else 'no discount'price = node.css('a .search_price::text').extract_first().strip()discountPrice = node.css('.discounted::text').extract()discountPrice = discountPrice[-1] if discountPrice else ''price = discountPrice if discountPrice else priceself.name.append(name)self.sales_date.append(sales_date)self.discount.append(discount)self.price.append(price) else:print('已存在')

????????將list組成相應的字典

data = {'URL':self.url,'游戲名':self.name,'發(fā)售日':self.sales_date,'是否打折':self.discount,'價格':self.price}

????????其中dict中的key值對應的是Excel的列名。之后用pandas的DataFrame()方法構建對象,之后插入Excel文件。

data = {'URL':self.url,'游戲名':self.name,'發(fā)售日':self.sales_date,'是否打折':self.discount,'價格':self.price} frame = pd.DataFrame(data) xlsxFrame = pd.read_excel('./steam.xlsx')

????????其中pd是引入pandas包的對象,約定俗成的見到pd就是引入了pandas。

import pandas as pd

2.2 pandas追加插入Excel

????????如果要是翻頁的話,重復調(diào)用插入Excel方法時你會發(fā)現(xiàn)Excel表內(nèi)的數(shù)據(jù)并不會增多,因為每一次to_excel()方法都會把你上一次寫入的數(shù)據(jù)覆蓋掉。

????????所以若想保留之前寫入的數(shù)據(jù),那就先把之前寫入的數(shù)據(jù)讀出來,然后和新產(chǎn)生的數(shù)據(jù)進行DaraFrame對象的合并,將總的數(shù)據(jù)再次寫入Excel

frame = frame.append(xlsxFrame)

????????寫入方法如下:

def insert_info(self):data = {'URL':self.url,'游戲名':self.name,'發(fā)售日':self.sales_date,'是否打折':self.discount,'價格':self.price}frame = pd.DataFrame(data)xlsxFrame = pd.read_excel('./steam.xlsx')print(xlsxFrame)if xlsxFrame is not None:print('追加')frame = frame.append(xlsxFrame)frame.to_excel('./steam.xlsx', index=False)else:frame.to_excel('./steam.xlsx', index=False)

邏輯:

  • 將已有的數(shù)據(jù)生成DataFrame
  • 讀取之前寫入的Excel文件,判斷是否寫入過數(shù)據(jù)
  • 如果寫入,將數(shù)據(jù)讀出來合并后再次寫入Excel
  • 如果源文件為空,直接寫入即可
  • 三、代碼整合

    import requests from scrapy import Selector import pandas as pdclass getSteamInfo():headers = {"Host": "那個網(wǎng)站","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","accept-encoding": "gzip, deflate, br","accept-language": "zh-CN,zh;q=0.9","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36",}url = []name = []sales_date = []discount = []price = []# api獲取ipdef getApiIp(self):# 獲取且僅獲取一個ipapi_url = 'api地址'res = requests.get(api_url, timeout=5)try:if res.status_code == 200:api_data = res.json()['data'][0]proxies = {'http': 'http://{}:{}'.format(api_data['ip'], api_data['port']),'https': 'http://{}:{}'.format(api_data['ip'], api_data['port']),}print(proxies)return proxieselse:print('獲取失敗')except:print('獲取失敗')def getInfo(self):url = 'https://那個網(wǎng)站/search/results/?query&start=0&count=50&sort_by=_ASC&os=win&snr=1_7_7_globaltopsellers_7&filter=globaltopsellers&infinite=1'res = self.getRes(url,self.headers,'','','GET')#自己封裝的請求方法res = res.json()['results_html']sel = Selector(text=res)nodes = sel.css('.search_result_row')for node in nodes:url = node.css('a::attr(href)').extract_first()if url not in self.url:self.url.append(url)name = node.css('a .search_name .title::text').extract_first()sales_date = node.css('a .search_released::text').extract_first()discount = node.css('.search_discount span::text').extract_first()discount = discount if discount else 'no discount'price = node.css('a .search_price::text').extract_first().strip()discountPrice = node.css('.discounted::text').extract()discountPrice = discountPrice[-1] if discountPrice else ''price = discountPrice if discountPrice else priceself.name.append(name)self.sales_date.append(sales_date)self.discount.append(discount)self.price.append(price)else:print('已存在')# self.insert_info()def insert_info(self):data = {'URL':self.url,'游戲名':self.name,'發(fā)售日':self.sales_date,'是否打折':self.discount,'價格':self.price}frame = pd.DataFrame(data)xlsxFrame = pd.read_excel('./steam.xlsx')print(xlsxFrame)if xlsxFrame is not None:print('追加')frame = frame.append(xlsxFrame)frame.to_excel('./steam.xlsx', index=False)else:frame.to_excel('./steam.xlsx', index=False)# 專門發(fā)送請求的方法,代理請求三次,三次失敗返回錯誤def getRes(self,url, headers, proxies, post_data, method):if proxies:for i in range(3):try:# 傳代理的post請求if method == 'POST':res = requests.post(url, headers=headers, data=post_data, proxies=proxies)# 傳代理的get請求else:res = requests.get(url, headers=headers, proxies=proxies)if res:return resexcept:print(f'第{i+1}次請求出錯')else:return Noneelse:for i in range(3):proxies = self.getApiIp()try:# 請求代理的post請求if method == 'POST':res = requests.post(url, headers=headers, data=post_data, proxies=proxies)# 請求代理的get請求else:res = requests.get(url, headers=headers, proxies=proxies)if res:return resexcept:print(f"第{i+1}次請求出錯")else:return Noneif __name__ == '__main__':getSteamInfo().getInfo()

    ????????對了,本次數(shù)據(jù)是獲取的美服數(shù)據(jù)哦。最近國內(nèi)訪問不穩(wěn)定,若是想要獲取數(shù)據(jù)不買游戲的話建議使用代理進行訪問。我這里使用的時ipidea的代理,新用戶可以白嫖流量哦。

    ? ? ? ? 地址:http://www.ipidea.net/?utm-source=csdn&utm-keyword=?wb?

    ????????最后奉勸大家:適當游戲,理智消費 ,認真生活,支持正版。(大批量的數(shù)據(jù)還是存數(shù)據(jù)庫吧,人家也支持導出Excel)

    總結

    以上是生活随笔為你收集整理的利用Python获取某游戏网站热销商品并用pands进行Excel数据存储的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。