當前位置：首頁 > 编程语言 > python >内容正文

python

用python通过selenium自动化测试抓取天猫店铺数据

發布時間：2023/12/14 python 23 豆豆

生活随笔收集整理的這篇文章主要介紹了用python通过selenium自动化测试抓取天猫店铺数据小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

用python通過selenium自動化測試抓取天貓店鋪數據
運行的環境在win10，軟件用的是vscode。大家平常在抓取天貓店鋪的時候登陸后會需要驗證，我的方法是通過谷歌插件跳過天貓的登陸。
首先要下載chromedriver.exe放到python安裝的位置，這里面不詳細解釋，自己可以去百度搜索。需要用到selenium這個模塊。

from selenium import webdriver as wbfrom bs4 import BeautifulSoup import csv import pyautogui import PIL import time import json # pyautogui.PAUSE = 0.5 # 調用模塊class Taobao:def __init__(self):self.url = 'https://chenguang.tmall.com/search.htm'#下面這三行代碼必須要有的 self.options = wb.ChromeOptions()self.options.add_experimental_option('excludeSwitches', ['enable-automation']) # 切換到開發者模式self.browser = wb.Chrome(options=self.options)self.data = []self.doc = {}

重點來了，這部分的代碼是跳過登陸驗證，用的是谷歌的插件獲取cookie，復制粘貼到一個txt文件里面，放到工作區。

緊接著上代碼

def get_data(self):self.browser.maximize_window()#確保窗口最大化確保坐標正確self.browser.get(self.url)#重點在這下面的代碼，可以跳過登陸self.browser.delete_all_cookies()f1 = open('cookie.txt')#讀取獲取到的cookiescookie = f1.read()cookie = json.loads(cookie)for i in cookie:self.browser.add_cookie(i)#注入cookiesprint('注入完畢')time.sleep(2)#等待兩秒self.get_shop()self.get_fanye()

注入cookie完成，我們就可以抓取數據了，我用的是Xpath，當然也可以用BeautifulSoup。注意一點，用BeautifulSoup需要獲取page_source，代碼走起。

def get_shop(self):#查找店鋪獲取數據 self.browser.get(self.url)time.sleep(5)self.browser.find_element_by_xpath('//a[@atpanel=",d,,,shopsearch,3,shopfilter,682114580"]').click()#按銷量排序time.sleep(5)self.get_pictures()def get_pictures(self):#獲取數據lists = self.browser.find_elements_by_xpath('//div[@class="item5line1"]/dl[contains(@class,"item") or @class="item last"]')time.sleep(4)print(lists)# #循環寫入之前創建的例表里面for li in lists[:-10]:pictuer = li.find_element_by_xpath('.//dt[@class="photo"]/a[@class="J_TGoldData"]/img').get_attribute('src')#記住遍歷的時候前面要加點name = li.find_element_by_xpath('.//dd[@class="detail"]/a[@class="item-name J_TGoldData"]').textprice = li.find_element_by_xpath('.//dd[@class="detail"]/div/div[@class="cprice-area"]').textxiaoliang = li.find_element_by_xpath('.//dd[@class="detail"]/div/div[@class="sale-area"]/span[@class="sale-num"]').texttime.sleep(2)print(pictuer)print(name)print(price)print(xiaoliang)self.data.append([name,price,xiaoliang,pictuer])print(self.data)# # time.sleep(3)#第二種方法獲取數據 o# button = self.browser.page_source# # print(button)# soup = BeautifulSoup(button,'html.parser')# merchandise_news = soup.find_all('div',class_="item5line1")# print(merchandise_news)

接下來就是翻頁了，我之前是看了總頁數的，所以用for循環，這個地方我偷懶了哈哈。上代碼：

def get_fanye(self):for i in range(2,10):self.browser.get('https://chenguang.tmall.com/search.htm?spm=a1z10.3-b-s.w4011-14939493465.411.446b3a16dEtqdy&search=y&orderType=hotsell_desc&pageNo={}&tsearch=y#anchor'.format(i))#這里面我用的是按銷量排序好后的URLtime.sleep(5)self.get_pictures()print('第'+str(i)+'抓取完成')

存儲數據我就不寫了。

總結

以上是生活随笔為你收集整理的用python通过selenium自动化测试抓取天猫店铺数据的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：上千家企业涌入蚂蚁开放联盟链：在区块链上
下一篇： python+django+vue+El