當前位置：首頁 > 编程语言 > python >内容正文

python

Python抓取电商平台数据 / 采集商品评论 / 可视化展示词云图...

發(fā)布時間：2024/1/18 python 40 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python抓取电商平台数据 / 采集商品评论 / 可视化展示词云图... 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

前言

大家早好、午好、晚好吖 ? ~

我給大家準備了一些資料，包括:

2022最新Python視頻教程、Python電子書10個G

（涵蓋基礎、爬蟲、數(shù)據(jù)分析、web開發(fā)、機器學習、人工智能、面試題）、Python學習路線圖等等

直接在文末名片自取即可！

本次亮點

selenium工具的使用

結構化的數(shù)據(jù)解析

csv數(shù)據(jù)保存

環(huán)境介紹：

python 3.8
pycharm
谷歌驅動谷歌瀏覽器

selenium 操控谷歌驅動然后操控瀏覽器模擬人的行為去操作瀏覽器

模塊使用:

selenium

pip install selenium==3.141.0 (指定版本安裝模塊)

安裝模塊時候速度比較慢可以切換一下鏡像源

(模擬人的行為去操作瀏覽器)
csv

內(nèi)置模塊不需要安裝把數(shù)據(jù)保存到csv表格里面
time

內(nèi)置模塊不需要安裝時間模塊延時操作延時等待

安裝python第三方模塊:

win + R 輸入 cmd 點擊確定, 輸入安裝命令 pip install 模塊名 (pip install requests) 回車

在pycharm中點擊Terminal(終端) 輸入安裝命令

selenium 模擬人的行為去操作瀏覽器

打開瀏覽器

輸入網(wǎng)址

輸入想要商品名字

點擊搜索查看商品數(shù)據(jù)

獲取我們想要數(shù)據(jù)內(nèi)容

保存數(shù)據(jù)

代碼展示

“”"

爬取商品數(shù)據(jù)

🎯 文章素材、解答、源碼、教程領取處：點擊

“”"

導入模塊

import pprint from selenium import webdriver # 從selenium里面導入webdriver的方法 # 導入時間模塊 import time import csv word = input('請輸入你想要獲取商品: ')

創(chuàng)建一個文件保存如果utf-8保存csv文件亂碼改成 utf-8-sig

f = open(f'{word}.csv', mode='a', encoding='utf-8', newline='')csv_writer = csv.DictWriter(f, fieldnames=['title','price','comment','shop_name','href', ])

寫入表頭

csv_writer.writeheader()

如果把瀏覽器驅動放到和python安裝目錄下面, 可以不用指定驅動路徑

executable_path=r’C:\01-Software-installation\Miniconda3\chromedriver.exe’

1. 打開瀏覽器

driver = webdriver.Chrome()

實例化瀏覽器對象, 打開一個瀏覽器原本是需要一個谷歌驅動 selenium 對象

2. 輸入網(wǎng)址

3. 輸入想要商品名字

driver.find_element_by_css_selector('#key').send_keys(word)

4. 點擊搜索查看商品數(shù)據(jù)

driver.find_element_by_css_selector('#search > div > div.form > button > i').click() # 點擊動作

5. 下滑網(wǎng)頁, 讓商品數(shù)據(jù)全部加載出來

"""執(zhí)行頁面滾動的操作""" # javascript def drop_down():for x in range(1, 12, 2): # 1 3 5 7 9 11 在你不斷的下拉過程中, 頁面高度也會變的time.sleep(1) # 延時操作死等j = x / 9 # 1/9 3/9 5/9 9/9# document.documentElement.scrollTop 指定滾動條的位置# document.documentElement.scrollHeight 獲取瀏覽器頁面的最大高度js = 'document.documentElement.scrollTop = document.documentElement.scrollHeight * %f' % jdriver.execute_script(js)def get_shop_info():driver.implicitly_wait(10) # 隱式等待, 等待網(wǎng)頁數(shù)據(jù)加載只要數(shù)據(jù)加載完了就運行下面的程序drop_down()

6. 獲取所有商品li標簽

css語法 class 可以用小圓點代替, 加上類名字可以直接定位到標簽

lis = driver.find_elements_by_css_selector('.gl-item')

elements 提取多個標簽 element 提取一個標簽

一個一個提取列表里面元素, 用for循環(huán)遍歷

for li in lis:try:title = li.find_element_by_css_selector('a em').text.replace('\n', '') # 標題price = li.find_element_by_css_selector('.p-price strong i').text # 價格comment = li.find_element_by_css_selector('.p-commit strong a').text # 評論數(shù)shop_name = li.find_element_by_css_selector('.p-shop span a').text # 店鋪名字href = li.find_element_by_css_selector('.p-name a').get_attribute('href') # 詳情頁

7. 保存數(shù)據(jù)

dit = {'title': title,'price': price,'comment': comment,'shop_name': shop_name,'href': href,}csv_writer.writerow(dit)print(title, price, comment, shop_name, href)# pprint.pprint(title) 格式化輸出模塊except:passdriver.find_element_by_css_selector('.pn-next').click() # 點擊下一頁for page in range(1, 11):print(f'===========================正在采集第{page}頁的數(shù)據(jù)內(nèi)容===========================')get_shop_info()driver.quit() # 采集完數(shù)據(jù)之后自動關閉瀏覽器

“”"

爬取商品評論數(shù)據(jù)

🎯 文章素材、解答、源碼、教程領取處：點擊

“”"

import requests import time for page in range(10):time.sleep(2)

response = requests.get(url=url, headers=headers)comments = '\n'.join([index['content'] for index in response.json()['comments']])

comments = [] 創(chuàng)建空列表

for index in response.json()['comments']: for循環(huán)遍歷提取列表元素

a = index['content'] 根據(jù)鍵值對取值提取評論數(shù)據(jù)

comments.append(a) 把評論數(shù)據(jù) 添加到列表里面

comments = '\n'.join(comments) 通過join的方法把comments 列表里面的元素用\n合并成為一個字符串

print(comments)with open('評論.txt', mode='a', encoding='utf-8') as f:f.write(comments)f.write('\n')

“”"

評論制作詞云圖

“”"

導入模塊

結巴分詞

import jieba

詞云圖模塊

import wordcloud

讀取文件返回對象

f = open('評論.txt', encoding='utf-8')

讀取文本內(nèi)容返回字符串

text = f.read()

通過jieba分詞對文本進行詞語分割返回的列表

text_list = jieba.lcut(text) print(text_list)

通過join方法把文本詞語列表合并成一個字符串

string = ' '.join(text_list)

詞云圖配置

wc = wordcloud.WordCloud(width=800,height=800,background_color='white',scale=15,font_path='msyh.ttc' )

寫入詞語內(nèi)容

wc.generate(string)

輸出詞云圖

wc.to_file('1.png')

尾語 💝

有更多建議或問題可以評論區(qū)或私信我哦！一起加油努力叭(? ?_?)?

喜歡就關注一下博主，或點贊收藏評論一下我的文章叭！！！

總結

以上是生活随笔為你收集整理的Python抓取电商平台数据 / 采集商品评论 / 可视化展示词云图...的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。