當前位置：首頁 > 编程语言 > python >内容正文

python

用Python爬取淘宝商品

發布時間：2023/12/8 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了用Python爬取淘宝商品小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本文爬取淘寶女裝短裙商品，并將商品信息存入mysql中

分析思路

1.頁面分析
在淘寶首頁搜索“短裙”，進入商品列表頁面：

分析頁面源代碼:

通過分析源代碼，可發現商品相關的幾個關鍵信息：商品圖片地址、商品名、價格、郵費、付款人數、店鋪名、店鋪所在地、評論數等，通過正則表達式可匹配出來：

#商品圖片 img_pat='"pic_url":"(//.*?)"' #商品名 name_pat='"raw_title":"(.*?)"' #店鋪名 nick_pat='"nick":"(.*?)"' #價格 price_pat='"view_price":"(.*?)"' #郵費 fee_pat='"view_fee":"(.*?)"' #付款人數 sales_pat='"view_sales":"(.*?)"' #評論數 comment_pat='"comment_count":"(.*?)"' #店鋪所在地 city_pat='"item_loc":"(.*?)"' #商品鏈接 detail_url_pat='detail_url":"(.*?)"'

2.分析商品列表頁面url

第2頁：
https://s.taobao.com/search?q=%E7%9F%AD%E8%A3%99&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.50862.201856-taobao-item.1&ie=utf8&initiative_id=tbindexz_20170706&bcoffset=4&ntoffset=4&p4ppushleft=1%2C48&s=44

第3頁：
https://s.taobao.com/search?q=%E7%9F%AD%E8%A3%99&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.50862.201856-taobao-item.1&ie=utf8&initiative_id=tbindexz_20170706&bcoffset=4&ntoffset=4&p4ppushleft=1%2C48&s=88

第4頁：
https://s.taobao.com/search?q=%E7%9F%AD%E8%A3%99&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.50862.201856-taobao-item.1&ie=utf8&initiative_id=tbindexz_20170706&bcoffset=4&ntoffset=4&p4ppushleft=1%2C48&s=132

不同頁url中最后一位參數s為44的倍數，通過測試可發現參數&initiative_id=tbindexz_20170706可以去掉，因此可得到第N頁的頁面地址為：

url="https://s.taobao.com/search?q="+keywords+"&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.50862.201856-taobao-item.1&ie=utf8&bcoffset=4&ntoffset=4&p4ppushleft=1%2C48&s="+str((N-1)*44)

完整代碼

#爬取taobao商品 import urllib.request import pymysql import re#打開網頁，獲取網頁內容 def url_open(url):headers=("user-agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.22 Safari/537.36 SE 2.X MetaSr 1.0")opener=urllib.request.build_opener()opener.addheaders=[headers]urllib.request.install_opener(opener)data=urllib.request.urlopen(url).read().decode("utf-8","ignore")return data#將數據存入mysql中 def data_Import(sql):conn=pymysql.connect(host='127.0.0.1',user='test',password='123456',db='python',charset='utf8')conn.query(sql)conn.commit()conn.close()if __name__=='__main__':try:#定義要查詢的商品關鍵詞keywd="短裙"keywords=urllib.request.quote(keywd)#定義要爬取的頁數num=100for i in range(num):url="https://s.taobao.com/search?q="+keywords+"&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.50862.201856-taobao-item.1&ie=utf8&bcoffset=4&ntoffset=4&p4ppushleft=1%2C48&s="+str(i*44)data=url_open(url)#定義各個字段正則匹配規則img_pat='"pic_url":"(//.*?)"'name_pat='"raw_title":"(.*?)"'nick_pat='"nick":"(.*?)"'price_pat='"view_price":"(.*?)"'fee_pat='"view_fee":"(.*?)"'sales_pat='"view_sales":"(.*?)"'comment_pat='"comment_count":"(.*?)"'city_pat='"item_loc":"(.*?)"'detail_url_pat='detail_url":"(.*?)"'#查找滿足匹配規則的內容，并存在列表中imgL=re.compile(img_pat).findall(data)nameL=re.compile(name_pat).findall(data)nickL=re.compile(nick_pat).findall(data)priceL=re.compile(price_pat).findall(data)feeL=re.compile(fee_pat).findall(data)salesL=re.compile(sales_pat).findall(data)commentL=re.compile(comment_pat).findall(data)cityL=re.compile(city_pat).findall(data)detail_urlL=re.compile(detail_url_pat).findall(data)for j in range(len(imgL)):img="http:"+imgL[j]#商品圖片鏈接name=nameL[j]#商品名稱nick=nickL[j]#淘寶店鋪名稱price=priceL[j]#商品價格fee=feeL[j]#運費sales=salesL[j]#商品付款人數detail_url=detail_urlL[j]#商品鏈接comment=commentL[j]#商品評論數，會存在為空值的情況if(comment==""):comment=0city=cityL[j]#店鋪所在城市print('正在爬取第'+str(i)+"頁，第"+str(j)+"個商品信息...")sql="insert into taobao(name,price,fee,sales,comment,city,nick,img,detail_url) values('%s','%s','%s','%s','%s','%s','%s','%s','%s')" %(name,price,fee,sales,comment,city,nick,img,detail_url)data_Import(sql)print("爬取完成，且數據已存入數據庫")except Exception as e:print(str(e))print("任務完成")

爬取過程

爬取結果

總結

以上是生活随笔為你收集整理的用Python爬取淘宝商品的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：淘宝小程序（商家应用）开发提前需要了解的
下一篇： python编写淘宝秒杀脚本