日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

爬取淘宝商品信息selenium+pyquery+mongodb

發布時間:2023/11/30 编程问答 28 豆豆
生活随笔 收集整理的這篇文章主要介紹了 爬取淘宝商品信息selenium+pyquery+mongodb 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
''' 爬取淘寶商品信息,通過selenium獲得渲染后的源碼,pyquery解析,mongodb存儲 '''from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import TimeoutException from selenium.webdriver.support.wait import WebDriverWait from urllib.parse import quote from pyquery import PyQuery as pq import pymongoBASEURL = 'https://s.taobao.com/search?q=' KEYWORD = 'python' driver = webdriver.Chrome() wait = WebDriverWait(driver, 10) client = pymongo.MongoClient('mongodb://admin:admin123@localhost:27017/') db = client.taobao collection = db.productsdef get_page(page):```跳轉到傳入頁面,獲得源碼,調用商品解析函數```#driver = webdriver.Chrome()#wait = WebDriverWait(driver, 10)try:driver.get(BASEURL + quote(KEYWORD))print('你當前訪問的是第%d頁' % page)if page > 1:J_input = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '#mainsrp-pager div.form > input' )))J_submit = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#mainsrp-pager div.form > span.btn.J_Submit')))J_input.clear()J_input.send_keys(page)J_submit.click()wait.until(EC.text_to_be_present_in_element((By.CSS_SELECTOR,'#mainsrp-pager li.item.active > span'), str(page)))wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '.m-itemlist .items .item')))html = driver.page_sourceget_products(html)except TimeoutException:print('try again')get_page(page)def get_products(html):'''解析出每件商品信息,調用存儲函數存儲'''doc = pq(html)items = doc('#mainsrp-itemlist .items .item').items()for item in items:product = {}product['image'] = item.find('.img').attr('src')product['price'] = item.find('.price').text()product['payment'] = item.find('.deal-cnt').text()product['title'] = item.find('.title').text()product['location'] = item.find('.location').text()product['shop'] = item.find('.shopname').text()product['shop-link'] = item.find('.shopname').attr('href')print(product)save_to_mongo(product)def save_to_mongo(product):```存儲函數,將商品信息存入數據庫```try:if collection.insert(product):print('存儲成功')except Exception as e:print('失敗',e.__class__)if __name__ == '__main__':for i in range(1, 3):get_page(i)

轉載于:https://www.cnblogs.com/Wang-Y/p/9401128.html

總結

以上是生活随笔為你收集整理的爬取淘宝商品信息selenium+pyquery+mongodb的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。