日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

爬虫系列之爬取1688

發布時間:2024/1/8 编程问答 28 豆豆
生活随笔 收集整理的這篇文章主要介紹了 爬虫系列之爬取1688 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

項目地址:GitHub - Carmenliukang/1688_crawler-image_search_products: 通過 1688 PC 端網址,上傳圖片查詢類似的商品

僅供學習,禁止商用


1688

lib/alibaba_lib 是具體實現方式

簡要流程如下: 1. 填入cookie 2. upload image 3. 返回鏈接

部分代碼:?

#!/usr/bin/env python # -*- coding: utf-8 -*-from lib.alibaba_lib import Alibabaif __name__ == '__main__':filename = 'data/下載.jpeg'cookie = """請填寫登入成功的cookie"""url = Alibaba(cookie).run(filename)print(url)

?

#!/usr/bin/env python # -*- coding: utf-8 -*- import io import os import re import json import requests from lib.func_txy import request_post from lib.func_txy import request_get_content from lib.func_txy import get_random_str from urllib.parse import urlparseclass Alibaba(object):"""1688 PC 端接口獲取相似商品的接口"""def __init__(self, cookie):self.upload_url = "https://stream-upload.taobao.com/api/upload.api?appkey=1688search&folderId=0&_input_charset=utf-8&useGtrSessionFilter=false" # 上傳圖片self.imageSearch_service_url = "https://open-s.1688.com/openservice/imageSearchOfferResultViewService"self._headers(cookie=cookie)self.search_page_size = 40def setSearchPageSize(self, pageSize):self.search_page_size = pageSizedef _headers(self, cookie):headres = {'Origin': "https://www.1688.com","User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:85.0) Gecko/20100101 Firefox/85.0","Accept": "*/*","Cache-Control": "no-cache","refer": "https://www.1688.com/","cookie": cookie}self.headers = headresdef upload_img(self, filename):"""用于上傳圖片:return:"""name = get_random_str(5) + ".jpeg"if os.path.exists(filename):bytestream = open(filename, "rb").read()else:us = urlparse(filename)if not us:return 'fail', Noner = requests.get(filename)bytestream = io.BytesIO(r.content)files = {"name": (None, name),# "ua": (None, ""),"file": (name, bytestream)}status, res = request_post(self.upload_url, data=None, files=files, headers=self.headers)key = ""if status == "succ":data = json.loads(res)url = data["object"]["url"]key = url.split("/")[-1]return status, keydef img_search(self, url):"""用于上傳圖片并搜索商品列表從1688官網圖搜頁面扒出來的jsonp接口:return: dict o None"""status_desc, data = request_get_content(url, headers=self.headers)if status_desc == "succ":return 'succ', dataelse:return 'fail', Nonedef check_goods(self, html):"""todo 這里需要匹配:param html::return:"""re.findall("window.data.offerresultData = successDataCheck\(.*?\)", html)def run(self, filename, need_products=False):# uoload image filestatus, key = self.upload_img(filename)# 上傳成功后,拼接生成的 查詢 URLif status == "succ":url_res = f"https://s.1688.com/youyuan/index.htm?tab=imageSearch&imageAddress={key}&spm="if need_products == False:return url_reselse:status_desc, data = self.img_search(url_res)if status_desc == 'succ':return datareturn Noneelse:return ""

總結

以上是生活随笔為你收集整理的爬虫系列之爬取1688的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。