爬虫系列之爬取1688
生活随笔
收集整理的這篇文章主要介紹了
爬虫系列之爬取1688
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
項目地址:GitHub - Carmenliukang/1688_crawler-image_search_products: 通過 1688 PC 端網址,上傳圖片查詢類似的商品
僅供學習,禁止商用
1688
lib/alibaba_lib 是具體實現方式
簡要流程如下: 1. 填入cookie 2. upload image 3. 返回鏈接部分代碼:?
#!/usr/bin/env python # -*- coding: utf-8 -*-from lib.alibaba_lib import Alibabaif __name__ == '__main__':filename = 'data/下載.jpeg'cookie = """請填寫登入成功的cookie"""url = Alibaba(cookie).run(filename)print(url)?
#!/usr/bin/env python # -*- coding: utf-8 -*- import io import os import re import json import requests from lib.func_txy import request_post from lib.func_txy import request_get_content from lib.func_txy import get_random_str from urllib.parse import urlparseclass Alibaba(object):"""1688 PC 端接口獲取相似商品的接口"""def __init__(self, cookie):self.upload_url = "https://stream-upload.taobao.com/api/upload.api?appkey=1688search&folderId=0&_input_charset=utf-8&useGtrSessionFilter=false" # 上傳圖片self.imageSearch_service_url = "https://open-s.1688.com/openservice/imageSearchOfferResultViewService"self._headers(cookie=cookie)self.search_page_size = 40def setSearchPageSize(self, pageSize):self.search_page_size = pageSizedef _headers(self, cookie):headres = {'Origin': "https://www.1688.com","User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:85.0) Gecko/20100101 Firefox/85.0","Accept": "*/*","Cache-Control": "no-cache","refer": "https://www.1688.com/","cookie": cookie}self.headers = headresdef upload_img(self, filename):"""用于上傳圖片:return:"""name = get_random_str(5) + ".jpeg"if os.path.exists(filename):bytestream = open(filename, "rb").read()else:us = urlparse(filename)if not us:return 'fail', Noner = requests.get(filename)bytestream = io.BytesIO(r.content)files = {"name": (None, name),# "ua": (None, ""),"file": (name, bytestream)}status, res = request_post(self.upload_url, data=None, files=files, headers=self.headers)key = ""if status == "succ":data = json.loads(res)url = data["object"]["url"]key = url.split("/")[-1]return status, keydef img_search(self, url):"""用于上傳圖片并搜索商品列表從1688官網圖搜頁面扒出來的jsonp接口:return: dict o None"""status_desc, data = request_get_content(url, headers=self.headers)if status_desc == "succ":return 'succ', dataelse:return 'fail', Nonedef check_goods(self, html):"""todo 這里需要匹配:param html::return:"""re.findall("window.data.offerresultData = successDataCheck\(.*?\)", html)def run(self, filename, need_products=False):# uoload image filestatus, key = self.upload_img(filename)# 上傳成功后,拼接生成的 查詢 URLif status == "succ":url_res = f"https://s.1688.com/youyuan/index.htm?tab=imageSearch&imageAddress={key}&spm="if need_products == False:return url_reselse:status_desc, data = self.img_search(url_res)if status_desc == 'succ':return datareturn Noneelse:return ""總結
以上是生活随笔為你收集整理的爬虫系列之爬取1688的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 缺少微信小程序测试经验?这篇文章带你从0
- 下一篇: 程序员的幽默6