日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

当当网数据爬取

發布時間:2024/3/7 编程问答 43 豆豆
生活随笔 收集整理的這篇文章主要介紹了 当当网数据爬取 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
##嘗試爬取當當網上的各種信息 import requests from bs4 import BeautifulSoup import time import jsonheader = {"Referer":"http://search.dangdang.com/?key=python&%253Bact=input&%253Bpage_index=%7B%7D&_ddclickunion=P-295132-199857_64_0_ZGljdHNfZ29vZ2xl_1%7Cad_type&page_index=3",'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36 EastBrowser/2.1',"Accept-Language":"zh-CN,zh;q=0.9","Accept-Ranges":"bytes","Accept":"*/*"}def get_links(url,list):wb_data = requests.get(url, headers=header)soup = BeautifulSoup(wb_data.text, 'lxml')links = soup.select('p.name > a')#將url鏈接都放在links里面去for link in links:href = link.get("href")#這里取得整個頁面的鏈接get_info(href,list)##定義獲取網頁信息的函數 def get_info(url,list):wb_data = requests.get(url, headers=header)soup = BeautifulSoup(wb_data.text,'lxml')titles = soup.select('#product_info > div.name_info > h1')authors = soup.select('#author > a')##author > a:nth-child(1)publishers = soup.select('#product_info > div.messbox_info > span:nth-of-type(2) > a')for title,author,publisher in zip(titles,authors,publishers):data = {"title":title.get_text().strip(),"author":author.get_text().strip(),"publisher":publisher.get_text().strip()}print(data)list.append(data)if __name__ == "__main__":url = ["http://search.dangdang.com/?key=python&act=input&page_index={}".format(x) for x in range(1, 3)] list = [] for i in url:get_links(i,list)time.sleep(1) print(len(list))filename = r"C:\Users\dell\Desktop\大三上學期\數據爬取\爬蟲\程序\pythondata" with open(filename,"w") as f_name:json.dump(list,f_name,ensure_ascii=True,indent=2,sort_keys=False)#indent表示縮進,ensure_ascii確保格式正確不出現亂碼##當前新的反爬代碼 url = 'http://search.dangdang.com/?key=python&act=input&page_index=1' wb_data = requests.get(url, headers=header)import httpx client = httpx.Client(http2=True) response = client.get('http://search.dangdang.com/?key=python&act=input&page_index=1') print(response.text)

總結

以上是生活随笔為你收集整理的当当网数据爬取的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。