日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 人工智能 > 卷积神经网络 >内容正文

卷积神经网络

CNN卷积神经网络十二生肖识别项目(一)数据下载篇

發(fā)布時間:2024/3/12 卷积神经网络 99 豆豆
生活随笔 收集整理的這篇文章主要介紹了 CNN卷积神经网络十二生肖识别项目(一)数据下载篇 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

  • 前言
  • 一、前提準(zhǔn)備
  • 二、代碼部分
    • 1.引入庫
    • 2.發(fā)送請求,解析數(shù)據(jù),并保存到本地
    • 3.全部代碼
  • 總結(jié)


前言

接觸深度學(xué)習(xí)有一段時間了,我們利用CNN卷積神經(jīng)網(wǎng)絡(luò)做一個十二生肖動物圖片識別的小項目。在訓(xùn)練模型的時候我們往往需要大量的數(shù)據(jù),今天我們主要針對數(shù)據(jù)獲取這部分做一個簡要的介紹:


今天我們通過python將Selenium自動化框架和Beautifulsoup結(jié)合起來,并通過多線程的方式進行數(shù)據(jù)下載,來提高下載速度,最后將數(shù)據(jù)保存到我們的數(shù)據(jù)集中。


一、前提準(zhǔn)備

開始前我們需要做好以下準(zhǔn)備:
1.python3.10(python3版本就可以,我用的是python3.10最新版本)
2.requests
3.Beautifulsoup用于解析網(wǎng)頁
4.Selenium自動化框架(安裝好相應(yīng)的瀏覽器驅(qū)動并配置好環(huán)境變量)
5.threading多線程

二、代碼部分

聲明:代碼僅供技術(shù)交流使用,如有侵權(quán)請聯(lián)系本人刪除

1.引入庫

首先,導(dǎo)入我們需要運用的庫:

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys # from selenium.webdriver.chrome.options import Options from bs4 import BeautifulSoup import threading # 用于開啟多線程 import requests import time import random import os

2.發(fā)送請求,解析數(shù)據(jù),并保存到本地

根據(jù)網(wǎng)頁信息先確定輸入框的位置:

這里我們用到的瀏覽器是谷歌瀏覽器,先在代碼中配置好谷歌瀏覽器的相關(guān)信息,然后定位輸入框,代碼部分如下:

def parser_content(self):# 配置谷歌瀏覽器環(huán)境driver = webdriver.Chrome()# 最大化頁面driver.maximize_window()# 加載網(wǎng)頁driver.get(self.url)# 設(shè)置隱式等待時長10sdriver.implicitly_wait(10)# 設(shè)置強制等待時長2stime.sleep(2)# 定位輸入欄driver.find_element(By.XPATH, "//input[@class='s_ipt']").send_keys(name)time.sleep(1)# 鍵盤事件模擬點擊回車鍵進入圖片界面driver.find_element(By.XPATH, "//input[@class='s_ipt']").send_keys(Keys.ENTER)driver.implicitly_wait(10)time.sleep(2)

接下來我們進入圖片列表界面:


在這里我們需要進行一個下拉框的操作來獲取更多的圖片信息,并將獲取到的圖片鏈接保存到列表中,代碼部分如下:

start = time.time() # 豎向滾動條操作---------------------------------------------------------------------------- temp_height = 0 for ii in range(1, 1000000, 8):js1 = "document.documentElement.scrollTop={}".format(ii)driver.execute_script(js1)time.sleep(0.01)# 檢查滑動條是否到達(dá)頁面最底部check_height = driver.execute_script("return document.documentElement.scrollTop || window.pageYOffset || document.body.scrollTop;")if check_height == temp_height:breaktemp_height = check_height# 加入時間限制,超過45s會自動停止if time.time() - start > 45:break # ----------------------------------------------------------------------------------------# 獲取全部圖片url鏈接 url_lst = driver.find_elements(By.XPATH, "//div[@class='imgbox-border']/a") for item in url_lst[1:201]: # 可以更改數(shù)字來獲取更多圖片鏈接new_url = item.get_attribute("href")# print(new_url)self.lst1.append(new_url) print("此次共獲取到" + str(len(self.lst1)) + "張圖片!") # 關(guān)閉瀏覽器 driver.quit()

運行后可以看到我們已經(jīng)獲取到了想要的內(nèi)容,接下來將我們進入圖片鏈接的詳情頁。

通過Beautifulsoup來解析獲取圖片url鏈接,并開啟四線程來進行圖片下載,代碼部分如下:

def download1(self):for i in self.lst1[0:int(len(self.lst1) / 4)]:try:resp = requests.get(i, headers=self.headers)resp.encoding = "utf-8"html = resp.text# print(html)Be = BeautifulSoup(html, 'html.parser')wrapper = Be.find('div', class_='img-wrapper')img = wrapper.find('img')['src']res = requests.get(img)with open(str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg", "wb") as file:file.write(res.content)print("下載完成保存至:" + str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg")except:passdef download2(self):for i in self.lst1[int(len(self.lst1) / 4):int(len(self.lst1) / 2)]:try:resp = requests.get(i, headers=self.headers)resp.encoding = "utf-8"html = resp.text# print(html)Be = BeautifulSoup(html, 'html.parser')wrapper = Be.find('div', class_='img-wrapper')img = wrapper.find('img')['src']res = requests.get(img)with open(str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg", "wb") as file:file.write(res.content)print("下載完成保存至:" + str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg")except:passdef download3(self):for i in self.lst1[int(len(self.lst1) / 2):int(len(self.lst1) / 2) + int(len(self.lst1) / 4)]:try:resp = requests.get(i, headers=self.headers)resp.encoding = "utf-8"html = resp.text# print(html)Be = BeautifulSoup(html, 'html.parser')wrapper = Be.find('div', class_='img-wrapper')img = wrapper.find('img')['src']res = requests.get(img)with open(str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg", "wb") as file:file.write(res.content)print("下載完成保存至:" + str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg")except:passdef download4(self):for i in self.lst1[int(len(self.lst1) / 2) + int(len(self.lst1) / 4):len(self.lst1)]:try:resp = requests.get(i, headers=self.headers)resp.encoding = "utf-8"html = resp.text# print(html)Be = BeautifulSoup(html, 'html.parser')wrapper = Be.find('div', class_='img-wrapper')img = wrapper.find('img')['src']res = requests.get(img)with open(str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg", "wb") as file:file.write(res.content)print("下載完成保存至:" + str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg")except:passdef multi_thread(self):t1 = threading.Thread(target=self.download1())t2 = threading.Thread(target=self.download2())t3 = threading.Thread(target=self.download3())t4 = threading.Thread(target=self.download4())t1.start()t2.start()t3.start()t4.start()

3.全部代碼

# -- coding: utf-8 -- from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys # from selenium.webdriver.chrome.options import Options from bs4 import BeautifulSoup import threading import requests import time import random import osclass Spider():def __init__(self):self.url = "https://image.baidu.com/"self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3883.400 QQBrowser/10.8.4559.400","Referer": "https://www.baidu.com/"}self.lst1 = []self.path = './data/train/' # 保存路徑self.save_path = os.path.join(xname)if not os.path.exists(self.save_path):os.mkdir(self.path + "./{}".format(xname))def parser_content(self):# 配置谷歌瀏覽器環(huán)境driver = webdriver.Chrome()# 最大化頁面driver.maximize_window()# 加載網(wǎng)頁driver.get(self.url)# 設(shè)置隱式等待時長10sdriver.implicitly_wait(10)# 設(shè)置強制等待時長2stime.sleep(2)# 定位輸入欄driver.find_element(By.XPATH, "//input[@class='s_ipt']").send_keys(name)time.sleep(1)# 鍵盤事件模擬點擊回車鍵進入圖片界面driver.find_element(By.XPATH, "//input[@class='s_ipt']").send_keys(Keys.ENTER)driver.implicitly_wait(10)time.sleep(2)start = time.time()# 豎向滾動條操作----------------------------------------------------------------------------temp_height = 0for ii in range(1, 1000000, 8):js1 = "document.documentElement.scrollTop={}".format(ii)driver.execute_script(js1)time.sleep(0.01)# 檢查滑動條是否到達(dá)頁面最底部check_height = driver.execute_script("return document.documentElement.scrollTop || window.pageYOffset || document.body.scrollTop;")if check_height == temp_height:breaktemp_height = check_heightif time.time() - start > 45:break# ----------------------------------------------------------------------------------------# 獲取全部圖片url鏈接url_lst = driver.find_elements(By.XPATH, "//div[@class='imgbox-border']/a")for item in url_lst[1:201]:new_url = item.get_attribute("href")print(new_url)self.lst1.append(new_url)print("此次共獲取到" + str(len(self.lst1)) + "張圖片!")# 關(guān)閉瀏覽器driver.quit()def download1(self):for i in self.lst1[0:int(len(self.lst1) / 4)]:try:resp = requests.get(i, headers=self.headers)resp.encoding = "utf-8"html = resp.text# print(html)Be = BeautifulSoup(html, 'html.parser')wrapper = Be.find('div', class_='img-wrapper')img = wrapper.find('img')['src']res = requests.get(img)with open(str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg", "wb") as file:file.write(res.content)print("下載完成保存至:" + str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg")except:passdef download2(self):for i in self.lst1[int(len(self.lst1) / 4):int(len(self.lst1) / 2)]:try:resp = requests.get(i, headers=self.headers)resp.encoding = "utf-8"html = resp.text# print(html)Be = BeautifulSoup(html, 'html.parser')wrapper = Be.find('div', class_='img-wrapper')img = wrapper.find('img')['src']res = requests.get(img)with open(str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg", "wb") as file:file.write(res.content)print("下載完成保存至:" + str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg")except:passdef download3(self):for i in self.lst1[int(len(self.lst1) / 2):int(len(self.lst1) / 2) + int(len(self.lst1) / 4)]:try:resp = requests.get(i, headers=self.headers)resp.encoding = "utf-8"html = resp.text# print(html)Be = BeautifulSoup(html, 'html.parser')wrapper = Be.find('div', class_='img-wrapper')img = wrapper.find('img')['src']res = requests.get(img)with open(str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg", "wb") as file:file.write(res.content)print("下載完成保存至:" + str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg")except:passdef download4(self):for i in self.lst1[int(len(self.lst1) / 2) + int(len(self.lst1) / 4):len(self.lst1)]:try:resp = requests.get(i, headers=self.headers)resp.encoding = "utf-8"html = resp.text# print(html)Be = BeautifulSoup(html, 'html.parser')wrapper = Be.find('div', class_='img-wrapper')img = wrapper.find('img')['src']res = requests.get(img)with open(str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg", "wb") as file:file.write(res.content)print("下載完成保存至:" + str(self.path) + str(self.save_path) + "/" + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + str(random.randint(1, 9)) + ".jpg")except:passdef multi_thread(self):t1 = threading.Thread(target=self.download1())t2 = threading.Thread(target=self.download2())t3 = threading.Thread(target=self.download3())t4 = threading.Thread(target=self.download4())t1.start()t2.start()t3.start()t4.start()if __name__ == '__main__':name = input("請輸入下載類型:")xname = input("請輸入創(chuàng)建文件夾名稱(英文):")spider = Spider()spider.parser_content()spider.multi_thread()

至此,我們整個代碼的編寫已經(jīng)全部完成了,最后讓我們一起來看一下效果。


我們打開最初命名的文件夾,可以發(fā)現(xiàn),圖片已經(jīng)被成功下載保存到里面了,而且下載的速度很快。

總結(jié)

興趣是最好的老師
好了,今天的代碼就到這里了。

總結(jié)

以上是生活随笔為你收集整理的CNN卷积神经网络十二生肖识别项目(一)数据下载篇的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。