當前位置：首頁 >

Day04-经典卷积神经网络解读

發布時間：2025/3/21 67 豆豆

生活随笔收集整理的這篇文章主要介紹了 Day04-经典卷积神经网络解读小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Day04-經典卷積神經網絡解讀

文章目錄

Day04-經典卷積神經網絡解讀
- 作業說明
- 示例代碼
- - 一、環境配置
  - 二、數據準備
  - 三、模型配置
  - 四、模型訓練
  - 五、模型校驗
  - 六、模型預測
- 完成作業

作業說明

今天的實戰項目是基于經典卷積神經網絡 VGG的“口罩分類”。

口罩識別，是指可以有效檢測在密集人流區域中攜帶和未攜戴口罩的所有人臉，同時判斷該者是否佩戴口罩。通常由兩個功能單元組成，可以分別完成口罩人臉的檢測和口罩人臉的分類。

本次實踐相比生產環境中口罩識別的問題，降低了難度，僅實現人臉口罩判斷模型，可實現對人臉是否佩戴口罩的判定。本實踐旨在通過一個口罩識別的案列，讓大家理解和掌握如何使用飛槳動態圖搭建一個經典的卷積神經網絡。

特別提示：本實踐所用數據集均來自互聯網，請勿用于商務用途。

作業要求：

1、根據課上所學內容，構建 VGGNet網絡并跑通。在此基礎上可嘗試構造其他網絡。
2、思考并動手進行調參、優化，提高測試集準確率。

課件和數據集的鏈接請去緒論部分尋找正式學習前的緒論

day04文件夾中包含的就是我們所需要的所有數據。characterData.zip是我們需要使用的數據集，CarID.png是最后用來測試效果的圖片。

示例代碼

一、環境配置

# 導入需要的包import os import zipfile import random import json import paddle import sys import numpy as np from PIL import Image from PIL import ImageEnhance import paddle.fluid as fluid from multiprocessing import cpu_count import matplotlib.pyplot as plt # 參數配置train_parameters = {"input_size": [3, 224, 224], #輸入圖片的shape"class_dim": -1, #分類數"src_path":"/home/aistudio/work/maskDetect.zip",#原始數據集路徑"target_path":"/home/aistudio/data/", #要解壓的路徑"train_list_path": "/home/aistudio/data/train.txt", #train.txt路徑"eval_list_path": "/home/aistudio/data/eval.txt", #eval.txt路徑"readme_path": "/home/aistudio/data/readme.json", #readme.json路徑"label_dict":{}, #標簽字典"num_epochs": 1, #訓練輪數"train_batch_size": 8, #訓練時每個批次的大小"learning_strategy": { #優化函數相關的配置"lr": 0.001 #超參數學習率} }

二、數據準備

解壓原始數據集

按照比例劃分訓練集與驗證集

亂序，生成數據列表

構造訓練數據集提供器和驗證數據集提供器

def unzip_data(src_path,target_path):'''解壓原始數據集，將src_path路徑下的zip包解壓至data目錄下'''if(not os.path.isdir(target_path + "maskDetect")): z = zipfile.ZipFile(src_path, 'r')z.extractall(path=target_path)z.close() def get_data_list(target_path,train_list_path,eval_list_path):'''生成數據列表'''#存放所有類別的信息class_detail = []#獲取所有類別保存的文件夾名稱data_list_path=target_path+"maskDetect/"class_dirs = os.listdir(data_list_path) #總的圖像數量all_class_images = 0#存放類別標簽class_label=0#存放類別數目class_dim = 0#存儲要寫進eval.txt和train.txt中的內容trainer_list=[]eval_list=[]#讀取每個類別，['maskimages', 'nomaskimages']for class_dir in class_dirs:if class_dir != ".DS_Store":class_dim += 1#每個類別的信息class_detail_list = {}eval_sum = 0trainer_sum = 0#統計每個類別有多少張圖片class_sum = 0#獲取類別路徑 path = data_list_path + class_dir# 獲取所有圖片img_paths = os.listdir(path)for img_path in img_paths: # 遍歷文件夾下的每個圖片name_path = path + '/' + img_path # 每張圖片的路徑if class_sum % 10 == 0: # 每10張圖片取一個做驗證數據eval_sum += 1 # test_sum為測試數據的數目eval_list.append(name_path + "\t%d" % class_label + "\n")else:trainer_sum += 1 trainer_list.append(name_path + "\t%d" % class_label + "\n")#trainer_sum測試數據的數目class_sum += 1 #每類圖片的數目all_class_images += 1 #所有類圖片的數目# 說明的json文件的class_detail數據class_detail_list['class_name'] = class_dir #類別名稱，如jiangwenclass_detail_list['class_label'] = class_label #類別標簽class_detail_list['class_eval_images'] = eval_sum #該類數據的測試集數目class_detail_list['class_trainer_images'] = trainer_sum #該類數據的訓練集數目class_detail.append(class_detail_list) #初始化標簽列表train_parameters['label_dict'][str(class_label)] = class_dirclass_label += 1 #初始化分類數train_parameters['class_dim'] = class_dim#亂序 random.shuffle(eval_list)with open(eval_list_path, 'a') as f:for eval_image in eval_list:f.write(eval_image) random.shuffle(trainer_list)with open(train_list_path, 'a') as f2:for train_image in trainer_list:f2.write(train_image) # 說明的json文件信息readjson = {}readjson['all_class_name'] = data_list_path #文件父目錄readjson['all_class_images'] = all_class_imagesreadjson['class_detail'] = class_detailjsons = json.dumps(readjson, sort_keys=True, indent=4, separators=(',', ': '))with open(train_parameters['readme_path'],'w') as f:f.write(jsons)print ('生成數據列表完成！') def custom_reader(file_list):'''自定義reader'''def reader():with open(file_list, 'r') as f:lines = [line.strip() for line in f]for line in lines:img_path, lab = line.strip().split('\t')img = Image.open(img_path) if img.mode != 'RGB': img = img.convert('RGB') img = img.resize((224, 224), Image.BILINEAR)img = np.array(img).astype('float32') img = img.transpose((2, 0, 1)) # HWC to CHW img = img/255 # 像素值歸一化 yield img, int(lab) return reader # 參數初始化src_path=train_parameters['src_path'] target_path=train_parameters['target_path'] train_list_path=train_parameters['train_list_path'] eval_list_path=train_parameters['eval_list_path'] batch_size=train_parameters['train_batch_size']''' 解壓原始數據到指定路徑 ''' unzip_data(src_path,target_path)''' 劃分訓練集與驗證集，亂序，生成數據列表 '''#每次生成數據列表前，首先清空train.txt和eval.txtwith open(train_list_path, 'w') as f: f.seek(0)f.truncate() with open(eval_list_path, 'w') as f: f.seek(0)f.truncate() #生成數據列表 get_data_list(target_path,train_list_path,eval_list_path)''' 構造數據提供器 ''' train_reader = paddle.batch(custom_reader(train_list_path),batch_size=batch_size,drop_last=True) eval_reader = paddle.batch(custom_reader(eval_list_path),batch_size=batch_size,drop_last=True)

三、模型配置

VGG的核心是五組卷積操作，每兩組之間做Max-Pooling空間降維。同一組內采用多次連續的3X3卷積，卷積核的數目由較淺組的64增多到最深組的512，同一組內的卷積核數目是一樣的。卷積之后接兩層全連接層，之后是分類層。由于每組內卷積層的不同，有11、13、16、19層這幾種模型，上圖展示一個16層的網絡結構。

class ConvPool(fluid.dygraph.Layer):'''卷積+池化'''def __init__(self,num_channels,num_filters,filter_size,pool_size,pool_stride,groups,pool_padding=1,pool_type='max',conv_stride=1,conv_padding=0,act=None):super(ConvPool, self).__init__() self._conv2d_list = []for i in range(groups):conv2d = self.add_sublayer( #返回一個由所有子層組成的列表。'bb_%d' % i,fluid.dygraph.Conv2D(num_channels=num_channels, #通道數num_filters=num_filters, #卷積核個數filter_size=filter_size, #卷積核大小stride=conv_stride, #步長padding=conv_padding, #padding大小，默認為0act=act))self._conv2d_list.append(conv2d) self._pool2d = fluid.dygraph.Pool2D(pool_size=pool_size, #池化核大小pool_type=pool_type, #池化類型，默認是最大池化pool_stride=pool_stride, #池化步長pool_padding=pool_padding #填充大小)def forward(self, inputs):x = inputsfor conv in self._conv2d_list:x = conv(x)x = self._pool2d(x)return x

請完成 VGG網絡的定義：

class VGGNet(fluid.dygraph.Layer):'''VGG網絡'''def __init__(self):super(VGGNet, self).__init__()def forward(self, inputs, label=None):"""前向計算"""

四、模型訓練

all_train_iter=0 all_train_iters=[] all_train_costs=[] all_train_accs=[]def draw_train_process(title,iters,costs,accs,label_cost,lable_acc):plt.title(title, fontsize=24)plt.xlabel("iter", fontsize=20)plt.ylabel("cost/acc", fontsize=20)plt.plot(iters, costs,color='red',label=label_cost) plt.plot(iters, accs,color='green',label=lable_acc) plt.legend()plt.grid()plt.show()def draw_process(title,color,iters,data,label):plt.title(title, fontsize=24)plt.xlabel("iter", fontsize=20)plt.ylabel(label, fontsize=20)plt.plot(iters, data,color=color,label=label) plt.legend()plt.grid()plt.show() ''' 模型訓練 ''' # with fluid.dygraph.guard(place = fluid.CUDAPlace(0)): with fluid.dygraph.guard():print(train_parameters['class_dim'])print(train_parameters['label_dict'])vgg = VGGNet()optimizer=fluid.optimizer.AdamOptimizer(learning_rate=train_parameters['learning_strategy']['lr'],parameter_list=vgg.parameters()) for epoch_num in range(train_parameters['num_epochs']):for batch_id, data in enumerate(train_reader()):dy_x_data = np.array([x[0] for x in data]).astype('float32') y_data = np.array([x[1] for x in data]).astype('int64') y_data = y_data[:, np.newaxis]#將Numpy轉換為DyGraph接收的輸入img = fluid.dygraph.to_variable(dy_x_data)label = fluid.dygraph.to_variable(y_data)out,acc = vgg(img,label)loss = fluid.layers.cross_entropy(out, label)avg_loss = fluid.layers.mean(loss)#使用backward()方法可以執行反向網絡avg_loss.backward()optimizer.minimize(avg_loss)#將參數梯度清零以保證下一輪訓練的正確性vgg.clear_gradients()all_train_iter=all_train_iter+train_parameters['train_batch_size']all_train_iters.append(all_train_iter)all_train_costs.append(loss.numpy()[0])all_train_accs.append(acc.numpy()[0])if batch_id % 1 == 0:print("Loss at epoch {} step {}: {}, acc: {}".format(epoch_num, batch_id, avg_loss.numpy(), acc.numpy()))draw_train_process("training",all_train_iters,all_train_costs,all_train_accs,"trainning cost","trainning acc") draw_process("trainning loss","red",all_train_iters,all_train_costs,"trainning loss")draw_process("trainning acc","green",all_train_iters,all_train_accs,"trainning acc") #保存模型參數fluid.save_dygraph(vgg.state_dict(), "vgg") print("Final loss: {}".format(avg_loss.numpy()))

五、模型校驗

''' 模型校驗 ''' with fluid.dygraph.guard():model, _ = fluid.load_dygraph("vgg")vgg = VGGNet()vgg.load_dict(model)vgg.eval()accs = []for batch_id, data in enumerate(eval_reader()):dy_x_data = np.array([x[0] for x in data]).astype('float32')y_data = np.array([x[1] for x in data]).astype('int')y_data = y_data[:, np.newaxis]img = fluid.dygraph.to_variable(dy_x_data)label = fluid.dygraph.to_variable(y_data)out, acc = vgg(img, label)lab = np.argsort(out.numpy())accs.append(acc.numpy()[0]) print(np.mean(accs))

六、模型預測

def load_image(img_path):'''預測圖片預處理'''img = Image.open(img_path) if img.mode != 'RGB': img = img.convert('RGB') img = img.resize((224, 224), Image.BILINEAR)img = np.array(img).astype('float32') img = img.transpose((2, 0, 1)) # HWC to CHW img = img/255 # 像素值歸一化 return imglabel_dic = train_parameters['label_dict']''' 模型預測 '''with fluid.dygraph.guard():model, _ = fluid.dygraph.load_dygraph("vgg")vgg = VGGNet()vgg.load_dict(model)vgg.eval()#展示預測圖片infer_path='/home/aistudio/data/data23615/infer_mask01.jpg'img = Image.open(infer_path)plt.imshow(img) #根據數組繪制圖像plt.show() #顯示圖像#對預測圖片進行預處理infer_imgs = []infer_imgs.append(load_image(infer_path))infer_imgs = np.array(infer_imgs)for i in range(len(infer_imgs)):data = infer_imgs[i]dy_x_data = np.array(data).astype('float32')dy_x_data=dy_x_data[np.newaxis,:, : ,:]img = fluid.dygraph.to_variable(dy_x_data)out = vgg(img)lab = np.argmax(out.numpy()) #argmax():返回最大數的索引print("第{}個樣本,被預測為：{}".format(i+1,label_dic[str(lab)]))print("結束")

完成作業

定義 VGG網絡：

代碼和前幾天相似，但今天的示例代碼將整個模型眾多的參數進行了統一的封裝處理，并增加了一個 ConvPool類，將卷積和池化合在一起了，這樣會比較方便一些，我們也就這么使用了。

class VGGNet(fluid.dygraph.Layer):'''VGG網絡'''def __init__(self):super(VGGNet, self).__init__()# 通道數、卷積核個數、卷積核大小、池化核大小、池化步長、連續卷積個數self.convpool01 = ConvPool(3, 64, 3, 2, 2, 2, act='relu')self.convpool02 = ConvPool(64, 128, 3, 2, 2, 2, act='relu')self.convpool03 = ConvPool(128, 256, 3, 2, 2, 3, act='relu')self.convpool04 = ConvPool(256, 512, 3, 2, 2, 3, act='relu')self.convpool05 = ConvPool(512, 512, 3, 2, 2, 3, act='relu')self.pool_5_shape = 512*7*7self.fc01 = fluid.dygraph.Linear(self.pool_5_shape, 4096, act='relu')self.fc02 = fluid.dygraph.Linear(4096, 4096, act='relu')self.fc03 = fluid.dygraph.Linear(4096, 2, act='softmax')def forward(self, inputs, label=None):"""前向計算"""out = self.convpool01(inputs)out = self.convpool02(out)out = self.convpool03(out)out = self.convpool04(out)out = self.convpool05(out)out = fluid.layers.reshape(out, shape=[-1, 512*7*7])out = self.fc01(out)out = self.fc02(out)out = self.fc03(out)if label is not None:acc = fluid.layers.accuracy(input=out, label=label)return out, accelse:return out

我們初始采取的訓練輪數是 10，可以看到模型訓練的準確率是忽上忽下的。

繪制出來的圖像也是如此，訓練模型的準確率一直在震蕩。

測試集上的準確率在 0.6左右。

口罩識別是個二分類問題，結果只有戴口罩和不戴口罩兩種。我們用口罩圖片進行預測時還勉強能預測成功。

準確率只有 0.6 左右我們肯定還是要想著去優化的，從以下三個方面進行了調參。

訓練輪數，也就是迭代次數（num_epochs）
學習率（learningrate）
訓練時各批次大小（batch_size）

我們增加了訓練輪數，增加為 20輪次，下調了學習率，降至0.0001，增加了訓練時每個批次的大小，增加為 16。這些參數在第一步的環境配置中就可以修改。

這里強調一句，稍微調高訓練批次大小是可以提高準確率的，但是batch_size的大小不是隨便調的，一般是 8 的倍數，這樣 GPU 內部的并行運算效率最高。

訓練之后可以看到，訓練集上的準確率逐漸收斂為 1.0。

測試集上的準確率也達到了 1.0，很nice。

因為此次的學習數據比較少，模型的泛化能力不強，群里的大佬們反饋說，可以適當采用 數據增強（Data Augmentation） 的方法來提高真實場景下預測的成功率。

總結

以上是生活随笔為你收集整理的Day04-经典卷积神经网络解读的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： LeetCode5377. 将二进制表示
下一篇：卷积神经网络(cnn)的体系结构