仅使用NumPy完成卷积神经网络CNN的搭建(附Python代码)
摘要:?現(xiàn)有的Caffe、TensorFlow等工具箱已經(jīng)很好地實(shí)現(xiàn)CNN模型,但這些工具箱需要的硬件資源比較多,不利于初學(xué)者實(shí)踐和理解。因此,本文教大家如何僅使用NumPy來構(gòu)建卷積神經(jīng)網(wǎng)絡(luò)(Convolutional Neural Network , CNN)模型,具體實(shí)現(xiàn)了卷積層、ReLU激活函數(shù)層以及最大池化層(max pooling),代碼簡單,講解詳細(xì)。
? ? ? ?目前網(wǎng)絡(luò)上存在很多編譯好的機(jī)器學(xué)習(xí)、深度學(xué)習(xí)工具箱,在某些情況下,直接調(diào)用已經(jīng)搭好的模型可能是非常方便且有效的,比如Caffe、TensorFlow工具箱,但這些工具箱需要的硬件資源比較多,不利于初學(xué)者實(shí)踐和理解。因此,為了更好的理解并掌握相關(guān)知識(shí),最好是能夠自己編程實(shí)踐下。本文將展示如何使用NumPy來構(gòu)建卷積神經(jīng)網(wǎng)絡(luò)(Convolutional Neural Network , CNN)。
? ? ? ?CNN是較早提出的一種神經(jīng)網(wǎng)絡(luò),直到近年來才變得火熱,可以說是計(jì)算機(jī)視覺領(lǐng)域中應(yīng)用最多的網(wǎng)絡(luò)。一些工具箱中已經(jīng)很好地實(shí)現(xiàn)CNN模型,相關(guān)的庫函數(shù)已經(jīng)完全編譯好,開發(fā)人員只需調(diào)用現(xiàn)有的模塊即可完成模型的搭建,避免了實(shí)現(xiàn)的復(fù)雜性。但實(shí)際上,這樣會(huì)使得開發(fā)人員不知道其中具體的實(shí)現(xiàn)細(xì)節(jié)。有些時(shí)候,數(shù)據(jù)科學(xué)家必須通過一些細(xì)節(jié)來提升模型的性能,但這些細(xì)節(jié)是工具箱不具備的。在這種情況下,唯一的解決方案就是自己編程實(shí)現(xiàn)一個(gè)類似的模型,這樣你對(duì)實(shí)現(xiàn)的模型會(huì)有最高級(jí)別的控制權(quán),同時(shí)也能更好地理解模型每步的處理過程。
? ? ? ?本文將僅使用NumPy實(shí)現(xiàn)CNN網(wǎng)絡(luò),創(chuàng)建三個(gè)層模塊,分別為卷積層(Conv)、ReLu激活函數(shù)和最大池化(max pooling)。
1.讀取輸入圖像
? ? ? ?以下代碼將從skimage Python庫中讀取已經(jīng)存在的圖像,并將其轉(zhuǎn)換為灰度圖:
1. import skimage.data 2. # Reading the image 3. img = skimage.data.chelsea() 4. # Converting the image into gray. 5. img = skimage.color.rgb2gray(img)js? ? ? ?讀取圖像是第一步,下一步的操作取決于輸入圖像的大小。將圖像轉(zhuǎn)換為灰度圖如下所示:
2.準(zhǔn)備濾波器
? ? ? ?以下代碼為第一個(gè)卷積層Conv準(zhǔn)備濾波器組(Layer 1,縮寫為l1,下同):
1. l1_filter = numpy.zeros((2,3,3))? ? ? ?根據(jù)濾波器的數(shù)目和每個(gè)濾波器的大小來創(chuàng)建零數(shù)組。上述代碼創(chuàng)建了2個(gè)3x3大小的濾波器,(2,3,3)中的元素?cái)?shù)字分別表示2:濾波器的數(shù)目(num_filters)、3:表示濾波器的列數(shù)、3:表示濾波器的行數(shù)。由于輸入圖像是灰度圖,讀取后變成2維圖像矩陣,因此濾波器的尺寸選擇為2維陣列,舍去了深度。如果圖像是彩色圖(具有3個(gè)通道,分別為RGB),則濾波器的大小必須為(3,3,3),最后一個(gè)3表示深度,上述代碼也要更改,變成(2,3,3,3)。
? ? ? ?濾波器組的大小由自己指定,但沒有給定濾波器中具體的數(shù)值,一般采用隨機(jī)初始化。下列一組值可以用來檢查垂直和水平邊緣:
3.卷積層(Conv Layer)
? ? ? ?構(gòu)建好濾波器后,接下來就是與輸入圖像進(jìn)行卷積操作。下面代碼使用conv函數(shù)將輸入圖像與濾波器組進(jìn)行卷積:
1. l1_feature_map = conv(img, l1_filter)? ? ? ?conv函數(shù)只接受兩個(gè)參數(shù),分別為輸入圖像、濾波器組:
1. def conv(img, conv_filter): 2. if len(img.shape) > 2 or len(conv_filter.shape) > 3: # Check if number of image channels matches the filter depth. 3. if img.shape[-1] != conv_filter.shape[-1]: 4. print("Error: Number of channels in both image and filter must match.") 5. sys.exit() 6. if conv_filter.shape[1] != conv_filter.shape[2]: # Check if filter dimensions are equal. 7. print('Error: Filter must be a square matrix. I.e. number of rows and columns must match.') 8. sys.exit() 9. if conv_filter.shape[1]%2==0: # Check if filter diemnsions are odd. 10. print('Error: Filter must have an odd size. I.e. number of rows and columns must be odd.') 11. sys.exit() 12. 13. # An empty feature map to hold the output of convolving the filter(s) with the image. 14. feature_maps = numpy.zeros((img.shape[0]-conv_filter.shape[1]+1, 15. img.shape[1]-conv_filter.shape[1]+1, 16. conv_filter.shape[0])) 17. 18. # Convolving the image by the filter(s). 19. for filter_num in range(conv_filter.shape[0]): 20. print("Filter ", filter_num + 1) 21. curr_filter = conv_filter[filter_num, :] # getting a filter from the bank. 22. """ 23. Checking if there are mutliple channels for the single filter. 24. If so, then each channel will convolve the image. 25. The result of all convolutions are summed to return a single feature map. 26. """ 27. if len(curr_filter.shape) > 2: 28. conv_map = conv_(img[:, :, 0], curr_filter[:, :, 0]) # Array holding the sum of all feature maps. 29. for ch_num in range(1, curr_filter.shape[-1]): # Convolving each channel with the image and summing the results. 30. conv_map = conv_map + conv_(img[:, :, ch_num], 31. curr_filter[:, :, ch_num]) 32. else: # There is just a single channel in the filter. 33. conv_map = conv_(img, curr_filter) 34. feature_maps[:, :, filter_num] = conv_map # Holding feature map with the current filter. 35. return feature_maps # Returning all feature maps.? ? ? ?該函數(shù)首先確保每個(gè)濾波器的深度等于圖像通道的數(shù)目,代碼如下。if語句首先檢查圖像與濾波器是否有一個(gè)深度通道,若存在,則檢查其通道數(shù)是否相等,如果匹配不成功,則報(bào)錯(cuò)。
1. if len(img.shape) > 2 or len(conv_filter.shape) > 3: # Check if number of image channels matches the filter depth. 2. if img.shape[-1] != conv_filter.shape[-1]: 3. print("Error: Number of channels in both image and filter must match.")? ? ? ?此外,濾波器的大小應(yīng)該是奇數(shù),且每個(gè)濾波器的大小是相等的。這是根據(jù)下面兩個(gè)if條件語塊來檢查的。如果條件不滿足,則程序報(bào)錯(cuò)并退出。
1. if conv_filter.shape[1] != conv_filter.shape[2]: # Check if filter dimensions are equal. 2. print('Error: Filter must be a square matrix. I.e. number of rows and columns must match.') 3. sys.exit() 4. if conv_filter.shape[1]%2==0: # Check if filter diemnsions are odd. 5. print('Error: Filter must have an odd size. I.e. number of rows and columns must be odd.') 6. sys.exit()? ? ? ?上述條件都滿足后,通過初始化一個(gè)數(shù)組來作為濾波器的值,通過下面代碼來指定濾波器的值:
1. # An empty feature map to hold the output of convolving the filter(s) with the image. 2. feature_maps = numpy.zeros((img.shape[0]-conv_filter.shape[1]+1, 3. img.shape[1]-conv_filter.shape[1]+1, 4. conv_filter.shape[0]))? ? ? ?由于沒有設(shè)置步幅(stride)或填充(padding),默認(rèn)為步幅設(shè)置為1,無填充。那么卷積操作后得到的特征圖大小為(img_rows-filter_rows+1, image_columns-filter_columns+1, num_filters),即輸入圖像的尺寸減去濾波器的尺寸后再加1。注意到,每個(gè)濾波器都會(huì)輸出一個(gè)特征圖。
1. # Convolving the image by the filter(s). 2. for filter_num in range(conv_filter.shape[0]): 3. print("Filter ", filter_num + 1) 4. curr_filter = conv_filter[filter_num, :] # getting a filter from the bank. 5. """ 6. Checking if there are mutliple channels for the single filter. 7. If so, then each channel will convolve the image. 8. The result of all convolutions are summed to return a single feature map. 9. """ 10. if len(curr_filter.shape) > 2: 11. conv_map = conv_(img[:, :, 0], curr_filter[:, :, 0]) # Array holding the sum of all feature maps. 12. for ch_num in range(1, curr_filter.shape[-1]): # Convolving each channel with the image and summing the results. 13. conv_map = conv_map + conv_(img[:, :, ch_num], 14. curr_filter[:, :, ch_num]) 15. else: # There is just a single channel in the filter. 16. conv_map = conv_(img, curr_filter) 17. feature_maps[:, :, filter_num] = conv_map # Holding feature map with the current filter.循環(huán)遍歷濾波器組中的每個(gè)濾波器后,通過下面代碼更新濾波器的狀態(tài):
1. curr_filter = conv_filter[filter_num, :] # getting a filter from the bank.? ? ? ?如果輸入圖像不止一個(gè)通道,則濾波器必須具有同樣的通道數(shù)目。只有這樣,卷積過程才能正常進(jìn)行。最后將每個(gè)濾波器的輸出求和作為輸出特征圖。下面的代碼檢測(cè)輸入圖像的通道數(shù),如果圖像只有一個(gè)通道,那么一次卷積即可完成整個(gè)過程:
1. if len(curr_filter.shape) > 2: 2. conv_map = conv_(img[:, :, 0], curr_filter[:, :, 0]) # Array holding the sum of all feature map 3. for ch_num in range(1, curr_filter.shape[-1]): # Convolving each channel with the image and summing the results. 4. conv_map = conv_map + conv_(img[:, :, ch_num], 5. curr_filter[:, :, ch_num]) 6. else: # There is just a single channel in the filter. 7. conv_map = conv_(img, curr_filter)? ? ? ?上述代碼中conv_函數(shù)與之前的conv函數(shù)不同,函數(shù)conv只接受輸入圖像和濾波器組這兩個(gè)參數(shù),本身并不進(jìn)行卷積操作,它只是設(shè)置用于conv_函數(shù)執(zhí)行卷積操作的每一組輸入濾波器。下面是conv_函數(shù)的實(shí)現(xiàn)代碼:
1. def conv_(img, conv_filter): 2. filter_size = conv_filter.shape[0] 3. result = numpy.zeros((img.shape)) 4. #Looping through the image to apply the convolution operation. 5. for r in numpy.uint16(numpy.arange(filter_size/2, 6. img.shape[0]-filter_size/2-2)): 7. for c in numpy.uint16(numpy.arange(filter_size/2, img.shape[1]-filter_size/2-2)): 8. #Getting the current region to get multiplied with the filter. 9. curr_region = img[r:r+filter_size, c:c+filter_size] 10. #Element-wise multipliplication between the current region and the filter. 11. curr_result = curr_region * conv_filter 12. conv_sum = numpy.sum(curr_result) #Summing the result of multiplication. 13. result[r, c] = conv_sum #Saving the summation in the convolution layer feature map. 14. 15. #Clipping the outliers of the result matrix. 16. final_result = result[numpy.uint16(filter_size/2):result.shape[0]-numpy.uint16(filter_size/2), 17. numpy.uint16(filter_size/2):result.shape[1]-numpy.uint16(filter_size/2)] 18. return final_result每個(gè)濾波器在圖像上迭代卷積的尺寸相同,通過以下代碼實(shí)現(xiàn):
1. curr_region = img[r:r+filter_size, c:c+filter_size]之后,在圖像區(qū)域矩陣和濾波器之間對(duì)位相乘,并將結(jié)果求和以得到單值輸出:
1. #Element-wise multipliplication between the current region and the filter. 2. curr_result = curr_region * conv_filter 3. conv_sum = numpy.sum(curr_result) #Summing the result of multiplication. 4. result[r, c] = conv_sum #Saving the summation in the convolution layer feature map.? ? ? ?輸入圖像與每個(gè)濾波器卷積后,通過conv函數(shù)返回特征圖。下圖顯示conv層返回的特征圖(由于l1卷積層的濾波器參數(shù)為(2,3,3),即2個(gè)3x3大小的卷積核,最終輸出2個(gè)特征圖):
卷積后圖像
卷積層的后面一般跟著激活函數(shù)層,本文采用ReLU激活函數(shù)。
4.ReLU激活函數(shù)層
? ? ? ?ReLU層將ReLU激活函數(shù)應(yīng)用于conv層輸出的每個(gè)特征圖上,根據(jù)以下代碼行調(diào)用ReLU激活函數(shù):
l1_feature_map_relu = relu(l1_feature_map)ReLU激活函數(shù)(ReLU)的具體實(shí)現(xiàn)代碼如下:
1. def relu(feature_map): 2. #Preparing the output of the ReLU activation function. 3. relu_out = numpy.zeros(feature_map.shape) 4. for map_num in range(feature_map.shape[-1]): 5. for r in numpy.arange(0,feature_map.shape[0]): 6. for c in numpy.arange(0, feature_map.shape[1]): 7. relu_out[r, c, map_num] = numpy.max(feature_map[r, c, map_num], 0)? ? ? ?ReLU思想很簡單,只是將特征圖中的每個(gè)元素與0進(jìn)行比較,若大于0,則保留原始值。否則將其設(shè)置為0。ReLU層的輸出如下圖所示:
ReLU層輸出圖像
激活函數(shù)層后面一般緊跟池化層,本文采用最大池化(max pooling)。
5.最大池化層
? ? ? ?ReLU層的輸出作為最大池化層的輸入,根據(jù)下面的代碼行調(diào)用最大池化操作:
1. l1_feature_map_relu_pool = pooling(l1_feature_map_relu, 2, 2)最大池化函數(shù)(max pooling)的具體實(shí)現(xiàn)代碼如下:
1. def pooling(feature_map, size=2, stride=2): 2. #Preparing the output of the pooling operation. 3. pool_out = numpy.zeros((numpy.uint16((feature_map.shape[0]-size+1)/stride), 4. numpy.uint16((feature_map.shape[1]-size+1)/stride), 5. feature_map.shape[-1])) 6. for map_num in range(feature_map.shape[-1]): 7. r2 = 0 8. for r in numpy.arange(0,feature_map.shape[0]-size-1, stride): 9. c2 = 0 10. for c in numpy.arange(0, feature_map.shape[1]-size-1, stride): 11. pool_out[r2, c2, map_num] = numpy.max(feature_map[r:r+size, c:c+size]) 12. c2 = c2 + 1 13. r2 = r2 +1? ? ? ?該函數(shù)接受3個(gè)參數(shù),分別為ReLU層的輸出,池化掩膜的大小和步幅。首先也是創(chuàng)建一個(gè)空數(shù)組,用來保存該函數(shù)的輸出。數(shù)組大小根據(jù)輸入特征圖的尺寸、掩膜大小以及步幅來確定。
1. pool_out = numpy.zeros((numpy.uint16((feature_map.shape[0]-size+1)/stride), 2. numpy.uint16((feature_map.shape[1]-size+1)/stride), 3. feature_map.shape[-1]))? ? ? ?對(duì)每個(gè)輸入特征圖通道都進(jìn)行最大池化操作,返回該區(qū)域中最大的值,代碼如下:
pool_out[r2, c2, map_num] = numpy.max(feature_map[r:r+size, c:c+size])? ? ? ?池化層的輸出如下圖所示,這里為了顯示讓其圖像大小看起來一樣,其實(shí)池化操作后圖像尺寸遠(yuǎn)遠(yuǎn)小于其輸入圖像。
池化層輸出圖像
6.層堆疊
? ? ? ?以上內(nèi)容已經(jīng)實(shí)現(xiàn)CNN結(jié)構(gòu)的基本層——conv、ReLU以及max pooling,現(xiàn)在將其進(jìn)行堆疊使用,代碼如下:
1. # Second conv layer 2. l2_filter = numpy.random.rand(3, 5, 5, l1_feature_map_relu_pool.shape[-1]) 3. print("\n**Working with conv layer 2**") 4. l2_feature_map = conv(l1_feature_map_relu_pool, l2_filter) 5. print("\n**ReLU**") 6. l2_feature_map_relu = relu(l2_feature_map) 7. print("\n**Pooling**") 8. l2_feature_map_relu_pool = pooling(l2_feature_map_relu, 2, 2) 9. print("**End of conv layer 2**\n")? ? ? ?從代碼中可以看到,l2表示第二個(gè)卷積層,該卷積層使用的卷積核為(3,5,5),即3個(gè)5x5大小的卷積核(濾波器)與第一層的輸出進(jìn)行卷積操作,得到3個(gè)特征圖。后續(xù)接著進(jìn)行ReLU激活函數(shù)以及最大池化操作。將每個(gè)操作的結(jié)果可視化,如下圖所示:
l2層處理過程可視化圖像
? ? ? ?從代碼中可以看到,l3表示第三個(gè)卷積層,該卷積層使用的卷積核為(1,7,7),即1個(gè)7x7大小的卷積核(濾波器)與第二層的輸出進(jìn)行卷積操作,得到1個(gè)特征圖。后續(xù)接著進(jìn)行ReLU激活函數(shù)以及最大池化操作。將每個(gè)操作的結(jié)果可視化,如下圖所示:
l3層處理過程可視化圖像
? ? ? ?神經(jīng)網(wǎng)絡(luò)的基本結(jié)構(gòu)是前一層的輸出作為下一層的輸入,比如l2層接收l1層的輸出,l3層接收來l2層的輸出,代碼如下:1. l2_feature_map = conv(l1_feature_map_relu_pool, l2_filter) 2. l3_feature_map = conv(l2_feature_map_relu_pool, l3_filter)
7.完整代碼
? ? ? ?全部代碼已經(jīng)上傳至Github上,每層的可視化是使用Matplotlib庫實(shí)現(xiàn)。
本文由阿里云云棲社區(qū)組織翻譯。
文章原標(biāo)題《Building Convolutional Neural Network using NumPy from Scratch》
原文鏈接
總結(jié)
以上是生活随笔為你收集整理的仅使用NumPy完成卷积神经网络CNN的搭建(附Python代码)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Python数据预处理:使用Dask和N
- 下一篇: MaxCompute Studio使用心