當(dāng)前位置：首頁(yè) > 人工智能 > 卷积神经网络 >内容正文

卷积神经网络

卷积神经网络如何解释和预测图像

發(fā)布時(shí)間：2023/12/15 卷积神经网络 35 豆豆

生活随笔收集整理的這篇文章主要介紹了卷积神经网络如何解释和预测图像小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Source資源

深層學(xué)習(xí)基礎(chǔ) (DEEP LEARNING BASICS)

Aim of this article is to provide an intuitive understanding behind the inner working of key layers in a convolution neural network. The idea is to go beyond simply stating the facts and exploring how image manipulation actually works.

本文的目的是提供對(duì)卷積神經(jīng)網(wǎng)絡(luò)中關(guān)鍵層內(nèi)部工作的直觀了解。這個(gè)想法不只是簡(jiǎn)單地陳述事實(shí) ，而是探索圖像處理的實(shí)際作用 。

目標(biāo) (The Objective)

Out aim is to design a deep learning framework capable of classify cat and dog images like those shown below. Let us start by thinking about what challenges such an algorithm must overcome.

最終目的是設(shè)計(jì)一個(gè)能夠?qū)ω埡凸穲D像進(jìn)行分類的深度學(xué)習(xí)框架，如下所示。讓我們首先考慮一下這種算法必須克服的挑戰(zhàn)。

It should be able to detect cats and dogs of different color, size, shape, and breed. It must be able to detect and classify animals even from pictures where the dog or the cat is not entirely visible. It must be sensitive to presence of more than one dog in the image. Most importantly, the algorithm must be spatially invariant — it must be able to recognise dogs physically located in any corner of the image.

它應(yīng)該能夠檢測(cè)出不同顏色，大小，形狀和品種的貓和狗。即使從狗或貓不完全可見(jiàn)的圖片中，它也必須能夠?qū)?dòng)物進(jìn)行檢測(cè)和分類。它必須對(duì)圖像中有不止一只狗的情況敏感。最重要的是，該算法必須在空間上不變-它必須能夠識(shí)別物理上位于圖像任何角落的狗。

計(jì)算機(jī)如何讀取圖像。 (How computer reads images.)

Image quality improves with pixel count圖像質(zhì)量隨像素?cái)?shù)提高

Images are composed of pixels that have values ranging from 0–255 that depict brightness. 0 means black, 255 is white and everything else is some shade of grey. More the pixels, better the image quality.

圖像由像素組成，其值在0-255之間，表示亮度。 0表示黑色，255表示白色，其他所有內(nèi)容均為灰色。像素越多，圖像質(zhì)量越好。

Three channels of a RGB imageRGB圖像的三個(gè)通道

While a greyscale image is made of a single channel (i.e. a 2D array). A color image in the RBG format is composed of three different layers, stacked on top of each other

灰度圖像由單個(gè)通道(即2D陣列)組成。 RBG格式的彩色圖像由三個(gè)不同的層組成，彼此堆疊

多層感知器的局限性。 (Limitations of a multi-layered perceptron.)

The contents of each pixel are fed into the perceptron separately. Each neuron processes a pixel in the input layer. For a image of dimensions 350*720, the total number of parameters to be learned for the input layer alone will be (350*720*3 (three channels for each pixel)*2 (two parameters per neuron, weight and bias)) 1.5 million. This number will scale linearly with number of layers, making the MLP an incredibly computationally intensive to learn. This however is not the only challeng with MLPs.

每個(gè)像素的內(nèi)容分別送入感知器。每個(gè)神經(jīng)元處理輸入層中的一個(gè)像素。對(duì)于尺寸為350 * 720的圖像，僅輸入層要學(xué)習(xí)的參數(shù)總數(shù)將為(350 * 720 * 3(每個(gè)像素三個(gè)通道)* 2(每個(gè)神經(jīng)元兩個(gè)參數(shù)，權(quán)重和偏差) 150萬(wàn)。該數(shù)目將隨層數(shù)線性增長(zhǎng)，這使得MLP的計(jì)算量大到難以學(xué)習(xí)。但是，這并不是MLP的唯一挑戰(zhàn)。

MLPs have no inbuilt mechanism for being spatially invariant. If a MLP has been trained to detect dogs in the top right corner of the image, it will fail when dogs are located in other positions. This is a serious drawback and in the subsequent sections we will discuss how to overcome this challenge.

MLP沒(méi)有內(nèi)置的機(jī)制來(lái)保持空間不變。如果已訓(xùn)練MLP在圖像的右上角檢測(cè)狗，則當(dāng)狗位于其他位置時(shí)，它將失敗。這是一個(gè)嚴(yán)重的缺陷，在接下來(lái)的部分中，我們將討論如何克服這一挑戰(zhàn)。

A convolution neural network aims to ameliorate these drawbacks using built-in mechanism for (1) extracting different high level features (2) introducing spatial invariance (3) improving networks learning ability.

卷積神經(jīng)網(wǎng)絡(luò)旨在使用內(nèi)置機(jī)制來(lái)緩解這些缺點(diǎn)，該機(jī)制用于 (1)提取不同的高級(jí)特征(2)引入空間不變性(3)改善網(wǎng)絡(luò)學(xué)習(xí)能力。

圖像特征提取。 (Image feature extraction.)

Convolution (discrete convolution to be specific) is based on use to linear transformations to extract the key features of input images while preserving the ordering of information. The input is convolved with a kernel to generate the output, similar to the response generated by a network of neurons in the visual cortex.

卷積(具體來(lái)說(shuō)是離散卷積)基于線性變換的使用，以在保持信息有序的同時(shí)提取輸入圖像的關(guān)鍵特征。輸入與內(nèi)核進(jìn)行卷積以生成輸出，類似于視覺(jué)皮層中神經(jīng)元網(wǎng)絡(luò)生成的響應(yīng)。

Kernel

核心

The kernel (also known as a filter or a feature detector) samples the input image matrix with a pre-determined step size (known as stride) in both horizontal and vertical directions. As the kernel slides over the input image, the element-wise product between each element of the kernel and overlapping elements of the input image is calculated to obtain to the output for the current location. When the input image is composed of multiple channels (which is almost always the case), the kernel has the same depth as the number of channels in the input image. The dot product in such cases is added to arrive at a feature map composed of a single channel.

內(nèi)核(也稱為過(guò)濾器或特征檢測(cè)器)在水平和垂直方向上以預(yù)定步長(zhǎng)(稱為步幅)對(duì)輸入圖像矩陣進(jìn)行采樣。當(dāng)內(nèi)核在輸入圖像上滑動(dòng)時(shí)，將計(jì)算內(nèi)核的每個(gè)元素與輸入圖像的重疊元素之間的逐元素乘積，以獲取當(dāng)前位置的輸出。當(dāng)輸入圖像由多個(gè)通道組成時(shí)(幾乎總是這樣)，內(nèi)核的深度與輸入圖像中通道的數(shù)量相同。在這種情況下，將點(diǎn)積相加即可得出由單個(gè)通道組成的特征圖。

Convolution : The image 卷積：將表示為張量為7 * 7 * 1的圖像I represented as a tensor of dimension 7*7*1 is convolved with a 3*3 filter I與3 * 3濾鏡K to result in a 5*5 output image. Shown above is one such step of the matrix multiplication process. K卷積，以生成5 * 5的輸出圖像。上面顯示的是矩陣乘法過(guò)程中的一個(gè)這樣的步驟。 Source資源

If you are new to matrix multiplication, check out this youtube video for a more detailed explanation.

如果您不熟悉矩陣乘法，請(qǐng) 觀看此 youtube視頻以獲取更詳細(xì)的說(shuō)明。

Single Stride Convolution : This animation shows how a kernel scans through a input image from left to right and from top to bottom to result in a output image. For a stride one convolution, the kernel moves a unit distance in each direction during very step. 單步卷積(Single Stride Convolution) ：此動(dòng)畫顯示了內(nèi)核如何從左到右，從上到下掃描輸入圖像以生成輸出圖像。對(duì)于跨步一次卷積，內(nèi)核在非常大的一步中沿每個(gè)方向移動(dòng)單位距離。 Source資源

While a CNN made of a single convolution layer would only be able to extract/learn low level features of the input image, adding successive convolution layers significantly improves the ability of the CNN to learn high level features.

盡管由單個(gè)卷積層構(gòu)成的CNN僅能夠提取/學(xué)習(xí)輸入圖像的低級(jí)特征，但是添加連續(xù)的卷積層會(huì)顯著提高CNN學(xué)習(xí)高級(jí)特征的能力。

Double Stride Convolution : This animation shows how a kernel scans through a input image from left to right and from top to bottom to result in a output image. For a stride two convolution, the kernel moves two units distance in each direction during very step. Double Stride Convolution ：此動(dòng)畫顯示了內(nèi)核如何從左到右以及從上到下掃描輸入圖像以生成輸出圖像。對(duì)于大步兩次卷積，內(nèi)核在非常大的一步中沿每個(gè)方向移動(dòng)了兩個(gè)單位距離。 Source資源

Rectifier

整流器

To introduce non-linearity into the system and improve the learning capacity, the output from the convolution operation is passed through a non-saturating activation function like sigmoid or rectified linear unit (ReLU). Check out this excellent article about these and several other commonly used activation functions.

為了將非線性引入系統(tǒng)并提高學(xué)習(xí)能力，卷積運(yùn)算的輸出將通過(guò)非飽和激活函數(shù)(如S型或整流線性單元(ReLU))傳遞。查看關(guān)于這些以及其他幾個(gè)常用激活功能的出色文章。

Rectifer : The two most widely used rectifier functions, sigmoid and ReLU.整流器：最廣泛使用的兩個(gè)整流器功能，S型和ReLU。

Padding

填充

The feature map resulting from convolution is smaller in size compared to the input image. For an input image of I*I that is convolved with a kernel of size K*K with a stride S, the output will be [(I-F)/S + 1]* [(I-F)/S + 1]. This can result in substantial reduction in image size in large CovNets made of several convolution layers. A zero padding of [(F-1)/2] all around the output image can be used to preserve the convolution output. Alternatively, the padding size itself can be turned into one of the hyperparameters that is learned during the training of the CNN.

與輸入圖像相比，由卷積產(chǎn)生的特征圖的大小較小。對(duì)于 I * I 的輸入圖像，它與步長(zhǎng)為 S 的大小為 K * K 的內(nèi)核卷積，輸出將為 [(IF)/ S + 1] * [(IF)/ S +1] 。在由多個(gè)卷積層組成的大型CovNet中，這可能會(huì)導(dǎo)致圖像大小的顯著減小。輸出圖像周圍的 [[F-1)/ 2] 零填充可用于保留卷積輸出。備選地，填充大小本身可以變成在CNN的訓(xùn)練期間學(xué)習(xí)的超參數(shù)之一。

For the most general case where an input image of size I*I is convolved with a filter of size K*K with a stride S and padding P, the output will have the dimension [(I+2P-K)/S +1]*[(I+2P-K)/S +1].

對(duì)于最一般的情況，其中輸入大小為 I * I的圖像與大小為 K * K 且步幅為 S 且填充為 P 的濾波器進(jìn)行卷積時(shí) ，輸出將具有 [[I + 2P-K)/ S +1 ] * [(I + 2P-K)/ S +1] 。

Padding : When a 5*5 image is convolved with a 3*3 kernel without padding, the resultant image is 3*3. A single layer of padding changes the input image dimensions to 7*7. This when convolved with a 3*3 filter results in a 5*5 output. 填充：將5 * 5圖像與3 * 3內(nèi)核卷積而不填充時(shí)，所得圖像為3 * 3。單層填充將輸入圖像的尺寸更改為7 * 7。與3 * 3過(guò)濾器卷積時(shí)，將產(chǎn)生5 * 5的輸出。 Source資源

Pooling

匯集

The convolution output is pooled so as to introduce spatial invariance i.e the ability to detect the same feature in different images. The idea here is to retain key information corresponding to important features that the CNN must learn and at the same time reduce image size by getting rid of insignificant information. While there are several variation, max pooling is the most commonly used strategy. The convolution product is split into non-overlapping patches of size K*K and only the maximum value of each patch is recorded in the output.

合并卷積輸出以引入空間不變性，即在不同圖像中檢測(cè)相同特征的能力。這里的想法是保留與CNN必須學(xué)習(xí)的重要功能相對(duì)應(yīng)的關(guān)鍵信息，同時(shí)通過(guò)擺脫無(wú)關(guān)緊要的信息來(lái)減小圖像尺寸。盡管存在多種變體，但最大池化是最常用的策略。卷積積被拆分為大小為 K * K的非重疊面片，并且僅每個(gè)面片的最大值記錄在輸出中。

Max-Pooling : A 4*4 input image is max-pooled with a 2*2 kernel resulting in a 2*2 output.最大池化：將4 * 4輸入圖像最大池化為2 * 2內(nèi)核，從而產(chǎn)生2 * 2輸出。

Other less frequently used pooling strategies include average pooling, ‘mixed’ max-average pooling, stochastic pooling, spatial pyramid pooling etc.

其他不常用的合并策略包括平均合并，“混合”最大平均合并，隨機(jī)合并，空間金字塔合并等。

Let us summarise the concepts discussed so far as they apply to the VGGNet 16 architecture. Show below are the convolution layers of this network.

讓我們總結(jié)一下所討論的概念，直到它們適用于VGGNet 16架構(gòu)。下面顯示的是該網(wǎng)絡(luò)的卷積層。

VGGNet 16 : The 13 convolution layers of VGGNet16. VGGNet 16：VGGNet16的13個(gè)卷積層。 Source資源

This network excepts a image made of 224*224 pixels and 3 channels (corresponding to red, green and blue) as input. It is then processes through a series of convolution layers (shown in black), not all of which are followed by a max pooling step. Five distinct convolution blocks are depicted in the image above. All convolution steps use 3*3 kernels and all max pooling steps use a 2*2 kernel. The number of kernels used in each convolution block gradually increases, from 64 in first to 512 in the fourth and fifth convolution block. Initially two and later three convolution layers are used per block. This is important for increasing the receptive field since the kernel size is mantained constant throughout this architecture. The output of block five is passed through a maxpooling layer at the end, resulting in a 7*7*512 output. The output from the last convolution block is then fed into the fully connected layer discussed in the subsequent section. For a more detailed understanding of VGGNet, read the original paper.

該網(wǎng)絡(luò)不包括由224 * 224像素和3個(gè)通道(分別對(duì)應(yīng)于紅色，綠色和藍(lán)色)作為輸入的圖像。然后，它通過(guò)一系列卷積層(以黑色顯示)進(jìn)行處理，并非所有卷積層之后都是最大池化步驟。上圖中描繪了五個(gè)不同的卷積塊。所有卷積步驟都使用3 * 3內(nèi)核，所有最大池化步驟都使用2 * 2內(nèi)核。每個(gè)卷積塊中使用的內(nèi)核數(shù)量逐漸增加，從最初的64個(gè)增加到第四個(gè)和第五個(gè)卷積塊中的512個(gè)。最初，每個(gè)塊使用兩個(gè)和后來(lái)的三個(gè)卷積層。這對(duì)于增加接收?qǐng)龊苤匾?#xff0c;因?yàn)樵谡麄€(gè)體系結(jié)構(gòu)中內(nèi)核大小保持恒定。塊5的輸出最后通過(guò)maxpooling層，產(chǎn)生7 * 7 * 512的輸出。然后，最后一個(gè)卷積塊的輸出將饋送到下一節(jié)中討論的完全連接的層中。為了更詳細(xì)地了解VGGNet，請(qǐng)閱讀原始文章。

翻譯自: https://medium.com/@aseem.kash/a-comprehensive-guide-to-convolution-neural-networks-4bc10584cbac

總結(jié)

以上是生活随笔為你收集整理的卷积神经网络如何解释和预测图像的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：关于微软Silverlight，你应该知
下一篇： cnn卷积神经网络_5分钟内卷积神经网络