當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

alexnet 结构_AlexNet的体系结构和实现

發布時間：2023/12/15 编程问答 41 豆豆

生活随笔收集整理的這篇文章主要介紹了 alexnet 结构_AlexNet的体系结构和实现小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

alexnet 結構

In my last blog, I gave a detailed explanation of the LeNet-5 architecture. In this blog, we’ll explore the enhanced version of it which is AlexNet.

在上一個博客中，我詳細介紹了 LeNet-5體系結構 。在此博客中，我們將探索它的增強版本AlexNet。

AlexNet was the winner of the 2012 Imagenet Large Scale Visual Recognition Challenge(ILSVRC-2012) submitted by Alex Krizhevsky, Ilya Sutskever, and. Geoffrey E. Hinton and this model beat its nearest contender by more than a 10% error rate. These visual recognition challenges encourage the researchers to monitor the progress of computer vision research across the globe.

一個 lexNet提交由Alex Krizhevsky，伊利亞Sutskever 2012年Imagenet大型視覺識別挑戰(ILSVRC-2012)的冠軍，和。杰弗里·欣頓(Geoffrey E. Hinton)和這個模型以10％以上的錯誤率擊敗了其最接近的競爭者。這些視覺識別挑戰鼓勵研究人員監視全球計算機視覺研究的進展。

Before we proceed further let’s discuss the data present in the Imagenet dataset. It contains images from dogs, horses, cars, etc. It contains 1000 classes and each class contains thousands of images. So in total, there are approximately 1.2 million high-resolution images in this dataset used by researchers for training, testing, and validating the model designed by researchers.

在繼續進行之前，讓我們討論Imagenet數據集中存在的數據。它包含來自狗，馬，汽車等的圖像。它包含1000個類別 ，每個類別包含數千個圖像。因此，在該數據集中，總共有大約120萬張高分辨率圖像被研究人員用于訓練，測試和驗證研究人員設計的模型。

Let’s dive into the AlexNet Architecture

讓我們深入研究AlexNet架構

Photo by Dylan Nolte on UnsplashD ylan Nolte在Unsplash上拍攝的照片

The AlexNet neural network architecture consists of 8 learned layers of which 5 are convolution layers, few are max-pooling layers, 3 are fully connected layers, and the output layer is a 1000 channel softmax layer. The pooling used here is Max pool.

AlexNet神經網絡體系結構由8個學習層組成，其中5個是卷積層，很少是最大合并層， 3個是全連接層，輸出層是1000通道softmax層。這里使用的池是最大池。

Why 1000 channels of softmax layer are taken??

為什么要使用1000個softmax層通道？

This is because the Imagenet dataset contains 1000 different classes of images, so at the final output layer we have one node for each of these 1000 categories and the output layer is the softmax output layer.

這是因為Imagenet數據集包含1000種不同類別的圖像，因此在最終輸出層中，這1000個類別中的每一個都有一個節點，并且輸出層是softmax輸出層。

The basic architecture of AlexNet is as shown below:

AlexNet的基本體系結構如下所示：

Image from Anh H Reynolds Blog圖片來自Anh H Reynolds博客

The input to the AlexNet network is a 227 x 227 size RGB image, so it’s having 3 different channels- red, green, and blue.

AlexNet網絡的輸入是227 x 227尺寸的RGB圖像，因此它具有3個不同的通道-紅色，綠色和藍色。

Then we have the First Convolution Layer in the AlexNet that has 96 different kernels with each kernel’s size of 11 x 11 and with stride equals 4. So the output of the first convolution layer gives you 96 different channels or feature maps because there are 96 different kernels and each feature map contains features of size 55 x 55.

然后在AlexNet中有第一個卷積層 ，其中有96個不同的內核，每個內核的大小為11 x 11，步幅等于4。所以第一個卷積層的輸出為您提供96個不同的通道或特征圖，因為有96種不同的內核和每個要素圖都包含大小為55 x 55的要素。

計算： (Calculations:)

Size of Input: N = 227 x 227
輸入大小：N = 227 x 227
Size of Convolution Kernels: f = 11 x 11
卷積核的大小：f = 11 x 11
No. of Kernels: 96
仁數： 96
Strides: S = 4
步幅：S = 4
Padding: P = 0
填充：P = 0

Size of each feature map = [(N — f + 2P)/S] + 1

每個特征圖的大小= [(N — f + 2P)/ S] + 1

Size of each feature map = (227–11+0)/4+1 = 55
每個特征圖的大小= (227–11 + 0)/ 4 + 1 = 55

So every feature map after the first convolution layer is of the size 55 x 55.

因此，第一個卷積層之后的每個要素圖的大小均為55 x 55。

After this convolution, we have an Overlapping Max Pool Layer, where the max-pooling is done over a window of 3 x 3, and stride equals 2. So, here we’ll find that as our max pooling window is of size 3 x 3 but the stride is 2 that means max pooling will be done over an overlapped window. After this pooling, the size of the feature map is reduced to 27 x 27 and the number of feature channels remains 96.

卷積之后，我們有了一個重疊的最大池層，其中最大池在3 x 3的窗口上完成，步幅等于2。因此，在這里，我們會發現，由于我們的最大池窗口的大小為3 x 3，但跨度為2，這意味著將在重疊的窗口上進行最大池化。合并之后，要素地圖的大小將減小為27 x 27，要素通道的數量仍為96。

計算： (Calculations:)

Size of Input: N = 55 x 55
輸入大小：N = 55 x 55
Size of Convolution Kernels: f = 3 x 3
卷積核的大小：f = 3 x 3
Strides: S = 2
步幅：S = 2
Padding: P = 0
填充：P = 0
Size of each feature map = (55–3+0)/2+1 = 27
每個特征圖的大小= (55–3 + 0)/ 2 + 1 = 27

So every feature map after this pooling is of the size 27 x 27.

因此，合并后的每個要素地圖的大小均為27 x 27。

Then we have Second Convolution Layer where kernel size is 5 x 5 and in this case, we use the padding of 2 so that the output of the convolution layer remains the same as the input feature size. Thus, the size of feature maps generated by this second convolution layer is 27 x 27 and the number of kernels used in this case is 256 so that means from this convolution layer output we’ll get 256 different channels or feature maps and every feature map will be of size 27 x 27.

然后我們有了第二卷積層 ，其內核大小為5 x 5，在這種情況下，我們使用填充2，以便卷積層的輸出與輸入要素大小保持相同。因此，第二個卷積層生成的特征圖的大小為27 x 27，在這種情況下使用的內核數為256，這意味著從該卷積層輸出中，我們將獲得256個不同的通道或特征圖，每個特征圖尺寸為27 x 27。

計算： (Calculations:)

Size of Input: N = 27 x 27
輸入大小：N = 27 x 27
Size of Convolution Kernels: f = 5 x 5
卷積核的大小：f = 5 x 5
No. of Kernels: 256
仁數： 256
Strides: S = 1
步幅：S = 1
Padding: P = 2
填充：P = 2
Size of each feature map = (27–5+4)/1+1 = 27
每個特征圖的大小= (27-5 + 4)/ 1 + 1 = 27

So every feature map after the second convolution layer is of the size 27 x 27.

因此，第二個卷積層之后的每個特征圖的大小均為27 x 27。

Now again we have an Overlapping Max Pool Layer, where the max-pooling is again done over a window of 3 x 3, and stride equals 2 which means max-pooling is done over overlapping windows and output of this become 13 x 13 feature maps and number of channels we get is 256.

現在我們又有了一個“ 重疊最大池層”，其中最大池再次在3 x 3的窗口上完成，步幅等于2，這意味著最大池在重疊的窗口上進行，其輸出變為13 x 13特征圖我們獲得的頻道數是256。

計算： (Calculations:)

Size of Input: N = 27 x 27
輸入大小：N = 27 x 27
Size of Convolution Kernels: f = 3 x 3
卷積核的大小：f = 3 x 3
Strides: S = 2
步幅：S = 2
Padding: P = 0
填充：P = 0
Size of each feature map = (27–3+0)/2+1 = 13
每個特征圖的大小= (27–3 + 0)/ 2 + 1 = 13

So every feature map after this pooling is of the size 13 x13.

因此，此池化后的每個特征圖的大小均為13 x13。

Then we have Three Consecutive Convolution Layers of which the first convolution layer is having the kernel size of 3 x 3 with padding equal to 1 and 384 kernels give you 384 feature maps of size 13 x 13 which passes through the next convolution layer.

然后我們有三個連續的卷積層 ，其中第一個卷積層的內核大小為3 x 3，邊距等于1，384個內核給您384個大小為13 x 13的特征貼圖，這些特征貼圖將通過下一個卷積層。

計算： (Calculations:)

Size of Input: N = 13 x 13
輸入大小：N = 13 x 13
Size of Convolution Kernels: f = 3 x 3
卷積核的大小：f = 3 x 3
No. of Kernels: 384
仁數： 384
Strides: S = 1
步幅：S = 1
Padding: P = 1
填充：P = 1
Size of each feature map = (13–3+2)/1+1 = 13
每個特征圖的大小= (13–3 + 2)/ 1 + 1 = 13

In the second convolution, the kernel size is 13 x 13 with padding equal to 1 and it has 384 number of kernels that means the output of this convolution layer will have 384 channels or 384 feature maps and every feature map is of size 13 x 13. As we have given padding equals 1 for a 3 x 3 kernel size and that’s the reason the size of every feature map at the output of this convolution layer is remaining the same as the size of the feature maps which are inputted to this convolution layer.

在第二次卷積中，內核大小為13 x 13，填充等于1，并且具有384個內核數，這意味著該卷積層的輸出將具有384個通道或384個特征圖，每個特征圖的大小為13 x 13 。因為我們給定了3 x 3內核大小的padding等于1，所以這就是在此卷積層輸出處每個特征圖的大小與輸入到該卷積層中的特征圖的大小相同的原因。

計算： (Calculations:)

Size of Input: N = 13 x 13
輸入大小：N = 13 x 13
Size of Convolution Kernels: f = 3 x 3
卷積核的大小：f = 3 x 3
No. of Kernels: 384
仁數： 384
Strides: S = 1
步幅：S = 1
Padding: P = 1
填充：P = 1
Size of each feature map = (13–3+2)/1+1 = 13
每個特征圖的大小= (13–3 + 2)/ 1 + 1 = 13

The output of this second convolution is again passed through a convolution layer where kernel size is again 3 x 3 and padding equal to 1 which means the output of this convolution layer generates feature maps of the same size of 13 x 13. But in this case, AlexNet uses 256 kernels so that means at the input of this convolution we have 384 channels which now get converted to 256 channels or we can say 256 feature maps are generated at the end of this convolution and every feature map is of size 13 x 13.

第二個卷積的輸出再次通過一個卷積層，其中內核大小再次為3 x 3，填充等于1，這意味著該卷積層的輸出將生成相同大小的13 x 13的特征圖。但是在這種情況下，AlexNet使用256個內核，這意味著在該卷積的輸入處，我們有384個通道現在已轉換為256個通道，或者可以說在該卷積結束時生成了256個特征圖，每個特征圖的大小為13 x 13 。

計算： (Calculations:)

Size of Input: N = 13 x 13
輸入大小：N = 13 x 13
Size of Convolution Kernels: f = 3 x 3
卷積核的大小：f = 3 x 3
No. of Kernels: 256
仁數： 256
Strides: S = 1
步幅：S = 1
Padding: P = 1
填充：P = 1
Size of each feature map = (13–3+2)/1+1 = 13
每個特征圖的大小= (13–3 + 2)/ 1 + 1 = 13

Followed by the above is the next Overlapping Max Pool Layer, where the max-pooling is again done over a window of 3 x 3 and stride equal to 2 and that gives you the output feature maps and the number of channels remains same as 256 and the size of each feature map is 6 x 6.

緊隨其后的是下一個重疊的最大池層，其中最大池化再次在3 x 3的窗口內完成，步幅等于2，這將為您提供輸出要素圖，并且通道數保持與256和每個要素圖的大小為6 x 6。

計算： (Calculations:)

Size of Input: N = 13 x 13
輸入大小：N = 13 x 13
Size of Convolution Kernels: f = 3 x 3
卷積核的大小：f = 3 x 3
Strides: S = 2
步幅：S = 2
Padding: P = 0
填充：P = 0
Size of each feature map = (13–3+0)/2+1 = 6
每個特征圖的大小= (13–3 + 0)/ 2 + 1 = 6

Now we have a fully connected layer which is the same as a multi-layer perception. The first two fully-connected layers have 4096 nodes each. After the above mentioned last max-pooling, we have a total of 6*6*256 i.e. 9216 nodes or features and each of these nodes is connected to each of the nodes in this fully-connected layer. So the number of connections we’ll have in this case is 9216*4096. And then every node from this fully connected convolution layer provides input to every node in the second fully connected layer. So here we’ll have a total of 4096*4096 connections as in the second fully connected layer also we have 4096 nodes.

現在我們有了一個完全連接的層，它與多層感知相同。前兩個完全連接的層各有4096個節點。在上述最后一個最大池之后，我們總共有6 * 6 * 256，即9216個節點或特征，并且這些節點中的每一個都連接到此完全連接層中的每個節點。因此，本例中的連接數為9216 * 4096。然后，該完全連接的卷積層中的每個節點都會向第二個完全連接層中的每個節點提供輸入。因此，在這里，我們總共有4096 * 4096個連接，因為在第二個完全連接的層中，我們還有4096個節點。

And then, in the end, we have an output layer with 1000 softmax channels. Thus the number of connections between the second fully connected layer and the output layer is 4096*1000.

最后，我們有了一個包含1000個softmax通道的輸出層。因此，第二完全連接層和輸出層之間的連接數為4096 * 1000。

Training on multiple GPUs

在多個GPU上訓練

Original Image published in [AlexNet-2012]原始圖片發表在[AlexNet-2012]

As we can see from the figure that inter-AlexNet was implemented in two channels because 1.2 million training examples were too big to fit on one GPU. So half of the network is put in one channel and the other half of the network is put into another channel. And as they are into two different channels so that made it possible to train this network on two different GPU cards. The GPU used was GTX 580 3GB GPUs and the network took between five to six days to get trained. Here cross-GPU parallelization (i.e. One GPU communicating with other GPU) is happening at some places like kernels of layer 3 take input from all kernel maps in layer 2. However, kernels in layer 4 take input only from those kernel maps in layer 3 which stay on the same GPU.

從圖中可以看出，AlexNet的實現是通過兩個渠道實現的，因為120萬個訓練示例太大而無法放在一個GPU上。因此，網絡的一半放在一個通道中，網絡的另一半放在另一個通道中。由于它們進入兩個不同的通道，因此可以在兩個不同的GPU卡上訓練該網絡。使用的GPU是GTX 580 3GB GPU，并且網絡花費了五到六天的時間進行培訓。這里在某些地方發生了跨GPU并行化 (即一個GPU與其他GPU通信)，例如第3層的內核從第2層的所有內核映射獲取輸入。但是，第4層的內核僅從第3層的那些內核映射獲取輸入。保持在同一GPU上。

消失梯度問題 (Vanishing Gradient Problem)

If we use the non -linear activation function like sigmoidal or tan hyperbolic (tanh) function then it gives a risk of vanishing gradient that is in some cases when we are training the network with the gradient descent procedure vanishing gradient means that the gradient of the error function may become too small such that using that gradient when you try to update the network parameters, the update becomes almost negligible because the gradient itself is very small, that is what is vanishing gradient problem.

如果我們使用諸如S形或tan雙曲線(tanh)函數之類的非線性激活函數，則可能會出現梯度消失的風險，在某些情況下，當我們使用梯度下降過程訓練網絡時，梯度消失會意味著錯誤函數可能變得太小，以致于當您嘗試更新網絡參數時使用該梯度時，由于梯度本身很小，即梯度問題消失了，因此更新幾乎可以忽略不計。

Why does the Vanishing Gradient Problem arise??

為什么會出現消失梯度問題？

As we know with the graph of sigmoid function that if the value of an input is very high then it saturates to 1 and if the value of an input is too low then it saturates to 0. So when we try to take the gradient at these points the gradient is almost 0. The same thing is true if our non-linear activation function is tanh.

正如我們通過S型函數圖所知道的，如果輸入的值非常高，則其飽和度為1；如果輸入的值太低，則其飽和度為0。因此，當我們嘗試在這些位置采用梯度時表示梯度幾乎為0。如果我們的非線性激活函數為tanh，則情況相同。

How to prevent Vanishing Gradient Problem??

如何防止消失梯度問題？

To prevent the vanishing gradient problem we use the relu (Rectified Linear Unit) activation function. As relu function is max(x, 0) so for x>0 the gradient is always constant so this is the advantage when we use relu as the non-linear activation function. Also, the training time using gradient descent with saturating nonlinearities like tanh and sigmoid is much larger as compared to non-saturating nonlinearity like relu. This can be seen in the following diagram where a four-layered convolution network with ReLUs (solid line) as an activation function reached a 25% training error rate on CIFAR10 dataset six times faster than the same network when ran with tanh (dashed line) as an activation function. Thus, network with relu as an activation function learns almost six times faster than saturating activation functions.

為了避免梯度消失的問題，我們使用relu(整流線性單元)激活函數。由于relu函數是max(x，0)，因此對于x> 0來說梯度總是恒定的，因此這在我們將relu用作非線性激活函數時具有優勢。同樣，與非飽和非線性(如relu)相比，使用具有飽和非線性(如tanh和Sigmoid)的梯度下降的訓練時間要大得多。在下圖中可以看到這一點，其中以ReLU(實線)作為激活函數的四層卷積網絡在CIFAR10數據集上達到25％的訓練錯誤率，是使用tanh運行時(同上)的同一個網絡快六倍。作為激活功能。因此，以relu作為激活功能的網絡的學習速度幾乎比飽和激活功能快六倍。

Original Image published in [AlexNet-2012]原始圖片發表在[AlexNet-2012]

Problem using relu as an activation function

使用relu作為激活功能的問題

Unlike the sigmoidal and tanh activation function where the activation output is limited and bounded but in case of relu, the output is unbounded. As x increases the non-linear output of the activation function relu also increases. So to avoid this problem AlexNet tries to normalize the output of the convolution layer before applying the relu through a process known as Local Response Normalization (or LR Normalization).

與S型和tanh激活函數不同，后者的激活輸出受到限制和限制，但是在relu的情況下，輸出是不受限制的。隨著x增加，激活函數relu的非線性輸出也增加。因此，為避免此問題，AlexNet嘗試通過稱為Local 響應歸一化(或LR歸一化)的過程在應用relu之前歸一化卷積層的輸出。

本地響應規范化 (Local Response Normalization)

Local Response Normalization is a type of normalization in which excited neurons are amplified while dampering the surrounding neurons at the same time in a local neighborhood. This particular operation is encouraged from a phenomenon known as lateral inhibition in neurobiology which indicates the capacity of a neuron to reduce the activity of its neighbors. So, the output of the convolution is first normalized before applying the non-linear activation function to limit the output values by unbounded activation function relu. It is a non-trainable layer in the network.

局部響應歸一化是一種歸一化類型，其中受激神經元被放大，同時在局部鄰域中同時衰減周圍的神經元。神經生物學中的一種稱為側向抑制的現象鼓勵了這種特殊的操作，該現象表明神經元減少其鄰居活動的能力。因此，在應用非線性激活函數以通過無界激活函數relu限制輸出值之前，先對卷積的輸出進行歸一化。它是網絡中的不可訓練層。

So this is how unbounded problem and vanishing gradient problem are prevented.

因此，這是如何防止無邊界問題和消失的梯度問題的方法。

Local Response Normalization can be done across the channel and also it can be done within a channel. When we do this normalization across the channel then it is known as Inter-channel normalization and when the normalization occurs between the features of the same channel it is known as Intra-channel normalization. Inter-channel normalization was performed in the AlexNet network. Based on the neighborhood two types of LRN can be seen from the below figure:

本地響應規范化可以在整個通道上完成，也可以在通道內完成。當我們跨通道執行此歸一化時，則稱為通道間歸一化；而在同一通道的特征之間發生歸一化時，則稱為通道內歸一化 。通道間標準化是在AlexNet網絡中執行的。從下圖可以看出，基于鄰居的兩種類型的LRN：

Image from the blog by Aqeel Anwar圖片來自Aqeel Anwar的博客

Inter-channel normalization: Here normalization is performed across the channels and hence here neighborhood is the depth of the channel. The normalization output at the position (x,y) is given by the following formula:
通道間歸一化：此處歸一化是在通道之間執行的，因此此處鄰域是通道的深度。位置(x，y)的歸一化輸出由以下公式給出：

Original Image published in [AlexNet-2012]原始圖片發表在[AlexNet-2012]

Here, b[i, (x,y)] is the output at the location (x,y) in the ith channel and a[i, (x,y)] is the original value at location (x,y) in the ith channel. So in this, we normalize a[i, (x,y)] by a factor, and that factor is given by k plus alpha times sum of a[j, (x,y)] where this j varies between the neighboring channels. Thus j varies between a maximum of (0 and i-n/2) and a minimum of (N-1 and i+n/2) that means n/2 number of channels before and n/2 number of channels after i. 0 and N-1 are put to take care of the first and last channels. After this normalization, the output will be bounded and it can be shown with the subsequent figure.

這里，B [I，(X，Y)]是在該位置的輸出(X，Y)中的第i個信道和[I，(X，Y)]是在位置為初始值(X，Y)中第一個通道。因此，在此，我們將a [i，(x，y)]歸一化，該因子由k加a [j，(x，y)]的 alpha乘和得出，其中j在相鄰通道之間變化。因此， j在最大值 ( 0和in / 2)和最小值 ( N-1和i + n / 2)之間變化 ，這意味著i之前為 n / 2個通道， i 之后為n / 2個通道 。 0和N-1用于處理第一個和最后一個通道。進行此歸一化后，輸出將受到限制，并可以在下圖中顯示。

Image from the blog by Aqeel Anwar圖片來自Aqeel Anwar的博客

Intra-channel normalization: Here normalization occurs between the neighboring neurons across the surface of the same channel.
通道內歸一化：此處歸一化發生在同一通道表面上的相鄰神經元之間。

Image from the blog by Aqeel Anwar 圖片來自Aqeel Anwa r 的博客

Here, b[k, (x,y)] is the output at the location (x,y) in the kth channel and a[k, (x,y)] is the original value at location (x,y) in the kth channel. So in this, we normalize a[k, (x,y)] by a factor, and that factor is given by k plus alpha times summation of the sum of squares of feature values within the neighborhood of x and y. Min and Max are put to take care of the features which are at the boundary of the feature maps. After this normalization, the output will be bounded and it can be shown with the subsequent figure.

此處， b [k，(x，y)]是第k個通道中位置(x，y)的輸出，而a [k，(x，y)]是第k個通道中位置(x，y)的原始值第k個頻道。因此，在此，我們用一個因子對a [k，(x，y)]進行歸一化，并且該因子由k加alpha乘以x和y附近的特征值平方和之和得出。最小和最大用于處理位于特征圖邊界的特征。進行此歸一化后，輸出將受到限制，并可以在下圖中顯示。

Image from the blog by Aqeel Anwar圖片來自Aqeel Anwar的博客

NOTE: ‘ k ‘ is used in the factor of both the normalization to avoid division by zero situation and here ‘k’ and ‘alpha’ are hyperparameters.

注意： “ k”用于規范化的因素，以避免被零除的情況，此處的“ k”和“ alpha”是超參數 。

過度擬合的問題 (Problem of Overfitting)

As there are 60 million parameters to be trained, so this would lead to the problem of overfitting which means the network would be able to learn or memorize the training data very properly but in case of testing input, it won’t be able to code the properties or features of the input data. So in such a case, the performance of the model in case of training may not be acceptable. So to reduce this overfitting issue additional augmented data was generated from the existing data and the augmentation(i.e. generation of new images from the original images by making variations like horizontal flipping, vertical flipping, zooming, etc. in the original images ) was done by mirroring and bt taking random crops from the input data. Another method by which the problem of overfitting was taken care of is by using dropout regularization.

由于有6000萬個要訓練的參數，因此這將導致過擬合的問題，這意味著網絡將能夠非常正確地學習或記憶訓練數據，但是在測試輸入的情況下，它將無法進行編碼輸入數據的屬性或特征。因此，在這種情況下，在訓練情況下模型的性能可能無法接受。因此，為了減少這種過擬合的問題，可以從現有數據中生成額外的增強數據，并通過以下方式進行增強(即通過對原始圖像進行水平翻轉，垂直翻轉，縮放等變化來從原始圖像生成新圖像) 鏡像和bt從輸入數據中獲取隨機作物 。解決過擬合問題的另一種方法是使用輟學正則化 。

什么是輟學正規化？ (What is Dropout Regularization??)

Image from Packt Subscription圖片來自Packt訂閱

In this randomly selected neuron or randomly selected nodes, which are selected with a probability of 0.5 are dropped from the network temporarily. So the probability of the nodes that are removed and the probability of the nodes that are retained both will be equal to 0.5. So the dropout means that the node which has been dropped out will not pass its output to the subsequent nodes in the subsequent layers downstream and for the same nodes during backward propagation, no updation will take place as removing the nodes will also remove its subsequent connections as well. But this would increase the number of iterations required for training the model but this would make the model less vulnerable to overfitting thus, generalizing the model.

在該隨機選擇的神經元或隨機選擇的節點中，以0.5的概率選擇的節點從網絡中暫時刪除。因此，被刪除的節點的概率和被保留的節點的概率都等于0.5。因此，丟棄意味著已丟棄的節點不會將其輸出傳遞到下游的后續層中的后續節點，并且對于反向傳播期間的同一節點，不會進行更新，因為刪除節點也會刪除其后續連接也一樣但這會增加訓練模型所需的迭代次數，但這會使模型不易過擬合，從而使模型泛化。

有關Alexnet架構的事實和數據 (Facts and figures regarding Alexnet architecture)

The model was trained with 1.2 million images.

該模型接受了120萬張圖像的訓練。

It has around 60 million parameters and 6,50,000 neurons.

它具有約6000萬個參數和6,50,000個神經元。

Even after using 2 GPU, the network took approximately a week to train this network.

即使使用了2個GPU，該網絡也花費了大約一周的時間來訓練該網絡。

As AlexNet takes 3 channels i.e. red, green, and blue into input so even if we input gray level image which is having just one channel so grayscale images need to be replicated into 3 channels R, G, and B so that it can be accepted by AlexNet.

由于AlexNet將3個通道(即紅色，綠色和藍色)輸入到輸入中，因此即使我們輸入的灰度圖像只有一個通道，因此也需要將灰度圖像復制到R，G和B的3個通道中由AlexNet。

More number of computations are carried out in the earlier stage of the network and a large number of parameters are generated at the end of the network by fully connected layers(4096*4096).

在網絡的早期階段執行了更多的計算，并且在網絡的末端通過完全連接的層(4096 * 4096)生成了大量參數。

For training, it uses Stochastic Gradient Descent with Momentum of 0.9, with a batch size of 128 examples, and weight decay of 0.0005

為了進行訓練，它使用動量為0.9的隨機梯度下降，每批128個示例，重量衰減為0.0005

The three convolution layers(1, 2, and 5) are followed by the max-pooling layer.

三個卷積層(1、2和5)后面是最大合并層。

The second, fourth and fifth convolution layers kernels are connected to only those kernels of the previous layer which are in the same GPU and the third convolution, and the fully connected layers are connected to all the neurons from the previous layers from both of the GPUs.

第二，第四和第五卷積層內核僅連接到同一GPU和第三卷積中的上一層內核，并且完全連接的層連接到兩個GPU中上一層的所有神經元。

The weight(w) updation rule was as follows:

權重( w )更新規則如下：

Original Image published in [AlexNet-2012]原始圖片發表在[AlexNet-2012]

7. The initial weights to all the layers were assigned under Gaussian distribution with mean as 0 as standard deviation as 0.01. Also, the initial bias assigned was 1 to second, fourth, fifth convolution layers, and also to all the fully connected layers, but for all other layers, bias was assigned with 0.

7.在高斯分布下分配所有層的初始權重，平均值為0，標準差為0.01。同樣，分配給第二，第四，第五卷積層的初始偏置為1，也分配給所有完全連接的層，但是對于所有其他層，偏置分配為0。

使用 Keras的lexNet代碼 (AlexNet code using Keras)

Here we are going to use the oxflower17 dataset prepared by Oxford and it’s present in the tflearn library.

在這里，我們將使用牛津大學編寫的oxflower17數據集，該數據集存在于tflearn庫中。

Also, one important thing to note here is that the image size present in tflearn library has the size of (224 x 224), so we’ll be using 224 x 224 image instaed of 227 x 227 which was used in the original architecture.

另外，這里要注意的一件事是tflearn庫中存在的圖像大小為(224 x 224)，因此我們將使用224 x 224的insta圖片，即227 x 227，這是在原始體系結構中使用的。

導入所需的庫 (Importing the required libraries)

加載數據集 (Loading the dataset)

檢查X和Y的形狀 (Checking the shape of X and Y)

建立模型 (Creating the model)

模型總結 (Summary of the model)

There are approx 46 million trainable parameters here as can be seen from the subsequent image.
從下圖可以看出，這里大約有4600萬個可訓練參數。

Image by Author圖片作者

編譯模型 (Compiling the model)

訓練模型 (Training the model)

📌 To get the complete code of AlexNet or any other network visit my GitHub repository.

get 要獲取AlexNet或任何其他網絡的完整代碼，請訪問我的 GitHub存儲庫 。

[1]Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton — ImageNet Classification with Deep Convolutional Neural Networks(2012)

[1] Alex Krizhevsky，Ilya Sutskever，Geoffrey E. Hinton — 深度卷積神經網絡的ImageNet分類 (2012年)

Thanks for reading. Hope this blog would have helped you with both the coding and understanding of the architecture. 😃

謝謝閱讀。希望該博客對您的架構編碼和理解有所幫助。 😃

翻譯自: https://medium.com/analytics-vidhya/the-architecture-implementation-of-alexnet-135810a3370

alexnet 結構

總結

以上是生活随笔為你收集整理的alexnet 结构_AlexNet的体系结构和实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：有什么隐藏应用的软件
下一篇： DeepR —训练TensorFlow模

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

alexnet 结构_AlexNet的体系结构和实现

計算： (Calculations:)

計算： (Calculations:)

計算： (Calculations:)

計算： (Calculations:)

計算： (Calculations:)

計算： (Calculations:)

計算： (Calculations:)

計算： (Calculations:)

消失梯度問題 (Vanishing Gradient Problem)

本地響應規范化 (Local Response Normalization)

過度擬合的問題 (Problem of Overfitting)

什么是輟學正規化？ (What is Dropout Regularization??)

有關Alexnet架構的事實和數據 (Facts and figures regarding Alexnet architecture)

使用 Keras的lexNet代碼 (AlexNet code using Keras)

導入所需的庫 (Importing the required libraries)

加載數據集 (Loading the dataset)

檢查X和Y的形狀 (Checking the shape of X and Y)

建立模型 (Creating the model)

模型總結 (Summary of the model)

編譯模型 (Compiling the model)

訓練模型 (Training the model)

總結