DL之BN-Inception:BN-Inception算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
DL之BN-Inception:BN-Inception算法的簡介(論文介紹)、架構(gòu)詳解、案例應(yīng)用等配圖集合之詳細(xì)攻略
?
?
?
目錄
BN-Inception算法的簡介(論文介紹)
BN-Inception算法的架構(gòu)詳解
1、BN-Inception網(wǎng)絡(luò)—核心組件
5、實(shí)驗(yàn)結(jié)果比對
BN-Inception算法的案例應(yīng)用
?
?
?
?
?
相關(guān)文章
DL之InceptionV2/V3:InceptionV2 & InceptionV3算法的簡介(論文介紹)、架構(gòu)詳解、案例應(yīng)用等配圖集合之詳細(xì)攻略
DL之BN-Inception:BN-Inception算法的簡介(論文介紹)、架構(gòu)詳解、案例應(yīng)用等配圖集合之詳細(xì)攻略
DL之BN-Inception:BN-Inception算法的架構(gòu)詳解
DL之InceptionV4/ResNet:InceptionV4/Inception-ResNet算法的簡介(論文介紹)、架構(gòu)詳解、案例應(yīng)用等配圖集合之詳細(xì)攻略
BN-Inception算法的簡介(論文介紹)
? ? ? ? ? ? ? ? ? ?BN-Inception是Google研究人員在Inception的基礎(chǔ)上,所作出的改進(jìn)版本。
Abstract
? ? ? ?Training Deep Neural Networks is complicated by the fact ?that the distribution of each layer’s inputs changes during ?training, as the parameters of the previous layers change. ?This slows down the training by requiring lower learning ?rates and careful parameter initialization, and makes it no ?- ?toriously hard to train models with saturating nonlinearities. ?We refer to this phenomenon as internal covariate ?shift, and address the problem by normalizing layer inputs. ?Our method draws its strength from making normalization ?a part of the model architecture and performing the ?normalization for each training mini-batch. Batch Normalization ?allows us to use much higher learning rates and ?be less careful about initialization. It also acts as a regularizer, ?in some cases eliminating the need for Dropout. ?Applied to a state-of-the-art image classification model, ?Batch Normalization achieves the same accuracy with 14 ?times fewer training steps, and beats the original model ?by a significant margin. Using an ensemble of batchnormalized ?networks, we improve upon the best published ?result on ImageNet classification: reaching 4.9% top-5 ?validation error (and 4.8% test error), exceeding the accuracy ?of human raters.
摘要
? ? ? ?由于訓(xùn)練過程中各層輸入的分布隨前一層參數(shù)的變化而變化,使得訓(xùn)練深度神經(jīng)網(wǎng)絡(luò)變得復(fù)雜。這通過要求較低的學(xué)習(xí)率和謹(jǐn)慎的參數(shù)初始化來降低訓(xùn)練速度,并使用飽和非線性訓(xùn)練模型變得不那么困難。我們將這種現(xiàn)象稱為內(nèi)部協(xié)變量移位,并通過規(guī)范化層輸入來解決這個(gè)問題。我們的方法將規(guī)范化作為模型體系結(jié)構(gòu)的一部分,并對每個(gè)訓(xùn)練小批執(zhí)行規(guī)范化,從而獲得了它的優(yōu)勢。批處理規(guī)范化允許我們使用更高的學(xué)習(xí)率,并且在初始化方面不那么小心。它還作為一個(gè)正則化器,在某些情況下消除了Dropout的需要。應(yīng)用于最先進(jìn)的圖像分類模型,批處理歸一化以14倍的訓(xùn)練步驟達(dá)到了同樣的精度,并大大超過了原始模型。利用批量歸一化網(wǎng)絡(luò)的集合,我們改進(jìn)了在ImageNet分類上發(fā)布的最佳結(jié)果:達(dá)到4.9%的前5個(gè)驗(yàn)證錯(cuò)誤(和4.8%的測試錯(cuò)誤),超過了人類評分器的精度。
Conclusion
? ? ? ?We have presented a novel mechanism for dramatically ?accelerating the training of deep networks. It is based on ?the premise that covariate shift, which is known to complicate ?the training of machine learning systems, also applies to sub-networks and layers, and removing it from internal activations of the network may aid in training. Our proposed method draws its power from normalizing activations, and from incorporating this normalization in the network architecture itself. This ensures that the normalization is appropriately handled by any optimization method that is being used to train the network. To enable stochastic optimization methods commonly used in deep network training, we perform the normalization for each mini-batch, and backpropagate the gradients through the normalization parameters. Batch Normalization adds only two extra parameters per activation, and in doing so preserves the representation ability of the network. We presented an algorithm for constructing, training, and performing inference with batch-normalized networks. The resulting networks can be trained with saturating nonlinearities, are more tolerant to increased training rates, and often do not require Dropout for regularization.
? ? ? ?我們提出了一種新的機(jī)制,可以顯著加快深度網(wǎng)絡(luò)的訓(xùn)練。它的前提是協(xié)變量移位(covariate shift)也適用于子網(wǎng)絡(luò)和層,從網(wǎng)絡(luò)的內(nèi)部激活中去除協(xié)變量移位可能有助于訓(xùn)練。協(xié)變量移位已知會使機(jī)器學(xué)習(xí)系統(tǒng)的訓(xùn)練復(fù)雜化。我們提出的方法從規(guī)范化激活和將這種規(guī)范化合并到網(wǎng)絡(luò)體系結(jié)構(gòu)本身中獲得強(qiáng)大的功能。這可以確保任何用于訓(xùn)練網(wǎng)絡(luò)的優(yōu)化方法都能恰當(dāng)?shù)靥幚硪?guī)范化。為了實(shí)現(xiàn)深度網(wǎng)絡(luò)訓(xùn)練中常用的隨機(jī)優(yōu)化方法,我們對每個(gè)小批進(jìn)行歸一化,并通過歸一化參數(shù)對梯度進(jìn)行反向傳播。批處理規(guī)范化在每次激活時(shí)只添加兩個(gè)額外的參數(shù),這樣做保留了網(wǎng)絡(luò)的表示能力。提出了一種利用批處理規(guī)范化網(wǎng)絡(luò)構(gòu)造、訓(xùn)練和執(zhí)行推理的算法。得到的網(wǎng)絡(luò)可以用飽和非線性進(jìn)行訓(xùn)練,對增加的訓(xùn)練率更有容忍度,而且通常不需要退出正則化。
? ? ? ?Merely adding Batch Normalization to a state-of-theart ?image classification model yields a substantial speedup ?in training. By further increasing the learning rates, removing ?Dropout, and applying other modifications afforded ?by Batch Normalization, we reach the previous ?state of the art with only a small fraction of training steps ?– and then beat the state of the art in single-network image ?classification. Furthermore, by combining multiple models ?trained with Batch Normalization, we perform better ?than the best known system on ImageNet, by a significant ?margin.
? ? ? ?僅僅在一個(gè)最先進(jìn)的圖像分類模型中添加批處理歸一化,就可以大大加快訓(xùn)練速度。通過進(jìn)一步提高學(xué)習(xí)速度,移除Dropout,并應(yīng)用批處理歸一化提供的其他修改,我們只需要一小部分訓(xùn)練步驟就可以達(dá)到以前的水平——然后在單網(wǎng)絡(luò)圖像分類中擊敗目前的水平。此外,通過將多個(gè)經(jīng)過訓(xùn)練的模型與批處理規(guī)范化相結(jié)合,我們在ImageNet上的性能比最著名的系統(tǒng)要好得多。
? ? ? ?Interestingly, our method bears similarity to the standardization ?layer of (G¨ulc?ehre & Bengio, 2013), though ?the two methods stem from very different goals, and perform ?different tasks. The goal of Batch Normalization ?is to achieve a stable distribution of activation values ?throughout training, and in our experiments we apply it ?before the nonlinearity since that is where matching the ?first and second moments is more likely to result in a ?stable distribution. On the contrary, (G¨ulc?ehre & Bengio, ?2013) apply the standardization layer to the output of the ?nonlinearity, which results in sparser activations. In our ?large-scale image classification experiments, we have not ?observed the nonlinearity inputs to be sparse, neither with ?nor without Batch Normalization. Other notable differentiating characteristics of Batch Normalization include ?the learned scale and shift that allow the BN transform ?to represent identity (the standardization layer did not require ?this since it was followed by the learned linear transform ?that, conceptually, absorbs the necessary scale and ?shift), handling of convolutional layers, deterministic inference ?that does not depend on the mini-batch, and batchnormalizing ?each convolutional layer in the network.
? ? ? ?有趣的是,我們的方法與(G¨ulc ehre & Bengio, 2013)的標(biāo)準(zhǔn)化層有相似之處,盡管這兩種方法的目標(biāo)非常不同,執(zhí)行的任務(wù)也不同。批量歸一化的目標(biāo)是在整個(gè)訓(xùn)練過程中實(shí)現(xiàn)激活值的穩(wěn)定分布,在我們的實(shí)驗(yàn)中,我們將其應(yīng)用于非線性之前,因?yàn)樵诜蔷€性之前,匹配第一和第二矩更有可能得到穩(wěn)定的分布。相反,(G¨ulc ehre & Bengio, 2013)將標(biāo)準(zhǔn)化層應(yīng)用于非線性的輸出,導(dǎo)致更稀疏的激活。在我們的大規(guī)模圖像分類實(shí)驗(yàn)中,我們沒有觀察到非線性輸入是稀疏的,既沒有批次歸一化也沒有沒有。批正常化的其他顯著的差異化特征包括規(guī)模和學(xué)習(xí)轉(zhuǎn)變,使BN變換代表身份(標(biāo)準(zhǔn)化層不需要這個(gè),因?yàn)殡S之而來的線性變換,從概念上講,吸收必要的規(guī)模和轉(zhuǎn)移),卷積處理層,確定性推理,并不取決于mini-batch,和每個(gè)卷積batchnormalizing層網(wǎng)絡(luò)中。
? ? ? ?In this work, we have not explored the full range of ?possibilities that Batch Normalization potentially enables. ?Our future work includes applications of our method to ?Recurrent Neural Networks (Pascanu et al., 2013), where ?the internal covariate shift and the vanishing or exploding ?gradients may be especially severe, and which would allow ?us to more thoroughly test the hypothesis that normalization ?improves gradient propagation (Sec. 3.3). We plan ?to investigate whether Batch Normalization can help with ?domain adaptation, in its traditional sense – i.e. whether ?the normalization performed by the network would allow ?it to more easily generalize to new data distributions, ?perhaps with just a recomputation of the population ?means and variances (Alg. 2). Finally, we believe that further ?theoretical analysis of the algorithm would allow still ?more improvements and applications.
? ? ? ?在這項(xiàng)工作中,我們還沒有探索批處理規(guī)范化可能實(shí)現(xiàn)的所有可能性。我們未來的工作包括將我們的方法應(yīng)用于遞歸神經(jīng)網(wǎng)絡(luò)(Pascanu et al., 2013),其中內(nèi)部協(xié)變量移位和消失或爆炸梯度可能特別嚴(yán)重,這將使我們能夠更徹底地檢驗(yàn)正常化改善梯度傳播的假設(shè)(第3.3節(jié))。我們計(jì)劃調(diào)查是否批標(biāo)準(zhǔn)化有助于域適應(yīng),在傳統(tǒng)意義上,即標(biāo)準(zhǔn)化執(zhí)行的網(wǎng)絡(luò)是否會使它更容易推廣到新的數(shù)據(jù)分布,也許只需重新計(jì)算總體均值和方差(alg.2)。最后,我們相信的進(jìn)一步理論分析算法將允許更多的改進(jìn)和應(yīng)用。
?
論文
Sergey Ioffe, Christian Szegedy.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,
https://arxiv.org/abs/1502.03167
?
?
BN-Inception算法的架構(gòu)詳解
DL之BN-Inception:BN-Inception算法的架構(gòu)詳解
?
0、BN算法是如何加快訓(xùn)練和收斂速度的呢?
Batch Normalization有兩個(gè)功能,一個(gè)是可以加快訓(xùn)練和收斂速度,另外一個(gè)是可以防止過擬合。
? ? ? ? ?BN算法在實(shí)際使用的時(shí)候,會把特征給強(qiáng)制性的歸到均值為0,方差為1的數(shù)學(xué)模型下。深度網(wǎng)絡(luò)在訓(xùn)練的過程中,如果每層的數(shù)據(jù)分布都不一樣的話,將會導(dǎo)致網(wǎng)絡(luò)非常難收斂和訓(xùn)練,而如果能把每層的數(shù)據(jù)轉(zhuǎn)換到均值為0,方差為1的狀態(tài)下,一方面,數(shù)據(jù)的分布是相同的,訓(xùn)練會比較容易收斂;另一方面,均值為0,方差為1的狀態(tài)下,在梯度計(jì)算時(shí)會產(chǎn)生比較大的梯度值,可以加快參數(shù)的訓(xùn)練,更直觀的來說,是把數(shù)據(jù)從飽和區(qū)直接拉到非飽和區(qū)。更進(jìn)一步,這也可以很好的控制梯度爆炸和梯度消失現(xiàn)象,因?yàn)檫@兩種現(xiàn)象都和梯度有關(guān)。
?
1、BN-Inception網(wǎng)絡(luò)—核心組件
- Batch Normalization(批歸一化)。意義,目前BN已經(jīng)成為幾乎所有卷積神經(jīng)網(wǎng)絡(luò)的標(biāo)配技巧。
- 5x5卷積核→2個(gè)3x3卷積核。相同的感受野
?
5、實(shí)驗(yàn)結(jié)果比對
? ? ? ? ?在提供的包含50000個(gè)圖像的驗(yàn)證集上,與以前的最新技術(shù)進(jìn)行批量標(biāo)準(zhǔn)化初始比較。*根據(jù)測試服務(wù)器的報(bào)告,在ImageNet測試集的100000張圖像上,BN初始集成已達(dá)到4.82% top-5。
? ? ? ? ?其中BN-Inception Ensemble,則采用多個(gè)網(wǎng)絡(luò)模型集成學(xué)習(xí)后得到的結(jié)果。
?
BN-Inception算法的案例應(yīng)用
?
TF之DD:實(shí)現(xiàn)輸出Inception模型內(nèi)的某個(gè)卷積層或者所有卷積層的形狀
TF之DD:利用Inception模型+GD算法生成原始的Deep Dream圖片
TF之DD:利用Inception模型+GD算法生成更大尺寸的Deep Dream精美圖片
TF之DD:利用Inception模型+GD算法生成更高質(zhì)量的Deep Dream高質(zhì)量圖片
TF之DD:利用Inception模型+GD算法——五個(gè)架構(gòu)設(shè)計(jì)思路
TF之DD:利用Inception模型+GD算法生成帶背景的大尺寸、高質(zhì)量的Deep Dream圖片
?
?
?
?
?
總結(jié)
以上是生活随笔為你收集整理的DL之BN-Inception:BN-Inception算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Dataset之WebVision:We
- 下一篇: 成功解决xgboost.core.XGB