DL之SegNet:SegNet图像分割/语义分割算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
DL之SegNet:SegNet圖像分割/語義分割算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略
導讀
基于CNN的神經網絡SegNet算法可進行高精度地識別行駛環境。
目錄
SegNet圖像分割算法的簡介(論文介紹)
0、實驗結果
1、SegNet算法的關鍵思路
SegNet圖像分割算法的架構詳解
SegNet圖像分割算法的案例應用
相關文章
DL之SegNet:SegNet圖像分割算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略
DL之SegNet:SegNet圖像分割算法的架構詳解
SegNet圖像分割算法的簡介(論文介紹)
更新……
Abstract
? ? ? ?We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation ?termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed ?by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the ?VGG16 network [1]. The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature ?maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input ?feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to ?perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then ?convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] ?and also with the well known DeepLab-LargeFOV [3], DeconvNet [4] architectures. This comparison reveals the memory versus ?accuracy trade-off involved in achieving good segmentation performance. ?SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and ?computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing ?architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet ?and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments ?show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared ?to other architectures. We also provide a Caffe implementation of SegNet and a web demo at?
http://mi.eng.cam.ac.uk/projects/segnet/.
? ? ? ?本文提出了一種新穎實用的深度全卷積神經網絡結構——SegNet。該核心的可訓練分割引擎由編碼器網絡、相應的解碼器網絡和像素級分類層組成。編碼器網絡的結構在拓撲上與VGG16網絡[1]中的13個卷積層相同。解碼器網絡的作用是將編碼器的低分辨率特征映射為全輸入分辨率特征映射,進行像素級分類。SegNet lies的新穎之處在于解碼器向上采樣其低分辨率輸入特征圖的方式。具體地說,解碼器使用在相應編碼器的最大池化步驟中計算的池化索引來執行非線性上采樣。這消除了學習向上采樣的需要。上采樣后的圖像是稀疏的,然后與可訓練濾波器進行卷積,生成密集的特征圖。我們將我們提出的體系結構與廣泛采用的FCN[2]以及著名的DeepLab-LargeFOV[3]、DeconvNet[4]體系結構進行了比較。這個比較揭示了在獲得良好的分割性能時所涉及的內存和精度之間的權衡。SegNet主要是由場景理解應用程序驅動的。因此,它的設計在內存和推理過程中的計算時間方面都是高效的。它在可訓練參數的數量上也明顯小于其他競爭架構,并且可以使用隨機梯度下降進行端到端訓練。我們還在道路場景和SUN RGB-D室內場景分割任務上對SegNet等架構進行了受控基準測試。這些定量評估表明,與其他體系結構相比,SegNet具有良好的性能,推理時間有競爭力,并且在內存方面推理效率最高。我們還提供了一個Caffe實現SegNet和一個web demo at
http://mi.eng.cam.ac.uk/projects/segnet/。
CONCLUSION ?
? ? ? ?We presented SegNet, a deep convolutional network architecture ?for semantic segmentation. The main motivation behind SegNet ?was the need to design an efficient architecture for road and indoor ?scene understanding which is efficient both in terms of memory ?and computational time. We analysed SegNet and compared it ?with other important variants to reveal the practical trade-offs ?involved in designing architectures for segmentation, particularly ?training time, memory versus accuracy. Those architectures which store the encoder network feature maps in full perform best but ?consume more memory during inference time. SegNet on the ?other hand is more efficient since it only stores the max-pooling ?indices of the feature maps and uses them in its decoder network ?to achieve good performance. On large and well known datasets ?SegNet performs competitively, achieving high scores for road ?scene understanding. End-to-end learning of deep segmentation ?architectures is a harder challenge and we hope to see more ?attention paid to this important problem.
? ? ? ?本文提出了一種用于語義分割的深度卷積網絡結構SegNet。SegNet背后的主要動機是需要為道路和室內場景理解設計一個高效的架構,它在內存和計算時間方面都是高效的。我們分析了SegNet,并將其與其他重要的變體進行了比較,以揭示在設計用于分割的架構時所涉及的實際權衡,尤其是訓練時間、內存和準確性。那些完全存儲編碼器網絡特征映射的架構執行得最好,但在推理期間消耗更多內存。另一方面,SegNet更高效,因為它只存儲特征映射的最大池索引,并將其用于解碼器網絡中,以獲得良好的性能。在大型和知名的數據集上,SegNet表現得很有競爭力,在道路場景理解方面獲得了高分。深度分割體系結構的端到端學習是一個比較困難的挑戰,我們希望看到更多的人關注這個重要的問題。
論文
Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla.
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,
IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 39 , Issue: 12 , Dec. 1 2017 )
https://arxiv.org/abs/1511.00561
《SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation》
arXiv地址:https://arxiv.org/abs/1511.00561?context=cs
PDF地址:https://arxiv.org/pdf/1511.00561.pdf
Vijay Badrinarayanan, Kendall, and Roberto Cipolla(2015): SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. arXiv preprint arXiv:1511.00561 (2015).
0、實驗結果
1、定性比較——在CamVidday和dusk測試樣品上的實驗結果? ?
? ? ? ?Results on CamVidday and dusk test samples,幾個測試樣的圖像,包括白天和傍晚。對比的算法包括SegNet、FCN、FCN(learn deconv)、DeconvNet算法,只有SegNet算法給出了比較好的分割效果。
2、定量比較——在CamVid11道路類分割問題上,將SegNet與傳統方法進行定量比較
Quantitative comparisons of SegNet with traditional methods on the CamVid11 road class segmentation problem
SegNet outperforms all the other methods, including those using depth, video and/or CRF’s on the majority of classes.
SegNet的單獨IU得分都比較高,最后的mean IU可達到60.1%。都優于所有其他方法,包括那些在大多數類上使用深度、視頻和/或CRF的方法。
1、SegNet算法的關鍵思路
1、An illustration of the SegNet architecture. There are no fully connected layers and hence it is only convolutional. A decoder upsamples its input using the transferred pool indices from its encoder to produce a sparse feature map(s). It then performs convolution with a trainable filter bank to densify the feature map. The final decoder output feature maps are fed to a soft-max classifier for pixel-wise classification.
2、An illustration of SegNet and FCN [2] decoders. a, b, c, d correspond to values in a feature map. SegNet uses the max pooling indices to upsample (without learning) the feature map(s) and convolves with a trainable decoder filter bank. FCN upsamples by learning to deconvolve the input feature map and adds the corresponding encoder feature map to produce the decoder output. This feature map is the output of the max-pooling layer (includes sub-sampling) in the corresponding encoder. Note that there are no trainable decoder filters in FCN.
SegNet圖像分割算法的架構詳解
更新……
SegNet圖像分割算法的案例應用
更新……
總結
以上是生活随笔為你收集整理的DL之SegNet:SegNet图像分割/语义分割算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: CV之ICG:计算机视觉之图像标题生成(
- 下一篇: DL之SoftmaxWithLoss:S