u-net语义分割_使用U-Net的语义分割
u-net語(yǔ)義分割
Picture By Martei Macru On Unsplash圖片由Martei Macru On Unsplash拍攝Semantic segmentation is a computer vision problem where we try to assign a class to each pixel . Unlike the classic image classification task where only one class value is predicted(assuming single label classification), in this problem we look for class value for each pixel. The application of image segmentation is predominantly seen in the medical field. However now this is being applied in other domains also e.g self driving car.
語(yǔ)義分割是一個(gè)計(jì)算機(jī)視覺(jué)問(wèn)題,我們嘗試為每個(gè)像素分配一個(gè)類(lèi)。 與經(jīng)典圖像分類(lèi)任務(wù)不同,在傳統(tǒng)圖像分類(lèi)任務(wù)中,僅預(yù)測(cè)一個(gè)類(lèi)別值( 假設(shè)使用單個(gè)標(biāo)簽分類(lèi) ),在此問(wèn)題中,我們?yōu)槊總€(gè)像素尋找類(lèi)別值。 圖像分割的應(yīng)用主要在醫(yī)學(xué)領(lǐng)域。 但是現(xiàn)在,這也被應(yīng)用于其他領(lǐng)域,例如自動(dòng)駕駛汽車(chē)。
In case of image classification we are particularly interested to know what is there in the image. Semantic segmentation comes with two wh questions which is what and where.
在圖像分類(lèi)的情況下,我們特別想知道圖像中有什么。 語(yǔ)義分割帶有兩個(gè)wh問(wèn)題,即什么 地方 。
什么是U-net: (What Is U-net:)
U-Net is the most popular model for semantic segmentation task. Though we have other models to accomplish this task U-Net is widely accepted as the de-facto standard for this task. A typical U-Net architecture has two parts: Encoder and Decoder.
U-Net是最受歡迎的語(yǔ)義分割任務(wù)模型。 盡管我們還有其他模型可以完成此任務(wù),但U-Net已被廣泛接受為該任務(wù)的實(shí)際標(biāo)準(zhǔn)。 典型的U-Net架構(gòu)包含兩個(gè)部分:編碼器和解碼器。
Structure Of U-NetU-Net的結(jié)構(gòu)編碼器: (Encoder:)
The job of the encoder is same as any convolutional neural network,which is basically to determine the first wh question what. However when we downsample the image like a typical convnet we tend to lose the information regarding the localization of the segmented objects. The feature maps of the cnn would have learned what is there in the image without any idea of where it is. In the original implementation of the U-Net a 128*18*1 image is taken where the encoder is able to output a 8*8*256 shape.
編碼器的工作與任何卷積神經(jīng)網(wǎng)絡(luò)相同,基本上是確定第一個(gè)問(wèn)題是什么 。 但是,當(dāng)像典型的卷積網(wǎng)絡(luò)那樣對(duì)圖像進(jìn)行降采樣時(shí),我們往往會(huì)丟失有關(guān)分割對(duì)象定位的信息。 cnn的特征圖將了解圖像中的內(nèi)容,而無(wú)需知道其位置。 在U-Net的原始實(shí)現(xiàn)中,會(huì)拍攝128 * 18 * 1的圖像,其中編碼器能夠輸出8 * 8 * 256的形狀。
解碼器: (Decoder:)
Decoder tries to recover the lost information during the encoder’s operation on the image. To do so it applies a skip connection which provides the spatial information that was lost during the downsampling of the image. Also the decoder uses transposed convolution which converts the a small image to a larger one. In the decoder size of the image increases from 8*8*256 to 128*128 *1.
解碼器嘗試在圖像上對(duì)編碼器進(jìn)行操作期間恢復(fù)丟失的信息。 為此,它應(yīng)用了一個(gè)跳過(guò)連接,該連接提供了在圖像降采樣期間丟失的空間信息。 解碼器也使用轉(zhuǎn)置卷積,將小圖像轉(zhuǎn)換為大圖像。 在解碼器中,圖像的大小從8 * 8 * 256增加到128 * 128 * 1。
U-Net的變化: (Variations In The U-Net:)
We can find variety of implementation of the U-Net architecture. Instead of transposed convolution we can also apply the bilinear sampling method. Similarly if we can replace the encoder convolutional neural network by any popular network like ResNet or VGG-Net. We may or may not choose to use the pretrained weight.
我們可以找到U-Net架構(gòu)的各種實(shí)現(xiàn)。 除了轉(zhuǎn)置卷積,我們還可以應(yīng)用雙線(xiàn)性采樣方法。 同樣,如果我們可以用任何流行的網(wǎng)絡(luò)(例如ResNet或VGG-Net)代替編碼器卷積神經(jīng)網(wǎng)絡(luò)。 我們可能會(huì)或可能不會(huì)選擇使用預(yù)先訓(xùn)練的體重。
This was a theoretical overview of the U-Net model using semantic segmentation. In the next blog we can use this model to do salt identification and do the practical implementation of it.
這是使用語(yǔ)義分段的U-Net模型的理論概述。 在下一個(gè)博客中,我們可以使用此模型進(jìn)行鹽識(shí)別并進(jìn)行實(shí)際實(shí)現(xiàn)。
翻譯自: https://medium.com/swlh/semantic-segmentation-using-u-net-e0f34e27724f
u-net語(yǔ)義分割
總結(jié)
以上是生活随笔為你收集整理的u-net语义分割_使用U-Net的语义分割的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 我叫mt4生活技能转换怎么做
- 下一篇: 地理空间数据