當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

u-net语义分割_使用U-Net的语义分割

發布時間：2023/12/15 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 u-net语义分割_使用U-Net的语义分割小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

u-net語義分割

Picture By Martei Macru On Unsplash圖片由Martei Macru On Unsplash拍攝

Semantic segmentation is a computer vision problem where we try to assign a class to each pixel . Unlike the classic image classification task where only one class value is predicted(assuming single label classification), in this problem we look for class value for each pixel. The application of image segmentation is predominantly seen in the medical field. However now this is being applied in other domains also e.g self driving car.

語義分割是一個計算機視覺問題，我們嘗試為每個像素分配一個類。與經典圖像分類任務不同，在傳統圖像分類任務中，僅預測一個類別值( 假設使用單個標簽分類 )，在此問題中，我們為每個像素尋找類別值。圖像分割的應用主要在醫學領域。但是現在，這也被應用于其他領域，例如自動駕駛汽車。

In case of image classification we are particularly interested to know what is there in the image. Semantic segmentation comes with two wh questions which is what and where.

在圖像分類的情況下，我們特別想知道圖像中有什么。語義分割帶有兩個wh問題，即什么地方。

什么是U-net： (What Is U-net:)

U-Net is the most popular model for semantic segmentation task. Though we have other models to accomplish this task U-Net is widely accepted as the de-facto standard for this task. A typical U-Net architecture has two parts: Encoder and Decoder.

U-Net是最受歡迎的語義分割任務模型。盡管我們還有其他模型可以完成此任務，但U-Net已被廣泛接受為該任務的實際標準。典型的U-Net架構包含兩個部分：編碼器和解碼器。

Structure Of U-NetU-Net的結構

編碼器： (Encoder:)

The job of the encoder is same as any convolutional neural network,which is basically to determine the first wh question what. However when we downsample the image like a typical convnet we tend to lose the information regarding the localization of the segmented objects. The feature maps of the cnn would have learned what is there in the image without any idea of where it is. In the original implementation of the U-Net a 128*18*1 image is taken where the encoder is able to output a 8*8*256 shape.

編碼器的工作與任何卷積神經網絡相同，基本上是確定第一個問題是什么 。但是，當像典型的卷積網絡那樣對圖像進行降采樣時，我們往往會丟失有關分割對象定位的信息。 cnn的特征圖將了解圖像中的內容，而無需知道其位置。在U-Net的原始實現中，會拍攝128 * 18 * 1的圖像，其中編碼器能夠輸出8 * 8 * 256的形狀。

解碼器： (Decoder:)

Decoder tries to recover the lost information during the encoder’s operation on the image. To do so it applies a skip connection which provides the spatial information that was lost during the downsampling of the image. Also the decoder uses transposed convolution which converts the a small image to a larger one. In the decoder size of the image increases from 8*8*256 to 128*128 *1.

解碼器嘗試在圖像上對編碼器進行操作期間恢復丟失的信息。為此，它應用了一個跳過連接，該連接提供了在圖像降采樣期間丟失的空間信息。解碼器也使用轉置卷積，將小圖像轉換為大圖像。在解碼器中，圖像的大小從8 * 8 * 256增加到128 * 128 * 1。

U-Net的變化： (Variations In The U-Net:)

We can find variety of implementation of the U-Net architecture. Instead of transposed convolution we can also apply the bilinear sampling method. Similarly if we can replace the encoder convolutional neural network by any popular network like ResNet or VGG-Net. We may or may not choose to use the pretrained weight.

我們可以找到U-Net架構的各種實現。除了轉置卷積，我們還可以應用雙線性采樣方法。同樣，如果我們可以用任何流行的網絡(例如ResNet或VGG-Net)代替編碼器卷積神經網絡。我們可能會或可能不會選擇使用預先訓練的體重。

This was a theoretical overview of the U-Net model using semantic segmentation. In the next blog we can use this model to do salt identification and do the practical implementation of it.

這是使用語義分段的U-Net模型的理論概述。在下一個博客中，我們可以使用此模型進行鹽識別并進行實際實現。

翻譯自: https://medium.com/swlh/semantic-segmentation-using-u-net-e0f34e27724f