當前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

李沐动手学深度学习V2-语义分割和Pascal VOC2012数据集加载代码实现

發布時間：2023/12/14 pytorch 25 豆豆

生活随笔收集整理的這篇文章主要介紹了李沐动手学深度学习V2-语义分割和Pascal VOC2012数据集加载代码实现小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一. 語義分割和數據集

1. 介紹

目標檢測問題中使用方形邊界框來標注和預測圖像中的目標，而語義分割（semantic segmentation）問題，重點關注于如何將圖像分割成屬于不同語義類別的區域。與目標檢測不同，語義分割可以識別并理解圖像中每一個像素的內容：其語義區域的標注和預測是像素級的。如下圖所示展示了語義分割中圖像有關狗、貓和背景的標簽，與目標檢測相比，語義分割標注的像素級的邊框顯然更加精細。

2. 圖像分割和實例分割

計算機視覺領域還有2個與語義分割相似的重要問題，即圖像分割（image segmentation）和實例分割（instance segmentation）。下面將他們同語義分割簡單區分一下。

圖像分割將圖像劃分為若干組成區域，這類問題的方法通常利用圖像中像素之間的相關性。它在訓練時不需要有關圖像像素的標簽信息，在預測時也無法保證分割出的區域具有我們希望得到的語義。以如上圖圖像作為輸入，圖像分割可能會將狗分為兩個區域：一個覆蓋以黑色為主的嘴和眼睛，另一個覆蓋以黃色為主的其余部分身體。
實例分割也叫同時檢測并分割（simultaneous detection and segmentation），它研究如何識別圖像中各個目標實例的像素級區域。與語義分割不同，實例分割不僅需要區分語義，還要區分不同的目標實例。如上圖圖像中有兩條狗，則實例分割需要區分像素屬于的兩條狗中的哪一條。

3. 語義分割數據集 Pascal VOC2012

3.1 語義分割數據集最重要的之一是Pascal VOC2012，數據集的tar文件大約為2GB。提取出的數據集位于. ./data/VOCdevkit/VOC2012

3.2 進入路徑. ./data/VOCdevkit/VOC2012之后，可以看到數據集的不同組件。 ImageSets/Segmentation路徑包含用于訓練和測試樣本的文本文件，而JPEGImages和SegmentationClass路徑分別存儲著每個示例的輸入圖像和標簽，此處的標簽也采用圖像格式，其尺寸和它所標注的輸入圖像的尺寸相同。此外，標簽中顏色相同的像素屬于同一個語義類別。 下面將read_voc_images()函數定義為將所有輸入的圖像和標簽讀入內存。

"""讀取所有VOC圖像并標注""" def read_voc_images(voc_dir,is_train=True):fpath = os.path.join(voc_dir,'ImageSets','Segmentation','train.txt' if is_train else 'val.txt')with open(fpath,'r') as f:images_name = f.read().split()print('images_name.len = ',len(images_name))mode = torchvision.io.image.ImageReadMode.RGBimages_feature,images_label = [],[]for image_name in images_name:image_path = os.path.join(voc_dir,'JPEGImages',f'{image_name}.jpg')label_path = os.path.join(voc_dir,'SegmentationClass',f'{image_name}.png')images_feature.append(torchvision.io.read_image(image_path))images_label.append(torchvision.io.read_image(label_path,mode))return images_feature,images_label train_images,train_labels = read_voc_images(voc_dir,True) len(train_images),len(train_labels)

3.3 繪制前5個輸入圖像及其標簽，在標簽圖像中，白色和黑色分別表示邊框和背景，而其他顏色則對應不同的類別，結果如下圖所示。

n = 5 imgs = train_images[0:n]+train_labels[0:n] imgs = [image.permute(1,2,0) for image in imgs] d2l.torch.show_images(imgs,2,n)

3.4 列舉RGB顏色值和類名

VOC_COLORMAP = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0],[0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128],[64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0],[64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128],[0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0],[0, 64, 128]]VOC_CLASSES = ['background', 'aeroplane', 'bicycle', 'bird', 'boat','bottle', 'bus', 'car', 'cat', 'chair', 'cow','diningtable', 'dog', 'horse', 'motorbike', 'person','potted plant', 'sheep', 'sofa', 'train', 'tv/monitor']

3.5 通過上面定義的兩個常量，可以方便地查找標簽中每個像素的類索引。定義voc_colormap2label（）函數來構建從上述RGB顏色值到類別索引的映射，而voc_label_indices（）函數將RGB值映射到在Pascal VOC2012數據集中的類別索引。

'''構建從RGB到VOC類別索引的映射''' def voc_colormap2label():colormaplabel = torch.zeros(256**3,dtype=torch.long)for i,colormap in enumerate(VOC_COLORMAP):idx = (colormap[0]*256+colormap[1])*256+colormap[2]colormaplabel[idx] = ireturn colormaplabel """將VOC標簽中的RGB值映射到它們對應的類別索引""" def voc_label_indices(label,colormaplabel):label = label.permute(1,2,0).numpy().astype('int32')idxs = [(label[:,:,0]*256+label[:,:,1])*256+label[:,:,2]]return colormaplabel[idxs]

例如在第一張樣本圖像中，飛機頭部區域的類別索引為1，而背景索引為0，如下圖所示。

y = voc_label_indices(train_labels[0],voc_colormap2label()) y[105:115,130:140],VOC_CLASSES[1]

4. 預處理數據

**以前網絡對圖片預處理是通過縮放圖像使其符合模型的輸入形狀，然而在語義分割中，這樣做需要將預測的像素類別重新映射回原始尺寸的輸入圖像。這樣的映射可能不夠精確，尤其在不同語義的分割區域。為了避免這個問題，需要將圖像裁剪為固定尺寸，而不是縮放。具體來說，使用圖像增廣中的隨機裁剪，裁剪輸入圖像和標簽的相同區域，**輸出結果如下圖所示。

"""隨機裁剪特征和標簽圖像""" def voc_rand_crop(feature,label,height,width):rect = torchvision.transforms.RandomCrop.get_params(feature,(height,width))feature = torchvision.transforms.functional.crop(feature,*rect)label = torchvision.transforms.functional.crop(label,*rect)return feature,labelimgs = [] for _ in range(n):imgs += voc_rand_crop(train_images[0],train_labels[0],200,300) imgs = [img.permute(1,2,0) for img in imgs] d2l.torch.show_images(imgs[::2]+imgs[1::2],2,n)

5. 自定義語義分割數據集類

通過繼承高級API提供的Dataset類，自定義一個語義分割數據集類VOCSegDataset。通過實現_ getitem _函數，可以任意訪問數據集中索引為item的輸入圖像及其每個像素的類別索引，由于數據集中有些圖像的尺寸可能小于隨機裁剪所指定的輸出尺寸，這些樣本可以通過自定義的filter函數移除掉。此外定義了normalize_image函數，從而對輸入圖像的RGB三個通道的值分別做標準化。

"""一個用于加載VOC數據集的自定義數據集""" class VOCSegDataset(torch.utils.data.Dataset):def __init__(self,is_train,crop_size,voc_dir):self.transform = torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])self.crop_size = crop_sizefeatures,labels = read_voc_images(voc_dir,is_train)self.features = [self.transform_normalize(feature) for feature in self.filter(features)]self.labels = self.filter(labels)self.colormap2label = voc_colormap2label()print('dataset.len =',len(self.features))def filter(self,images):images = [image for image in images if (image.shape[1]>=self.crop_size[0] and image.shape[2]>=self.crop_size[1])]return imagesdef transform_normalize(self,image):return self.transform(image.float()/255)def __getitem__(self, item):feature,label = voc_rand_crop(self.features[item],self.labels[item],*self.crop_size)return (feature,voc_label_indices(label,self.colormap2label))def __len__(self):return len(self.features)

6. 讀取數據集

通過自定義的VOCSegDataset類來分別創建訓練集和測試集的實例，假設指定隨機裁剪的輸出圖像的形狀為 320×480 ，下面可以查看訓練集和測試集所保留的樣本個數。

crop_size = (320,480) voc_train = VOCSegDataset(is_train=True,crop_size=crop_size,voc_dir=voc_dir) voc_val = VOCSegDataset(is_train=False,crop_size=crop_size,voc_dir=voc_dir) ''' 輸出結果如下： images_name.len = 1464 dataset.len = 1114 images_name.len = 1449 dataset.len = 1078 '''

設批量大小為64，定義訓練集的迭代器，打印第一個小批量的形狀會發現：與圖像分類或目標檢測不同，這里的標簽是一個三維數組。

batch_size = 64 train_iter = torch.utils.data.DataLoader(voc_train,batch_size,shuffle=True,drop_last=True,num_workers = d2l.torch.get_dataloader_workers()) for X,Y in train_iter:print(X.shape)print(Y.shape)break ''' 輸出結果如下： torch.Size([64, 3, 320, 480]) torch.Size([64, 320, 480]) '''

7. 整合所有組件

定義load_data_voc（）函數來下載并讀取Pascal VOC2012語義分割數據集，并且返回訓練集和測試集的數據迭代器。

def load_data_voc(crop_size,batch_size):"""加載VOC語義分割數據集"""voc_dir = d2l.torch.download_extract('voc2012', os.path.join('VOCdevkit', 'VOC2012'))num_workers = d2l.torch.get_dataloader_workers()voc_train = VOCSegDataset(is_train=True,crop_size=crop_size,voc_dir=voc_dir)voc_val = VOCSegDataset(is_train=False,crop_size=crop_size,voc_dir=voc_dir)train_iter = torch.utils.data.DataLoader(voc_train,batch_size,shuffle=True,drop_last=True,num_workers=num_workers)val_iter = torch.utils.data.DataLoader(voc_val,batch_size,shuffle=False,drop_last=True,num_workers=num_workers)return (train_iter,val_iter)

8. 小結

語義分割通過將圖像劃分為屬于不同語義類別的區域，來識別并理解圖像中像素級別的內容。

由于語義分割的輸入圖像和標簽在像素上一一對應，輸入圖像會被隨機裁剪為固定尺寸而不是縮放，標簽跟輸入圖像裁剪的區域是一一對應的。

語義分割的一個重要的數據集之一是Pascal VOC2012。

9.全部代碼

import torch import torchvision import d2l.torch import osd2l.torch.DATA_HUB['voc2012'] = (d2l.torch.DATA_URL + 'VOCtrainval_11-May-2012.tar','4e443f8a2eca6b1dac8a6c57641b67dd40621a49') voc_dir = d2l.torch.download_extract('voc2012', 'VOCdevkit/VOC2012') print('voc_dir = ', voc_dir)"""讀取所有VOC圖像并標注""" def read_voc_images(voc_dir, is_train=True):fpath = os.path.join(voc_dir, 'ImageSets', 'Segmentation', 'train.txt' if is_train else 'val.txt')with open(fpath, 'r') as f:images_name = f.read().split()print('images_name.len = ', len(images_name))mode = torchvision.io.image.ImageReadMode.RGBimages_feature, images_label = [], []for image_name in images_name:image_path = os.path.join(voc_dir, 'JPEGImages', f'{image_name}.jpg')label_path = os.path.join(voc_dir, 'SegmentationClass', f'{image_name}.png')images_feature.append(torchvision.io.read_image(image_path))images_label.append(torchvision.io.read_image(label_path, mode))return images_feature, images_labeltrain_images, train_labels = read_voc_images(voc_dir, True) len(train_images), len(train_labels) n = 5 imgs = train_images[0:n] + train_labels[0:n] imgs = [image.permute(1, 2, 0) for image in imgs] d2l.torch.show_images(imgs, 2, n) VOC_COLORMAP = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0],[0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128],[64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0],[64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128],[0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0],[0, 64, 128]]VOC_CLASSES = ['background', 'aeroplane', 'bicycle', 'bird', 'boat','bottle', 'bus', 'car', 'cat', 'chair', 'cow','diningtable', 'dog', 'horse', 'motorbike', 'person','potted plant', 'sheep', 'sofa', 'train', 'tv/monitor']'''構建從RGB到VOC類別索引的映射''' def voc_colormap2label():colormaplabel = torch.zeros(256 ** 3, dtype=torch.long)for i, colormap in enumerate(VOC_COLORMAP):idx = (colormap[0] * 256 + colormap[1]) * 256 + colormap[2]colormaplabel[idx] = ireturn colormaplabel"""將VOC標簽中的RGB值映射到它們對應的類別索引""" def voc_label_indices(label, colormaplabel):label = label.permute(1, 2, 0).numpy().astype('int32')idxs = [(label[:, :, 0] * 256 + label[:, :, 1]) * 256 + label[:, :, 2]]return colormaplabel[idxs]y = voc_label_indices(train_labels[0], voc_colormap2label()) y[105:115, 130:140], VOC_CLASSES[1]"""隨機裁剪特征和標簽圖像""" def voc_rand_crop(feature, label, height, width):rect = torchvision.transforms.RandomCrop.get_params(feature, (height, width))feature = torchvision.transforms.functional.crop(feature, *rect)label = torchvision.transforms.functional.crop(label, *rect)return feature, labelimgs = [] for _ in range(n):imgs += voc_rand_crop(train_images[0], train_labels[0], 200, 300) imgs = [img.permute(1, 2, 0) for img in imgs] d2l.torch.show_images(imgs[::2] + imgs[1::2], 2, n)"""一個用于加載VOC數據集的自定義數據集""" class VOCSegDataset(torch.utils.data.Dataset):def __init__(self, is_train, crop_size, voc_dir):self.transform = torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])self.crop_size = crop_sizefeatures, labels = read_voc_images(voc_dir, is_train)self.features = [self.transform_normalize(feature) for feature in self.filter(features)]self.labels = self.filter(labels)self.colormap2label = voc_colormap2label()print('dataset.len =', len(self.features))def filter(self, images):images = [image for image in images if(image.shape[1] >= self.crop_size[0] and image.shape[2] >= self.crop_size[1])]return imagesdef transform_normalize(self, image):return self.transform(image.float() / 255)def __getitem__(self, item):feature, label = voc_rand_crop(self.features[item], self.labels[item], *self.crop_size)return (feature, voc_label_indices(label, self.colormap2label))def __len__(self):return len(self.features)crop_size = (320, 480) voc_train = VOCSegDataset(is_train=True, crop_size=crop_size, voc_dir=voc_dir) voc_val = VOCSegDataset(is_train=False, crop_size=crop_size, voc_dir=voc_dir) batch_size = 64 train_iter = torch.utils.data.DataLoader(voc_train, batch_size, shuffle=True, drop_last=True,num_workers=d2l.torch.get_dataloader_workers()) for X, Y in train_iter:print(X.shape)print(Y.shape)breakdef load_data_voc(crop_size, batch_size):"""加載VOC語義分割數據集"""voc_dir = d2l.torch.download_extract('voc2012', os.path.join('VOCdevkit', 'VOC2012'))num_workers = d2l.torch.get_dataloader_workers()voc_train = VOCSegDataset(is_train=True, crop_size=crop_size, voc_dir=voc_dir)voc_val = VOCSegDataset(is_train=False, crop_size=crop_size, voc_dir=voc_dir)train_iter = torch.utils.data.DataLoader(voc_train, batch_size, shuffle=True, drop_last=True,num_workers=num_workers)val_iter = torch.utils.data.DataLoader(voc_val, batch_size, shuffle=False, drop_last=True, num_workers=num_workers)return (train_iter, val_iter)

10.鏈接

語義分割第一篇：李沐動手學深度學習V2-語義分割和Pascal VOC2012數據集加載代碼實現
語義分割第二篇：李沐動手學深度學習V2-轉置卷積和代碼實現
語義分割第三篇：李沐動手學深度學習V2-語義分割全卷積網絡FCN和代碼實現

總結

以上是生活随笔為你收集整理的李沐动手学深度学习V2-语义分割和Pascal VOC2012数据集加载代码实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：极光推送--RegistrationID
下一篇：梳理百年深度学习发展史-七月在线机器学习