yolov3权重_目标检测之 YOLOv3 (Pytorch实现)
1.文章目的
Github上已經(jīng)有YOLOv3 Pytorch版本的實(shí)現(xiàn),但我總覺得不是自己的東西直接拿過來用著不舒服。想著自己動(dòng)手豐衣足食,因此,本文主要是出于學(xué)習(xí)的目的,將YOLO3網(wǎng)絡(luò)自己搭建一遍,然后使用官方提供的預(yù)訓(xùn)練權(quán)重進(jìn)行預(yù)測,這樣有助于對YOLOv3模型的理解。
2.目標(biāo)檢測的任務(wù)
目標(biāo)檢測是計(jì)算機(jī)視覺中的一項(xiàng)任務(wù),它包括識(shí)別給定照片中一個(gè)或多個(gè)目標(biāo)的存在、位置和類型。這是一個(gè)具有挑戰(zhàn)性的問題,涉及建立在對象識(shí)別方法(例如,它們在哪里)、對象定位(例如,它們的范圍是什么)和對象分類(例如,它們是什么)的基礎(chǔ)上。例如下面這張照片,目標(biāo)檢測的任務(wù)是識(shí)別出照片里有什么,它們在哪里,并用方框?qū)⑺鼈儤?biāo)注出來。
三只斑馬(Taken by Boegh)3.YOLOv3模型
關(guān)于YOLOv3模型(原論文作者將其稱之為“DarkNet”,這個(gè)名字聽起來怪怪的)的介紹,網(wǎng)上有一大堆,這里不再哆嗦。網(wǎng)絡(luò)結(jié)構(gòu)如下圖:
另外有一點(diǎn):對于搭建好的模型,我們將使用預(yù)先訓(xùn)練好的權(quán)重文件來進(jìn)行預(yù)測,因此,有必要先下載好預(yù)訓(xùn)練權(quán)重文件(在國內(nèi)如果你有足夠的時(shí)間等待下載或者網(wǎng)絡(luò)不會(huì)抽風(fēng)那你可以不用迅雷。ps,迅雷是個(gè)好東西):
DarkNet在MSCOCO數(shù)據(jù)集上的預(yù)訓(xùn)練權(quán)重
4.模型搭建
導(dǎo)入需要用到的庫:
import numpy as np import torch import torch.nn as nn import torchvision from PIL import Image import matplotlib.pyplot as plt定義 YOLOv3(DarkNet)網(wǎng)絡(luò)中的層:
每個(gè) DarkNet 層包括卷積層、batch norm(BN)層、激活函數(shù)。如果 DarkNet 層中有 BN 層,則其中得到卷積層只有權(quán)重而沒有 bias。
DarkNet 網(wǎng)絡(luò)分別在第 82,94,106 層會(huì)輸出預(yù)測,即共計(jì)三個(gè)在不同 stride 下的輸出,在這三個(gè)輸出層中沒有 BN 層,也沒有激活函數(shù)。
#Darknet層 class DarknetLayer(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, padding, bnorm = True, leaky = True):super().__init__()self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias = False if bnorm else True)self.bnorm = nn.BatchNorm2d(out_channels, eps = 1e-3) if bnorm else Noneself.leaky = nn.LeakyReLU(0.1) if leaky else Nonedef forward(self, x):x = self.conv(x)if self.bnorm is not None:x = self.bnorm(x)if self.leaky is not None:x = self.leaky(x)return x定義 YOLOv3 網(wǎng)絡(luò)中的塊:
這里借鑒了 ResNet 中殘差塊的思想,一個(gè)塊中會(huì)有一個(gè)跳躍,即輸入在經(jīng)過塊中每一層之后得到一個(gè)臨時(shí)輸出,再將輸入和臨時(shí)輸出在相同的位置處相加得到塊的輸出。
#DarkNet塊 class DarknetBlock(nn.Module):def __init__(self, layers, skip = True):super().__init__()self.skip = skipself.layers = nn.ModuleDict()for i in range(len(layers)):self.layers[layers[i]['id']] = DarknetLayer(layers[i]['in_channels'], layers[i]['out_channels'], layers[i]['kernel_size'],layers[i]['stride'], layers[i]['padding'], layers[i]['bnorm'],layers[i]['leaky'])def forward(self, x):count = 0for _, layer in self.layers.items():if count == (len(self.layers) - 2) and self.skip:skip_connection = xcount += 1x = layer(x)return x + skip_connection if self.skip else x上述代碼將幾個(gè) DarkNet 層堆疊成一個(gè)塊。layers 是包含了幾個(gè)字典的一個(gè)列表,每個(gè)字典聲明了 DarkNet 層的的輸入通道數(shù),卷積核數(shù)等參數(shù)。skip 用于指明這個(gè)塊是否作為殘差塊使用。
forword 函數(shù)中有一個(gè) if 語句,這個(gè)語句的作用是,如果這個(gè)塊是殘差塊,則將塊中 stride 為 2 的層的輸出和塊的臨時(shí)輸出相加,如果沒有 stride 為 2 的層,才將塊的輸入和臨時(shí)的輸出相加得到塊的輸出。將塊堆疊成 YOLOv3 網(wǎng)絡(luò):
總共有 106 層,第 82,94,106 層是輸出,結(jié)構(gòu)稍微有點(diǎn)復(fù)雜。
#DarkNet網(wǎng)絡(luò) class Yolov3(nn.Module):def __init__(self):super().__init__()self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners = False)#layer0 -> layer4, input = (3, 416, 416), flow_out = (64, 208, 208)self.blocks = nn.ModuleDict()self.blocks['block0_4'] = DarknetBlock([{'id': 'layer_0', 'in_channels': 3, 'out_channels': 32, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_1', 'in_channels': 32, 'out_channels': 64, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_2', 'in_channels': 64, 'out_channels': 32, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_3', 'in_channels': 32, 'out_channels': 64, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer5 -> layer8, input = (64, 208, 208), flow_out = (128, 104, 104)self.blocks['block5_8'] = DarknetBlock([{'id': 'layer_5', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_6', 'in_channels': 128, 'out_channels': 64, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_7', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer9 -> layer11, input = (128, 104, 104), flow_out = (128, 104, 104)self.blocks['block9_11'] = DarknetBlock([{'id': 'layer_9', 'in_channels': 128, 'out_channels': 64, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_10', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer12 -> layer15, input = (128, 104, 104), flow_out = (256, 52, 52)self.blocks['block12_15'] = DarknetBlock([{'id': 'layer_12', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_13', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_14', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer16 -> layer36, input = (256, 52, 52), flow_out = (256, 52, 52)self.blocks['block16_18'] = DarknetBlock([{'id': 'layer_16', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_17', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block19_21'] = DarknetBlock([{'id': 'layer_19', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_20', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block22_24'] = DarknetBlock([{'id': 'layer_22', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_23', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block25_27'] = DarknetBlock([{'id': 'layer_25', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_26', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block28_30'] = DarknetBlock([{'id': 'layer_28', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_29', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block31_33'] = DarknetBlock([{'id': 'layer_31', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_32', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block34_36'] = DarknetBlock([{'id': 'layer_34', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_35', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer37 -> layer40, input = (256, 52, 52), flow_out = (512, 26, 26)self.blocks['block37_40'] = DarknetBlock([{'id': 'layer_37', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_38', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_39', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer41 -> layer61, input = (512, 26, 26), flow_out = (512, 26, 26)self.blocks['block41_43'] = DarknetBlock([{'id': 'layer_41', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_42', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block44_46'] = DarknetBlock([{'id': 'layer_44', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_45', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block47_49'] = DarknetBlock([{'id': 'layer_47', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_48', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block50_52'] = DarknetBlock([{'id': 'layer_50', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_51', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block53_55'] = DarknetBlock([{'id': 'layer_53', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_54', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block56_58'] = DarknetBlock([{'id': 'layer_56', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_57', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block59_61'] = DarknetBlock([{'id': 'layer_59', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_60', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer62 -> layer65, input = (512, 26, 26), flow_out = (1024, 13, 13)self.blocks['block62_65'] = DarknetBlock([{'id': 'layer_62', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_63', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_64', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer66 -> layer74, input = (1024, 13, 13), flow_out = (1024, 13, 13)self.blocks['block66_68'] = DarknetBlock([{'id': 'layer_66', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_67', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block69_71'] = DarknetBlock([{'id': 'layer_69', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_70', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block72_74'] = DarknetBlock([{'id': 'layer_72', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_73', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer75 -> layer79, input = (1024, 13, 13), flow_out = (512, 13, 13)self.blocks['block75_79'] = DarknetBlock([{'id': 'layer_75', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_76', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_77', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_78', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_79', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer80 -> layer82, input = (512, 13, 13), yolo_out = (255, 13, 13)self.blocks['yolo_82'] = DarknetBlock([{'id': 'layer_80', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_81', 'in_channels': 1024, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}], skip = False)#layer83 -> layer86, input = (512, 13, 13), -> (256, 13, 13) -> upsample and concate layer61(512, 26, 26), flow_out = (768, 26, 26)self.blocks['block83_86'] = DarknetBlock([{'id': 'layer_84', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer87 -> layer91, input = (768, 26, 26), flow_out = (256, 26, 26)self.blocks['block87_91'] = DarknetBlock([{'id': 'layer_87', 'in_channels': 768, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_88', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_89', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_90', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_91', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer92 -> layer94, input = (256, 26, 26), yolo_out = (255, 26, 26)self.blocks['yolo_94'] = DarknetBlock([{'id': 'layer_92', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_93', 'in_channels': 512, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}], skip = False)#layer95 -> layer98, input = (256, 26, 26), -> (128, 26, 26) -> upsample and concate layer36(256, 52, 52), flow_out = (384, 52, 52)self.blocks['block95_98'] = DarknetBlock([{'id': 'layer_96', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer99 -> layer106, input = (384, 52, 52), yolo_out = (255, 52, 52)self.blocks['yolo_106'] = DarknetBlock([{'id': 'layer_99', 'in_channels': 384, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_100', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_101', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_102', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_103', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_104', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_105', 'in_channels': 256, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}], skip = False)def forward(self, x):x = self.blocks['block0_4'](x)x = self.blocks['block5_8'](x)x = self.blocks['block9_11'](x)x = self.blocks['block12_15'](x)x = self.blocks['block16_18'](x)x = self.blocks['block19_21'](x)x = self.blocks['block22_24'](x)x = self.blocks['block25_27'](x)x = self.blocks['block28_30'](x)x = self.blocks['block31_33'](x)x = self.blocks['block34_36'](x)skip36 = xx = self.blocks['block37_40'](x)x = self.blocks['block41_43'](x)x = self.blocks['block44_46'](x)x = self.blocks['block47_49'](x)x = self.blocks['block50_52'](x)x = self.blocks['block53_55'](x)x = self.blocks['block56_58'](x)x = self.blocks['block59_61'](x)skip61 = xx = self.blocks['block62_65'](x)x = self.blocks['block66_68'](x)x = self.blocks['block69_71'](x)x = self.blocks['block72_74'](x)x = self.blocks['block75_79'](x)yolo_82 = self.blocks['yolo_82'](x)x = self.blocks['block83_86'](x)x = self.upsample(x)x = torch.cat((x, skip61), dim = 1)x = self.blocks['block87_91'](x)yolo_94 = self.blocks['yolo_94'](x)x = self.blocks['block95_98'](x)x = self.upsample(x)x = torch.cat((x, skip36), dim = 1)yolo_106 = self.blocks['yolo_106'](x)return yolo_82, yolo_94, yolo_106定義模型
model = Yolov3()到這一步可以用 print 將模型結(jié)構(gòu)打印出來。
加載預(yù)訓(xùn)練權(quán)重
這時(shí)候,權(quán)重文件應(yīng)該已經(jīng)下載好了,我們可以通過一個(gè)權(quán)重讀取類來將權(quán)重參數(shù)加載到我們的模型里:
#權(quán)重讀取類 class WeightReader():def __init__(self, weight_file):with open(weight_file, 'r') as fp:header = np.fromfile(fp, dtype = np.int32, count = 5)self.header = torch.from_numpy(header)self.seen = self.header[3]#The rest of the values are the weights#load them upself.weights = np.fromfile(fp, dtype = np.float32)#加載權(quán)重參數(shù)def load_weights(self, model):ptr = 0for _, block in model.blocks.items():for _, layer in block.layers.items():bn = layer.bnormconv = layer.convif bn is not None:#Get the number of weights of Batch Norm Layernum_bn_biases = bn.bias.numel()#Load the data#偏差bn_biases = torch.from_numpy(self.weights[ptr:ptr + num_bn_biases])ptr += num_bn_biases#權(quán)重bn_weights = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])ptr += num_bn_biases#均值bn_running_mean = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])ptr += num_bn_biases#方差bn_running_var = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])ptr += num_bn_biases#Cast the loaded weights into dims of model weights. bn_biases = bn_biases.view_as(bn.bias.data)bn_weights = bn_weights.view_as(bn.weight.data)bn_running_mean = bn_running_mean.view_as(bn.running_mean)bn_running_var = bn_running_var.view_as(bn.running_var)#Copy the data to modelbn.bias.data.copy_(bn_biases)bn.weight.data.copy_(bn_weights)bn.running_mean.copy_(bn_running_mean)bn.running_var.copy_(bn_running_var) else:#Number of biasesnum_biases = conv.bias.numel()#Load the biasesconv_biases = torch.from_numpy(self.weights[ptr: ptr + num_biases])ptr = ptr + num_biases#reshape the loaded weights according to the dims of the model weightsconv_biases = conv_biases.view_as(conv.bias.data)#Finally copy the dataconv.bias.data.copy_(conv_biases)#load the weights for the Convolutional layersnum_weights = conv.weight.numel()#Do the same as above for weightsconv_weights = torch.from_numpy(self.weights[ptr:ptr+num_weights])ptr = ptr + num_weightsconv_weights = conv_weights.view_as(conv.weight.data)conv.weight.data.copy_(conv_weights)#查看網(wǎng)絡(luò)參數(shù)def weight_summary(self, model):train_able, train_disable = 0, 0for _, block in model.blocks.items():for _, layer in block.layers.items():bn = layer.bnormconv = layer.convif bn is not None:train_able += (bn.bias.numel() + bn.weight.numel())train_disable += (bn.running_mean.numel() + bn.running_var.numel())else:train_able += conv.bias.numel()train_able += conv.weight.numel()print("total = %d"%(train_able + train_disable))print("count of train_able = %d"%train_able)print("count of train_disable = %d"%train_disable)官方給出的預(yù)訓(xùn)練權(quán)重文件中去掉前 5 個(gè)數(shù)值,剩下的才是可以加載到模型里面的。需要注意權(quán)重文件中參數(shù)的保存格式,這里給出官方提供的一張圖:
它是按照層的前向傳播順序來存儲(chǔ)參數(shù)數(shù)值的。如果 DarkNet 層中有 BN 層,則依次存儲(chǔ) BN 的偏置,權(quán)重,均值,方差以及卷積層的權(quán)重。如果 DarkNet 層中沒有 BN 層,則依次存儲(chǔ)卷積層的偏置,卷積層的權(quán)重。
對于 BN 層,它的偏置和權(quán)重是可訓(xùn)練參數(shù),而均值和方差是不可訓(xùn)練參數(shù),但都需要加載到網(wǎng)絡(luò)里。通過以下代碼加載參數(shù)并查看參數(shù)數(shù)量。
#加載模型參數(shù),并查看模型參數(shù)數(shù)量 #####網(wǎng)絡(luò)總參數(shù)為 62,001,757 #####其中,可訓(xùn)練參數(shù)(BN層以及卷積層的weight, bias) = 61,949,149, 不可訓(xùn)練參數(shù)(BN層的均值和方差) = 52,608 weight_reader = WeightReader('yolov3.weights') weight_reader.load_weights(model) weight_reader.weight_summary(model)輸入處理
定義一個(gè)圖片加載的函數(shù),將輸入的圖片裁剪成網(wǎng)絡(luò)輸入的大小(416),并將圖片每個(gè)像素都除以 255,轉(zhuǎn)成四維張量。最后返回圖片和圖片原始的寬高。
#加載圖片 def img_loader(photo_file, input_w, input_h):img = Image.open(photo_file)img_w, img_h = img.sizeimg = img.resize((input_w, input_h))img = torchvision.transforms.ToTensor()(img)img = torch.unsqueeze(img, 0)#返回指定大小的圖片張量和圖片原始的寬高return img, img_w, img_h接下來,模型就可以根據(jù)輸入的圖片得到準(zhǔn)確的輸出了。
photo_file = 'zebra.jpg' input_w, input_h = 416, 416 img, img_w, img_h = img_loader(photo_file, input_w, input_h) y_hat = model(img)這時(shí)候得到的 y_hat 是一個(gè)含有三個(gè)元素的元組,每個(gè)元素都是一個(gè)四維張量,剩下要做的事就是對這些張量進(jìn)行解碼,做 IoU 過濾,使用非極大值抑制,畫出邊框等一些列操作,這里一并將涉及到的函數(shù)直接貼出。
#錨箱類 class BoundBox:def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):self.xmin = xminself.ymin = yminself.xmax = xmaxself.ymax = ymaxself.objness = objnessself.classes = classesself.label = -1self.score = -1def get_label(self):if self.label == -1:self.label = np.argmax(self.classes)return self.labeldef get_score(self):if self.score == -1:self.score = self.classes[self.get_label()]return self.scoredef _sigmoid(x):return 1. / (1. + np.exp(-x))#解碼網(wǎng)絡(luò)輸出 def decode_netout(netout, anchors, obj_thresh, net_w, net_h):grid_h, grid_w = netout.shape[1: ]nb_box = 3netout = netout.permute(1, 2, 0).detach().numpy().reshape((grid_h, grid_w, nb_box, -1))nb_class = netout.shape[-1] - 5boxes = []netout[..., :2] = _sigmoid(netout[..., :2])netout[..., 4:] = _sigmoid(netout[..., 4:])netout[..., 5:] = netout[..., 4][..., np.newaxis] * netout[..., 5:]netout[..., 5:] *= netout[..., 5:] > obj_threshfor i in range(grid_h*grid_w):row = i / grid_wcol = i % grid_wfor b in range(nb_box):# 4th element is objectness scoreobjectness = netout[int(row)][int(col)][b][4]if(objectness.all() <= obj_thresh): continue# first 4 elements are x, y, w, and hx, y, w, h = netout[int(row)][int(col)][b][:4]x = (col + x) / grid_w # center position, unit: image widthy = (row + y) / grid_h # center position, unit: image heightw = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image widthh = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height# last elements are class probabilitiesclasses = netout[int(row)][col][b][5:]box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)boxes.append(box)return boxes#執(zhí)行邊界框坐標(biāo)的轉(zhuǎn)換,將邊界框列表、加載照片的原始形狀和網(wǎng)絡(luò)輸入的形狀作為參數(shù)。 #邊界框的坐標(biāo)將直接更新。 def correct_yolo_boxes(boxes, image_w, image_h, net_w, net_h):new_w, new_h = net_w, net_hfor i in range(len(boxes)):x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_wy_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_hboxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)# 為計(jì)算 IoU 服務(wù) def _interval_overlap(interval_a, interval_b):x1, x2 = interval_ax3, x4 = interval_bif x3 < x1:if x4 < x1:return 0else:return min(x2,x4) - x1else:if x2 < x3:return 0else:return min(x2,x4) - x3#計(jì)算兩個(gè)箱體的 IoU def bbox_iou(box1, box2):intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])intersect = intersect_w * intersect_hw1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.yminw2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.yminunion = w1*h1 + w2*h2 - intersectreturn float(intersect) / union#非極大值抑制 def do_nms(boxes, nms_thresh):if len(boxes) > 0:nb_class = len(boxes[0].classes)else:returnfor c in range(nb_class):sorted_indices = np.argsort([-box.classes[c] for box in boxes])for i in range(len(sorted_indices)):index_i = sorted_indices[i]if boxes[index_i].classes[c] == 0: continuefor j in range(i+1, len(sorted_indices)):index_j = sorted_indices[j]if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:boxes[index_j].classes[c] = 0#檢索那些能強(qiáng)烈預(yù)測物體存在的箱子:它們的可信度超過 thresh def get_boxes(boxes, labels, thresh):v_boxes, v_labels, v_scores = list(), list(), list()# enumerate all boxesfor box in boxes:# enumerate all possible labelsfor i in range(len(labels)):# check if the threshold for this label is high enoughif box.classes[i] > thresh:v_boxes.append(box)v_labels.append(labels[i])v_scores.append(box.classes[i]*100)# don't break, many labels may trigger for one boxreturn v_boxes, v_labels, v_scores#畫出邊界框 def draw_boxes(photo_file, v_boxes, v_labels, v_scores):# load the imagedata = plt.imread(photo_file)# plot the imageplt.imshow(data)# get the context for drawing boxesax = plt.gca()# plot each boxfor i in range(len(v_boxes)):box = v_boxes[i]# get coordinatesy1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax# calculate width and height of the boxwidth, height = x2 - x1, y2 - y1# create the shaperect = plt.Rectangle((x1, y1), width, height, fill=False, color='white')# draw the boxax.add_patch(rect)# draw text and score in top left cornerlabel = "%s (%.1f)" % (v_labels[i], v_scores[i])plt.text(x1, y1, label, color='white', bbox=dict(facecolor='red'))# show the plotplt.show()寫一個(gè)函數(shù)對上述步驟做一個(gè)封裝。
def make_predict(photo_file):img, img_w, img_h = img_loader(photo_file, input_w, input_h)y_hat = model(img)boxes = []for i in range(len(y_hat)):# decode the output of the networkboxes += decode_netout(y_hat[i][0], anchors[i], class_threshold, input_w, input_h)# correct the sizes of the bounding boxes for the shape of the imagecorrect_yolo_boxes(boxes, img_w, img_h, input_w, input_h)# suppress non-maximal boxesdo_nms(boxes, 0.5)# get the details of the detected objectsv_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)# summarize what we foundfor i in range(len(v_boxes)):print(v_labels[i], v_scores[i])# draw what we founddraw_boxes(photo_file, v_boxes, v_labels, v_scores)另外,需要將網(wǎng)絡(luò)輸出的類別序號(hào)映射成我們能夠理解的自然語言,權(quán)重文件能夠預(yù)測的標(biāo)簽如下:
#權(quán)重文件能夠預(yù)測的標(biāo)簽 labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck","boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench","bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe","backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard","sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard","tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana","apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake","chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse","remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator","book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]最后,執(zhí)行我們封裝好的函數(shù)。
#預(yù)先設(shè)定的錨點(diǎn) anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]] #輸入的網(wǎng)絡(luò)的寬高 input_w, input_h = 416, 416 #置信度閾值 class_threshold = 0.75 #讀取圖片開始預(yù)測 photo_file = 'zebra.jpg' make_predict(photo_file)結(jié)果如下:
參考文獻(xiàn):
YOLOv3 論文
How to Perform Object Detection With YOLOv3 in Keras
How to implement a YOLO (v3) object detector from scratch in PyTorch
YOLOv3網(wǎng)絡(luò)結(jié)構(gòu)和解析
總結(jié)
以上是生活随笔為你收集整理的yolov3权重_目标检测之 YOLOv3 (Pytorch实现)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: visio思维导图模板_如何下载思维导图
- 下一篇: 视觉目标检测和识别之过去,现在及可能