當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

卷积在计算机中实现＋pool作用+数据预处理目的＋特征归一化+理解BN+感受野理解与计算+梯度回传+NMS/soft NMS

發布時間：2024/7/23 编程问答 37 豆豆

生活随笔收集整理的這篇文章主要介紹了卷积在计算机中实现＋pool作用+数据预处理目的＋特征归一化+理解BN+感受野理解与计算+梯度回传+NMS/soft NMS 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一．卷積在計算機中實現

1.卷積

將其存入內存當中再操作（按照“行先序”）：

這樣就造成混亂．

故需要im2col操作，將特征圖轉換成龐大的矩陣來進行卷積計算，利用矩陣加速來實現，犧牲了空間．

而對于１x１卷積，按照原始儲存結構和im2col存儲結構計算是一樣的，故1x1卷積不需要im2col的過程，所以底層可以有更快的實現，大大節省了數據重排列的時間和空間。如下圖所示：

2.反卷積

作用:實現上采樣,近似重構圖像,卷積可視化

假設輸入圖像input,為4*4,元素矩陣為：

卷積核kernel尺寸為3*3,，元素矩陣為：

步長stride=1,填充padding=0,則輸出圖像為(n-k+2*p)/s=2?，2*2的大小.

將輸入圖像X拉成一個列向量:

輸出圖像Y拉成一個列向量:

對于權重矩陣C,則Y=CX

則C為一個稀疏矩陣:

而反卷積就是通過C和Y去求X:

,即可以通過C和Y就可以恢復出X的尺寸.

由此可見反卷積可看為轉置卷積.

例子：

? ? ?

轉置卷積缺點:

棋盤效應：

上圖可看出，由于轉置卷積的不均勻重疊，這是卷積核大小不能被步長整除時，導致出現的。

避免棋盤效應的手段:

（1）.卷積核大小能被步長整除;

（2）.先resize,在卷積。

3.可行變卷積(DCN)

通過對feature的每個位置學習一個offset。

流程:

二.pool作用

1.提取對于平移和小變形不變的特征;

2.減少過擬合，提升泛化能力;

3.減小feature map的尺寸，對于網絡具有正則化作用。

三.數據預處理目的

簡單的從二維來理解，首先，圖像數據是高度相關的，假設其分布如下圖a所示(簡化為2維)。由于初始化的時候，我們的參數一般都是0均值的，因此開始的擬合y=w*x+b,基本過原點附近(因為b接近于零)，如圖b紅色虛線。因此，網絡需要經過多次學習才能逐步達到如紫色實線的擬合，即收斂的比較慢。如果我們對輸入數據先作減均值操作，如圖c，顯然可以加快學習。更進一步的，我們對數據再進行去相關操作，使得數據更加容易區分，這樣又會加快訓練，如圖d。

四.特征歸一化

4.1.目的

特征的單位尺度不一樣，比如身高體重，會造成loss函數為橢圓形，而進行了特征歸一化就消除了這種單位之間的差異，每一維特征都是平等對待。

4.2.歸一化的一些手段

(1)最小最大歸一化,歸一化到0~1,缺點是有新數據加入導致,max和min有變化。

(2)標準歸一化，歸一化到均值為0，方差為１,其中μ為所有樣本數據的均值，?δ為所有樣本數據的標準差。

(3)將每個樣本的特征向量除以其長度，即對樣本特征向量的長度進行歸一化，長度的度量常使用的是歐氏距離，特點將數據歸一化到單位圓上去

總的來說，歸一化/標準化的目的是為了獲得某種“無關性”——偏置無關、尺度無關、長度無關……當歸一化/標準化方法背后的物理意義和幾何含義與當前問題的需要相契合時，其對解決該問題就有正向作用，反之，就會起反作用。

4.3什么時候需要

與距離計算有關系時就需要，比如梯度下降，而樹模型比如決策樹，隨機森林等只關注當前特征怎么切分更好，與特征間的相對大小無關就不需要。

五.BN

BN可看成在網絡的每一層都在做數據預處理．

tensorflow使用BN。

tensorflow中batch normalization的用法_智障變智能-CSDN博客

1,首先我們根據論文來介紹一下BN層的優點:

1）加快訓練速度，這樣我們就可以使用較大的學習率來訓練網絡。

2）提高網絡的泛化能力。

3）BN層本質上是一個歸一化網絡層。

4）可以打亂樣本訓練順序（這樣就不可能出現同一張照片被多次選擇用來訓練）論文中提到可以提高1%的精度。
問題：深層網絡訓練過程中，每一層輸入隨著參數變化而變化，導致每一層都需要適應新的分布，這叫做內部協方差變化。

BN除了解決內部協方差變化，還能起到正則化作用。

BN通過歸一化每一層的輸入的均值和方差，可以有效解決梯度之間的依賴性。

2,加入γ與β的原因：

由于歸一化每一層的輸入可能影響該層的代表性，例如sigmoid本來用來做分類要用非線性區域，結果歸一化到了線性區域，所以加入上述兩個參數，當γ等于樣本標準差時，β等于期望時就恢復到了未歸一化狀態。

3,用minibatch代表整個樣本集原因：

當用整個訓練集做梯度下降時是不現實的，故采用mini-batch的方式產生均值和方差的估計，通過這種方式的話可以把歸一化加入到梯度回傳的過程中。注意到這里提及了是計算一個minibatch每一個維度的方差，而不是整個方差。

4,mini-batch算法訓練過程：

一個batch-size有m個樣本。

輸入：輸入數據x1…xm（這些數據是準備進入激活函數的數據）
計算過程中可以看到,
1.求數據均值；
2.求數據方差；
3.數據進行歸一化
4.訓練參數γ，β
5.輸出y通過γ與β的線性變換得到新的值
在正向傳播的時候，通過可學習的γ與β參數求出新的分布值

在反向傳播的時候，通過鏈式求導方式，求出γ與β以及相關權值。

5.預測過程中的均值和方差：

每層的γ與β兩個參數, 通過訓練時所得。

每一層均值和方差：

對于均值來說直接計算所有mini-batch均值的期望；然后對于標準偏差采用所有mini-batch?σB期望的無偏估計。

6.pytoch 實現

看看pytorch中的實現，BN層的輸出Y與輸入X之間的關系是：Y = (X - running_mean) / sqrt(running_var + eps) * gamma + beta，而running_mean、running_var則是在前向時先由X計算出mean和var，再由mean和var以動量momentum來更新running_mean和running_var。所以在訓練階段，running_mean和running_var在每次前向時更新一次；在測試階段，則通過model.eval()固定該BN層的running_mean和running_var，此時這兩個值即為訓練階段最后一次前向時確定的值，并在整個測試階段保持不變。

running_mean =?running_mean*(1-momentum)+E[x]*momentum

running_var =?running_var*(1-momentum)+Var[x]*momentum

7.SyncBatchNorm 的 PyTorch 實現

SyncBatchNorm主要用于解決多卡的bn同步問題,

1.每張卡單獨計算均值，然后同步，得到全局均值;

2.用全局均值計算每張卡的方差，然后同步即可得到全局方差，但兩次會消耗時間挺長，經過下面公式變換就可以一次同步好。

由上圖可看出只需要同步計算好和就可以計算全局方差。

8.對于CNN:

如果min-batch sizes為m，那么網絡某一層輸入數據可以表示為四維矩陣(m,f,w,h)，m為min-batch sizes，f為特征圖個數，w、h分別為特征圖的寬高。在CNN中我們可以把每個特征圖看成是一個特征處理（一個神經元），因此在使用Batch Normalization，mini-batch size 的大小就是：m*w*h，于是對于每個特征圖都只有一對可學習參數：γ、β。說白了吧，這就是相當于求取所有樣本所對應的一個特征圖的平均值、方差，然后對這個特征圖神經元做歸一化。

從上就可以看出BN很受batch的影響,

9.二維數據BN推導:

10.另一些歸一化手段

由于BN容易受到batch的影響,故又發展出LayerNorm,InstanceNorm,GroupNorm等手段

BatchNorm：batch方向做歸一化，計算N*H*W的均值,得到C個值,也就是對batch樣本中相對應的通道求和除于(H*W)

LayerNorm：channel方向做歸一化，計算C*H*W的均值,得到N個值,也就是對batch樣本中每個樣本求均值,也就是針對單個訓練樣本進行的，如果不同輸入特征不屬于相似類別，比如顏色和大小，那么LN的處理就會降低模型的表達能力。Batch Normalization 不適用于變長的網絡，如 RNN

InstanceNorm：一個channel內做歸一化，計算H*W的均值,得到C*N個值,只利用了空間的信息,也就是對batch樣本中單樣本的單通道求均值.

GroupNorm：先將channel方向分group，然后每個group內做歸一化，計算(C//G)*H*W的均值,得到C//G*N個值,當G==C,就變為IN,當G==1,就變為LN,GN更加適合解決batch小的問題,也就是對batch樣本中單樣本的C//G個group通道求均值.

11.放在relu之前還是之后？

1、before， conv1-bn1-ReLU1-conv2-bn2-ReLU2

2、after，conv1-ReLU1-bn1-conv2-ReLU2-bn2

１．放在之前:很多網絡也是都把bn放到激活前面。所有的激活都是relu，也就是使得負半區的卷積值被抑制，正半區的卷積值被保留。而bn的作用是使得輸入值的均值為０，方差為１，也就是說假如relu之前是bn的話，會有接近一半的輸入值被抑制，一半的輸入值被保留。所以bn放到relu之前的好處可以這樣理解：bn可以防止某一層的激活值全部都被抑制，從而防止從這一層往前傳的梯度全都變成０，也就是防止梯度消失。（當然也可以防止梯度爆炸.還有一個好處，把bn放到激活前面是有可以把卷積的weight和bn的參數進行合并的，所以它有利于網絡在做前向inference時候進行加速。

2.放在之后:before中ReLU1截斷了部分bn1歸一化以后的數據，所以很有可能歸一化的數據已經不再完全滿足0均值和單位方差，而after中ReLU1之后的數據做了歸一化，歸一化后仍滿足0均值和單位方差。所以放后邊更有效也是可以理解的。

12.推理時卷積與BN融合

官方代碼:

import copy import torchdef fuse_conv_bn_eval(conv, bn):assert(not (conv.training or bn.training)), "Fusion only for eval!"fused_conv = copy.deepcopy(conv)fused_conv.weight, fused_conv.bias = \fuse_conv_bn_weights(fused_conv.weight, fused_conv.bias,bn.running_mean, bn.running_var, bn.eps, bn.weight, bn.bias)return fused_convdef fuse_conv_bn_weights(conv_w, conv_b, bn_rm, bn_rv, bn_eps, bn_w, bn_b):if conv_b is None:conv_b = torch.zeros_like(bn_rm)if bn_w is None:bn_w = torch.ones_like(bn_rm)if bn_b is None:bn_b = torch.zeros_like(bn_rm)bn_var_rsqrt = torch.rsqrt(bn_rv + bn_eps)conv_w = conv_w * (bn_w * bn_var_rsqrt).reshape([-1] + [1] * (len(conv_w.shape) - 1)) # conv_b = (conv_b - bn_rm) * bn_var_rsqrt * bn_w + bn_breturn torch.nn.Parameter(conv_w), torch.nn.Parameter(conv_b)

六.感受野

１．理解

Understanding the Effective Receptive Field in Deep Convolutional Neural Networks

特征的有效感受野（實際起作用的感受野）是遠小于理論感受野的

可看出實際的感受野是呈高斯分布的.

感受野中心像素與邊緣像素對于梯度的貢獻是不一樣的.

２．計算

具體如下圖：

top往下層層迭代直到追溯回input image，從而計算出RF.

1.感受野計算:

:本層感受野;

:上層感受野;

:第i層卷積或池化的步長

k:本層卷積核大小

2.空洞卷積卷積核計算:K=k+(k-1)(r-1)，k為原始卷積核大小，r為空洞卷積參數空洞率，帶入上式即可計算空洞卷積感受野；

感受野計算過程:

上圖是常用分類模型對應的感受野的結果，我們可以發現，隨著模型的不斷進化，感受野在不增大，在比較新提出的網絡中，感受野已經能夠覆蓋整個輸入圖像了，這就意味著最終特征圖中每個點都會使用到整個圖像所有得上下文信息。

一個用來計算感受野的網站：Fomoro Visual Inspection

感受野作用:

1.?目標檢測：像SSD、RPN、YOLOv3等都使用了anchor，而anchor的設計正是依據感受野，如果感受野太小，只能觀察到局部的特征，不足以得到整個目標的信息。如果感受野過大，則會引入過多噪聲和無效信息。Anchor太大或太小均會影響性能。

2.?語義分割：最終預測的像素的感受野越大越好，涉及網絡一般也是越深越好，這樣才能捕獲更多的上下文信息，預測才會更準。

3.?分類任務：圖像分類中最后卷積層的感受野要大于輸入圖像，網絡深度越深，感受野越大，性能越好。

計算感受野代碼:

from collections import namedtuple import math import torch as t import torch.nn as nnSize = namedtuple('Size', ('w', 'h')) Vector = namedtuple('Vector', ('x', 'y'))class ReceptiveField(namedtuple('ReceptiveField', ('offset', 'stride', 'rfsize', 'outputsize', 'inputsize'))):"""Contains information of a network's receptive fields (RF).The RF size, stride and offset can be accessed directly,or used to calculate the coordinates of RF rectangles usingthe convenience methods."""def left(self):"""Return left (x) coordinates of the receptive fields."""return t.arange(float(self.outputsize.w)) * self.stride.x + self.offset.xdef top(self):"""Return top (y) coordinates of the receptive fields."""return t.arange(float(self.outputsize.h)) * self.stride.y + self.offset.ydef hcenter(self):"""Return center (x) coordinates of the receptive fields."""return self.left() + self.rfsize.w / 2def vcenter(self):"""Return center (y) coordinates of the receptive fields."""return self.top() + self.rfsize.h / 2def right(self):"""Return right (x) coordinates of the receptive fields."""return self.left() + self.rfsize.wdef bottom(self):"""Return bottom (y) coordinates of the receptive fields."""return self.top() + self.rfsize.hdef rects(self):"""Return a list of rectangles representing the receptive fields of all output elements. Each rectangle is a tuple (x, y, width, height)."""return [(x, y, self.rfsize.w, self.rfsize.h) for x in self.left().numpy() for y in self.top().numpy()]def show(self, image=None, axes=None, show=True):"""Visualize receptive fields using MatPlotLib."""import matplotlib.pyplot as pltimport matplotlib.patches as patchesif image is None:# create a checkerboard image for the backgroundxs = t.arange(self.inputsize.w).unsqueeze(1)ys = t.arange(self.inputsize.h).unsqueeze(0)image = (xs.remainder(8) >= 4) ^ (ys.remainder(8) >= 4)image = image * 128 + 64if axes is None:(fig, axes) = plt.subplots(1)# convert image to numpy and show itif isinstance(image, t.Tensor):image = image.numpy().transpose(-1, -2)axes.imshow(image, cmap='gray', vmin=0, vmax=255)rect_density = self.stride.x * self.stride.y / (self.rfsize.w * self.rfsize.h)rects = self.rects()print('==rects:', rects)for (index, (x, y, w, h)) in enumerate(rects): # iterate RFs# show center markerprint('==x + w/2, y + w/2:', x + w/2, y + w/2)marker, = axes.plot(x + w/2, y + w/2, marker='x')# show rectangle with some probability, since it's too dense.# also, always show the first and last rectangles for reference.if index == 0 or index == len(rects) - 1 or t.rand(1).item() < rect_density:axes.add_patch(patches.Rectangle((x, y), w, h, facecolor=marker.get_color(), edgecolor='none', alpha=0.5))first = False# set axis limits correctlyaxes.set_xlim(self.left().min().item(), self.right().max().item())axes.set_ylim(self.top().min().item(), self.bottom().max().item())axes.invert_yaxis()if show: plt.show()(x_dim, y_dim) = (-1, -2) # indexes of spatial dimensions in tensorsdef receptivefield(net, input_shape, device='cpu'):"""Computes the receptive fields for the given network (nn.Module) and input shape, given as a tuple (images, channels, height, width).Returns a ReceptiveField object."""if len(input_shape) < 4:raise ValueError('Input shape must be at least 4-dimensional (N x C x H x W).')# make gradients of some problematic layers pass-throughhooks = []def insert_hook(module):if isinstance(module, (nn.ReLU, nn.BatchNorm2d, nn.MaxPool2d)):hook = _passthrough_gradif isinstance(module, nn.MaxPool2d):hook = _maxpool_passthrough_gradhooks.append(module.register_backward_hook(hook))net.apply(insert_hook)# remember whether the network was in train/eval mode and set to evalmode = net.trainingnet.eval()# compute forward pass to prepare for gradient computationinput = t.ones(input_shape, requires_grad=True, device=device)output = net(input)if output.dim() < 4:raise ValueError('Network is fully connected (output should have at least 4 dimensions: N x C x H x W).')# output feature map sizeoutputsize = Size(output.shape[x_dim], output.shape[y_dim])if outputsize.w < 2 and outputsize.h < 2: # note: no error if only one dim is singletonraise ValueError('Network output is too small along spatial dimensions (fully connected).')# get receptive field bounding box, to compute its size.# the position of the one-hot output gradient (pos) is stored for later.(x1, x2, y1, y2, pos) = _project_rf(input, output, return_pos=True)rfsize = Size(x2 - x1 + 1, y2 - y1 + 1)# do projection again with one-cell offsets, to calculate stride(x1o, _, _, _) = _project_rf(input, output, offset_x=1)(_, _, y1o, _) = _project_rf(input, output, offset_y=1)stride = Vector(x1o - x1, y1o - y1)if stride.x == 0 and stride.y == 0: # note: no error if only one dim is singletonraise ValueError('Input tensor is too small relative to network receptive field.')# compute offset between the top-left corner of the receptive field in the# actual input (x1, y1), and the top-left corner obtained by extrapolating# just based on the output position and stride (the negative terms below).offset = Vector(x1 - pos[x_dim] * stride.x, y1 - pos[y_dim] * stride.y)# remove the hooks from the network, and restore training modefor hook in hooks: hook.remove()net.train(mode)# return results in a nicely packed structureinputsize = Size(input_shape[x_dim], input_shape[y_dim])return ReceptiveField(offset, stride, rfsize, outputsize, inputsize)def _project_rf(input, output, offset_x=0, offset_y=0, return_pos=False):"""Project one-hot output gradient, using back-propagation, and return its bounding box at the input."""# create one-hot output gradient tensor, with 1 in the center (spatially)pos = [0] * len(output.shape) # index 0th batch/channel/etcpos[x_dim] = math.ceil(output.shape[x_dim] / 2) - 1 + offset_xpos[y_dim] = math.ceil(output.shape[y_dim] / 2) - 1 + offset_yout_grad = t.zeros(output.shape)out_grad[tuple(pos)] = 1# clear gradient firstif input.grad is not None:input.grad.zero_()# propagate gradient of one-hot cell to input tensoroutput.backward(gradient=out_grad, retain_graph=True)# keep only the spatial dimensions of the gradient at the input, and binarizein_grad = input.grad[0, 0]is_inside_rf = (in_grad != 0.0)# x and y coordinates of where input gradients are non-zero (i.e., in the receptive field)xs = is_inside_rf.any(dim=y_dim).nonzero()ys = is_inside_rf.any(dim=x_dim).nonzero()if xs.numel() == 0 or ys.numel() == 0:raise ValueError('Could not propagate gradient through network to determine receptive field.')# return bounds of receptive fieldbounds = (xs.min().item(), xs.max().item(), ys.min().item(), ys.max().item())if return_pos: # optionally, also return position of one-hot output gradientreturn (*bounds, pos)return boundsdef _passthrough_grad(self, grad_input, grad_output):"""Hook to bypass normal gradient computation (of first input only)."""if isinstance(grad_input, tuple) and len(grad_input) > 1:# replace first input's gradient onlyreturn (grad_output[0], *grad_input[1:])else: # single inputreturn grad_outputdef _maxpool_passthrough_grad(self, grad_input, grad_output):"""Hook to bypass normal gradient computation of nn.MaxPool2d."""assert isinstance(self, nn.MaxPool2d)if self.dilation != 1 and self.dilation != (1, 1):raise ValueError('Dilation != 1 in max pooling not supported.')# backprop through a nn.AvgPool2d with same args as nn.MaxPool2dwith t.enable_grad(): input = t.ones(grad_input[0].shape, requires_grad=True)output = nn.functional.avg_pool2d(input, self.kernel_size, self.stride, self.padding, self.ceil_mode)return t.autograd.grad(output, input, grad_output[0])def run_test():"""Tests various combinations of inputs and checks that they are correct."""# this is easy to do for convolutions since the RF is known in closed form.# for kw in [1, 2, 3, 5]: # kernel width# for sx in [1, 2, 3]: # stride in x# for px in [1, 2, 3, 5]: # padding in x# (kh, sy, py) = (kw + 1, sx + 1, px + 1) # kernel/stride/pad in y# for width in range(kw + sx * 2, kw + 3 * sx + 1): # enough width# for height in range(width + 1, width + sy + 1):# # create convolution and compute its RF# print('=(kh, kw), (sy, sx), (py, px):', (kh, kw), (sy, sx), (py, px))# print('== height, width:', height, width)kh, kw = 3, 3sy, sx = 1, 1py,px = 0, 0height, width = 5, 5net = nn.Conv2d(3, 2, (kh, kw), (sy, sx), (py, px))rf = receptivefield(net, (1, 3, height, width))print('Checking: ', rf)assert rf.rfsize.w == kw and rf.rfsize.h == khassert rf.stride.x == sx and rf.stride.y == syassert rf.offset.x == -px and rf.offset.y == -pyrf.show()assert 1 == 0print('Done, all tests passed.')if __name__ == '__main__':run_test()

代碼中以5*5圖像,3*3,步長為1kernel為例:

x號是感受野中心,由于感受野很密集,這里就主要展示開始框與結束框.

import torchvision from receptivefield import receptivefield import torchif __name__ == '__main__':# get standard ResNetnet = torchvision.models.resnet18()print('===net:', net)# ResNet block to compute receptive field forblock = 2# change the forward function to output convolutional features only.# otherwise the output is fully-connected and the receptive field is the whole image.def features_only(self, x):x = self.conv1(x)x = self.bn1(x)x = self.relu(x)x = self.maxpool(x)if block == 0: return xx = self.layer1(x)if block == 1: return xx = self.layer2(x)if block == 2: return xx = self.layer3(x)if block == 3: return xx = self.layer4(x)return xnet.forward = features_only.__get__(net) # bind methodprint('====net===', net)x = torch.rand(4, 3, 64, 64)y = net(x)# print('==y.shape:', y.shape) #/4 /4, block = 1print('==y.shape:', y.shape) # /8 /8, block = 2# # compute receptive field for this input shaperf = receptivefield(net, (1, 3, 480, 480))## # print to console, and visualizeprint(rf)rf.show()

可看出感受野為99.

一些CNN自注意力:

由于卷積核關注的像素太少，故有文獻提出，基于預測像素與其他像素之間的協方差，將每個像素視為隨機變量。參與的目標像素只是所有像素值的加權和，其中的權值是每個像素與目標像素的相關。

自注意力機制：

自注意機制簡化版：

首先輸入高度為H、寬度為w的特征圖X，然后將X reshape為三個一維向量A、B和C，將A和B相乘得到大小為HWxHW的協方差矩陣。最后，我們用協方差矩陣和C相乘，得到D并對它reshape，得到輸出特性圖Y，并從輸入X進行殘差連接。這里D中的每一項都是輸入X的加權和，權重是像素和彼此之間的協方差。

故利用自注意力機制，可以在模型訓練和預測過程中實現全局參考。

七.梯度回傳

1.平均池化梯度回傳

平均池化層的前向傳播就是把一個patch中的值求取平均來做池化，那么反向傳播的過程也就是把某個元素的梯度等分為n份分配給前一層，這樣就保證池化前后的梯度之和保持不變。

2.最大池化梯度回傳

最大池化也必須滿足梯度之和不變的原則，最大池化的前向傳播是把patch中最大的值傳遞給后一層，而其他像素的值直接被舍棄掉.那么反向傳播也就是把梯度直接傳給前一層某一個像素，而其他像素不接受梯度，也就是為0。最大池化與平均池化前向傳播有一個不同點在于最大池化時需要記錄下池化操作時到底哪個像素的值是最大。

3.次梯度回傳

例如:Relu在x = 0處的梯度

對于ReLU函數, 當x>0的時候,其導數為1; 當x<0時,其導數為0. 則ReLU函數在x=0的次梯度范圍是0到1??,這里是次梯度有多個,可以取0,1之間的任意值. 工程上為了方便取c=0即可.

八.NMS/soft NMS

1.NMS

代碼

每個選出來的Bounding Box檢測框（既BBox）用（x,y,h,w, confidence score，Pdog,Pcat）表示，confidence ?score表示background和foreground的置信度得分，取值范圍[0,1]。Pdog,Pcat分布代表類別是狗和貓的概率。如果是100類的目標檢測模型，BBox輸出向量為5+100=105。

NMS主要就是通過迭代的形式，不斷的以最大得分的框去與其他框做IoU操作，并過濾那些IoU較大（即交集較大）的框。如下圖所示NMS的計算過程。

如果是two stage算法，通常在選出BBox有BBox位置(x,y,h,w)和confidence score，沒有類別的概率。因為程序是生成BBox，再將選擇的BBox的feature map做rescale (一般用ROI pooling)，然后再用分類器分類。NMS一般只能在CPU計算，這也是two stage相對耗時的原因。

但如果是one stage作法，BBox有位置信息(x,y,h,w)、confidence score，以及類別概率，相對于two stage少了后面的rescale和分類程序，所以計算量相對少。

NMS缺點：

1、NMS算法中的最大問題就是它將相鄰檢測框的分數均強制歸零(既將重疊部分大于重疊閾值Nt的檢測框移除)。在這種情況下，如果一個真實物體在重疊區域出現（比如人抱著貓），則將導致對該物體的檢測失敗并降低了算法的平均檢測率（average precision, AP）。

2、NMS的閾值也不太容易確定，設置過小會出現誤刪，設置過高又容易增大誤檢。

3、NMS一般只能使用CPU計算，無法使用GPU計算。

2.soft NMS

NMS算法是略顯粗暴，因為NMS直接將刪除所有IoU大于閾值的框。soft-NMS吸取了NMS的教訓，在算法執行過程中不是簡單的對IoU大于閾值的檢測框刪除，而是降低得分。算法流程同NMS相同，但是對原置信度得分使用函數運算，目標是降低置信度得分，其IOU越大，得分就下降的越厲害。

參考:

深度學習之17——歸一化(BN+LN+IN+GN) - 知乎

反卷積(Transposed Convolution)詳細推導 - 知乎

關于感受野的總結 - 簡書

你知道如何計算CNN感受野嗎？這里有一份詳細指南 - 知乎

目標檢測和感受野的總結和想法 - 知乎

總結

以上是生活随笔為你收集整理的卷积在计算机中实现＋pool作用+数据预处理目的＋特征归一化+理解BN+感受野理解与计算+梯度回传+NMS/soft NMS的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： C语言有参函数调用时参数间数据传递问题
下一篇： pytorch实现常用的一些即插即用模块