【目标检测】cvpr2021_VarifocalNet: An IoU-Aware Dense Object Detector
文章目錄
- 一、背景
- 二、動(dòng)機(jī)
- 三、方法
- 3.1 IACS——IoU-Aware Classification Score
- 3.2 Varifocal loss
- 3.3 Star-Shaped Box Feature Representation
- 4.4 Bounding-box refinement
- 4.5 VarifocalNet
- 四、效果
- 五、代碼
- 5.1 修改數(shù)據(jù)集路徑
- 5.2 VFNet
代碼已開(kāi)源: https://github.com/hyz-xmaster/VarifocalNet
一、背景
現(xiàn)有的目標(biāo)檢測(cè)器中,大多都是先生成大量的候選框,然后使用NMS來(lái)進(jìn)行濾除,NMS是基于分類(lèi)得分來(lái)排序的。
由于分類(lèi)得分高的框,并不一定是框的最準(zhǔn)的框,得分較低但框的很準(zhǔn)確的框很可能會(huì)被濾掉。因此,[11]使用額外的 IoU 得分,[9] 使用centerness 得分作為衡量框位置準(zhǔn)確率的標(biāo)準(zhǔn),將其得分和分類(lèi)得分相乘來(lái)作為 NMS 排序的標(biāo)準(zhǔn)。
二、動(dòng)機(jī)
上述兩個(gè)方法雖然可能會(huì)有一些優(yōu)勢(shì),但是相乘的方法并不是最優(yōu)的,可能會(huì)導(dǎo)致更差的排序,且性能提升有限,并且單獨(dú)的分支來(lái)預(yù)測(cè)位置得分會(huì)引起計(jì)算量上升。
可以不使用額外分支來(lái)預(yù)測(cè)位置得分,而是將其和分類(lèi)得分分支融合起來(lái)?
——作者提出了一個(gè) localization-aware/IoU-aware 的 classification score (IACS)
作者在FOCS+ATSS 上進(jìn)行了一系列實(shí)驗(yàn),將類(lèi)別、位置、centerness得分分別替換為真值,尋找哪個(gè)得分對(duì)最終結(jié)果影響最大:
上表中,效果最好的是 74.4 AP,這是將真實(shí)類(lèi)別得分替換為預(yù)測(cè)box和gt box的 IoU 得分得到的,這實(shí)際上表明,對(duì)于大多數(shù)對(duì)象,在大量候選框中已經(jīng)存在精確的局部邊界框。實(shí)現(xiàn)優(yōu)秀檢測(cè)性能的關(guān)鍵是準(zhǔn)確地從池中選擇出這些高質(zhì)量的檢測(cè),這些結(jié)果表明,用gt IoU替代ground-truth類(lèi)的分類(lèi)分?jǐn)?shù)是最有希望的選擇措施。
三、方法
本文作者提出了基于 FCOS+ATSS 的 VarifocalNet(去掉了 centerness 分支),相比 FCOS+ATSS,該網(wǎng)絡(luò)有三個(gè)新模塊:
- varifocal loss
- star-shaped box feature representation
- bounding box refinement
3.1 IACS——IoU-Aware Classification Score
IACS:分類(lèi)得分向量中的一個(gè)標(biāo)量元素
- 真實(shí)類(lèi)別位置的得分:預(yù)測(cè)框和真實(shí)框的 IoU
- 其他位置的得分:0
3.2 Varifocal loss
為了學(xué)習(xí) IACS 得分,作者設(shè)計(jì)了一個(gè) Varifocal Loss,靈感得益于 Focal Loss。
Focal Loss:
- p:預(yù)測(cè)得分
- y:真實(shí)類(lèi)別
- 降低前景/背景中,簡(jiǎn)單樣例對(duì) Loss 的貢獻(xiàn)
Varifocal Loss:
- p:預(yù)測(cè)的 IACS
- q:target score(真實(shí)類(lèi)別:q為預(yù)測(cè)和真值的 IoU,其他類(lèi)別:0)
不對(duì)稱(chēng)的處理前景和背景對(duì)loss的貢獻(xiàn):
- 只降低負(fù)例對(duì)loss的貢獻(xiàn)
- 不降低正例對(duì)loss的貢獻(xiàn)(因?yàn)檎啾蓉?fù)例而言,更加珍貴)
- 作者使用 target qqq 作為正例的權(quán)重做了一個(gè)實(shí)驗(yàn),實(shí)驗(yàn)發(fā)現(xiàn)當(dāng)正例的 gt_IoU 高時(shí),其對(duì) loss 的貢獻(xiàn)很大,也就是說(shuō)訓(xùn)練高質(zhì)量的正例對(duì)AP的提升比低質(zhì)量的更大一些。
3.3 Star-Shaped Box Feature Representation
作者定義了一種 star-shaped 的 bounding-box 特征的表達(dá)方式,使用9個(gè)采樣點(diǎn)來(lái)表達(dá) 可變形卷積得到的 bounding-box。
為什么用9個(gè)點(diǎn)來(lái)表達(dá)?
- 作者認(rèn)為現(xiàn)在基于關(guān)鍵點(diǎn)的表達(dá)方法雖然有效,但是丟失了box中的特征和上下文信息
首先,給定一個(gè)采樣點(diǎn)(x,y),使用 3x3 卷積來(lái)回歸初始 box (l′,t′,r′,b′l', t', r', b'l′,t′,r′,b′),分別為其到左、上、右、下的距離(圖1紅框)。選中的9個(gè)點(diǎn)為黃色圈。相對(duì)位移作為可變形卷積的偏移,然后使用可變形卷積對(duì)這9個(gè)點(diǎn)卷積,來(lái)表示這個(gè)box。
4.4 Bounding-box refinement
作者還從 bounding box refinement 的角度來(lái)嘗試提升目標(biāo)定位的準(zhǔn)確性。
原始四個(gè)偏移:(l′,t′,r′,b′)(l', t', r', b')(l′,t′,r′,b′)
學(xué)習(xí)的四個(gè)縮放因子:(Δl,Δt,Δr,Δb)(\Delta l, \Delta t, \Delta r, \Delta b)(Δl,Δt,Δr,Δb)
refine 后的偏移:(l,t,r,b)=(Δl×l′,Δt×t′,Δr×r′,Δb×b′)(l, t, r, b) = (\Delta l\times l', \Delta t \times t', \Delta r \times r', \Delta b \times b')(l,t,r,b)=(Δl×l′,Δt×t′,Δr×r′,Δb×b′)
4.5 VarifocalNet
Backbone:
- FCN
Heads:
- bounding-box
- IoU-aware classification score
四、效果
α=0.75,γ=2\alpha=0.75, \gamma=2α=0.75,γ=2
可視化效果:
五、代碼
該文章實(shí)現(xiàn)基于mmdetection,本文代碼的 github倉(cāng)庫(kù)中有安裝方式,安裝成功后,下載作者訓(xùn)練好的模型,修改coco數(shù)據(jù)路徑即可成功訓(xùn)練和測(cè)試。
訓(xùn)練:
./tools/dist_train.sh configs/vfnet/vfnet_r50_fpn_1x_coco.py 8測(cè)試:
# 測(cè)試指標(biāo) ./tools/dist_test.sh configs/vfnet/vfnet_r50_fpn_1x_coco.py checkpoints/vfnet_r50_1x_41.6.pth 8 --eval bbox # 可視化 ./tools/dist_test.sh configs/vfnet/vfnet_r50_fpn_1x_coco.py checkpoints/vfnet_r50_1x_41.6.pth 8 --show-dir results/demo:
python demo/image_demo.py demo/demo.jpg configs/vfnet/vfnet_r50_fpn_1x_coco.py checkpoints/vfnet_r50_1x_41.6.pth5.1 修改數(shù)據(jù)集路徑
在運(yùn)行代碼之前,一定要執(zhí)行下面的語(yǔ)句,不然代碼走的路徑不對(duì):
python setup.py develop # 1 ./configs/_base_/coco_detection.py # 2 ./configs/vfnet/vfnet_r50_fpn_1x_coco.py5.2 VFNet
# vfnet_r50_fpn_1x_coco.py # model settings model = dict(type='VFNet',pretrained='torchvision://resnet50',backbone=dict(type='ResNet',depth=50,num_stages=4,out_indices=(0, 1, 2, 3),frozen_stages=1,norm_cfg=dict(type='BN', requires_grad=True),norm_eval=True,style='pytorch'),neck=dict(type='FPN',in_channels=[256, 512, 1024, 2048],out_channels=256,start_level=1,add_extra_convs=True,extra_convs_on_inputs=False, # use P5num_outs=5,relu_before_extra_convs=True),bbox_head=dict(type='VFNetHead',num_classes=80,in_channels=256,stacked_convs=3,feat_channels=256,strides=[8, 16, 32, 64, 128],center_sampling=False,dcn_on_last_conv=False,use_atss=True,use_vfl=True,loss_cls=dict(type='VarifocalLoss',use_sigmoid=True,alpha=0.75,gamma=2.0,iou_weighted=True,loss_weight=1.0),loss_bbox=dict(type='GIoULoss', loss_weight=1.5),loss_bbox_refine=dict(type='GIoULoss', loss_weight=2.0)),# training and testing settingstrain_cfg=dict(assigner=dict(type='ATSSAssigner', topk=9),allowed_border=-1,pos_weight=-1,debug=False),test_cfg=dict(nms_pre=1000,min_bbox_size=0,score_thr=0.05,nms=dict(type='nms', iou_threshold=0.6),max_per_img=100))VFNet Head:
/mmdet/models/dense_heads/vfnet_head.py (bbox_head): VFNetHead((loss_cls): VarifocalLoss()(loss_bbox): GIoULoss()(cls_convs): ModuleList((0): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True))(1): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True))(2): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True)))(reg_convs): ModuleList((0): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True))(1): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True))(2): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True)))(relu): ReLU(inplace=True)(vfnet_reg_conv): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True))(vfnet_reg): Conv2d(256, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(scales): ModuleList((0): Scale()(1): Scale()(2): Scale()(3): Scale()(4): Scale())(vfnet_reg_refine_dconv): DeformConv2d(in_channels=256,out_channels=256,kernel_size=(3, 3),stride=(1, 1),padding=(1, 1),dilation=(1, 1),groups=1,deform_groups=1,deform_groups=False)(vfnet_reg_refine): Conv2d(256, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(scales_refine): ModuleList((0): Scale()(1): Scale()(2): Scale()(3): Scale()(4): Scale())(vfnet_cls_dconv): DeformConv2d(in_channels=256,out_channels=256,kernel_size=(3, 3),stride=(1, 1),padding=(1, 1),dilation=(1, 1),groups=1,deform_groups=1,deform_groups=False)(vfnet_cls): Conv2d(256, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(loss_bbox_refine): GIoULoss())init_cfg={'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01, 'override': {'type': 'Normal', 'name': 'vfnet_cls', 'std': 0.01, 'bias_prob': 0.01}} )VFNet model:
VFNet((backbone): ResNet((conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)(layer1): ResLayer((0): Bottleneck((conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)(downsample): Sequential((0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)))(1): Bottleneck((conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(2): Bottleneck((conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)))(layer2): ResLayer((0): Bottleneck((conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)(downsample): Sequential((0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)))(1): Bottleneck((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(2): Bottleneck((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(3): Bottleneck((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)))(layer3): ResLayer((0): Bottleneck((conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)(downsample): Sequential((0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)))(1): Bottleneck((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(2): Bottleneck((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(3): Bottleneck((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(4): Bottleneck((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(5): Bottleneck((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)))(layer4): ResLayer((0): Bottleneck((conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True)(downsample): Sequential((0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)))(1): Bottleneck((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))(2): Bottleneck((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace=True))))init_cfg={'type': 'Pretrained', 'checkpoint': 'torchvision://resnet50'}(neck): FPN((lateral_convs): ModuleList((0): ConvModule((conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)))(1): ConvModule((conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)))(2): ConvModule((conv): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))))(fpn_convs): ModuleList((0): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(1): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(2): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(3): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)))(4): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)))))init_cfg={'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'}(bbox_head): VFNetHead((loss_cls): VarifocalLoss()(loss_bbox): GIoULoss()(cls_convs): ModuleList((0): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True))(1): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True))(2): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True)))(reg_convs): ModuleList((0): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True))(1): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True))(2): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True)))(relu): ReLU(inplace=True)(vfnet_reg_conv): ConvModule((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(gn): GroupNorm(32, 256, eps=1e-05, affine=True)(activate): ReLU(inplace=True))(vfnet_reg): Conv2d(256, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(scales): ModuleList((0): Scale()(1): Scale()(2): Scale()(3): Scale()(4): Scale())(vfnet_reg_refine_dconv): DeformConv2d(in_channels=256,out_channels=256,kernel_size=(3, 3),stride=(1, 1),padding=(1, 1),dilation=(1, 1),groups=1,deform_groups=1,deform_groups=False)(vfnet_reg_refine): Conv2d(256, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(scales_refine): ModuleList((0): Scale()(1): Scale()(2): Scale()(3): Scale()(4): Scale())(vfnet_cls_dconv): DeformConv2d(in_channels=256,out_channels=256,kernel_size=(3, 3),stride=(1, 1),padding=(1, 1),dilation=(1, 1),groups=1,deform_groups=1,deform_groups=False)(vfnet_cls): Conv2d(256, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(loss_bbox_refine): GIoULoss())init_cfg={'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01, 'override': {'type': 'Normal', 'name': 'vfnet_cls', 'std': 0.01, 'bias_prob': 0.01}} )Varifocal loss:
def varifocal_loss(pred,target,weight=None,alpha=0.75,gamma=2.0,iou_weighted=True,reduction='mean',avg_factor=None):"""`Varifocal Loss <https://arxiv.org/abs/2008.13367>`_Args:pred (torch.Tensor): The prediction with shape (N, C), C is thenumber of classestarget (torch.Tensor): The learning target of the iou-awareclassification score with shape (N, C), C is the number of classes.weight (torch.Tensor, optional): The weight of loss for eachprediction. Defaults to None.alpha (float, optional): A balance factor for the negative part ofVarifocal Loss, which is different from the alpha of Focal Loss.Defaults to 0.75.gamma (float, optional): The gamma for calculating the modulatingfactor. Defaults to 2.0.iou_weighted (bool, optional): Whether to weight the loss of thepositive example with the iou target. Defaults to True.reduction (str, optional): The method used to reduce the loss intoa scalar. Defaults to 'mean'. Options are "none", "mean" and"sum".avg_factor (int, optional): Average factor that is used to averagethe loss. Defaults to None."""# pred and target should be of the same sizeassert pred.size() == target.size()import pdb; pdb.set_trace();pred_sigmoid = pred.sigmoid()target = target.type_as(pred)if iou_weighted:focal_weight = target * (target > 0.0).float() + \alpha * (pred_sigmoid - target).abs().pow(gamma) * \(target <= 0.0).float()else:focal_weight = (target > 0.0).float() + \alpha * (pred_sigmoid - target).abs().pow(gamma) * \(target <= 0.0).float()loss = F.binary_cross_entropy_with_logits(pred, target, reduction='none') * focal_weightloss = weight_reduce_loss(loss, weight, reduction, avg_factor)return loss總結(jié)
以上是生活随笔為你收集整理的【目标检测】cvpr2021_VarifocalNet: An IoU-Aware Dense Object Detector的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 史上最坑爹的游戏2怎么玩(汉典史字的基本
- 下一篇: maskrcnn用于目标检测_用于目标检