MicroNet论文复现:用极低的FLOPs改进图像识别
摘要
????????MicroNet旨在解決在極低的計(jì)算成本下(例如在ImageNet分類上的5M FLOPs)性能大幅下降的問題。研究發(fā)現(xiàn),稀疏連接和動(dòng)態(tài)激活函數(shù)這兩個(gè)因素可以有效地提高準(zhǔn)確性。前者避免了網(wǎng)絡(luò)寬度的顯著減少,而后者則減輕了網(wǎng)絡(luò)深度減少的不利影響。在技術(shù)上,論文提出了微因子卷積法,它將卷積矩陣分解為低等級(jí)矩陣,將稀疏連接性整合到卷積中。論文還提出了一個(gè)新的動(dòng)態(tài)激活函數(shù),名為動(dòng)態(tài)移位最大值,通過對(duì)輸入特征圖和其循環(huán)通道移位之間的多個(gè)動(dòng)態(tài)融合的最大值來(lái)改善非線性。在這兩個(gè)新算子的基礎(chǔ)上,得到了一個(gè)名為MicroNet的網(wǎng)絡(luò)系列,它在低FLOP制度下取得了比現(xiàn)有技術(shù)更顯著的性能提升。例如,在1200萬(wàn)FLOPs的約束下,MicroNet在ImageNet分類上實(shí)現(xiàn)了59.4%的最高1級(jí)精度,比MobileNetV3高出9.6%。
1. MicroNet
1.1 總體架構(gòu)
????????論文提出應(yīng)對(duì)極低計(jì)算量場(chǎng)景的輕量級(jí)網(wǎng)絡(luò)MicroNet,包含兩個(gè)核心思路Micro-Factorized convolution和Dynamic Shift-Max,Micro-Factorized convolution通過低秩近似將原卷積分解成多個(gè)小卷積,保持輸入輸出的連接性并降低連接數(shù),Dynamic Shift-Max通過動(dòng)態(tài)的組間特征融合增加節(jié)點(diǎn)的連接以及提升非線性,彌補(bǔ)網(wǎng)絡(luò)深度減少帶來(lái)的性能降低。
2. 代碼復(fù)現(xiàn)
2.1 Micro-Factorized Convolution
Micro-Factorized Convolution的目標(biāo)是平衡通道數(shù)量和節(jié)點(diǎn)連接性。這里一個(gè)層的連接性被定義為每個(gè)輸出節(jié)點(diǎn)的路徑數(shù),其中一個(gè)路徑連接著一個(gè)輸入結(jié)點(diǎn)和一個(gè)輸出結(jié)點(diǎn)。
class MaxGroupPooling(nn.Layer):def __init__(self, channel_per_group=2):super().__init__()self.channel_per_group = channel_per_groupdef forward(self, x):if self.channel_per_group == 1:return x# max opb, c, h, w = x.shape# reshapey = x.reshape([b, c // self.channel_per_group, -1, h, w])out, _ = paddle.max(y, axis=2)return outclass GroupConv(nn.Layer):def __init__(self, inp, oup, groups=2):super().__init__()self.inp = inpself.oup = oupself.groups = groups# print('inp: %d, oup:%d, g:%d' % (inp, oup, self.groups[0]))self.conv = nn.Sequential(nn.Conv2D(inp, oup, 1, groups=self.groups[0], bias_attr=False),nn.BatchNorm2D(oup))def forward(self, x):x = self.conv(x)return xclass ChannelShuffle(nn.Layer):def __init__(self, groups):super().__init__()self.groups = groupsdef forward(self, x):b, c, h, w = x.shapechannels_per_group = c // self.groups# reshapex = x.reshape([b, self.groups, channels_per_group, h, w])x = x.transpose([0, 2, 1, 3, 4])out = x.reshape([b, c, h, w])return outclass SpatialSepConvSF(nn.Layer):def __init__(self, inp, oups, kernel_size, stride):super().__init__()oup1, oup2 = oupsself.conv = nn.Sequential(nn.Conv2D(inp,oup1, (kernel_size, 1), (stride, 1), (kernel_size // 2, 0),groups=1,bias_attr=False),nn.BatchNorm2D(oup1),nn.Conv2D(oup1,oup1 * oup2, (1, kernel_size), (1, stride),(0, kernel_size // 2),groups=oup1,bias_attr=False),nn.BatchNorm2D(oup1 * oup2),ChannelShuffle(oup1), )def forward(self, x):out = self.conv(x)return outclass DepthSpatialSepConv(nn.Layer):def __init__(self, inp, expand, kernel_size, stride):super().__init__()exp1, exp2 = expandhidden_dim = inp * exp1oup = inp * exp1 * exp2self.conv = nn.Sequential(nn.Conv2D(inp,inp * exp1, (kernel_size, 1), (stride, 1),(kernel_size // 2, 0),groups=inp,bias_attr=False),nn.BatchNorm2D(inp * exp1),nn.Conv2D(hidden_dim,oup, (1, kernel_size), (1, stride), (0, kernel_size // 2),groups=hidden_dim,bias_attr=False),nn.BatchNorm2D(oup))def forward(self, x):out = self.conv(x)return out2.2 Dynamic Shift-Max
Dynamic-ShiftMax是一種新的動(dòng)態(tài)非線性,加強(qiáng)了由微因子化創(chuàng)建的組之間的聯(lián)系。這是對(duì)Micro-Factorized Convolution的補(bǔ)充,后者側(cè)重于組內(nèi)的連接。
class DYShiftMax(nn.Layer):def __init__(self,inp,oup,reduction=4,act_max=1.0,act_relu=True,init_a=[0.0, 0.0],init_b=[0.0, 0.0],relu_before_pool=False,g=None,expansion=False):super().__init__()self.oup = oupself.act_max = act_max * 2self.act_relu = act_reluself.avg_pool = nn.Sequential(nn.ReLU() if relu_before_pool == Trueelse nn.Identity(),nn.AdaptiveAvgPool2D(1))self.exp = 4 if act_relu else 2self.init_a = init_aself.init_b = init_b# determine squeezesqueeze = _make_divisible(inp // reduction, 4)if squeeze < 4:squeeze = 4# print('reduction: {}, squeeze: {}/{}'.format(reduction, inp, squeeze))# print('init-a: {}, init-b: {}'.format(init_a, init_b))self.fc = nn.Sequential(nn.Linear(inp, squeeze),nn.ReLU(), nn.Linear(squeeze, oup * self.exp), nn.Hardsigmoid())if g is None:g = 1self.g = g[1]if self.g != 1 and expansion:self.g = inp // self.g# print('group shuffle: {}, divide group: {}'.format(self.g, expansion))self.gc = inp // self.gindex = paddle.to_tensor(list(range(inp))).reshape([1, inp, 1, 1])index = index.reshape([1, self.g, self.gc, 1, 1])indexgs = paddle.split(index, [1, self.g - 1], axis=1)indexgs = paddle.concat((indexgs[1], indexgs[0]), axis=1)indexs = paddle.split(indexgs, [1, self.gc - 1], axis=2)indexs = paddle.concat((indexs[1], indexs[0]), axis=2)self.index = indexs.reshape([inp]).astype(paddle.int64)self.expansion = expansiondef forward(self, x):x_in = xx_out = xb, c, _, _ = x_in.shapey = self.avg_pool(x_in).reshape([b, c])y = self.fc(y).reshape([b, self.oup * self.exp, 1, 1])y = (y - 0.5) * self.act_maxn2, c2, h2, w2 = x_out.shapex2 = paddle.index_select(x_out, self.index, axis=1)if self.exp == 4:a1, b1, a2, b2 = paddle.split(y, 4, axis=1)a1 = a1 + self.init_a[0]a2 = a2 + self.init_a[1]b1 = b1 + self.init_b[0]b2 = b2 + self.init_b[1]z1 = x_out * a1 + x2 * b1z2 = x_out * a2 + x2 * b2out = paddle.maximum(z1, z2)elif self.exp == 2:a1, b1 = paddle.split(y, 2, axis=1)a1 = a1 + self.init_a[0]b1 = b1 + self.init_b[0]out = x_out * a1 + x2 * b1return outclass DYMicroBlock(nn.Layer):def __init__(self,inp,oup,kernel_size=3,stride=1,ch_exp=(2, 2),ch_per_group=4,groups_1x1=(1, 1),dy=[0, 0, 0],ratio=1.0,activation_cfg=None):super().__init__()# print(dy)self.identity = stride == 1 and inp == oupy1, y2, y3 = dyact_max = activation_cfg["act_max"]act_reduction = activation_cfg["reduction"] * ratioinit_a = activation_cfg["init_a"]init_b = activation_cfg["init_b"]init_ab3 = activation_cfg["init_ab3"]t1 = ch_expgs1 = ch_per_grouphidden_fft, g1, g2 = groups_1x1hidden_dim1 = inp * t1[0]hidden_dim2 = inp * t1[0] * t1[1]if gs1[0] == 0:self.layers = nn.Sequential(DepthSpatialSepConv(inp, t1, kernel_size, stride),DYShiftMax(hidden_dim2,hidden_dim2,act_max=act_max,act_relu=True if y2 == 2 else False,init_a=init_a,reduction=act_reduction,init_b=init_b,g=gs1,expansion=False) if y2 > 0 else nn.ReLU6(),ChannelShuffle(gs1[1]),ChannelShuffle2(hidden_dim2 // 2)if y2 != 0 else nn.Identity(),GroupConv(hidden_dim2, oup, (g1, g2)),DYShiftMax(oup,oup,act_max=act_max,act_relu=False,init_a=[init_ab3[0], 0.0],reduction=act_reduction // 2,init_b=[init_ab3[1], 0.0],g=(g1, g2),expansion=False) if y3 > 0 else nn.Identity(),ChannelShuffle(g2),ChannelShuffle2(oup // 2)if oup % 2 == 0 and y3 != 0 else nn.Identity(), )elif g2 == 0:self.layers = nn.Sequential(GroupConv(inp, hidden_dim2, gs1),DYShiftMax(hidden_dim2,hidden_dim2,act_max=act_max,act_relu=False,init_a=[init_ab3[0], 0.0],reduction=act_reduction,init_b=[init_ab3[1], 0.0],g=gs1,expansion=False) if y3 > 0 else nn.Identity(), )else:self.layers = nn.Sequential(GroupConv(inp, hidden_dim2, gs1),DYShiftMax(hidden_dim2,hidden_dim2,act_max=act_max,act_relu=True if y1 == 2 else False,init_a=init_a,reduction=act_reduction,init_b=init_b,g=gs1,expansion=False) if y1 > 0 else nn.ReLU6(),ChannelShuffle(gs1[1]),DepthSpatialSepConv(hidden_dim2, (1, 1), kernel_size, stride),nn.Identity(),DYShiftMax(hidden_dim2,hidden_dim2,act_max=act_max,act_relu=True if y2 == 2 else False,init_a=init_a,reduction=act_reduction,init_b=init_b,g=gs1,expansion=True, ) if y2 > 0 else nn.ReLU6(),ChannelShuffle2(hidden_dim2 // 4)if y1 != 0 and y2 != 0 else nn.Identity()if y1 == 0 and y2 == 0 else ChannelShuffle2(hidden_dim2 // 2),GroupConv(hidden_dim2, oup, (g1, g2)),DYShiftMax(oup,oup,act_max=act_max,act_relu=False,init_a=[init_ab3[0], 0.0],reduction=act_reduction // 2if oup < hidden_dim2 else act_reduction,init_b=[init_ab3[1], 0.0],g=(g1, g2),expansion=False) if y3 > 0 else nn.Identity(),ChannelShuffle(g2),ChannelShuffle2(oup // 2) if y3 != 0 else nn.Identity(), )def forward(self, x):out = self.layers(x)if self.identity:out = out + xreturn out3.數(shù)據(jù)集和復(fù)現(xiàn)精度
3.1 數(shù)據(jù)集
ImageNet項(xiàng)目是一個(gè)大型視覺數(shù)據(jù)庫(kù),用于視覺目標(biāo)識(shí)別研究任務(wù),該項(xiàng)目已手動(dòng)標(biāo)注了 1400 多萬(wàn)張圖像。ImageNet-1k 是 ImageNet 數(shù)據(jù)集的子集,其包含 1000 個(gè)類別。訓(xùn)練集包含 1281167 個(gè)圖像數(shù)據(jù),驗(yàn)證集包含 50000 個(gè)圖像數(shù)據(jù)。2010 年以來(lái),ImageNet 項(xiàng)目每年舉辦一次圖像分類競(jìng)賽,即 ImageNet 大規(guī)模視覺識(shí)別挑戰(zhàn)賽(ILSVRC)。挑戰(zhàn)賽使用的數(shù)據(jù)集即為 ImageNet-1k。到目前為止,ImageNet-1k 已經(jīng)成為計(jì)算機(jī)視覺領(lǐng)域發(fā)展的最重要的數(shù)據(jù)集之一,其促進(jìn)了整個(gè)計(jì)算機(jī)視覺的發(fā)展,很多計(jì)算機(jī)視覺下游任務(wù)的初始化模型都是基于該數(shù)據(jù)集訓(xùn)練得到的。
| ImageNet1k | 1.2M | 50k | 1000 |
3.2 復(fù)現(xiàn)精度
| micronet_m0 | 600 | 46.6 | 46.4 | m1_epoch_594.pdparams | m0_train.log |
| micronet_m3 | 600 | 62.5 | 62.8 | m3_epoch_591.pdparams | m3_train.log |
權(quán)重及訓(xùn)練日志下載地址:百度網(wǎng)盤 or work/best_model.pdparams
4.準(zhǔn)備環(huán)境
4.1 安裝paddlepaddle
# 安裝GPU版本的Paddle pip install paddlepaddle-gpu==2.3.2更多安裝方法可以參考:Paddle安裝指南。
4.2 下載代碼
%cd /home/aistudio/# !git clone https://github.com/flytocc/PaddleClas.git # !cd PaddleClas # !git checkout -b micronet_PR!unzip PaddleClas-micronet_PR.zip %cd /home/aistudio/PaddleClas-micronet_PR !pip install -r requirements.txt5.開始使用
5.1 模型預(yù)測(cè)
%cd /home/aistudio/PaddleClas-micronet_PR%run tools/infer.py \-c ./ppcls/configs/ImageNet/MicroNet/micronet_m3.yaml \-o Infer.infer_imgs=./deploy/images/ImageNet/ILSVRC2012_val_00020010.jpeg \-o Global.pretrained_model=/home/aistudio/work/best_model最終輸出結(jié)果為
[{'class_ids': [178, 209, 211, 208, 236], 'scores': [0.99474, 0.00512, 8e-05, 3e-05, 2e-05], 'file_name': './deploy/images/ImageNet/ILSVRC2012_val_00020010.jpeg', 'label_names': ['Weimaraner', 'Chesapeake Bay retriever', 'vizsla, Hungarian pointer', 'Labrador retriever', 'Doberman, Doberman pinscher']}]表示預(yù)測(cè)的類別為Weimaraner(魏瑪獵狗),ID是178,置信度為0.99474。
5.2 模型訓(xùn)練
- 單機(jī)多卡訓(xùn)練
部分訓(xùn)練日志如下所示。
[2022/08/31 04:13:11] ppcls INFO: [Train][Epoch 302/600][Iter: 1550/2503]lr(LinearWarmup): 0.10046482, top1: 0.48098, top5: 0.72183, CELoss: 2.36589, loss: 2.36589, batch_cost: 0.27864s, reader_cost: 0.01763, ips: 459.37528 samples/s, eta: 2 days, 9:48:20 [2022/08/31 04:13:14] ppcls INFO: [Train][Epoch 302/600][Iter: 1560/2503]lr(LinearWarmup): 0.10046271, top1: 0.48091, top5: 0.72176, CELoss: 2.36646, loss: 2.36646, batch_cost: 0.27873s, reader_cost: 0.01755, ips: 459.22941 samples/s, eta: 2 days, 9:49:245.3 模型評(píng)估
python -m paddle.distributed.launch --gpus=0,1,2,3 \tools/eval.py \-c ./ppcls/configs/ImageNet/MicroNet/micronet_m3.yaml \-o Global.pretrained_model=$TRAINED_MODEL6. License
This project is released under MIT License.
7. 參考鏈接與文獻(xiàn)
此文章為搬運(yùn)
原項(xiàng)目鏈接
總結(jié)
以上是生活随笔為你收集整理的MicroNet论文复现:用极低的FLOPs改进图像识别的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ACM模式下输入输出写法 Java版本
- 下一篇: 如何在服务器发布网站