當前位置：首頁 >

R(2+1)D理解与MindSpore框架下的实现

發布時間：2024/3/24 54 豆豆

生活随笔收集整理的這篇文章主要介紹了 R(2+1)D理解与MindSpore框架下的实现小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、R(2+1)D算法原理介紹

論文地址：[1711.11248] A Closer Look at Spatiotemporal Convolutions for Action Recognition (arxiv.org)

Tran等人在2018年發表在CVPR 的文章《A Closer Look at Spatiotemporal Convolutions for Action Recognition》提出了R(2+1)D，表明將三位卷積核分解為獨立的空間和時間分量可以顯著提高精度，R(2+1)D中的卷積模塊將 $\times t \times d \times d$ 的3D卷積拆分為 $\times 1 \times d \times d$ 的2D空間卷積和 $\times t \times 1 \times 1$ 的1D時間卷積，其中N和M為卷積核的個數，超參數M決定了信號在空間卷積和時間卷積之間投影的中間子空間的維數，論文中將M的值設置為：
$Mi=?td2Ni?1Nid2Ni?1+tNi?M_{i}= \left \lfloor \frac{td^{2}N_{i-1}N_{i}}{d^{2}N_{i-1}+tN_{i}} \right \rfloor$

i表示殘差網絡中第i個卷積塊，通過這種方式以保證(2+1)D模塊中的參數量近似于3D卷積的參數量。

與全三維卷積相比，(2+1)D分解有兩個優點，首先，盡管沒有改變參數的數量，但由于每個塊中2D和1D卷積之間的額外激活函數，網絡中的非線性數量增加了一倍，非線性數量的增加了可以表示的函數的復雜性。第二個好處在于，將3D卷積強制轉換為單獨的空間和時間分量，使優化變得更容易，這表現在與相同參數量的3D卷積網絡相比，(2+1)D網絡的訓練誤差更低。

下表展示了18層和34層的R3D網絡的架構，在R3D中，使用(2+1)D卷積代替3D卷積就能得到對應層數的R(2+1)D網絡。

實驗部分在Kinetics 上比較了不同形式的卷積的動作識別準確性，如下表所示。所有模型都基于 ResNet18，并在 8 幀或 16 幀剪輯輸入上從頭開始訓練，結果表明R(2+1)D 的精度優于所有其他模型。

在Kinetics上與sota方法比較的結果如下表所示。當在 RGB 輸入上從頭開始訓練時，R(2+1)D 比 I3D 高出 4.5%，在 Sports-1M 上預訓練的 R(2+1)D 也比在 ImageNet 上預訓練的 I3D 高 2.2%。

二、R(2+1)D的mindspore代碼實現

功能函數說明

數據預處理

使用GeneratorDataset讀取了視頻數據集文件，輸出batch_size=16的指定幀數的三通道圖片。

數據前處理包括混洗、歸一化。

數據增強包括video_random_crop類實現的隨機裁剪、video_resize類實現的調整大小、video_random_horizontal_flip實現的隨機水平翻轉。

模型主干

R2Plus1d18中，輸入首先經過一個 (2+1)D卷積模塊，經過一個最大池化層，之后通過4個由(2+1)D卷積模塊組成的residual block，再經過平均池化層、展平層最后到全連接層。

最先的(2+1)D卷積模塊具體為卷積核大小為(1,7,7)的Conv3d再接一個卷積核大小為(3,1,1)的Conv3d，卷積層之間是Batch Normalization和Relu層。

R2Plus1d18中包含4個residual block，每個block在模型中都堆疊兩次，同時每個block都由兩個(2+1)D卷積模塊組成，每個(2+1)D卷積都由一個卷積核大小為(1,3,3)的Conv3d再接一個卷積核大小為(3,1,1)的Conv3d組成，卷積層之間仍然是Batch Normalization和Relu層，block的輸入和輸出之間是殘差連接的結構。

具體模型搭建中各個類的作用為：

Unit3D類實現了輸入經過Conv3d、BN、Relu、Pooling層的結構，其中BN層、Relu層和Pooling層是可選的。

class Unit3D(nn.Cell):"""Conv3d fused with normalization and activation blocks definition.Args:in_channels (int): The number of channels of input frame images.out_channels (int): The number of channels of output frame images.kernel_size (tuple): The size of the conv3d kernel.stride (Union[int, Tuple[int]]): Stride size for the first convolutional layer. Default: 1.pad_mode (str): Specifies padding mode. The optional values are "same", "valid", "pad".Default: "pad".padding (Union[int, Tuple[int]]): Implicit paddings on both sides of the input x.If `pad_mode` is "pad" and `padding` is not specified by user, then the paddingsize will be `(kernel_size - 1) // 2` for C, H, W channel.dilation (Union[int, Tuple[int]]): Specifies the dilation rate to use for dilatedconvolution. Default: 1group (int): Splits filter into groups, in_channels and out_channels must be divisibleby the number of groups. Default: 1.activation (Optional[nn.Cell]): Activation function which will be stacked on top of thenormalization layer (if not None), otherwise on top of the conv layer. Default: nn.ReLU.norm (Optional[nn.Cell]): Norm layer that will be stacked on top of the convolutionlayer. Default: nn.BatchNorm3d.pooling (Optional[nn.Cell]): Pooling layer (if not None) will be stacked on top of all theformer layers. Default: None.has_bias (bool): Whether to use Bias.Returns:Tensor, output tensor.Examples:Unit3D(in_channels=in_channels, out_channels=out_channels[0], kernel_size=(1, 1, 1))"""def __init__(self,in_channels: int,out_channels: int,kernel_size: Union[int, Tuple[int]] = 3,stride: Union[int, Tuple[int]] = 1,pad_mode: str = 'pad',padding: Union[int, Tuple[int]] = 0,dilation: Union[int, Tuple[int]] = 1,group: int = 1,activation: Optional[nn.Cell] = nn.ReLU,norm: Optional[nn.Cell] = nn.BatchNorm3d,pooling: Optional[nn.Cell] = None,has_bias: bool = False) -> None:super().__init__()if pad_mode == 'pad' and padding == 0:padding = tuple((k - 1) // 2 for k in six_padding(kernel_size))else:padding = 0layers = [nn.Conv3d(in_channels=in_channels,out_channels=out_channels,kernel_size=kernel_size,stride=stride,pad_mode=pad_mode,padding=padding,dilation=dilation,group=group,has_bias=has_bias)]if norm:layers.append(norm(out_channels))if activation:layers.append(activation())self.pooling = Noneif pooling:self.pooling = poolingself.features = nn.SequentialCell(layers)def construct(self, x):""" construct unit3d"""output = self.features(x)if self.pooling:output = self.pooling(output)return output

Inflate3D類使用Unit3D實現了(2+1)D卷積模塊。

class Inflate3D(nn.Cell):"""Inflate3D block definition.Args:in_channel (int): The number of channels of input frame images.out_channel (int): The number of channels of output frame images.mid_channel (int): The number of channels of inner frame images.kernel_size (tuple): The size of the spatial-temporal convolutional layer kernels.stride (Union[int, Tuple[int]]): Stride size for the second convolutional layer. Default: 1.conv2_group (int): Splits filter into groups for the second conv layer,in_channels and out_channelsmust be divisible by the number of groups. Default: 1.norm (Optional[nn.Cell]): Norm layer that will be stacked on top of the convolutionlayer. Default: nn.BatchNorm3d.activation (List[Optional[Union[nn.Cell, str]]]): Activation function which will be stackedon top of the normalization layer (if not None), otherwise on top of the conv layer.Default: nn.ReLU, None.inflate (int): Whether to inflate two conv3d layers and with different kernel size.Returns:Tensor, output tensor.Examples:>>> from mindvision.msvideo.models.blocks import Inflate3D>>> Inflate3D(3, 64, 64)"""def __init__(self,in_channel: int,out_channel: int,mid_channel: int = 0,stride: tuple = (1, 1, 1),kernel_size: tuple = (3, 3, 3),conv2_group: int = 1,norm: Optional[nn.Cell] = nn.BatchNorm3d,activation: List[Optional[Union[nn.Cell, str]]] = (nn.ReLU, None),inflate: int = 1,):super(Inflate3D, self).__init__()if not norm:norm = nn.BatchNorm3dself.in_channel = in_channelif mid_channel == 0:self.mid_channel = (in_channel * out_channel * kernel_size[1] * kernel_size[2] * 3) // \(in_channel * kernel_size[1] * kernel_size[2] + 3 * out_channel)else:self.mid_channel = mid_channelself.inflate = inflateif self.inflate == 0:conv1_kernel_size = (1, 1, 1)conv2_kernel_size = (1, kernel_size[1], kernel_size[2])elif self.inflate == 1:conv1_kernel_size = (kernel_size[0], 1, 1)conv2_kernel_size = (1, kernel_size[1], kernel_size[2])elif self.inflate == 2:conv1_kernel_size = (1, 1, 1)conv2_kernel_size = (kernel_size[0], kernel_size[1], kernel_size[2])self.conv1 = Unit3D(self.in_channel,self.mid_channel,stride=(1, 1, 1),kernel_size=conv1_kernel_size,norm=norm,activation=activation[0])self.conv2 = Unit3D(self.mid_channel,self.mid_channel,stride=stride,kernel_size=conv2_kernel_size,group=conv2_group,norm=norm,activation=activation[1])def construct(self, x):x = self.conv1(x)x = self.conv2(x)return x

Resnet3D類實現了輸入經過Unit3D、Max Pooling再接4個residual block的結構，residual block的堆疊數量可以通過參數進行指定。

class ResNet3D(nn.Cell):"""ResNet3D architecture.Args:block (Optional[nn.Cell]): THe block for network.layer_nums (Tuple[int]): The numbers of block in different layers.stage_channels (Tuple[int]): Output channel for every res stage.Default: [64, 128, 256, 512].stage_strides (Tuple[Tuple[int]]): Strides for every res stage.Default:[[1, 1, 1],[1, 2, 2],[1, 2, 2],[1, 2, 2]].group (int): The number of Group convolutions. Default: 1.base_width (int): The width of per group. Default: 64.norm (nn.Cell, optional): The module specifying the normalization layer to use.Default: None.down_sample(nn.Cell, optional): Residual block in every resblock, it can transfer the inputfeature into the same channel of output. Default: Unit3D.kwargs (dict, optional): Key arguments for "make_res_layer" and resblocks.Inputs:- **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, T_{in}, H_{in}, W_{in})`.Outputs:Tensor of shape :math:`(N, 2048, 7, 7, 7)`Supported Platforms:``GPU``Examples:>>> import numpy as np>>> import mindspore as ms>>> from mindvision.msvideo.models.backbones import ResNet3D, ResidualBlock3D>>> net = ResNet(ResidualBlock3D, [3, 4, 23, 3])>>> x = ms.Tensor(np.ones([1, 3, 16, 224, 224]), ms.float32)>>> output = net(x)>>> print(output.shape)(1, 2048, 7, 7)About ResNet:The ResNet is to ease the training of networks that are substantially deeper thanthose used previously.The model explicitly reformulate the layers as learning residual functions withreference to the layer inputs, instead of learning unreferenced functions."""def __init__(self,block: Optional[nn.Cell],layer_nums: Tuple[int],stage_channels: Tuple[int] = (64, 128, 256, 512),stage_strides: Tuple[Tuple[int]] = ((1, 1, 1),(1, 2, 2),(1, 2, 2),(1, 2, 2)),group: int = 1,base_width: int = 64,norm: Optional[nn.Cell] = None,down_sample: Optional[nn.Cell] = Unit3D,**kwargs) -> None:super().__init__()if not norm:norm = nn.BatchNorm3dself.norm = normself.in_channels = stage_channels[0]self.group = groupself.base_with = base_widthself.down_sample = down_sampleself.conv1 = Unit3D(3, self.in_channels, kernel_size=7, stride=2, norm=norm)self.max_pool = ops.MaxPool3D(kernel_size=3, strides=2, pad_mode='same')self.layer1 = self._make_layer(block,stage_channels[0],layer_nums[0],stride=stage_strides[0],norm=self.norm,**kwargs)self.layer2 = self._make_layer(block,stage_channels[1],layer_nums[1],stride=stage_strides[1],norm=self.norm,**kwargs)self.layer3 = self._make_layer(block,stage_channels[2],layer_nums[2],stride=stage_strides[2],norm=self.norm,**kwargs)self.layer4 = self._make_layer(block,stage_channels[3],layer_nums[3],stride=stage_strides[3],norm=self.norm,**kwargs)def _make_layer(self,block: Optional[nn.Cell],channel: int,block_nums: int,stride: Tuple[int] = (1, 2, 2),norm: Optional[nn.Cell] = nn.BatchNorm3d,**kwargs):"""Block layers."""down_sample = Noneif stride[1] != 1 or self.in_channels != channel * block.expansion:down_sample = self.down_sample(self.in_channels,channel * block.expansion,kernel_size=1,stride=stride,norm=norm,activation=None)self.stride = stridebkwargs = [{} for _ in range(block_nums)] # block specified key word argstemp_args = kwargs.copy()for pname, pvalue in temp_args.items():if isinstance(pvalue, (list, tuple)):Validator.check_equal_int(len(pvalue), block_nums, f'len({pname})')for idx, v in enumerate(pvalue):bkwargs[idx][pname] = vkwargs.pop(pname)layers = []layers.append(block(self.in_channels,channel,stride=self.stride,down_sample=down_sample,group=self.group,base_width=self.base_with,norm=norm,**(bkwargs[0]),**kwargs))self.in_channels = channel * block.expansionfor i in range(1, block_nums):layers.append(block(self.in_channels,channel,stride=(1, 1, 1),group=self.group,base_width=self.base_with,norm=norm,**(bkwargs[i]),**kwargs))return nn.SequentialCell(layers)def construct(self, x):"""Resnet3D construct."""x = self.conv1(x)x = self.max_pool(x)x = self.layer1(x)x = self.layer2(x)x = self.layer3(x)x = self.layer4(x)return x

R2Plus1dNet類繼承了Resnet3D類，主要是使用了Resnet3D中的4個residual block，實現了輸入經過(2+1)D、Max Pooling，再通過4個residual block，最后經過平均池化層、展平層到全連接層的結構。

class R2Plus1dNet(ResNet3D):"""Generic R(2+1)d generator.Args:block (Optional[nn.Cell]): THe block for network.layer_nums (Tuple[int]): The numbers of block in different layers.stage_channels (Tuple[int]): Output channel for every res stage. Default: (64, 128, 256, 512).stage_strides (Tuple[Tuple[int]]): Strides for every res stage.Default:((1, 1, 1),(2, 2, 2),(2, 2, 2),(2, 2, 2).conv12 (nn.Cell, optional): Conv1 and conv2 config in resblock. Default: Conv2Plus1D.base_width (int): The width of per group. Default: 64.norm (nn.Cell, optional): The module specifying the normalization layer to use. Default: None.num_classes(int): Number of categories in the action recognition dataset.keep_prob(float): Dropout probability in classification stage.kwargs (dict, optional): Key arguments for "make_res_layer" and resblocks.Returns:Tensor, output tensor.Examples:>>> from mindvision.msvideo.models.backbones.r2plus1d import *>>> from mindvision.msvideo.models.backbones.resnet3d import ResidualBlockBase3D>>> data = Tensor(np.random.randn(2, 3, 16, 112, 112), dtype=mindspore.float32)>>>>>> net = R2Plus1dNet(block=ResidualBlockBase3D, layer_nums=[2, 2, 2, 2])>>>>>> predict = net(data)>>> print(predict.shape)"""def __init__(self,block: Optional[nn.Cell],layer_nums: Tuple[int],stage_channels: Tuple[int] = (64, 128, 256, 512),stage_strides: Tuple[Tuple[int]] = ((1, 1, 1),(2, 2, 2),(2, 2, 2),(2, 2, 2)),num_classes: int = 400,**kwargs) -> None:super().__init__(block=block,layer_nums=layer_nums,stage_channels=stage_channels,stage_strides=stage_strides,conv12=Conv2Plus1d,**kwargs)self.conv1 = nn.SequentialCell([nn.Conv3d(3, 45,kernel_size=(1, 7, 7),stride=(1, 2, 2),pad_mode='pad',padding=(0, 0, 3, 3, 3, 3),has_bias=False),nn.BatchNorm3d(45),nn.ReLU(),nn.Conv3d(45, 64,kernel_size=(3, 1, 1),stride=(1, 1, 1),pad_mode='pad',padding=(1, 1, 0, 0, 0, 0),has_bias=False),nn.BatchNorm3d(64),nn.ReLU()])self.avgpool = AdaptiveAvgPool3D((1, 1, 1))self.flatten = nn.Flatten()self.classifier = nn.Dense(stage_channels[-1] * block.expansion,num_classes)# init weightsself._initialize_weights()def construct(self, x):"""R2Plus1dNet construct."""x = self.conv1(x)x = self.layer1(x)x = self.layer2(x)x = self.layer3(x)x = self.layer4(x)x = self.avgpool(x)x = self.flatten(x)x = self.classifier(x)return xdef _initialize_weights(self):"""Init the weight of Conv3d and Dense in the net."""for _, cell in self.cells_and_names():if isinstance(cell, nn.Conv3d):cell.weight.set_data(init.initializer(init.HeNormal(math.sqrt(5), mode='fan_out', nonlinearity='relu'),cell.weight.shape, cell.weight.dtype))if cell.bias:cell.bias.set_data(init.initializer(init.Zero(), cell.bias.shape, cell.bias.dtype))elif isinstance(cell, nn.BatchNorm2d):cell.gamma.set_data(init.initializer(init.One(), cell.gamma.shape, cell.gamma.dtype))cell.beta.set_data(init.initializer(init.Zero(), cell.beta.shape, cell.beta.dtype))

R2Plus1d18類繼承了R2Plu1dNet類，主要的作用是指定residual block的堆疊次數，在此類中指定的數量即為每個block都堆疊兩次。

class R2Plus1d18(R2Plus1dNet):"""The class of R2Plus1d-18 uses the registration mechanism to register,need to use the yaml configuration file to call."""def __init__(self, **kwargs):super(R2Plus1d18, self).__init__(block=ResidualBlockBase3D,layer_nums=(2, 2, 2, 2),**kwargs)

三、可執行案例

notebook文件鏈接

數據集準備

代碼倉庫使用 Kinetics400 數據集進行訓練和驗證。

預訓練模型

預訓練模型是在 kinetics400 數據集上訓練，下載地址：r2plus1d18_kinetic400.ckpt

環境準備

git clone https://gitee.com/yanlq46462828/zjut_mindvideo.git cd zjut_mindvideo# Please first install mindspore according to instructions on the official website: https://www.mindspore.cn/installpip install -r requirements.txt pip install -e .

訓練流程

from mindspore import nn from mindspore import context, load_checkpoint, load_param_into_net from mindspore.context import ParallelMode from mindspore.communication import init, get_rank, get_group_size from mindspore.train import Model from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor from mindspore.nn.loss import SoftmaxCrossEntropyWithLogitsfrom msvideo.utils.check_param import Validator,Rel

數據集加載

通過基于VideoDataset編寫的Kinetic400類來加載kinetic400數據集。

from msvideo.data.kinetics400 import Kinetic400 # Data Pipeline. dataset = Kinetic400(path='/home/publicfile/kinetics-400',split="train",seq=32,num_parallel_workers=1,shuffle=True,batch_size=6,repeat_num=1) ckpt_save_dir = './r2plus1d' /home/publicfile/kinetics-400/cls2index.json

數據處理

通過VideoRescale對視頻進行縮放，利用VideoResize改變大小，再用VideoRandomCrop對Resize后的視頻進行隨機裁剪，再用VideoRandomHorizontalFlip根據概率對視頻進行水平翻轉，利用VideoReOrder對維度進行變換，再用VideoNormalize進行歸一化處理。

from msvideo.data.transforms import VideoRandomCrop, VideoRandomHorizontalFlip, VideoRescale from msvideo.data.transforms import VideoNormalize, VideoResize, VideoReOrdertransforms = [VideoRescale(shift=0.0),VideoResize([128, 171]),VideoRandomCrop([112, 112]),VideoRandomHorizontalFlip(0.5),VideoReOrder([3, 0, 1, 2]),VideoNormalize(mean=[0.43216, 0.394666, 0.37645],std=[0.22803, 0.22145, 0.216989])] dataset.transform = transforms dataset_train = dataset.run() Validator.check_int(dataset_train.get_dataset_size(), 0, Rel.GT) step_size = dataset_train.get_dataset_size() [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:30:59.929.412 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.

網絡構建

from msvideo.models.r2plus1d import R2Plus1d18 # Create model network = R2Plus1d18(num_classes=400) from msvideo.schedule.lr_schedule import warmup_cosine_annealing_lr_v1 # Set learning rate scheduler. learning_rate = warmup_cosine_annealing_lr_v1(lr=0.01,steps_per_epoch=step_size,warmup_epochs=4,max_epoch=100,t_max=100,eta_min=0) # Define optimizer. network_opt = nn.Momentum(network.trainable_params(),learning_rate=learning_rate,momentum=0.9,weight_decay=0.00004) # Define loss function. network_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") # Set the checkpoint config for the network. ckpt_config = CheckpointConfig(save_checkpoint_steps=step_size,keep_checkpoint_max=10) ckpt_callback = ModelCheckpoint(prefix='r2plus1d_kinetics400',directory=ckpt_save_dir,config=ckpt_config) # Init the model. model = Model(network, loss_fn=network_loss, optimizer=network_opt, metrics={'acc'}) # Begin to train. print('[Start training `{}`]'.format('r2plus1d_kinetics400')) print("=" * 80) model.train(1,dataset_train,callbacks=[ckpt_callback, LossMonitor()],dataset_sink_mode=False) print('[End of training `{}`]'.format('r2plus1d_kinetics400')) [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:41:43.490.637 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:41:43.498.663 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.[Start training `r2plus1d_kinetics400`] ================================================================================ epoch: 1 step: 1, loss is 5.998835563659668 epoch: 1 step: 2, loss is 5.921803951263428 epoch: 1 step: 3, loss is 6.024421691894531 epoch: 1 step: 4, loss is 6.08278751373291 epoch: 1 step: 5, loss is 6.014780044555664 epoch: 1 step: 6, loss is 5.945815086364746 epoch: 1 step: 7, loss is 6.078174114227295 epoch: 1 step: 8, loss is 6.0565361976623535 epoch: 1 step: 9, loss is 5.952683448791504 epoch: 1 step: 10, loss is 6.033120632171631 epoch: 1 step: 11, loss is 6.05575704574585 epoch: 1 step: 12, loss is 5.9879350662231445 epoch: 1 step: 13, loss is 6.006839275360107 epoch: 1 step: 14, loss is 5.9968180656433105 epoch: 1 step: 15, loss is 5.971335411071777 epoch: 1 step: 16, loss is 6.0620856285095215 epoch: 1 step: 17, loss is 6.081112861633301 epoch: 1 step: 18, loss is 6.106649398803711 epoch: 1 step: 19, loss is 6.095144271850586 epoch: 1 step: 20, loss is 6.00246000289917 epoch: 1 step: 21, loss is 6.061524868011475 epoch: 1 step: 22, loss is 6.046009063720703 epoch: 1 step: 23, loss is 5.997835159301758 epoch: 1 step: 24, loss is 6.007784366607666 epoch: 1 step: 25, loss is 5.946590423583984 epoch: 1 step: 26, loss is 5.9461164474487305 epoch: 1 step: 27, loss is 5.9034929275512695 epoch: 1 step: 28, loss is 5.925591945648193 epoch: 1 step: 29, loss is 6.176599979400635 ......

評估流程

from mindspore import context from msvideo.data.kinetics400 import Kinetic400context.set_context(mode=context.GRAPH_MODE, device_target="GPU")# Data Pipeline. dataset_eval = Kinetic400("/home/publicfile/kinetics-400",split="val",seq=32,seq_mode="interval",num_parallel_workers=1,shuffle=False,batch_size=8,repeat_num=1) /home/publicfile/kinetics-400/cls2index.json from msvideo.data.transforms import VideoCenterCrop, VideoRescale, VideoReOrder from msvideo.data.transforms import VideoNormalize, VideoResizetransforms = [VideoResize([128, 171]),VideoRescale(shift=0.0),VideoCenterCrop([112, 112]),VideoReOrder([3, 0, 1, 2]),VideoNormalize(mean=[0.43216, 0.394666, 0.37645],std=[0.22803, 0.22145, 0.216989])] dataset_eval.transform = transforms dataset_eval = dataset_eval.run() from mindspore import nn from mindspore import context, load_checkpoint, load_param_into_net from mindspore.train import Model from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits from msvideo.utils.callbacks import EvalLossMonitor from msvideo.models.r2plus1d import R2Plus1d18# Create model network = R2Plus1d18(num_classes=400)# Define loss function. network_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")param_dict = load_checkpoint('/home/zhengs/r2plus1d/r2plus1d18_kinetic400.ckpt') load_param_into_net(network, param_dict)# Define eval_metrics. eval_metrics = {'Loss': nn.Loss(),'Top_1_Accuracy': nn.Top1CategoricalAccuracy(),'Top_5_Accuracy': nn.Top5CategoricalAccuracy()}# Init the model. model = Model(network, loss_fn=network_loss, metrics=eval_metrics)print_cb = EvalLossMonitor(model) # Begin to eval. print('[Start eval `{}`]'.format('r2plus1d_kinetics400')) result = model.eval(dataset_eval,callbacks=[print_cb],dataset_sink_mode=False) print(result) [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.745.627 [mindspore/train/model.py:1077] For EvalLossMonitor callback, {'epoch_end', 'step_end', 'epoch_begin', 'step_begin'} methods may not be supported in later version, Use methods prefixed with 'on_train' or 'on_eval' instead when using customized callbacks. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.747.418 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.749.293 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.751.452 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.[Start eval `r2plus1d_kinetics400`] step:[ 1/ 2484], metrics:[], loss:[3.070/3.070], time:1923.473 ms, step:[ 2/ 2484], metrics:['Loss: 3.0702', 'Top_1_Accuracy: 0.3750', 'Top_5_Accuracy: 0.7500'], loss:[0.808/1.939], time:169.314 ms, step:[ 3/ 2484], metrics:['Loss: 1.9391', 'Top_1_Accuracy: 0.5625', 'Top_5_Accuracy: 0.8750'], loss:[2.645/2.175], time:192.965 ms, step:[ 4/ 2484], metrics:['Loss: 2.1745', 'Top_1_Accuracy: 0.5417', 'Top_5_Accuracy: 0.8750'], loss:[2.954/2.369], time:172.657 ms, step:[ 5/ 2484], metrics:['Loss: 2.3695', 'Top_1_Accuracy: 0.5000', 'Top_5_Accuracy: 0.8438'], loss:[2.489/2.393], time:176.803 ms, step:[ 6/ 2484], metrics:['Loss: 2.3934', 'Top_1_Accuracy: 0.4750', 'Top_5_Accuracy: 0.8250'], loss:[1.566/2.256], time:172.621 ms, step:[ 7/ 2484], metrics:['Loss: 2.2556', 'Top_1_Accuracy: 0.4792', 'Top_5_Accuracy: 0.8333'], loss:[0.761/2.042], time:172.149 ms, step:[ 8/ 2484], metrics:['Loss: 2.0420', 'Top_1_Accuracy: 0.5357', 'Top_5_Accuracy: 0.8571'], loss:[3.675/2.246], time:181.757 ms, step:[ 9/ 2484], metrics:['Loss: 2.2461', 'Top_1_Accuracy: 0.4688', 'Top_5_Accuracy: 0.7969'], loss:[3.909/2.431], time:186.722 ms, step:[ 10/ 2484], metrics:['Loss: 2.4309', 'Top_1_Accuracy: 0.4583', 'Top_5_Accuracy: 0.7639'], loss:[3.663/2.554], time:199.209 ms, step:[ 11/ 2484], metrics:['Loss: 2.5542', 'Top_1_Accuracy: 0.4375', 'Top_5_Accuracy: 0.7375'], loss:[3.438/2.635], time:173.766 ms, step:[ 12/ 2484], metrics:['Loss: 2.6345', 'Top_1_Accuracy: 0.4318', 'Top_5_Accuracy: 0.7159'], loss:[2.695/2.640], time:171.364 ms, step:[ 13/ 2484], metrics:['Loss: 2.6395', 'Top_1_Accuracy: 0.4375', 'Top_5_Accuracy: 0.7292'], loss:[3.542/2.709], time:172.889 ms, step:[ 14/ 2484], metrics:['Loss: 2.7090', 'Top_1_Accuracy: 0.4231', 'Top_5_Accuracy: 0.7308'], loss:[3.404/2.759], time:216.287 ms, step:[ 15/ 2484], metrics:['Loss: 2.7586', 'Top_1_Accuracy: 0.4018', 'Top_5_Accuracy: 0.7232'], loss:[4.012/2.842], time:171.686 ms, step:[ 16/ 2484], metrics:['Loss: 2.8422', 'Top_1_Accuracy: 0.3833', 'Top_5_Accuracy: 0.7167'], loss:[5.157/2.987], time:170.363 ms, step:[ 17/ 2484], metrics:['Loss: 2.9869', 'Top_1_Accuracy: 0.3750', 'Top_5_Accuracy: 0.6875'], loss:[4.667/3.086], time:171.926 ms, step:[ 18/ 2484], metrics:['Loss: 3.0857', 'Top_1_Accuracy: 0.3603', 'Top_5_Accuracy: 0.6618'], loss:[5.044/3.194], time:197.028 ms, step:[ 19/ 2484], metrics:['Loss: 3.1945', 'Top_1_Accuracy: 0.3403', 'Top_5_Accuracy: 0.6458'], loss:[3.625/3.217], time:222.758 ms, step:[ 20/ 2484], metrics:['Loss: 3.2171', 'Top_1_Accuracy: 0.3355', 'Top_5_Accuracy: 0.6513'], loss:[1.909/3.152], time:207.416 ms, step:[ 21/ 2484], metrics:['Loss: 3.1517', 'Top_1_Accuracy: 0.3563', 'Top_5_Accuracy: 0.6625'], loss:[4.591/3.220], time:171.645 ms, step:[ 22/ 2484], metrics:['Loss: 3.2202', 'Top_1_Accuracy: 0.3631', 'Top_5_Accuracy: 0.6667'], loss:[3.545/3.235], time:209.975 ms, step:[ 23/ 2484], metrics:['Loss: 3.2350', 'Top_1_Accuracy: 0.3693', 'Top_5_Accuracy: 0.6591'], loss:[3.350/3.240], time:185.889 ms,

Code

代碼倉庫地址如下：

Gitee地址
Github地址

總結

以上是生活随笔為你收集整理的R(2+1)D理解与MindSpore框架下的实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： PHP基础——相册管理系统的实现
下一篇： Teamviewer11现在无法捕捉屏幕