日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

R(2+1)D理解与MindSpore框架下的实现

發(fā)布時間:2024/3/24 编程问答 47 豆豆
生活随笔 收集整理的這篇文章主要介紹了 R(2+1)D理解与MindSpore框架下的实现 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

一、R(2+1)D算法原理介紹

論文地址:[1711.11248] A Closer Look at Spatiotemporal Convolutions for Action Recognition (arxiv.org)

Tran等人在2018年發(fā)表在CVPR 的文章《A Closer Look at Spatiotemporal Convolutions for Action Recognition》提出了R(2+1)D,表明將三位卷積核分解為獨立的空間和時間分量可以顯著提高精度,R(2+1)D中的卷積模塊將 N×t×d×dN \times t \times d \times dN×t×d×d 的3D卷積拆分為 N×1×d×dN \times 1 \times d \times dN×1×d×d 的2D空間卷積和 M×t×1×1M \times t \times 1 \times 1M×t×1×1 的1D時間卷積,其中N和M為卷積核的個數(shù),超參數(shù)M決定了信號在空間卷積和時間卷積之間投影的中間子空間的維數(shù),論文中將M的值設(shè)置為:
Mi=?td2Ni?1Nid2Ni?1+tNi?M_{i}= \left \lfloor \frac{td^{2}N_{i-1}N_{i}}{d^{2}N_{i-1}+tN_{i}} \right \rfloor Mi?=?d2Ni?1?+tNi?td2Ni?1?Ni???

i表示殘差網(wǎng)絡中第i個卷積塊,通過這種方式以保證(2+1)D模塊中的參數(shù)量近似于3D卷積的參數(shù)量。


與全三維卷積相比,(2+1)D分解有兩個優(yōu)點,首先,盡管沒有改變參數(shù)的數(shù)量,但由于每個塊中2D和1D卷積之間的額外激活函數(shù),網(wǎng)絡中的非線性數(shù)量增加了一倍,非線性數(shù)量的增加了可以表示的函數(shù)的復雜性。第二個好處在于,將3D卷積強制轉(zhuǎn)換為單獨的空間和時間分量,使優(yōu)化變得更容易,這表現(xiàn)在與相同參數(shù)量的3D卷積網(wǎng)絡相比,(2+1)D網(wǎng)絡的訓練誤差更低。

下表展示了18層和34層的R3D網(wǎng)絡的架構(gòu),在R3D中,使用(2+1)D卷積代替3D卷積就能得到對應層數(shù)的R(2+1)D網(wǎng)絡。


實驗部分在Kinetics 上比較了不同形式的卷積的動作識別準確性,如下表所示。所有模型都基于 ResNet18,并在 8 幀或 16 幀剪輯輸入上從頭開始訓練,結(jié)果表明R(2+1)D 的精度優(yōu)于所有其他模型。


在Kinetics上與sota方法比較的結(jié)果如下表所示。當在 RGB 輸入上從頭開始訓練時,R(2+1)D 比 I3D 高出 4.5%,在 Sports-1M 上預訓練的 R(2+1)D 也比在 ImageNet 上預訓練的 I3D 高 2.2%。

二、R(2+1)D的mindspore代碼實現(xiàn)

功能函數(shù)說明

數(shù)據(jù)預處理

  • 使用GeneratorDataset讀取了視頻數(shù)據(jù)集文件,輸出batch_size=16的指定幀數(shù)的三通道圖片。

  • 數(shù)據(jù)前處理包括混洗、歸一化。

  • 數(shù)據(jù)增強包括video_random_crop類實現(xiàn)的隨機裁剪、video_resize類實現(xiàn)的調(diào)整大小、video_random_horizontal_flip實現(xiàn)的隨機水平翻轉(zhuǎn)。

  • 模型主干

  • R2Plus1d18中,輸入首先經(jīng)過一個 (2+1)D卷積模塊,經(jīng)過一個最大池化層,之后通過4個由(2+1)D卷積模塊組成的residual block,再經(jīng)過平均池化層、展平層最后到全連接層。

  • 最先的(2+1)D卷積模塊具體為卷積核大小為(1,7,7)的Conv3d再接一個卷積核大小為(3,1,1)的Conv3d,卷積層之間是Batch Normalization和Relu層。

  • R2Plus1d18中包含4個residual block,每個block在模型中都堆疊兩次,同時每個block都由兩個(2+1)D卷積模塊組成,每個(2+1)D卷積都由一個卷積核大小為(1,3,3)的Conv3d再接一個卷積核大小為(3,1,1)的Conv3d組成,卷積層之間仍然是Batch Normalization和Relu層,block的輸入和輸出之間是殘差連接的結(jié)構(gòu)。

  • 具體模型搭建中各個類的作用為:

    • Unit3D類實現(xiàn)了輸入經(jīng)過Conv3d、BN、Relu、Pooling層的結(jié)構(gòu),其中BN層、Relu層和Pooling層是可選的。
    class Unit3D(nn.Cell):"""Conv3d fused with normalization and activation blocks definition.Args:in_channels (int): The number of channels of input frame images.out_channels (int): The number of channels of output frame images.kernel_size (tuple): The size of the conv3d kernel.stride (Union[int, Tuple[int]]): Stride size for the first convolutional layer. Default: 1.pad_mode (str): Specifies padding mode. The optional values are "same", "valid", "pad".Default: "pad".padding (Union[int, Tuple[int]]): Implicit paddings on both sides of the input x.If `pad_mode` is "pad" and `padding` is not specified by user, then the paddingsize will be `(kernel_size - 1) // 2` for C, H, W channel.dilation (Union[int, Tuple[int]]): Specifies the dilation rate to use for dilatedconvolution. Default: 1group (int): Splits filter into groups, in_channels and out_channels must be divisibleby the number of groups. Default: 1.activation (Optional[nn.Cell]): Activation function which will be stacked on top of thenormalization layer (if not None), otherwise on top of the conv layer. Default: nn.ReLU.norm (Optional[nn.Cell]): Norm layer that will be stacked on top of the convolutionlayer. Default: nn.BatchNorm3d.pooling (Optional[nn.Cell]): Pooling layer (if not None) will be stacked on top of all theformer layers. Default: None.has_bias (bool): Whether to use Bias.Returns:Tensor, output tensor.Examples:Unit3D(in_channels=in_channels, out_channels=out_channels[0], kernel_size=(1, 1, 1))"""def __init__(self,in_channels: int,out_channels: int,kernel_size: Union[int, Tuple[int]] = 3,stride: Union[int, Tuple[int]] = 1,pad_mode: str = 'pad',padding: Union[int, Tuple[int]] = 0,dilation: Union[int, Tuple[int]] = 1,group: int = 1,activation: Optional[nn.Cell] = nn.ReLU,norm: Optional[nn.Cell] = nn.BatchNorm3d,pooling: Optional[nn.Cell] = None,has_bias: bool = False) -> None:super().__init__()if pad_mode == 'pad' and padding == 0:padding = tuple((k - 1) // 2 for k in six_padding(kernel_size))else:padding = 0layers = [nn.Conv3d(in_channels=in_channels,out_channels=out_channels,kernel_size=kernel_size,stride=stride,pad_mode=pad_mode,padding=padding,dilation=dilation,group=group,has_bias=has_bias)]if norm:layers.append(norm(out_channels))if activation:layers.append(activation())self.pooling = Noneif pooling:self.pooling = poolingself.features = nn.SequentialCell(layers)def construct(self, x):""" construct unit3d"""output = self.features(x)if self.pooling:output = self.pooling(output)return output
    • Inflate3D類使用Unit3D實現(xiàn)了(2+1)D卷積模塊。
    class Inflate3D(nn.Cell):"""Inflate3D block definition.Args:in_channel (int): The number of channels of input frame images.out_channel (int): The number of channels of output frame images.mid_channel (int): The number of channels of inner frame images.kernel_size (tuple): The size of the spatial-temporal convolutional layer kernels.stride (Union[int, Tuple[int]]): Stride size for the second convolutional layer. Default: 1.conv2_group (int): Splits filter into groups for the second conv layer,in_channels and out_channelsmust be divisible by the number of groups. Default: 1.norm (Optional[nn.Cell]): Norm layer that will be stacked on top of the convolutionlayer. Default: nn.BatchNorm3d.activation (List[Optional[Union[nn.Cell, str]]]): Activation function which will be stackedon top of the normalization layer (if not None), otherwise on top of the conv layer.Default: nn.ReLU, None.inflate (int): Whether to inflate two conv3d layers and with different kernel size.Returns:Tensor, output tensor.Examples:>>> from mindvision.msvideo.models.blocks import Inflate3D>>> Inflate3D(3, 64, 64)"""def __init__(self,in_channel: int,out_channel: int,mid_channel: int = 0,stride: tuple = (1, 1, 1),kernel_size: tuple = (3, 3, 3),conv2_group: int = 1,norm: Optional[nn.Cell] = nn.BatchNorm3d,activation: List[Optional[Union[nn.Cell, str]]] = (nn.ReLU, None),inflate: int = 1,):super(Inflate3D, self).__init__()if not norm:norm = nn.BatchNorm3dself.in_channel = in_channelif mid_channel == 0:self.mid_channel = (in_channel * out_channel * kernel_size[1] * kernel_size[2] * 3) // \(in_channel * kernel_size[1] * kernel_size[2] + 3 * out_channel)else:self.mid_channel = mid_channelself.inflate = inflateif self.inflate == 0:conv1_kernel_size = (1, 1, 1)conv2_kernel_size = (1, kernel_size[1], kernel_size[2])elif self.inflate == 1:conv1_kernel_size = (kernel_size[0], 1, 1)conv2_kernel_size = (1, kernel_size[1], kernel_size[2])elif self.inflate == 2:conv1_kernel_size = (1, 1, 1)conv2_kernel_size = (kernel_size[0], kernel_size[1], kernel_size[2])self.conv1 = Unit3D(self.in_channel,self.mid_channel,stride=(1, 1, 1),kernel_size=conv1_kernel_size,norm=norm,activation=activation[0])self.conv2 = Unit3D(self.mid_channel,self.mid_channel,stride=stride,kernel_size=conv2_kernel_size,group=conv2_group,norm=norm,activation=activation[1])def construct(self, x):x = self.conv1(x)x = self.conv2(x)return x
    • Resnet3D類實現(xiàn)了輸入經(jīng)過Unit3D、Max Pooling再接4個residual block的結(jié)構(gòu),residual block的堆疊數(shù)量可以通過參數(shù)進行指定。
    class ResNet3D(nn.Cell):"""ResNet3D architecture.Args:block (Optional[nn.Cell]): THe block for network.layer_nums (Tuple[int]): The numbers of block in different layers.stage_channels (Tuple[int]): Output channel for every res stage.Default: [64, 128, 256, 512].stage_strides (Tuple[Tuple[int]]): Strides for every res stage.Default:[[1, 1, 1],[1, 2, 2],[1, 2, 2],[1, 2, 2]].group (int): The number of Group convolutions. Default: 1.base_width (int): The width of per group. Default: 64.norm (nn.Cell, optional): The module specifying the normalization layer to use.Default: None.down_sample(nn.Cell, optional): Residual block in every resblock, it can transfer the inputfeature into the same channel of output. Default: Unit3D.kwargs (dict, optional): Key arguments for "make_res_layer" and resblocks.Inputs:- **x** (Tensor) - Tensor of shape :math:`(N, C_{in}, T_{in}, H_{in}, W_{in})`.Outputs:Tensor of shape :math:`(N, 2048, 7, 7, 7)`Supported Platforms:``GPU``Examples:>>> import numpy as np>>> import mindspore as ms>>> from mindvision.msvideo.models.backbones import ResNet3D, ResidualBlock3D>>> net = ResNet(ResidualBlock3D, [3, 4, 23, 3])>>> x = ms.Tensor(np.ones([1, 3, 16, 224, 224]), ms.float32)>>> output = net(x)>>> print(output.shape)(1, 2048, 7, 7)About ResNet:The ResNet is to ease the training of networks that are substantially deeper thanthose used previously.The model explicitly reformulate the layers as learning residual functions withreference to the layer inputs, instead of learning unreferenced functions."""def __init__(self,block: Optional[nn.Cell],layer_nums: Tuple[int],stage_channels: Tuple[int] = (64, 128, 256, 512),stage_strides: Tuple[Tuple[int]] = ((1, 1, 1),(1, 2, 2),(1, 2, 2),(1, 2, 2)),group: int = 1,base_width: int = 64,norm: Optional[nn.Cell] = None,down_sample: Optional[nn.Cell] = Unit3D,**kwargs) -> None:super().__init__()if not norm:norm = nn.BatchNorm3dself.norm = normself.in_channels = stage_channels[0]self.group = groupself.base_with = base_widthself.down_sample = down_sampleself.conv1 = Unit3D(3, self.in_channels, kernel_size=7, stride=2, norm=norm)self.max_pool = ops.MaxPool3D(kernel_size=3, strides=2, pad_mode='same')self.layer1 = self._make_layer(block,stage_channels[0],layer_nums[0],stride=stage_strides[0],norm=self.norm,**kwargs)self.layer2 = self._make_layer(block,stage_channels[1],layer_nums[1],stride=stage_strides[1],norm=self.norm,**kwargs)self.layer3 = self._make_layer(block,stage_channels[2],layer_nums[2],stride=stage_strides[2],norm=self.norm,**kwargs)self.layer4 = self._make_layer(block,stage_channels[3],layer_nums[3],stride=stage_strides[3],norm=self.norm,**kwargs)def _make_layer(self,block: Optional[nn.Cell],channel: int,block_nums: int,stride: Tuple[int] = (1, 2, 2),norm: Optional[nn.Cell] = nn.BatchNorm3d,**kwargs):"""Block layers."""down_sample = Noneif stride[1] != 1 or self.in_channels != channel * block.expansion:down_sample = self.down_sample(self.in_channels,channel * block.expansion,kernel_size=1,stride=stride,norm=norm,activation=None)self.stride = stridebkwargs = [{} for _ in range(block_nums)] # block specified key word argstemp_args = kwargs.copy()for pname, pvalue in temp_args.items():if isinstance(pvalue, (list, tuple)):Validator.check_equal_int(len(pvalue), block_nums, f'len({pname})')for idx, v in enumerate(pvalue):bkwargs[idx][pname] = vkwargs.pop(pname)layers = []layers.append(block(self.in_channels,channel,stride=self.stride,down_sample=down_sample,group=self.group,base_width=self.base_with,norm=norm,**(bkwargs[0]),**kwargs))self.in_channels = channel * block.expansionfor i in range(1, block_nums):layers.append(block(self.in_channels,channel,stride=(1, 1, 1),group=self.group,base_width=self.base_with,norm=norm,**(bkwargs[i]),**kwargs))return nn.SequentialCell(layers)def construct(self, x):"""Resnet3D construct."""x = self.conv1(x)x = self.max_pool(x)x = self.layer1(x)x = self.layer2(x)x = self.layer3(x)x = self.layer4(x)return x
    • R2Plus1dNet類繼承了Resnet3D類,主要是使用了Resnet3D中的4個residual block,實現(xiàn)了輸入經(jīng)過(2+1)D、Max Pooling,再通過4個residual block,最后經(jīng)過平均池化層、展平層到全連接層的結(jié)構(gòu)。
    class R2Plus1dNet(ResNet3D):"""Generic R(2+1)d generator.Args:block (Optional[nn.Cell]): THe block for network.layer_nums (Tuple[int]): The numbers of block in different layers.stage_channels (Tuple[int]): Output channel for every res stage. Default: (64, 128, 256, 512).stage_strides (Tuple[Tuple[int]]): Strides for every res stage.Default:((1, 1, 1),(2, 2, 2),(2, 2, 2),(2, 2, 2).conv12 (nn.Cell, optional): Conv1 and conv2 config in resblock. Default: Conv2Plus1D.base_width (int): The width of per group. Default: 64.norm (nn.Cell, optional): The module specifying the normalization layer to use. Default: None.num_classes(int): Number of categories in the action recognition dataset.keep_prob(float): Dropout probability in classification stage.kwargs (dict, optional): Key arguments for "make_res_layer" and resblocks.Returns:Tensor, output tensor.Examples:>>> from mindvision.msvideo.models.backbones.r2plus1d import *>>> from mindvision.msvideo.models.backbones.resnet3d import ResidualBlockBase3D>>> data = Tensor(np.random.randn(2, 3, 16, 112, 112), dtype=mindspore.float32)>>>>>> net = R2Plus1dNet(block=ResidualBlockBase3D, layer_nums=[2, 2, 2, 2])>>>>>> predict = net(data)>>> print(predict.shape)"""def __init__(self,block: Optional[nn.Cell],layer_nums: Tuple[int],stage_channels: Tuple[int] = (64, 128, 256, 512),stage_strides: Tuple[Tuple[int]] = ((1, 1, 1),(2, 2, 2),(2, 2, 2),(2, 2, 2)),num_classes: int = 400,**kwargs) -> None:super().__init__(block=block,layer_nums=layer_nums,stage_channels=stage_channels,stage_strides=stage_strides,conv12=Conv2Plus1d,**kwargs)self.conv1 = nn.SequentialCell([nn.Conv3d(3, 45,kernel_size=(1, 7, 7),stride=(1, 2, 2),pad_mode='pad',padding=(0, 0, 3, 3, 3, 3),has_bias=False),nn.BatchNorm3d(45),nn.ReLU(),nn.Conv3d(45, 64,kernel_size=(3, 1, 1),stride=(1, 1, 1),pad_mode='pad',padding=(1, 1, 0, 0, 0, 0),has_bias=False),nn.BatchNorm3d(64),nn.ReLU()])self.avgpool = AdaptiveAvgPool3D((1, 1, 1))self.flatten = nn.Flatten()self.classifier = nn.Dense(stage_channels[-1] * block.expansion,num_classes)# init weightsself._initialize_weights()def construct(self, x):"""R2Plus1dNet construct."""x = self.conv1(x)x = self.layer1(x)x = self.layer2(x)x = self.layer3(x)x = self.layer4(x)x = self.avgpool(x)x = self.flatten(x)x = self.classifier(x)return xdef _initialize_weights(self):"""Init the weight of Conv3d and Dense in the net."""for _, cell in self.cells_and_names():if isinstance(cell, nn.Conv3d):cell.weight.set_data(init.initializer(init.HeNormal(math.sqrt(5), mode='fan_out', nonlinearity='relu'),cell.weight.shape, cell.weight.dtype))if cell.bias:cell.bias.set_data(init.initializer(init.Zero(), cell.bias.shape, cell.bias.dtype))elif isinstance(cell, nn.BatchNorm2d):cell.gamma.set_data(init.initializer(init.One(), cell.gamma.shape, cell.gamma.dtype))cell.beta.set_data(init.initializer(init.Zero(), cell.beta.shape, cell.beta.dtype))
    • R2Plus1d18類繼承了R2Plu1dNet類,主要的作用是指定residual block的堆疊次數(shù),在此類中指定的數(shù)量即為每個block都堆疊兩次。
    class R2Plus1d18(R2Plus1dNet):"""The class of R2Plus1d-18 uses the registration mechanism to register,need to use the yaml configuration file to call."""def __init__(self, **kwargs):super(R2Plus1d18, self).__init__(block=ResidualBlockBase3D,layer_nums=(2, 2, 2, 2),**kwargs)

    三、可執(zhí)行案例

    notebook文件鏈接

    數(shù)據(jù)集準備

    代碼倉庫使用 Kinetics400 數(shù)據(jù)集進行訓練和驗證。

    預訓練模型

    預訓練模型是在 kinetics400 數(shù)據(jù)集上訓練,下載地址:r2plus1d18_kinetic400.ckpt

    環(huán)境準備

    git clone https://gitee.com/yanlq46462828/zjut_mindvideo.git cd zjut_mindvideo# Please first install mindspore according to instructions on the official website: https://www.mindspore.cn/installpip install -r requirements.txt pip install -e .

    訓練流程

    from mindspore import nn from mindspore import context, load_checkpoint, load_param_into_net from mindspore.context import ParallelMode from mindspore.communication import init, get_rank, get_group_size from mindspore.train import Model from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor from mindspore.nn.loss import SoftmaxCrossEntropyWithLogitsfrom msvideo.utils.check_param import Validator,Rel
    數(shù)據(jù)集加載

    通過基于VideoDataset編寫的Kinetic400類來加載kinetic400數(shù)據(jù)集。

    from msvideo.data.kinetics400 import Kinetic400 # Data Pipeline. dataset = Kinetic400(path='/home/publicfile/kinetics-400',split="train",seq=32,num_parallel_workers=1,shuffle=True,batch_size=6,repeat_num=1) ckpt_save_dir = './r2plus1d' /home/publicfile/kinetics-400/cls2index.json
    數(shù)據(jù)處理

    通過VideoRescale對視頻進行縮放,利用VideoResize改變大小,再用VideoRandomCrop對Resize后的視頻進行隨機裁剪,再用VideoRandomHorizontalFlip根據(jù)概率對視頻進行水平翻轉(zhuǎn),利用VideoReOrder對維度進行變換,再用VideoNormalize進行歸一化處理。

    from msvideo.data.transforms import VideoRandomCrop, VideoRandomHorizontalFlip, VideoRescale from msvideo.data.transforms import VideoNormalize, VideoResize, VideoReOrdertransforms = [VideoRescale(shift=0.0),VideoResize([128, 171]),VideoRandomCrop([112, 112]),VideoRandomHorizontalFlip(0.5),VideoReOrder([3, 0, 1, 2]),VideoNormalize(mean=[0.43216, 0.394666, 0.37645],std=[0.22803, 0.22145, 0.216989])] dataset.transform = transforms dataset_train = dataset.run() Validator.check_int(dataset_train.get_dataset_size(), 0, Rel.GT) step_size = dataset_train.get_dataset_size() [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:30:59.929.412 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.
    網(wǎng)絡構(gòu)建
    from msvideo.models.r2plus1d import R2Plus1d18 # Create model network = R2Plus1d18(num_classes=400) from msvideo.schedule.lr_schedule import warmup_cosine_annealing_lr_v1 # Set learning rate scheduler. learning_rate = warmup_cosine_annealing_lr_v1(lr=0.01,steps_per_epoch=step_size,warmup_epochs=4,max_epoch=100,t_max=100,eta_min=0) # Define optimizer. network_opt = nn.Momentum(network.trainable_params(),learning_rate=learning_rate,momentum=0.9,weight_decay=0.00004) # Define loss function. network_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") # Set the checkpoint config for the network. ckpt_config = CheckpointConfig(save_checkpoint_steps=step_size,keep_checkpoint_max=10) ckpt_callback = ModelCheckpoint(prefix='r2plus1d_kinetics400',directory=ckpt_save_dir,config=ckpt_config) # Init the model. model = Model(network, loss_fn=network_loss, optimizer=network_opt, metrics={'acc'}) # Begin to train. print('[Start training `{}`]'.format('r2plus1d_kinetics400')) print("=" * 80) model.train(1,dataset_train,callbacks=[ckpt_callback, LossMonitor()],dataset_sink_mode=False) print('[End of training `{}`]'.format('r2plus1d_kinetics400')) [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:41:43.490.637 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-10:41:43.498.663 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.[Start training `r2plus1d_kinetics400`] ================================================================================ epoch: 1 step: 1, loss is 5.998835563659668 epoch: 1 step: 2, loss is 5.921803951263428 epoch: 1 step: 3, loss is 6.024421691894531 epoch: 1 step: 4, loss is 6.08278751373291 epoch: 1 step: 5, loss is 6.014780044555664 epoch: 1 step: 6, loss is 5.945815086364746 epoch: 1 step: 7, loss is 6.078174114227295 epoch: 1 step: 8, loss is 6.0565361976623535 epoch: 1 step: 9, loss is 5.952683448791504 epoch: 1 step: 10, loss is 6.033120632171631 epoch: 1 step: 11, loss is 6.05575704574585 epoch: 1 step: 12, loss is 5.9879350662231445 epoch: 1 step: 13, loss is 6.006839275360107 epoch: 1 step: 14, loss is 5.9968180656433105 epoch: 1 step: 15, loss is 5.971335411071777 epoch: 1 step: 16, loss is 6.0620856285095215 epoch: 1 step: 17, loss is 6.081112861633301 epoch: 1 step: 18, loss is 6.106649398803711 epoch: 1 step: 19, loss is 6.095144271850586 epoch: 1 step: 20, loss is 6.00246000289917 epoch: 1 step: 21, loss is 6.061524868011475 epoch: 1 step: 22, loss is 6.046009063720703 epoch: 1 step: 23, loss is 5.997835159301758 epoch: 1 step: 24, loss is 6.007784366607666 epoch: 1 step: 25, loss is 5.946590423583984 epoch: 1 step: 26, loss is 5.9461164474487305 epoch: 1 step: 27, loss is 5.9034929275512695 epoch: 1 step: 28, loss is 5.925591945648193 epoch: 1 step: 29, loss is 6.176599979400635 ......

    評估流程

    from mindspore import context from msvideo.data.kinetics400 import Kinetic400context.set_context(mode=context.GRAPH_MODE, device_target="GPU")# Data Pipeline. dataset_eval = Kinetic400("/home/publicfile/kinetics-400",split="val",seq=32,seq_mode="interval",num_parallel_workers=1,shuffle=False,batch_size=8,repeat_num=1) /home/publicfile/kinetics-400/cls2index.json from msvideo.data.transforms import VideoCenterCrop, VideoRescale, VideoReOrder from msvideo.data.transforms import VideoNormalize, VideoResizetransforms = [VideoResize([128, 171]),VideoRescale(shift=0.0),VideoCenterCrop([112, 112]),VideoReOrder([3, 0, 1, 2]),VideoNormalize(mean=[0.43216, 0.394666, 0.37645],std=[0.22803, 0.22145, 0.216989])] dataset_eval.transform = transforms dataset_eval = dataset_eval.run() from mindspore import nn from mindspore import context, load_checkpoint, load_param_into_net from mindspore.train import Model from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits from msvideo.utils.callbacks import EvalLossMonitor from msvideo.models.r2plus1d import R2Plus1d18# Create model network = R2Plus1d18(num_classes=400)# Define loss function. network_loss = SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")param_dict = load_checkpoint('/home/zhengs/r2plus1d/r2plus1d18_kinetic400.ckpt') load_param_into_net(network, param_dict)# Define eval_metrics. eval_metrics = {'Loss': nn.Loss(),'Top_1_Accuracy': nn.Top1CategoricalAccuracy(),'Top_5_Accuracy': nn.Top5CategoricalAccuracy()}# Init the model. model = Model(network, loss_fn=network_loss, metrics=eval_metrics)print_cb = EvalLossMonitor(model) # Begin to eval. print('[Start eval `{}`]'.format('r2plus1d_kinetics400')) result = model.eval(dataset_eval,callbacks=[print_cb],dataset_sink_mode=False) print(result) [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.745.627 [mindspore/train/model.py:1077] For EvalLossMonitor callback, {'epoch_end', 'step_end', 'epoch_begin', 'step_begin'} methods may not be supported in later version, Use methods prefixed with 'on_train' or 'on_eval' instead when using customized callbacks. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.747.418 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.749.293 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead. [WARNING] ME(150956:140289176069952,MainProcess):2023-03-13-11:35:48.751.452 [mindspore/dataset/core/validator_helpers.py:804] 'Compose' from mindspore.dataset.transforms.py_transforms is deprecated from version 1.8 and will be removed in a future version. Use 'Compose' from mindspore.dataset.transforms instead.[Start eval `r2plus1d_kinetics400`] step:[ 1/ 2484], metrics:[], loss:[3.070/3.070], time:1923.473 ms, step:[ 2/ 2484], metrics:['Loss: 3.0702', 'Top_1_Accuracy: 0.3750', 'Top_5_Accuracy: 0.7500'], loss:[0.808/1.939], time:169.314 ms, step:[ 3/ 2484], metrics:['Loss: 1.9391', 'Top_1_Accuracy: 0.5625', 'Top_5_Accuracy: 0.8750'], loss:[2.645/2.175], time:192.965 ms, step:[ 4/ 2484], metrics:['Loss: 2.1745', 'Top_1_Accuracy: 0.5417', 'Top_5_Accuracy: 0.8750'], loss:[2.954/2.369], time:172.657 ms, step:[ 5/ 2484], metrics:['Loss: 2.3695', 'Top_1_Accuracy: 0.5000', 'Top_5_Accuracy: 0.8438'], loss:[2.489/2.393], time:176.803 ms, step:[ 6/ 2484], metrics:['Loss: 2.3934', 'Top_1_Accuracy: 0.4750', 'Top_5_Accuracy: 0.8250'], loss:[1.566/2.256], time:172.621 ms, step:[ 7/ 2484], metrics:['Loss: 2.2556', 'Top_1_Accuracy: 0.4792', 'Top_5_Accuracy: 0.8333'], loss:[0.761/2.042], time:172.149 ms, step:[ 8/ 2484], metrics:['Loss: 2.0420', 'Top_1_Accuracy: 0.5357', 'Top_5_Accuracy: 0.8571'], loss:[3.675/2.246], time:181.757 ms, step:[ 9/ 2484], metrics:['Loss: 2.2461', 'Top_1_Accuracy: 0.4688', 'Top_5_Accuracy: 0.7969'], loss:[3.909/2.431], time:186.722 ms, step:[ 10/ 2484], metrics:['Loss: 2.4309', 'Top_1_Accuracy: 0.4583', 'Top_5_Accuracy: 0.7639'], loss:[3.663/2.554], time:199.209 ms, step:[ 11/ 2484], metrics:['Loss: 2.5542', 'Top_1_Accuracy: 0.4375', 'Top_5_Accuracy: 0.7375'], loss:[3.438/2.635], time:173.766 ms, step:[ 12/ 2484], metrics:['Loss: 2.6345', 'Top_1_Accuracy: 0.4318', 'Top_5_Accuracy: 0.7159'], loss:[2.695/2.640], time:171.364 ms, step:[ 13/ 2484], metrics:['Loss: 2.6395', 'Top_1_Accuracy: 0.4375', 'Top_5_Accuracy: 0.7292'], loss:[3.542/2.709], time:172.889 ms, step:[ 14/ 2484], metrics:['Loss: 2.7090', 'Top_1_Accuracy: 0.4231', 'Top_5_Accuracy: 0.7308'], loss:[3.404/2.759], time:216.287 ms, step:[ 15/ 2484], metrics:['Loss: 2.7586', 'Top_1_Accuracy: 0.4018', 'Top_5_Accuracy: 0.7232'], loss:[4.012/2.842], time:171.686 ms, step:[ 16/ 2484], metrics:['Loss: 2.8422', 'Top_1_Accuracy: 0.3833', 'Top_5_Accuracy: 0.7167'], loss:[5.157/2.987], time:170.363 ms, step:[ 17/ 2484], metrics:['Loss: 2.9869', 'Top_1_Accuracy: 0.3750', 'Top_5_Accuracy: 0.6875'], loss:[4.667/3.086], time:171.926 ms, step:[ 18/ 2484], metrics:['Loss: 3.0857', 'Top_1_Accuracy: 0.3603', 'Top_5_Accuracy: 0.6618'], loss:[5.044/3.194], time:197.028 ms, step:[ 19/ 2484], metrics:['Loss: 3.1945', 'Top_1_Accuracy: 0.3403', 'Top_5_Accuracy: 0.6458'], loss:[3.625/3.217], time:222.758 ms, step:[ 20/ 2484], metrics:['Loss: 3.2171', 'Top_1_Accuracy: 0.3355', 'Top_5_Accuracy: 0.6513'], loss:[1.909/3.152], time:207.416 ms, step:[ 21/ 2484], metrics:['Loss: 3.1517', 'Top_1_Accuracy: 0.3563', 'Top_5_Accuracy: 0.6625'], loss:[4.591/3.220], time:171.645 ms, step:[ 22/ 2484], metrics:['Loss: 3.2202', 'Top_1_Accuracy: 0.3631', 'Top_5_Accuracy: 0.6667'], loss:[3.545/3.235], time:209.975 ms, step:[ 23/ 2484], metrics:['Loss: 3.2350', 'Top_1_Accuracy: 0.3693', 'Top_5_Accuracy: 0.6591'], loss:[3.350/3.240], time:185.889 ms,

    Code

    代碼倉庫地址如下:

    Gitee地址
    Github地址

    總結(jié)

    以上是生活随笔為你收集整理的R(2+1)D理解与MindSpore框架下的实现的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。