日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spatial Transformer Networks(STN)代码分析

發布時間:2025/4/16 编程问答 26 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Spatial Transformer Networks(STN)代码分析 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

這是比較早的關于 attention的 文章了。

早且作用大,效果也不錯。

關于這篇文章的解讀有很多,一找一大堆,就不再贅述。

首先看看文章的解讀,看懂原理,然后找到代碼,對著看看,明白之后就自己會改了,就可以用到自己需要的地方了。

例如,文章解說和代碼可參考:
一個文章解說地址
一個code地址

簡單來說,就是在分類之前,先將原圖作用于一個變換矩陣得到新的圖,再去分類。

所以核心就是
1、得到變換矩陣,一個2*3的矩陣,可以實現平移縮放旋轉裁剪等操作。
2、通過變換矩陣得到射變換前后的坐標的映射關系,即grid。
2、原圖作用于grid之后得到新圖,再卷積輸出分類。

一個使用代碼如下:

class STNSVHNet(nn.Module):def __init__(self, spatial_dim,in_channels, stn_kernel_size, kernel_size, num_classes=10, use_dropout=False):super(STNSVHNet, self).__init__()self._in_ch = in_channels self._ksize = kernel_size self._sksize = stn_kernel_sizeself.ncls = num_classes self.dropout = use_dropout self.drop_prob = 0.5self.stride = 1 self.spatial_dim = spatial_dimself.stnmod = STNModule.SpatialTransformer(self._in_ch, self.spatial_dim, self._sksize)self.conv1 = nn.Conv2d(self._in_ch, 32, kernel_size=self._ksize, stride=self.stride, padding=1, bias=False)self.conv2 = nn.Conv2d(32, 64, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv3 = nn.Conv2d(64, 128, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.fc1 = nn.Linear(128*4*4, 3092)self.fc2 = nn.Linear(3092, self.ncls)def forward(self, x):rois, affine_grid = self.stnmod(x)out = F.relu(self.conv1(rois))out = F.max_pool2d(out, 2)out = F.relu(self.conv2(out))out = F.max_pool2d(out, 2)out = F.relu(self.conv3(out))out = out.view(-1, 128*4*4)if self.dropout:out = F.dropout(self.fc1(out), p=0.5)else:out = self.fc1(out)out = self.fc2(out)return out

被調用的STN代如下:

class SpatialTransformer(nn.Module):"""Implements a spatial transformer as proposed in the Jaderberg paper. Comprises of 3 parts:1. Localization Net2. A grid generator 3. A roi pooled module.The current implementation uses a very small convolutional net with 2 convolutional layers and 2 fully connected layers. Backends can be swapped in favor of VGG, ResNets etc. TTMVReturns:A roi feature map with the same input spatial dimension as the input feature map. """def __init__(self, in_channels, spatial_dims, kernel_size,use_dropout=False):super(SpatialTransformer, self).__init__()self._h, self._w = spatial_dims self._in_ch = in_channels self._ksize = kernel_sizeself.dropout = use_dropout# localization net self.conv1 = nn.Conv2d(in_channels, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False) # size : [1x3x32x32]self.conv2 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv3 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv4 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.fc1 = nn.Linear(32*4*4, 1024)self.fc2 = nn.Linear(1024, 6)def forward(self, x): """Forward pass of the STN module. x -> input feature map """batch_images = xx = F.relu(self.conv1(x.detach()))x = F.relu(self.conv2(x))x = F.max_pool2d(x, 2)x = F.relu(self.conv3(x))x = F.max_pool2d(x,2)x = F.relu(self.conv3(x))x = F.max_pool2d(x, 2)print("Pre view size:{}".format(x.size()))x = x.view(-1, 32*4*4)if self.dropout:x = F.dropout(self.fc1(x), p=0.5)x = F.dropout(self.fc2(x), p=0.5)else:x = self.fc1(x)x = self.fc2(x) # params [Nx6]x = x.view(-1, 2,3) # change it to the 2x3 matrix print(x.size())affine_grid_points = F.affine_grid(x, torch.Size((x.size(0), self._in_ch, self._h, self._w)))assert(affine_grid_points.size(0) == batch_images.size(0)), "The batch sizes of the input images must be same as the generated grid."rois = F.grid_sample(batch_images, affine_grid_points)print("rois found to be of size:{}".format(rois.size()))return rois, affine_grid_points

核心代碼就兩句

affine_grid_points = F.affine_grid(x, torch.Size((x.size(0), self._in_ch, self._h, self._w))) rois = F.grid_sample(batch_images, affine_grid_points)

可以參考這個理解一下:
Pytorch中的仿射變換(affine_grid)

  • batch_images:是原圖
  • X:是2*3的變換矩陣,是原圖經過一系列卷積等網絡結構得到。
  • X后面的參數:表示在仿射變換中的輸出的shape,其格式 [N, C, H, W],這里使得輸出的size大小維度和原圖一致。
  • F.affine_grid:即affine_grid_points 是得到仿射變換前后的坐標的映射關系。返回Shape為 [N, H, W, 2] 的4-D Tensor,表示其中,N、H、W分別為仿射變換中輸出feature map的batch size、高和寬。
  • grid_sample:就是將映射關系作用于原圖,得到新的圖,再將新圖進行卷積等操作,輸出即可。

因為是有監督學習,所以X會自己學習得到。后面就都有了。

《新程序員》:云原生和全面數字化實踐50位技術專家共同創作,文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的Spatial Transformer Networks(STN)代码分析的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。