當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spatial Transformer Networks（STN）代码分析

發布時間：2025/4/16 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 Spatial Transformer Networks（STN）代码分析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

這是比較早的關于 attention的文章了。

早且作用大，效果也不錯。

關于這篇文章的解讀有很多，一找一大堆，就不再贅述。

首先看看文章的解讀，看懂原理，然后找到代碼，對著看看，明白之后就自己會改了，就可以用到自己需要的地方了。

例如，文章解說和代碼可參考：
一個文章解說地址
一個code地址

簡單來說，就是在分類之前，先將原圖作用于一個變換矩陣得到新的圖，再去分類。

所以核心就是
1、得到變換矩陣，一個2*3的矩陣，可以實現平移縮放旋轉裁剪等操作。
2、通過變換矩陣得到射變換前后的坐標的映射關系，即grid。
2、原圖作用于grid之后得到新圖，再卷積輸出分類。

一個使用代碼如下：

class STNSVHNet(nn.Module):def __init__(self, spatial_dim,in_channels, stn_kernel_size, kernel_size, num_classes=10, use_dropout=False):super(STNSVHNet, self).__init__()self._in_ch = in_channels self._ksize = kernel_size self._sksize = stn_kernel_sizeself.ncls = num_classes self.dropout = use_dropout self.drop_prob = 0.5self.stride = 1 self.spatial_dim = spatial_dimself.stnmod = STNModule.SpatialTransformer(self._in_ch, self.spatial_dim, self._sksize)self.conv1 = nn.Conv2d(self._in_ch, 32, kernel_size=self._ksize, stride=self.stride, padding=1, bias=False)self.conv2 = nn.Conv2d(32, 64, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv3 = nn.Conv2d(64, 128, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.fc1 = nn.Linear(128*4*4, 3092)self.fc2 = nn.Linear(3092, self.ncls)def forward(self, x):rois, affine_grid = self.stnmod(x)out = F.relu(self.conv1(rois))out = F.max_pool2d(out, 2)out = F.relu(self.conv2(out))out = F.max_pool2d(out, 2)out = F.relu(self.conv3(out))out = out.view(-1, 128*4*4)if self.dropout:out = F.dropout(self.fc1(out), p=0.5)else:out = self.fc1(out)out = self.fc2(out)return out

被調用的STN代如下：

class SpatialTransformer(nn.Module):"""Implements a spatial transformer as proposed in the Jaderberg paper. Comprises of 3 parts:1. Localization Net2. A grid generator 3. A roi pooled module.The current implementation uses a very small convolutional net with 2 convolutional layers and 2 fully connected layers. Backends can be swapped in favor of VGG, ResNets etc. TTMVReturns:A roi feature map with the same input spatial dimension as the input feature map. """def __init__(self, in_channels, spatial_dims, kernel_size,use_dropout=False):super(SpatialTransformer, self).__init__()self._h, self._w = spatial_dims self._in_ch = in_channels self._ksize = kernel_sizeself.dropout = use_dropout# localization net self.conv1 = nn.Conv2d(in_channels, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False) # size : [1x3x32x32]self.conv2 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv3 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv4 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.fc1 = nn.Linear(32*4*4, 1024)self.fc2 = nn.Linear(1024, 6)def forward(self, x): """Forward pass of the STN module. x -> input feature map """batch_images = xx = F.relu(self.conv1(x.detach()))x = F.relu(self.conv2(x))x = F.max_pool2d(x, 2)x = F.relu(self.conv3(x))x = F.max_pool2d(x,2)x = F.relu(self.conv3(x))x = F.max_pool2d(x, 2)print("Pre view size:{}".format(x.size()))x = x.view(-1, 32*4*4)if self.dropout:x = F.dropout(self.fc1(x), p=0.5)x = F.dropout(self.fc2(x), p=0.5)else:x = self.fc1(x)x = self.fc2(x) # params [Nx6]x = x.view(-1, 2,3) # change it to the 2x3 matrix print(x.size())affine_grid_points = F.affine_grid(x, torch.Size((x.size(0), self._in_ch, self._h, self._w)))assert(affine_grid_points.size(0) == batch_images.size(0)), "The batch sizes of the input images must be same as the generated grid."rois = F.grid_sample(batch_images, affine_grid_points)print("rois found to be of size:{}".format(rois.size()))return rois, affine_grid_points

核心代碼就兩句

affine_grid_points = F.affine_grid(x, torch.Size((x.size(0), self._in_ch, self._h, self._w))) rois = F.grid_sample(batch_images, affine_grid_points)

可以參考這個理解一下：
Pytorch中的仿射變換(affine_grid)

batch_images：是原圖
X：是2*3的變換矩陣，是原圖經過一系列卷積等網絡結構得到。
X后面的參數：表示在仿射變換中的輸出的shape，其格式 [N, C, H, W]，這里使得輸出的size大小維度和原圖一致。
F.affine_grid：即affine_grid_points 是得到仿射變換前后的坐標的映射關系。返回Shape為 [N, H, W, 2] 的4-D Tensor，表示其中，N、H、W分別為仿射變換中輸出feature map的batch size、高和寬。
grid_sample：就是將映射關系作用于原圖，得到新的圖，再將新圖進行卷積等操作，輸出即可。

因為是有監督學習，所以X會自己學習得到。后面就都有了。

《新程序員》：云原生和全面數字化實踐50位技術專家共同創作，文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的Spatial Transformer Networks（STN）代码分析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：最近看了看保险
下一篇： scipy minimize当目标函数需