转置卷积Transposed Convolution
轉置卷積Transposed Convolution
我們為卷積神經網絡引入的層,包括卷積層和池層,通常會減小輸入的寬度和高度,或者保持不變。然而,語義分割和生成對抗網絡等應用程序需要預測每個像素的值,因此需要增加輸入寬度和高度。轉置卷積,也稱為分步卷積或反卷積,就是為了達到這一目的。
from mxnet import np, npx, init
from mxnet.gluon import nn
from d2l import mxnet as d2l
npx.set_np()
- Basic 2D Transposed Convolution
讓我們考慮一個基本情況,輸入和輸出通道都是1,填充為0,步長為1。圖1說明了如何用2×2輸入矩陣計算2×2內核的。
Fig. 1. Transposed convolution layer with a 2×22×2 kernel.
可以通過給出矩陣核來實現這個運算
K和矩陣輸入X。
def trans_conv(X, K):
h, w = K.shapeY = np.zeros((X.shape[0] + h - 1, X.shape[1] + w - 1))for i in range(X.shape[0]):for j in range(X.shape[1]):Y[i: i + h, j: j + w] += X[i, j] * K
Return
卷積通過Y[i, j] = (X[i: i + h, j: j + w] * K).sum()計算結果,它通過內核匯總輸入值。而轉置卷積則通過核來傳輸輸入值,從而得到更大的輸出。
X = np.array([[0, 1], [2, 3]])
K = np.array([[0, 1], [2, 3]])
trans_conv(X, K)
array([[ 0., 0., 1.],
[ 0., 4., 6.],[ 4., 12., 9.]])
或者我們可以用nn.Conv2D轉置得到同樣的結果。作為nn.Conv2D,輸入和核都應該是四維張量。
X, K = X.reshape(1, 1, 2, 2), K.reshape(1, 1, 2, 2)
tconv = nn.Conv2DTranspose(1, kernel_size=2)
tconv.initialize(init.Constant(K))
tconv(X)
array([[[[ 0., 0., 1.],
[ 0., 4., 6.],[ 4., 12., 9.])
- Padding, Strides, and Channels
在卷積中,我們將填充元素應用于輸入,而在轉置卷積中將它們應用于輸出。A 1×1 padding意味著我們首先正常計算輸出,然后刪除第一行/最后一列。
tconv = nn.Conv2DTranspose(1, kernel_size=2, padding=1)
tconv.initialize(init.Constant(K))
tconv(X)
array([[4.])
同樣,在輸出中也應用了這個策略。
tconv = nn.Conv2DTranspose(1, kernel_size=2, strides=2)
tconv.initialize(init.Constant(K))
tconv(X)
array([0., 0., 0., 1.],
[0., 0., 2., 3.],[0., 2., 0., 3.],[4., 6., 6., 9.])
X = np.random.uniform(size=(1, 10, 16, 16))
conv = nn.Conv2D(20, kernel_size=5, padding=2, strides=3)
tconv = nn.Conv2DTranspose(10, kernel_size=5, padding=2, strides=3)
conv.initialize()
tconv.initialize()
tconv(conv(X)).shape == X.shape
True
- Analogy to Matrix Transposition
轉置卷積因矩陣轉置而得名。實際上,卷積運算也可以通過矩陣乘法來實現。在下面的示例中,我們定義了一個3×3× input XX with a 2×22×2 kernel K,然后使用corr2d計算卷積輸出。
X = np.arange(9).reshape(3, 3)
K = np.array([[0, 1], [2, 3]])
Y = d2l.corr2d(X, K)
Y
array([[19., 25.], [37., 43.]])
Next, we rewrite convolution kernel KK as a matrix WW. Its shape will be (4,9)(4,9), where the ithith row present applying the kernel to the input to generate the ithith output element.
def kernel2matrix(K):
k, W = np.zeros(5), np.zeros((4, 9))k[:2], k[3:5] = K[0, :], K[1, :]W[0, :5], W[1, 1:6], W[2, 3:8], W[3, 4:] = k, k, k, kreturn W
W = kernel2matrix(K)
W
array([[0., 1., 0., 2., 3., 0., 0., 0., 0.],
[0., 0., 1., 0., 2., 3., 0., 0., 0.],[0., 0., 0., 0., 1., 0., 2., 3., 0.],[0., 0., 0., 0., 0., 1., 0., 2., 3.]])
然后通過適當的整理,用矩陣乘法實現卷積算子。
Y == np.dot(W, X.reshape(-1)).reshape(2, 2)
array([[ True, True],
[ True, True]])
We can implement transposed convolution as a matrix multiplication as well by reusing kernel2matrix. To reuse the generated WW, we construct a 2×22×2 input, so the corresponding weight matrix will
have a shape (9,4)(9,4), which is W?W?. Let us verify the results.
X = np.array([0, 1], [2, 3])
Y = trans_conv(X, K)
Y == np.dot(W.T, X.reshape(-1)).reshape(3, 3)
array([[ True, True, True],
[ True, True, True],[ True, True, True]])
- Summary
· Compared to convolutions that reduce inputs through kernels, transposed convolutions broadcast inputs.
· If a convolution layer reduces the input width and height by nwnw and hhhh time, respectively. Then a transposed convolution layer with the same kernel sizes, padding and strides will increase the input width and height by nwnw and nhnh, respectively.
· We can implement convolution operations by the matrix multiplication, the corresponding transposed convolutions can be done by transposed matrix multiplication.
總結
以上是生活随笔為你收集整理的转置卷积Transposed Convolution的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 语义分割与数据集
- 下一篇: 多尺度目标检测 Multiscale O