當前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

深度学习之自编码器（2）Fashion MNIST图片重建实战

發(fā)布時間：2023/12/15 pytorch 36 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学习之自编码器（2）Fashion MNIST图片重建实战小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

深度學習之自編碼器（2）Fashion MNIST圖片重建實戰(zhàn)

1. Fashion MNIST數(shù)據(jù)集
2. 編碼器
3. 解碼器
4. 自編碼器
5. 網(wǎng)絡訓練
6. 圖片重建
完整代碼

?自編碼器算法原理非常簡單，實現(xiàn)方便，訓練也較穩(wěn)定，相對于PCA算法，神經(jīng)網(wǎng)絡的強大表達能力可以學習到輸入的高層抽象的隱層特征向量

z\boldsymbol z

，同時也能夠基于

z\boldsymbol z

重建出輸入。這里我們基于Fashion MNIST數(shù)據(jù)集進行圖片重建實戰(zhàn)。

1. Fashion MNIST數(shù)據(jù)集

?Fashion MNIST是一個定位比MNIST圖片識別問題稍微復雜的數(shù)據(jù)集，它的設定與MNIST幾乎完全一樣，包含了10類不同類型的衣服、鞋子、包等灰度圖片，圖片大小為 $28×2828\times28$ ，共70000張圖片，其中60000張用于訓練集，10000張用于測試集，如下圖所示，每行都是一種類別圖片。

Fashion MNIST數(shù)據(jù)集

?可以看到，Fashion MNIST除了圖片內容與MNIST不一樣，其它設定都相同，大部分情況可以直接替換掉原來基于MNIST訓練的算法代碼，而不需要額外修改。由于Fashion MNIST圖片識別相對于MNIST圖片更難，因此可以用于測試稍微復雜的算法性能。

?在TensorFlow中，加載Fashion MNIST數(shù)據(jù)集同樣非常方便，利用keras.datasets.fashion_mnist.load_data()函數(shù)即可在線下載、管理和加載。由于在線加載十分緩慢，我使用了本地加載。數(shù)據(jù)加載和測試代碼如下：

import os import tensorflow as tf import numpy as np import sslfrom Chapter12.Fashion_MNIST_dataload import get_dataos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' ssl._create_default_https_context = ssl._create_unverified_contextbatchsz = 512# 加載Fashion MNIST圖片數(shù)據(jù)集 (x_train, y_train), (x_test, y_test) = get_data() # 歸一化 x_train, x_test = x_train.astype(np.float32) / 255., x_test.astype(np.float32) / 255. # 只需要通過圖片數(shù)據(jù)即可構建數(shù)據(jù)集對象，不需要標簽 train_db = tf.data.Dataset.from_tensor_slices(x_train) train_db = train_db.shuffle(batchsz * 5).batch(batchsz) # 構建測試集對象 test_db = tf.data.Dataset.from_tensor_slices(x_test) test_db = test_db.batch(batchsz) # 打印訓練集和測試集的shape print(x_train.shape, y_train.shape) print(x_test.shape, y_test.shape)

運行結果如下所示：

(60000, 28, 28) (60000,) (10000, 28, 28) (10000,)

其中，數(shù)據(jù)加載函數(shù)get_data()實現(xiàn)如下：

import numpy as np import gzipdef get_data():# 文件獲取train_image = r"/Users/XXX/.keras/datasets/fashion_mnist/train-images-idx3-ubyte.gz"test_image = r"/Users/XXX/.keras/datasets/fashion_mnist/t10k-images-idx3-ubyte.gz"train_label = r"/Users/XXX/.keras/datasets/fashion_mnist/train-labels-idx1-ubyte.gz"test_label = r"/Users/XXX/.keras/datasets/fashion_mnist/t10k-labels-idx1-ubyte.gz" # 文件路徑paths = [train_label, train_image, test_label, test_image]with gzip.open(paths[0], 'rb') as lbpath:y_train = np.frombuffer(lbpath.read(), np.uint8, offset=8)with gzip.open(paths[1], 'rb') as imgpath:x_train = np.frombuffer(imgpath.read(), np.uint8, offset=16).reshape(len(y_train), 28, 28)with gzip.open(paths[2], 'rb') as lbpath:y_test = np.frombuffer(lbpath.read(), np.uint8, offset=8)with gzip.open(paths[3], 'rb') as imgpath:x_test = np.frombuffer(imgpath.read(), np.uint8, offset=16).reshape(len(y_test), 28, 28)return (x_train, y_train), (x_test, y_test)

參考：
[1] fashion-mnist簡介和使用及下載
[2] 從本地加載FASHION MNIST數(shù)據(jù)集并輸入到模型進行訓練

2. 編碼器

?我們利用編碼器將輸入圖片 $x∈R784\boldsymbol x\in R^{784}$ 降維到較低維度的隱藏向量： $h∈R20\boldsymbol h\in R^{20}$ ，并基于隱藏向量 $h\boldsymbol h$ 利用解碼器重建圖片，自編碼器模型如下圖所示，編碼器由3層全連接層網(wǎng)絡組成，輸出節(jié)點數(shù)分別為256、128、20，解碼器同樣由3層全連接網(wǎng)絡組成，輸出節(jié)點數(shù)分別為128、256、784。

Fashion MNIST自編碼器網(wǎng)絡結構

?首先是編碼器子網(wǎng)絡的實現(xiàn)。利用3層的神經(jīng)網(wǎng)絡將長度為784的圖片向量數(shù)據(jù)一次降維到256、128，最后降維到h_dim維度，每層使用ReLU激活函數(shù)，最后一層不使用激活函數(shù)。代碼如下：

# 創(chuàng)建Encoders網(wǎng)絡，實現(xiàn)在自編碼器類的初始化函數(shù)中 self.encoder = Sequential([layers.Dense(256, activation=tf.nn.relu),layers.Dense(128, activation=tf.nn.relu),layers.Dense(h_dim) ])

3. 解碼器

?然后再來創(chuàng)建解碼器子網(wǎng)絡，這里基于隱藏向量h_dim一次升維到128、256、784長度，除最后一層，激活函數(shù)使用ReLU函數(shù)。解碼器的輸出為784長度的向量，代表了打平后的 $28×2828\times28$ 大小圖片，通過Reshape操作即可恢復為圖片矩陣。代碼如下：

# 創(chuàng)建Decoders網(wǎng)絡 self.decoder = Sequential([layers.Dense(128, activation=tf.nn.relu),layers.Dense(256, activation=tf.nn.relu),layers.Dense(784) ])

4. 自編碼器

?上述的編碼器和解碼器2個子網(wǎng)絡均實現(xiàn)在自編碼器類AE中，我們在初始化函數(shù)中同時創(chuàng)建這兩個子網(wǎng)絡。代碼如下：

class AE(keras.Model):def __init__(self):super(AE, self).__init__()# 創(chuàng)建Encoders網(wǎng)絡，實現(xiàn)在自編碼器類的初始化函數(shù)中self.encoder = Sequential([layers.Dense(256, activation=tf.nn.relu),layers.Dense(128, activation=tf.nn.relu),layers.Dense(h_dim)])# 創(chuàng)建Decoders網(wǎng)絡self.decoder = Sequential([layers.Dense(128, activation=tf.nn.relu),layers.Dense(256, activation=tf.nn.relu),layers.Dense(784)])

?接下來將前向傳播過程實現(xiàn)在call函數(shù)中，輸入圖片首先通過encoder子網(wǎng)絡得到隱藏向量h，再通過decoder得到重建圖片。一次調用編碼器和解碼器的前向傳播函數(shù)即可，代碼如下：

def call(self, inputs, training=None):# [b, 784] => [b, 10]h = self.encoder(inputs)# [b, 10] => [b, 784]x_hat = self.decoder(h)return x_hat

5. 網(wǎng)絡訓練

?自編碼器的訓練過程與分類器的基本一致，通過誤差函數(shù)計算出重建向量 $xˉ\bar\boldsymbol x$ 與原始輸入 $x\boldsymbol x$ 之間的距離，再利用TensorFlow的自動求導機制同時求出encoder和decoder的梯度，循環(huán)更新即可。

?首先創(chuàng)建自編碼器實例和優(yōu)化器，并設置合適的學習率。例如：

# 創(chuàng)建網(wǎng)絡對象 model = AE() # 指定輸入大小 model.build(input_shape=(None, 784)) # 打印網(wǎng)絡信息 model.summary() # 創(chuàng)建優(yōu)化器，并設置學習率 optimizer = tf.optimizers.Adam(lr=lr)

?這里固定訓練100個Epoch，每次通過前向計算獲得重建圖片向量，并利用tf.nn.sigmoid_cross+entropy_with_logits損失函數(shù)計算城建圖片與原始圖片直接的誤差，實際上利用MSE誤差函數(shù)也是可行的。代碼如下：

for epoch in range(100): # 訓練100個Epochfor step, x in enumerate(train_db): # 遍歷訓練集# 打平，[b, 28, 28] => [b, 784]x = tf.reshape(x, [-1, 784])# 構建梯度記錄器with tf.GradientTape() as tape:# 前向計算獲得重建的圖片x_rec_logits = model(x)# 計算重建圖片與輸入之間的損失函數(shù)rec_loss = tf.losses.binary_crossentropy(x, x_rec_logits, from_logits=True)# 計算均值rec_loss = tf.reduce_mean(rec_loss)# 自動求導，包含了2個子網(wǎng)絡的梯度grads = tape.gradient(rec_loss, model.trainable_variables)# 自動更新，同時更新2個子網(wǎng)絡optimizer.apply_gradients(zip(grads, model.trainable_variables))if step % 100 == 0:# 間隔性打印訓練誤差print(epoch, step, float(rec_loss))

6. 圖片重建

?與分類問題不同的是，自編碼器的模型性能一般不好量化評價，盡管L值可以在一定程度上代表網(wǎng)絡的學習效果，但我們最終希望獲得還原度較高、樣式較豐富的重建樣本。因此一般需要更具具體問題來討論自編碼器的學習效果，比如對于圖片重建，一般依賴于人工主管評價圖片生成的質量，或利用某些圖片逼真度計算方法（如Inception Score和Frechet Inception Distance）來輔助評估。

?為了測試圖片重建效果，我們把數(shù)據(jù)集切分為訓練集與測試集，其中測試集不參與訓練。我們從測試集中隨機采樣測試圖片 $x∈Dtest\boldsymbol x\in \mathbb{D}^{test}$ ，經(jīng)過自編碼器計算得到重建后的圖片，然后將真實圖片與重建圖片保存為圖片陣列，并可視化，方便對比。代碼如下：

# 重建圖片，從測試集采樣一批圖片 x = next(iter(test_db)) logits = model(tf.reshape(x, [-1, 784])) # 打平并送入自編碼器 x_hat = tf.sigmoid(logits) # 將輸出轉換為像素值，使用sigmoid函數(shù) # 恢復為28×28，[b, 784] => [b, 28, 28] x_hat = tf.reshape(x_hat, [-1, 28, 28])# 輸入的前50張+重建的前50張圖片合并，[b, 28, 28] => [2b, 28, 28] x_concat = tf.concat([x, x_hat], axis=0) x_concat = x_hat x_concat = x_concat.numpy() * 255. # 恢復為0~255范圍 x_concat = x_concat.astype(np.uint8) # 轉換為整型 save_images(x_concat, 'ae_images/rec_epoch_%d.png' % epoch) # 保存圖片

?圖片重建的效果如下圖所示，其中每張圖片的左邊5列為真實圖片，右邊5列為對應的重建圖片。

第1個Epoch

第50個Epoch

第100個Epoch

可以看到，第一個Epoch時，圖片重建效果交叉，圖片非常模糊，逼真度較差；隨著訓練的進行，重建圖片邊緣越來越清晰，第100個Epoch時，重建的圖片效果以及比較接近真實圖片。

?這里的save_images函數(shù)負責將多張圖片合并并保存為一張大圖，這部分代碼使用PIL圖片庫完成圖片陣列邏輯，代碼如下：

def save_images(imgs, name):# 創(chuàng)建280×280大小的圖片陣列new_im = Image.new('L', (280, 280))index = 0for i in range(0, 280, 28): # 10行圖片陣列for j in range(0, 280, 28): # 10列圖片陣列im = imgs[index]im = Image.fromarray(im, mode='L')new_im.paste(im, (i, j)) # 寫入對應位置index += 1# 保存圖片陣列new_im.save(name)

完整代碼

import os import tensorflow as tf import numpy as np from tensorflow import keras from tensorflow.keras import Sequential, layers from PIL import Image from matplotlib import pyplot as plt import sslfrom Chapter12.Fashion_MNIST_dataload import get_datassl._create_default_https_context = ssl._create_unverified_context tf.random.set_seed(22) np.random.seed(22) os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' assert tf.__version__.startswith('2.')def save_images(imgs, name):# 創(chuàng)建280×280大小的圖片陣列new_im = Image.new('L', (280, 280))index = 0for i in range(0, 280, 28): # 10行圖片陣列for j in range(0, 280, 28): # 10列圖片陣列im = imgs[index]im = Image.fromarray(im, mode='L')new_im.paste(im, (i, j)) # 寫入對應位置index += 1# 保存圖片陣列new_im.save(name)h_dim = 20 batchsz = 512 lr = 1e-3(x_train, y_train), (x_test, y_test) = get_data() x_train, x_test = x_train.astype(np.float32) / 255., x_test.astype(np.float32) / 255. # we do not need label train_db = tf.data.Dataset.from_tensor_slices(x_train) train_db = train_db.shuffle(batchsz * 5).batch(batchsz) test_db = tf.data.Dataset.from_tensor_slices(x_test) test_db = test_db.batch(batchsz)print(x_train.shape, y_train.shape) print(x_test.shape, y_test.shape)class AE(keras.Model):def __init__(self):super(AE, self).__init__()# 創(chuàng)建Encoders網(wǎng)絡，實現(xiàn)在自編碼器類的初始化函數(shù)中self.encoder = Sequential([layers.Dense(256, activation=tf.nn.relu),layers.Dense(128, activation=tf.nn.relu),layers.Dense(h_dim)])# 創(chuàng)建Decoders網(wǎng)絡self.decoder = Sequential([layers.Dense(128, activation=tf.nn.relu),layers.Dense(256, activation=tf.nn.relu),layers.Dense(784)])def call(self, inputs, training=None):# [b, 784] => [b, 10]h = self.encoder(inputs)# [b, 10] => [b, 784]x_hat = self.decoder(h)return x_hat# 創(chuàng)建網(wǎng)絡對象 model = AE() # 指定輸入大小 model.build(input_shape=(None, 784)) # 打印網(wǎng)絡信息 model.summary() # 創(chuàng)建優(yōu)化器，并設置學習率 optimizer = tf.optimizers.Adam(lr=lr)for epoch in range(100): # 訓練100個Epochfor step, x in enumerate(train_db): # 遍歷訓練集# 打平，[b, 28, 28] => [b, 784]x = tf.reshape(x, [-1, 784])# 構建梯度記錄器with tf.GradientTape() as tape:# 前向計算獲得重建的圖片x_rec_logits = model(x)# 計算重建圖片與輸入之間的損失函數(shù)rec_loss = tf.losses.binary_crossentropy(x, x_rec_logits, from_logits=True)# 計算均值rec_loss = tf.reduce_mean(rec_loss)# 自動求導，包含了2個子網(wǎng)絡的梯度grads = tape.gradient(rec_loss, model.trainable_variables)# 自動更新，同時更新2個子網(wǎng)絡optimizer.apply_gradients(zip(grads, model.trainable_variables))if step % 100 == 0:# 間隔性打印訓練誤差print(epoch, step, float(rec_loss))# evaluation# 重建圖片，從測試集采樣一批圖片x = next(iter(test_db))logits = model(tf.reshape(x, [-1, 784])) # 打平并送入自編碼器x_hat = tf.sigmoid(logits) # 將輸出轉換為像素值，使用sigmoid函數(shù)# 恢復為28×28，[b, 784] => [b, 28, 28]x_hat = tf.reshape(x_hat, [-1, 28, 28])# 輸入的前50張+重建的前50張圖片合并，[b, 28, 28] => [2b, 28, 28]x_concat = tf.concat([x, x_hat], axis=0)x_concat = x_hatx_concat = x_concat.numpy() * 255. # 恢復為0~255范圍x_concat = x_concat.astype(np.uint8) # 轉換為整型save_images(x_concat, 'ae_images/rec_epoch_%d.png' % epoch) # 保存圖片

總結

以上是生活随笔為你收集整理的深度学习之自编码器（2）Fashion MNIST图片重建实战的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： win10怎么打开本地组策略编辑器 wi
下一篇：深度学习之自编码器（3）自编码器变种