【Tensorflow】卷积神经网络实现艺术风格化通过Vgg16实现
卷積神經(jīng)網(wǎng)絡(luò)實(shí)現(xiàn)藝術(shù)風(fēng)格化
基于卷積神經(jīng)網(wǎng)絡(luò)實(shí)現(xiàn)圖片風(fēng)格的遷移,可以用于大學(xué)生畢業(yè)設(shè)計(jì)基于python,深度學(xué)習(xí),tensorflow卷積神經(jīng)網(wǎng)絡(luò), 通過Vgg16實(shí)現(xiàn),一幅圖片內(nèi)容特征的基礎(chǔ)上添加另一幅圖片的風(fēng)格特征從而生成一幅新的圖片。在卷積模型訓(xùn)練中,通過輸入固定的圖片來調(diào)整網(wǎng)絡(luò)的參數(shù)從而達(dá)到利用圖片訓(xùn)練網(wǎng)絡(luò)的目的。而在生成特定風(fēng)格圖片時(shí),固定已有的網(wǎng)絡(luò)參數(shù)不變,調(diào)整圖片從而使圖片向目標(biāo)風(fēng)格轉(zhuǎn)化。在內(nèi)容風(fēng)格轉(zhuǎn)換時(shí),調(diào)整圖像的像素值,使其向目標(biāo)圖片在卷積網(wǎng)絡(luò)輸出的內(nèi)容特征靠攏。在風(fēng)格特征計(jì)算時(shí),通過多個(gè)神經(jīng)元的輸出兩兩之間作內(nèi)積求和得到Gram矩陣,然后對(duì)G矩陣做差求均值得到風(fēng)格的損失函數(shù)。
示例代碼:
import time import numpy as np import tensorflow as tf from PIL import Image from keras import backend from keras.models import Model from keras.applications.vgg16 import VGG16 from scipy.optimize import fmin_l_bfgs_b from scipy.misc import imsave加載和預(yù)處理內(nèi)容和樣式圖像
加載內(nèi)容和樣式圖像,注意,我們正在處理的內(nèi)容圖像質(zhì)量不是特別高,但是在這個(gè)過程結(jié)束時(shí)我們將得到的輸出看起來仍然非常好。
height = 512 width = 512 content_image_path = 'images/elephant.jpg' content_image = Image.open(content_image_path) content_image = content_image.resize((height, width)) content_image style_image_path = 'images/styles/wave.jpg' style_image = Image.open(style_image_path) style_image = style_image.resize((height, width)) style_image然后,我們將這些圖像轉(zhuǎn)換成適合于數(shù)值處理的形式。
特別注意,我們添加了另一個(gè)維度(高度x寬度x 3維度)
以便我們可以稍后將這兩個(gè)圖像的表示連接到一個(gè)公共數(shù)據(jù)結(jié)構(gòu)中。
content_array = np.asarray(content_image, dtype='float32') content_array = np.expand_dims(content_array, axis=0) print(content_array.shape)style_array = np.asarray(style_image, dtype='float32') style_array = np.expand_dims(style_array, axis=0) print(style_array.shape) (1, 512, 512, 3) (1, 512, 512, 3)我們需要執(zhí)行兩個(gè)轉(zhuǎn)換:
1.從每個(gè)像素中減去平均RGB值 (具體原因可查論文此處原因暫時(shí)省略)
2.將多維數(shù)組的順序從RGB翻轉(zhuǎn)到BGR(本文中使用的順序)。
content_array[:, :, :, 0] -= 103.939 content_array[:, :, :, 1] -= 116.779 content_array[:, :, :, 2] -= 123.68 content_array = content_array[:, :, :, ::-1]style_array[:, :, :, 0] -= 103.939 style_array[:, :, :, 1] -= 116.779 style_array[:, :, :, 2] -= 123.68 style_array = style_array[:, :, :, ::-1]現(xiàn)在我們可以使用這些數(shù)組在Keras的后端(TensorFlow圖)中定義變量了。
我們還引入了一個(gè)占位符變量來存儲(chǔ)組合圖像,
該圖像在合并樣式圖像的樣式時(shí)保留了內(nèi)容圖像的內(nèi)容。
最后,我們將所有這些圖像數(shù)據(jù)連接到一個(gè)張量中,
該張量適合用Keras’VGG16模型進(jìn)行處理。
重新使用預(yù)先訓(xùn)練的圖像分類模型來定義損失函數(shù)
由于我們對(duì)分類問題不感興趣,因此不需要完全連接的層或最終的softmax分類器。我們只需要下表中用綠色標(biāo)記的那部分型號(hào)。
對(duì)于我們來說,訪問這個(gè)被截?cái)嗟哪P褪呛芎唵蔚?#xff0c;因?yàn)镵eras附帶了一組預(yù)先訓(xùn)練的模型,包括我們感興趣的VGG16模型。注意,通過在下面的代碼中設(shè)置“include_top=False”,我們不包括任何完全連接的層。
從上表可以看出,我們使用的模型有很多層。對(duì)于這些層,Keras有自己的名稱。讓我們列出這些名稱,以便以后可以方便地引用各個(gè)層。
layers = dict([(layer.name, layer.output) for layer in model.layers]) layers讀取本地模型
import scipy as scipydef load_vgg_model(path):"""Returns a model for the purpose of 'painting' the picture.Takes only the convolution layer weights and wrap using the TensorFlowConv2d, Relu and AveragePooling layer. VGG actually uses maxpool butthe paper indicates that using AveragePooling yields better results.The last few fully connected layers are not used.Here is the detailed configuration of the VGG model:0 is conv1_1 (3, 3, 3, 64)1 is relu2 is conv1_2 (3, 3, 64, 64)3 is relu 4 is maxpool5 is conv2_1 (3, 3, 64, 128)6 is relu7 is conv2_2 (3, 3, 128, 128)8 is relu9 is maxpool10 is conv3_1 (3, 3, 128, 256)11 is relu12 is conv3_2 (3, 3, 256, 256)13 is relu14 is conv3_3 (3, 3, 256, 256)15 is relu16 is maxpool17 is conv4_1 (3, 3, 256, 512)18 is relu19 is conv4_2 (3, 3, 512, 512)20 is relu21 is conv4_3 (3, 3, 512, 512)22 is relu23 is maxpool24 is conv5_1 (3, 3, 512, 512)25 is relu26 is conv5_2 (3, 3, 512, 512)27 is relu28 is conv5_3 (3, 3, 512, 512)29 is relu30 is maxpool31 is fullyconnected (7, 7, 512, 4096)32 is relu33 is fullyconnected (1, 1, 4096, 4096)34 is relu35 is fullyconnected (1, 1, 4096, 1000)36 is softmax"""vgg = scipy.io.loadmat(path)vgg_layers = vgg['layers']def _weights(layer, expected_layer_name):"""Return the weights and bias from the VGG model for a given layer.layers[0][0][0][0][2][0][0]"""W = vgg_layers[0][layer][0][0][2][0][0]b = vgg_layers[0][layer][0][0][2][0][1]layer_name = vgg_layers[0][layer][0][0][0]assert layer_name == expected_layer_namereturn W, bdef _relu(conv2d_layer):"""Return the RELU function wrapped over a TensorFlow layer. Expects aConv2d layer input."""return tf.nn.relu(conv2d_layer)def _conv2d(prev_layer, layer, layer_name):"""Return the Conv2D layer using the weights, biases from the VGGmodel at 'layer'."""W, b = _weights(layer, layer_name)W = tf.constant(W)b = tf.constant(np.reshape(b, (b.size)))return tf.nn.conv2d(prev_layer, filter=W, strides=[1, 1, 1, 1], padding='SAME') + bdef _conv2d_relu(prev_layer, layer, layer_name):"""Return the Conv2D + RELU layer using the weights, biases from the VGGmodel at 'layer'."""return _relu(_conv2d(prev_layer, layer, layer_name))# def _avgpool(prev_layer): # """ # Return the AveragePooling layer.# return tf.nn.avg_pool(prev_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')def _avgpool(prev_layer):return tf.nn.max_pool(prev_layer,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')# Constructs the graph model.graph = {}graph['input'] = input_tensorgraph['conv1_1'] = _conv2d_relu(graph['input'], 0, 'conv1_1')graph['conv1_2'] = _conv2d_relu(graph['conv1_1'], 2, 'conv1_2')graph['_maxpool1'] = _avgpool(graph['conv1_2'])graph['conv2_1'] = _conv2d_relu(graph['_maxpool1'], 5, 'conv2_1')graph['conv2_2'] = _conv2d_relu(graph['conv2_1'], 7, 'conv2_2')graph['_maxpool2'] = _avgpool(graph['conv2_2'])graph['conv3_1'] = _conv2d_relu(graph['_maxpool2'], 10, 'conv3_1')graph['conv3_2'] = _conv2d_relu(graph['conv3_1'], 12, 'conv3_2')graph['conv3_3'] = _conv2d_relu(graph['conv3_2'], 14, 'conv3_3')graph['_maxpool3'] = _avgpool(graph['conv3_3'])graph['conv4_1'] = _conv2d_relu(graph['_maxpool3'], 17, 'conv4_1')graph['conv4_2'] = _conv2d_relu(graph['conv4_1'], 19, 'conv4_2')graph['conv4_3'] = _conv2d_relu(graph['conv4_2'], 21, 'conv4_3')graph['_maxpool4'] = _avgpool(graph['conv4_3'])graph['conv5_1'] = _conv2d_relu(graph['_maxpool4'], 24, 'conv5_1')graph['conv5_2'] = _conv2d_relu(graph['conv5_1'], 26, 'conv5_2')graph['conv5_3'] = _conv2d_relu(graph['conv5_2'], 28, 'conv5_3')graph['_maxpool5'] = _avgpool(graph['conv5_3'])return graph from scipy import io import tensorflow as tf model = load_vgg_model('./imagenet-vgg-verydeep-16.mat')如果你盯著上面的單子看,你會(huì)相信我們已經(jīng)把所有我們想要的東西都放在桌子上了(綠色的單元格)。還要注意,因?yàn)槲覀優(yōu)镵eras提供了一個(gè)具體的輸入張量,所以各種張量流張量得到了定義良好的形狀。
風(fēng)格轉(zhuǎn)換問題可以作為一個(gè)優(yōu)化問題
其中我們希望最小化的損失函數(shù)可以分解為三個(gè)不同的部分:內(nèi)容損失、風(fēng)格損失和總變化損失。
這些項(xiàng)的相對(duì)重要性由一組標(biāo)量權(quán)重決定。這些都是任意的,但是在經(jīng)過相當(dāng)多的實(shí)驗(yàn)之后選擇了下面的集合,以找到一個(gè)生成對(duì)我來說美觀的輸出的集合。
content_weight = 0.025 style_weight = 5.0 total_variation_weight = 1.0我們現(xiàn)在將使用模型的特定層提供的特征空間來定義這三個(gè)損失函數(shù)。我們首先將總損失初始化為0,然后分階段將其相加。
loss = tf.Variable(0.) losscontent_loss
content_loss 是內(nèi)容的特征表示與組合圖像之間的(縮放,平方)歐氏距離。
def content_loss(content, combination):return tf.reduce_sum(tf.square(combination - content))layer_features = model['conv2_2'] content_image_features = layer_features[0, :, :, :] combination_features = layer_features[2, :, :, :]loss += content_weight * content_loss(content_image_features,combination_features)風(fēng)格損失
這就是事情開始變得有點(diǎn)復(fù)雜的地方。
對(duì)于樣式丟失,我們首先定義一個(gè)稱為Gram matrix的東西。該矩陣的項(xiàng)與對(duì)應(yīng)特征集的協(xié)方差成正比,從而捕獲關(guān)于哪些特征傾向于一起激活的信息。通過只捕獲圖像中的這些聚合統(tǒng)計(jì)信息,它們對(duì)圖像中對(duì)象的特定排列是盲目的。這使他們能夠捕獲與內(nèi)容無關(guān)的樣式信息。(這根本不是微不足道的,我指的是[試圖解釋這個(gè)想法的論文] 。
通過對(duì)特征空間進(jìn)行適當(dāng)?shù)闹貥?gòu)并取外積,可以有效地計(jì)算出Gram矩陣。
def gram_matrix(x):features = backend.batch_flatten(backend.permute_dimensions(x, (2, 0, 1)))gram = backend.dot(features, backend.transpose(features))return gram #也可用tf的方法 不使用kears的后端 # def gram_matrix(x): # ret = tf.transpose(x, (2, 0, 1)) # features = tf.reshape(ret,[ret.shape[0],-1]) # gram = tf.matmul(features,tf.transpose(features)) # return gram樣式損失是樣式和組合圖像的Gram矩陣之間的差的(縮放,平方)Frobenius范數(shù)。
同樣,在下面的代碼中,我選擇使用Johnson等人定義的圖層中的樣式特性。(2016)而不是蓋蒂等人。(2015)因?yàn)槲矣X得最終的結(jié)果更美觀。我鼓勵(lì)你嘗試這些選擇,以看到不同的結(jié)果。
def style_loss(style, combination):S = gram_matrix(style)C = gram_matrix(combination)channels = 3size = height * widthreturn tf.reduce_sum(tf.square(S - C)) / (4. * (channels ** 2) * (size ** 2))feature_layers = ['conv1_2', 'conv2_2','conv3_3', 'conv4_3','conv5_3'] for layer_name in feature_layers:layer_features = model[layer_name]style_features = layer_features[1, :, :, :]combination_features = layer_features[2, :, :, :]sl = style_loss(style_features, combination_features)loss += (style_weight / len(feature_layers)) * sl總變化損失
現(xiàn)在我們回到了更簡單的基礎(chǔ)上。
如果您只使用我們目前介紹的兩個(gè)損失項(xiàng)(樣式和內(nèi)容)來解決優(yōu)化問題,您會(huì)發(fā)現(xiàn)輸出非常嘈雜。因此,我們?cè)黾恿肆硪粋€(gè)術(shù)語,稱為[總變化損失](一個(gè)正則化項(xiàng)),它鼓勵(lì)空間平滑。
您可以嘗試減少“總變化”權(quán)重,并播放生成圖像的噪聲級(jí)別。
combination_image def total_variation_loss(x):a = tf.square(x[:, :height-1, :width-1, :] - x[:, 1:, :width-1, :])b = tf.square(x[:, :height-1, :width-1, :] - x[:, :height-1, 1:, :])return tf.reduce_sum(tf.pow(a + b, 1.25))loss += total_variation_weight * total_variation_loss(combination_image) optimizer = tf.train.AdamOptimizer(0.001).minimize(loss) optimizer定義所需的梯度并解決優(yōu)化問題
現(xiàn)在,我們已經(jīng)對(duì)輸入圖像進(jìn)行了處理,并定義了了損失函數(shù) calculators,
剩下的工作就是定義相對(duì)于組合圖像的總損失的梯度,
并使用這些梯度對(duì)組合圖像進(jìn)行迭代改進(jìn),以最小化損失。
然后,我們需要定義一個(gè)“Evaluator”類,
它通過兩個(gè)單獨(dú)的函數(shù)“l(fā)oss”和“grads”檢索丟失和漸變。
之所以這樣做,是因?yàn)椤皊cipy.optimize”需要單獨(dú)的函數(shù)來處理損失和梯度,但是單獨(dú)計(jì)算它們將是低效的。
outputs = [loss] outputs += grads f_outputs = backend.function([combination_image], outputs)def eval_loss_and_grads(x):x = x.reshape((1, height, width, 3))outs = f_outputs([x])loss_value = outs[0]grad_values = outs[1].flatten().astype('float64')return loss_value, grad_valuesclass Evaluator(object):def __init__(self):self.loss_value = Noneself.grads_values = Nonedef loss(self, x):assert self.loss_value is Noneloss_value, grad_values = eval_loss_and_grads(x)self.loss_value = loss_valueself.grad_values = grad_valuesreturn self.loss_valuedef grads(self, x):assert self.loss_value is not Nonegrad_values = np.copy(self.grad_values)self.loss_value = Noneself.grad_values = Nonereturn grad_valuesevaluator = Evaluator()現(xiàn)在我們終于可以解決我們的優(yōu)化問題了。這個(gè)組合圖像的生命開始于一個(gè)隨機(jī)的(有效的)像素集合,我們使用[L-BFGS]算法(一個(gè)比標(biāo)準(zhǔn)梯度下降更快收斂的準(zhǔn)牛頓算法)迭代改進(jìn)它。
我在2次迭代之后就停止了,因?yàn)闀r(shí)間問題,可以定義十次左右,效果較好損失可以自己觀察。
evaluator.grads x = np.random.uniform(0, 255, (1, height, width, 3)) - 128.iterations = 2for i in range(iterations):print('Start of iteration', i)start_time = time.time()x, min_val, info = fmin_l_bfgs_b(evaluator.loss, x.flatten(),fprime=evaluator.grads, maxfun=20)print('Current loss value:', min_val)end_time = time.time()print('Iteration %d completed in %ds' % (i, end_time - start_time))Start of iteration 0
Current loss value: 73757336000.0
Iteration 0 completed in 217s
Start of iteration 1
Current loss value: 36524343000.0
Iteration 1 completed in 196s
效果圖
總結(jié)
盡管這段代碼的輸出非常漂亮,但用來生成它的過程非常緩慢。
不管你如何加速這個(gè)算法(使用gpu和創(chuàng)造性的黑客),
它仍然是一個(gè)相對(duì)昂貴的問題來解決。
這是因?yàn)槲覀兠看蜗胍蓤D像時(shí)都在解決一個(gè)完整的優(yōu)化問題。
PS:公眾號(hào)內(nèi)回復(fù) :Python,即可獲取最新最全學(xué)習(xí)資源!
破解專業(yè)版pycharm參考博客www.wakemeupnow.cn公眾號(hào):劉旺學(xué)長內(nèi)容詳細(xì):【個(gè)人分享】今年最新最全的Python學(xué)習(xí)資料匯總!!!
以上,便是今天的分享,希望大家喜歡,
覺得內(nèi)容不錯(cuò)的,歡迎點(diǎn)擊分享支持,謝謝各位。
單純分享,無任何利益相關(guān)!
總結(jié)
以上是生活随笔為你收集整理的【Tensorflow】卷积神经网络实现艺术风格化通过Vgg16实现的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 【Nginx】通过反向代理配置本地图床功
- 下一篇: PyTorch-07 卷积神经网络(什么