當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

深度学总结：Image Style Transfer pytorch方式实现，这个是非基于autoencoder和domain adversrial方式

發(fā)布時(shí)間：2024/9/15 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学总结：Image Style Transfer pytorch方式实现，这个是非基于autoencoder和domain adversrial方式小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

文章目錄

論文鏈接：
主要思路：
pytorch實(shí)現(xiàn)：
- 計(jì)算content的Loss:
- 計(jì)算style 的Loss:
- 計(jì)算total的Loss:
- 訓(xùn)練過程：

論文鏈接：

https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf
作者運(yùn)用了一個(gè)披著深度學(xué)習(xí)外表的傳統(tǒng)方法做這個(gè)問題，技術(shù)不提倡，思路、想法很天才。

主要思路：

1、想辦法分離style features和content features;
2、怎么得到content features，作者偷懶直接取vgg19里面conv4_2的features，當(dāng)作content features；
3、怎么得到style features，作者發(fā)揮天才的偷懶，認(rèn)為style就是features之間的相關(guān)度，作者直接取vgg19里面conv1_1，conv2_1，conv3_1，conv4_1，conv5_1這5層的feature map來算風(fēng)格相關(guān)度，作者又發(fā)現(xiàn)不同的layer對(duì)風(fēng)格有影響，前面的風(fēng)格細(xì)膩，后面的風(fēng)格粗獷，又給這5個(gè)層的loss誤差加了權(quán)重。
4、那么目標(biāo)是什么了？就是new_image和content_image的內(nèi)容接近，new_image和stytle_image的風(fēng)格接近， $L_{total}$ = $L_{content}$ + $L_{style}$ ，首先要像，其次才是風(fēng)格，所以 $L_{content}$ 的比重要大。

pytorch實(shí)現(xiàn)：

計(jì)算content的Loss:

前面提到了作者偷懶直接取vgg19里面conv4_2的features，當(dāng)作content features，那么只需要比較兩個(gè)feature map的差別就行了。

# the content losscontent_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2)

計(jì)算style 的Loss:

def gram_matrix(tensor):""" Calculate the Gram Matrix of a given tensor Gram Matrix: https://en.wikipedia.org/wiki/Gramian_matrix"""# get the batch_size, depth, height, and width of the Tensor_, d, h, w = tensor.size()# reshape so we're multiplying the features for each channeltensor = tensor.view(d, h * w)# calculate the gram matrixgram = torch.mm(tensor, tensor.t())return gram

# weights for each style layer # weighting earlier layers more will result in *larger* style artifacts # notice we are excluding `conv4_2` our content representation style_weights = {'conv1_1': 1.,'conv2_1': 0.75,'conv3_1': 0.2,'conv4_1': 0.2,'conv5_1': 0.2}# get style features only once before training style_features = get_features(style, vgg)# calculate the gram matrices for each layer of our style representation style_grams = {layer: gram_matrix(style_features[layer]) for layer in style_features}for layer in style_weights:# get the "target" style representation for the layertarget_feature = target_features[layer]target_gram = gram_matrix(target_feature)_, d, h, w = target_feature.shape# get the "style" style representationstyle_gram = style_grams[layer]# the style loss for one layer, weighted appropriatelylayer_style_loss = style_weights[layer] * torch.mean((target_gram - style_gram)**2)# add to the style lossstyle_loss += layer_style_loss / (d * h * w)

計(jì)算total的Loss:

content_weight = 1 # alpha style_weight = 1e6 # beta# calculate the *total* loss total_loss = content_weight * content_loss + style_weight * style_loss

訓(xùn)練過程：

有人可能問VGG參數(shù)固定住了，需要的參數(shù)是什么了，參數(shù)就是新圖啊？通過上述loss調(diào)整圖像的像素值，這個(gè)和我們一般了解到的有點(diǎn)不一樣。
新圖就是在原圖的基礎(chǔ)上慢慢變化,f復(fù)制原圖并設(shè)置新圖為trainable的：

# create a third "target" image and prep it for change # it is a good idea to start of with the target as a copy of our *content* image # then iteratively change its style target = content.clone().requires_grad_(True).to(device)

訓(xùn)練大概代碼如下：

# for displaying the target image, intermittently show_every = 400# iteration hyperparameters optimizer = optim.Adam([target], lr=0.003) steps = 2000 # decide how many iterations to update your image (5000)for ii in range(1, steps+1):# get the features from your target imagetarget_features = get_features(target, vgg)# the content losscontent_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2)# the style loss# initialize the style loss to 0style_loss = 0# then add to it for each layer's gram matrix lossfor layer in style_weights:# get the "target" style representation for the layertarget_feature = target_features[layer]target_gram = gram_matrix(target_feature)_, d, h, w = target_feature.shape# get the "style" style representationstyle_gram = style_grams[layer]# the style loss for one layer, weighted appropriatelylayer_style_loss = style_weights[layer] * torch.mean((target_gram - style_gram)**2)# add to the style lossstyle_loss += layer_style_loss / (d * h * w)# calculate the *total* losstotal_loss = content_weight * content_loss + style_weight * style_loss# update your target imageoptimizer.zero_grad()total_loss.backward()optimizer.step()

總結(jié)

以上是生活随笔為你收集整理的深度学总结：Image Style Transfer pytorch方式实现，这个是非基于autoencoder和domain adversrial方式的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：深度学总结：CNN Decoder, U
下一篇：深度学总结：RNN训练需要注意地方：py