當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

自己写一个PRISMA 让两张图片融合起来

發布時間：2023/12/20 编程问答 37 豆豆

生活随笔收集整理的這篇文章主要介紹了自己写一个PRISMA 让两张图片融合起来小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

原文：http://blog.askfermi.me/2016/09/27/diy-prisma/

大約2個月前的一天，一款叫做PRISMA的應用突然刷爆了朋友圈，后來還出現了叫做Ostagram之類的更豐富的應用，它可以讓一張照片變成世界名畫的風格。實話說，這款app突然火起來還是很讓我驚訝的，因為之前也恰好看到了相關的論文，和一個開源的實現。而且在6月的《互聯網編程》的課上還有一位同學實現了出來。今天，我們就來一起來實現一個高級版的PRISMA，不僅僅是世界名畫，任意兩幅圖片，我們都能將它們融合在一起。

由于這不是一篇太學術意義上的科普文章，因此本文中會介紹相關的論文，和一些開源的項目，并利用這個開源的項目來實現一個簡單的類PRISMA應用。算是一篇踏坑紀實。這篇文章將只會實現后臺的一部分。

原理

PRISMA工作在一種叫做卷積神經網絡的理論之上，論文可以按此：A Neural Algorithm of Artistic Style，我們的這個項目根據的是這篇論文在torch上的一個實現，作者也將其開源在了Github上了：Neural Style。我們將其安裝在我們的系統上，在做一些簡單的操作就可以完成類似PRISMA的操作了。

硬件配置

PRISMA所用的卷積神經網絡(CNN)通常都對計算機的性能有著較高的要求，在科研和工業環境中，通常需要使用較高配置的顯卡來進行基于CUDA的運算才可以在較快的時間內完成。Neural Style的作者也提供了對CUDA的支持，因此有一塊較好的顯卡是比較推薦的配置。

根據測試，大約需要6~8G的內存，才可以較好地在CPU模式下運行Neural Style。

強烈不建議在小內存的主機上運行這一程序。

安裝

Neural Style的作者提供了安裝文檔，然而，還是會經常遇到一些問題。推薦的安裝流程如下（以Ubuntu為例）：

升級GCC

GCC 5是必備的組件之一。最初我使用gcc 4.8和gcc 4.9都失敗了，這是特別坑的一點，只有使用gcc 5以上的版本才可以正常編譯。

sudo add-apt-repository ppa:ubuntu-toolchain-r/test sudo apt-get update sudo apt-get install gcc-5 g++-5 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5

之后使用gcc -v就可以看到當前的版本，若為5就可以進行下面的步驟了。

安裝Torch及依賴

cd ~/ curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash git clone https://github.com/torch/distro.git ~/torch --recursive cd ~/torch ./install.sh

執行最后一條之后就會開始自動安裝torch，在安裝結束之后，會自動將環境變量信息寫入bashrc，我們只需要source ~/.bashrc就可以使其生效，之后，在命令行中輸入th，若出現 Torch，就表示安裝成功了。

安裝loadcaffe
loadcaffe 可以在Torch中加載Caffe的網絡，也是一個經常使用的庫。它依賴Google的Protocol Buffer Library，所以要先安裝它們 sudo apt-get install libprotobuf-dev protobuf-compiler 我們可以通過luarocks(一個lua的Package Manager)來安裝loadcaffe luarocks install loadcaffe

安裝 Neural-Style

先從github上把倉庫clone下來

cd ~/ git clone https://github.com/jcjohnson/neural-style.git cd neural-style

之后下載提前訓練好的神經網絡數據，這個數據會比較大sh models/download_models.sh 下載結束之后，就基本可以開始使用了。

使用

最基礎的使用：th neural_style.lua -style_image <image.jpg> -content_image <image.jpg> 就可以用默認的參數來輸出融合后的圖像了。我們也可以為其增加參數來實現不同的功能，基本可以實現PRISMA的各項功能：

Options:

image_size: Maximum side length (in pixels) of of the generated image. Default is 512.
style_blend_weights: The weight for blending the style of multiple style images, as a comma-separated list, such as -style_blend_weights 3,7. By default all style images are equally weighted.
gpu: Zero-indexed ID of the GPU to use; for CPU mode set -gpu to -1.

Optimization options:

content_weight: How much to weight the content reconstruction term. Default is 5e0.
style_weight: How much to weight the style reconstruction term. Default is 1e2.
tv_weight: Weight of total-variation (TV) regularization; this helps to smooth the image. Default is 1e-3. Set to 0 to disable TV regularization.
num_iterations: Default is 1000.
init: Method for generating the generated image; one of random or image. Default is random which uses a noise initialization as in the paper; image initializes with the content image.
optimizer: The optimization algorithm to use; either lbfgs or adam; default is lbfgs. L-BFGS tends to give better results, but uses more memory. Switching to ADAM will reduce memory usage; when using ADAM you will probably need to play with other parameters to get good results, especially the style weight, content weight, and learning rate; you may also want to normalize gradients when using ADAM.
learning_rate: Learning rate to use with the ADAM optimizer. Default is 1e1.
normalize_gradients: If this flag is present, style and content gradients from each layer will be L1 normalized. Idea from andersbll/neural_artistic_style.

Output options:

output_image: Name of the output image. Default is out.png.
print_iter: Print progress every print_iter iterations. Set to 0 to disable printing.
save_iter: Save the image every save_iter iterations. Set to 0 to disable saving intermediate results.

Layer options:

content_layers: Comma-separated list of layer names to use for content reconstruction. Default is relu4_2.
style_layers: Comma-separated list of layer names to use for style reconstruction. Default is relu1_1,relu2_1,relu3_1,relu4_1,relu5_1.

Other options:

style_scale: Scale at which to extract features from the style image. Default is 1.0.
original_colors: If you set this to 1, then the output image will keep the colors of the content image.
proto_file: Path to the deploy.txt file for the VGG Caffe model.
model_file: Path to the .caffemodel file for the VGG Caffe model. Default is the original VGG-19 model; you can also try the normalized VGG-19 model used in the paper.
pooling: The type of pooling layers to use; one of max or avg. Default is max. The VGG-19 models uses max pooling layers, but the paper mentions that replacing these layers with average pooling layers can improve the results. I haven’t been able to get good results using average pooling, but the option is here.
backend: nn, cudnn, or clnn. Default is nn. cudnn requires cudnn.torch and may reduce memory usage. clnn requires cltorch and clnn
cudnn_autotune: When using the cuDNN backend, pass this flag to use the built-in cuDNN autotuner to select the best convolution algorithms for your architecture. This will make the first iteration a bit slower and can take a bit more memory, but may significantly speed up the cuDNN backend.

運行結果

我對示例中的兩幅圖片進行了測試，分別使用50次迭代，100次迭代和200次迭代來達到不同的效果。效果圖如下：

Content Image:

Style Image:

50次迭代:

100次迭代:

200次迭代:

總結

以上是生活随笔為你收集整理的自己写一个PRISMA 让两张图片融合起来的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：语音服务器搭建,教你自建团队语音服务器
下一篇： eureka的自我保护机制？