當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

图像生成对抗生成网络gan_GAN生成汽车图像

發布時間：2023/12/15 编程问答 34 豆豆

生活随笔收集整理的這篇文章主要介紹了图像生成对抗生成网络gan_GAN生成汽车图像小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

圖像生成對抗生成網絡gan

Hello there! This is my story of making a GAN that would generate images of cars, with PyTorch.

你好！這是我用PyTorch制作可生成汽車圖像的GAN的故事。

First of all, let me tell you what a GAN is — at least to what I understand what it is.

首先，讓我告訴您GAN是什么-至少據我了解是什么。

A Generative Adversarial Network(GAN) is a network we use to generate something(image, sound… anything), What we challenge here is the ability of the machine to imagine something. (For the paragraph below know that a GAN has two networks: A Generator and a Discriminator.)

生成對抗網絡(GAN)是我們用來生成事物(圖像，聲音……任何事物)的網絡。我們在這里面臨的挑戰是機器想象事物的能力。 (對于下面的段落，知道GAN有兩個網絡：生成器和鑒別器。)

How do we make a machine imagine something?

我們如何使機器想象某些東西？

Let’s say we’re trying to make the machine imagine some form of data, just as usual, we’ll start with it(our generator) imagining random noise — data that doesn’t make any sense at all.

假設我們正試圖使機器想象某種形式的數據，就像往常一樣，我們將從它(我們的生成器)開始想象隨機噪聲，即完全沒有意義的數據。

We then feed it to another network that is trained specifically to distinguish between fake and real data(our discriminator), this network enables us to say how fake the generated data is, knowing which we’ll update the generation process to make it more and more real over training.

然后，我們將其提供給另一個經過專門訓練的網絡，以區分假數據和真實數據(我們的鑒別器)，該網絡使我們能夠說出所生成數據的偽造程度，知道我們將更新生成過程以使其更多，并在訓練上更加真實。

Also note that we’ll be training our discriminator at the same time too(of course, we freeze the generator for a second), to distinguish better between real and fake images as our generator gets better.

還要注意，我們也將同時訓練鑒別器(當然，我們凍結生成器一秒鐘)，以便在生成器變得更好時更好地區分真實圖像和偽圖像。

You can imagine it to be similar to two people playing a game, one person knows a target picture that the other has to draw, and the other just draws pictures, seeing the picture drawn, the first person gives the second some feedback on how close his picture looks to the target, based off which he makes changes and gets better and better towards an ideal picture.

您可以想象它類似于兩個人在玩游戲，一個人知道另一個人必須繪制的目標圖片，另一個人只是繪制圖片，看到了繪制的圖片，第一個人給出了第二個關于接近程度的反饋他的圖片面向目標，在此基礎上他進行了更改，并朝著理想的圖片越來越好。

In the simplest terms, thats how it works.(feel free to correct me though)

用最簡單的話來說就是這樣(盡管可以糾正我)

我們的數據集 (Our Dataset)

I looked through kaggle for images of cars, and the dataset I found most suitable was the Stanford cars dataset.

我通過kaggle瀏覽了汽車圖像，發現最合適的數據集是斯坦福汽車數據集。

However, this isn’t a dataset that would be ideal if you’re into generating highly accurate images of cars.(images are very different from one another, and things happening in the background apart from just the car)

但是，如果您要生成高度準確的汽車圖像，則這不是一個理想的數據集(圖像彼此之間非常不同，并且除了汽車之外，背景中發生的事情)

I wouldn’t be telling you much about the processing the data into PyTorch usable form as it isn’t really that important, Ive just resized all images to 256x256(try smaller sizes for better images though!) and converted them into typical, 3-channel image tensors.

我也不會有什么告訴你關于處理數據到PyTorch可用的形式，因為它是不是真的那么重要，我剛調整的所有圖像，以256×256(嘗試較小尺寸，更好的圖像雖然！)并將它們改建典型，3通道圖像張量。

Let’s jump right into our model architecture!

讓我們直接進入我們的模型架構！

發電機 (The Generator)

We’re working on image generation, so our GAN is going to be a deep convolutional GAN(DC GAN).

我們正在致力于圖像生成，因此我們的GAN將成為深度卷積GAN(DC GAN)。

For the generator, Ive taken an input vector of size 128, and then applied about 7 transposed convolution layers, to finally generate an image of size 3x256x256.(a RGB-channelled 256x256 image)

對于生成器，我獲取了大小為128的輸入向量，然后應用了約7個轉置的卷積層，最終生成了大小為3x256x256的圖像(RGB通道化256x256圖像)。

Heres the generator for reference:

繼承人發電機供參考：

So we do this operation called transposed convolutions, which is essentially helps us to make this latent vector transform into a tensor of our image size.

因此，我們執行此操作(稱為轉置卷積)，這實際上有助于我們將潛伏矢量轉換為圖像大小的張量。

Transposed convolutions, simply work the other way around when compared to convolutions, instead of making the size of the image decrease, it makes it increase.

與卷積相比，轉置的卷積只是以相反的方式工作，而不是減小圖像的大小，而是使其增大。

We take a kernel matrix and slide it over each of the input image pixels, multiplying their values each time and map it to the output image(essentially take a note of each multiplication on the output). If we encounter an overlap in the output image while sliding this kernel over the input image, we simply take the sum of the values in it.

我們取一個核矩陣并將其在每個輸入圖像像素上滑動，每次將它們的值相乘，然后將其映射到輸出圖像(本質上記下輸出上每個乘法的注釋)。如果在將該內核滑動到輸入圖像上時在輸出圖像中遇到重疊，我們只需取其中的值之和即可。

If you’d like to learn more about transposed convolutions you can check this link. (This is the blog I used to understand transposed convolutions better)

如果您想了解有關轉置卷積的更多信息，可以查看此鏈接。 (這是我用來更好地理解轉置卷積的博客)

鑒別者 (The Discriminator)

The discriminator helps train our generator, all that this has to do, is predict if the image is real or fake(a 1 or 0 in this case). I’ve used a network that consists of only convolution layers that start with an input image of size 3x256x256 and run it through 7 convolutional layers and finally end with a 1x1x1 sized tensor, we flatten and run a sigmoid activation to get values ranging from 0 to 1.

鑒別器幫助訓練我們的生成器，這一切必須做的是，預測圖像是真實的還是假的(在這種情況下為1或0)。我使用了一個僅包含卷積層的網絡，該卷積層的大小為3x256x256的輸入圖像，并通過7個卷積層運行，最后以1x1x1大小的張量結束，我們展平并運行S形激活以獲取值，范圍從0到1。

here’s the code for the discriminator:

這是辨別器的代碼：

訓練我們的GAN (Training our GAN)

Ideally, our discriminator has to predict that all the images from our dataset are real images, so it should predict a 1 for all images in our training dataset, Also, It is supposed to say that every image from our generator(generated images) are fake( a 0 prediction). Keeping this concept in our mind, we train the discriminator.

理想情況下，我們的判別器必須預測數據集中的所有圖像都是真實圖像，因此對于訓練數據集中的所有圖像，其預測值都應為1。此外，應該說，生成器中的每個圖像(生成的圖像)??都是假(0預測)。我們牢記這個概念，我們訓練鑒別器。

here’s the code for discriminator training:

這是鑒別訓練的代碼：

Coming to our generator, our generator should make images so well, that our discriminator must be fooled, so our discriminator should return a 1 for each image generated, this becomes our target, and our score is what the discriminator returns for our generated images.(value ranging from 0–1)

來到我們的生成器時，生成器應該使圖像非常好，以至于我們的鑒別器必須被愚弄，因此我們的鑒別器應該為生成的每個圖像返回1，這成為我們的目標，而得分就是鑒別器針對我們生成的圖像返回的分數。 (值介于0到1之間)

here’s the code for generator training:

這是發電機訓練的代碼：

We train both our generator and discriminator in tandem to fit to our dataset, as the generator gets better at generating images, our discriminator should get better at telling them apart, in turn making our generator understand more finer details in our image.

我們會同時訓練生成器和鑒別器以適合我們的數據集，因為生成器在生成圖像方面變得更好，我們的鑒別器在區分它們方面也應該變得更好，從而使生成器了解圖像中更精細的細節。

結果 (Results)

So we started off with our first batch of random noise that looked like this:

因此，我們從第一批隨機噪聲開始，如下所示：

After about 20 epochs:

大約20個紀元后：

weird, squiggly images that kinda resemble cars.

怪異，扭曲的圖像有點像汽車。

after about 50 epochs:

大約50個紀元后：

looks far off from real cars.

看起來與真正的汽車相去甚遠。

After about 80 epochs we get this:

在大約80個時代之后，我們得到了：

To be fair, they don’t really look as great as real life images of cars but it’s getting somewhere.

公平地說，它們看上去并不像汽車的真實生活圖像那樣好看，但它正在普及。

I could run it for about 8 epochs more and here’s what I ended with:

我可以再運行約8個時間，這就是我的結局：

We’re getting something similar to cars, at least.

至少我們得到了類似于汽車的東西。

the best image of a car that I could Isolate from the above images was this one:

我可以從上述圖像中分離出的最好的汽車圖像是：

and this one:

還有這個：

Not really great, But not bad either.

并不是很好，但是也不錯。

尾注 (End Notes)

While it isn’t really realistic looking cars that I ended up with, Im happy that I could get this far. The dataset that I’ve used has very different images of cars and was made for an entirely different purpose, So It’s cool to see it get this far.

盡管我最終得到的并不是真正逼真的汽車，但我很高興能夠做到這一點。我使用的數據集具有截然不同的汽車圖像，并且是為完全不同的目的而制作的，因此很高興看到它走了這么遠。

引文 (Citations)

Dataset hosted on kaggle originally made for this paper:

由kaggle托管的數據集最初是為本文制作的：

3D Object Representations for Fine-Grained Categorisation

用于精細分類的3D對象表示

Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei

喬納森·克勞斯(Jonathan Krause)，邁克爾·史塔克(Michael Stark)，賈登，李飛飛

4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013

在ICCV 2013(3dRR-13)上舉行的第4屆IEEE 3D表示和識別研討會。悉尼，澳大利亞。 2013年12月8日

Thanks to kaggle and the makers of this dataset for letting me experiment around😊.

感謝kaggle和該數據集的創建者，讓我可以進行實驗😊。

Notebooks saved on jovian.ml.

筆記本保存在jovian.ml上。

That’s about it for my story, Hope It was informative.

我的故事就是這樣，希望它能提供很多信息。

翻譯自: https://medium.com/swlh/gan-to-generate-images-of-cars-5f706ca88da