當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

《Gans in Action》第一章对抗神经网络介绍

發布時間：2023/12/8 编程问答 37 豆豆

生活随笔收集整理的這篇文章主要介紹了《Gans in Action》第一章对抗神经网络介绍小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

此為《Gans in Action》（對抗神經網絡實戰）第一章讀書筆記

Chapter 1. Introduction to GANs 對抗神經網絡介紹

This chapter covers

An overview of Generative Adversarial Networks
What makes this class of machine learning algorithms special
Some of the exciting GAN applications that this book covers

本章內容包括：GAN概述、GAN的特別之處以及GAN的應用

The notion of whether machines can think is older than the computer itself. In 1950, the famed mathematician, logician, and computer scientist Alan Turing—perhaps best known for his role in decoding the Nazi wartime enciphering machine, Enigma—penned a paper that would immortalize his name for generations to come, “Computing Machinery and Intelligence.”

In the paper, Turing proposed a test he called the imitation game, better known today as the Turing test. In this hypothetical scenario, an unknowing observer talks with two counterparts behind a closed door: one, a fellow human; the other, a computer. Turing reasons that if the observer is unable to tell which is the person and which is the machine, the computer passed the test and must be deemed intelligent.

圖靈機的提出，門后是電腦和人，另一測試者跟他們交談，無法區分人與電腦時，則認為電腦有了智能。

Anyone who has attempted to engage in a dialogue with an automated chatbot or a voice-powered intelligent assistant knows that computers have a long way to go to pass this deceptively simple test. However, in other tasks, computers have not only matched human performance but also surpassed it—even in areas that were until recently considered out of reach for even the smartest algorithms, such as superhumanly accurate face recognition or mastering the game of Go.^[1]

[1]:See “Surpassing Human-Level Face Verification Performance on LFW with GaussianFace,” by Chaochao Lu and Xiaoou Tang, 2014, https://arXiv.org/abs/1404.3840. See also the New York Times article “Google’s AlphaGo Defeats Chinese Go Master in Win for A.I.,” by Paul Mozur, 2017, http://mng.bz/07WJ.

盡管人工智能還有很長的路要走，但在某些方面的能力已經超越人類，比如人臉識別和圍棋。

Machine learning algorithms are great at recognizing patterns in existing data and using that insight for tasks such as classification (assigning the correct category to an example) and regression (estimating a numerical value based on a variety of inputs). When asked to generate new data, however, computers have struggled. An algorithm can defeat a chess grandmaster, estimate stock price movements, and classify whether a credit card transaction is likely to be fraudulent. In contrast, any attempt at making small talk with Amazon’s Alexa or Apple’s Siri is doomed. Indeed, humanity’s most basic and essential capacities—including a convivial conversation or the crafting of an original creation—can leave even the most sophisticated supercomputers in digital spasms.

之前機器學習算法擅長已有數據的分類和回歸任務，但對于生成新的數據表現不佳。

This all changed in 2014 when Ian Goodfellow, then a PhD student at the University of Montreal, invented Generative Adversarial Networks (GANs). This technique has enabled computers to generate realistic data by using not one, but two, separate neural networks. GANs were not the first computer program used to generate data, but their results and versatility set them apart from all the rest. GANs have achieved remarkable results that had long been considered virtually impossible for artificial systems, such as the ability to generate fake images with real-world-like quality, turn a scribble into a photograph-like image, or turn video footage of a horse into a running zebra—all without the need for vast troves of painstakingly labeled training data.

直到2014年，博士生 Ian Goodfellow提出了生成對抗網絡（GAN）。GAN是由兩個神經網絡組成，在產生新數據方面具有很好的通用性也被廣泛應用

A telling example of how far machine data generation has been able to advance thanks to GANs is the synthesis of human faces, illustrated in figure 1.1. As recently as 2014, when GANs were invented, the best that machines could produce was a blurred countenance—and even that was celebrated as a groundbreaking success. By 2017, just three years later, advances in GANs enabled computers to synthesize fake faces whose quality rivals high-resolution portrait photographs. In this book, we look under the hood of the algorithm that made all this possible.

一個比較好的例子是人臉圖像合成，如圖1.1所示，GAN能夠生成高分辨率的圖像

Figure 1.1. Progress in human face generation

(Source: “The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation,” by Miles Brundage et al., 2018, https://arxiv.org/abs/1802.07228.)

1.1. What are Generative Adversarial Networks? 什么是GAN

Generative Adversarial Networks (GANs) are a class of machine learning techniques that consist of two simultaneously trained models: one (the Generator) trained to generate fake data, and the other (the Discriminator) trained to discern the fake data from real examples.

GAN包含生成器和識別器，前者生成虛假圖像，后者把虛假圖像識別出來。

The word generative indicates the overall purpose of the model: creating new data. The data that a GAN will learn to generate depends on the choice of the training set. For example, if we want a GAN to synthesize images that look like Leonardo da Vinci’s, we would use a training dataset of da Vinci’s artwork.

生成：生成器生成訓練數據集類似的數據，比如用達芬奇的作品作為訓練集，合成達芬奇風格的圖像

The term adversarial points to the game-like, competitive dynamic between the two models that constitute the GAN framework: the Generator and the Discriminator. The Generator’s goal is to create examples that are indistinguishable from the real data in the training set. In our example, this means producing paintings that look just like da Vinci’s. The Discriminator’s objective is to distinguish the fake examples produced by the Generator from the real examples coming from the training dataset. In our example, the Discriminator plays the role of an art expert assessing the authenticity of paintings believed to be da Vinci’s. The two networks are continually trying to outwit each other: the better the Generator gets at creating convincing data, the better the Discriminator needs to be at distinguishing real examples from the fake ones.

對抗：生成器努力生成以假亂真的圖像，識別器努力識別出真假來，兩者就像造假者與鑒假者一樣，互相對抗

Finally, the word networks indicates the class of machine learning models most commonly used to represent the Generator and the Discriminator: neural networks. Depending on the complexity of the GAN implementation, these can range from simple feed-forward neural networks (as you’ll see in chapter 3) to convolutional neural networks (as you’ll see in chapter 4) or even more complex variants, such as the U-Net (as you’ll see in chapter 9).

網絡：生成器和識別器一般由兩個神經網絡構成，可以是前饋神經網絡（第三章）、卷積神經網絡（第四章）、以及更復雜額變種，例如U-Net（第九章）

1.2. How do GANs work? GAN工作原理

The mathematics underpinning GANs are complex (as you’ll explore in later chapters, especially chapters 3 and 5); fortunately, many real-world analogies can make GANs easier to understand. Previously, we discussed the example of an art forger (the Generator) trying to fool an art expert (the Discriminator). The more convincing the fake paintings the forger makes, the better the art expert must be at determining their authenticity. This is true in the reverse situation as well: the better the art expert is at telling whether a particular painting is genuine, the more the forger must improve to avoid being caught red-handed.

GAN的數學知識比較復雜，這里用達芬奇作品造假者與鑒假專家的比喻比較形象。生成器（造假）與識別器（鑒假）的能力，在訓練過程中是相互促進提升的。

Another metaphor often used to describe GANs—one that Ian Goodfellow himself likes to use—is that of a criminal (the Generator) who forges money, and a detective (the Discriminator) who tries to catch him. The more authentic-looking the counterfeit bills become, the better the detective must be at detecting them, and vice versa.

另一個比喻是造假鈔者與警探的例子。

In more technical terms, the Generator’s goal is to produce examples that capture the characteristics of the training dataset, so much so that the samples it generates look indistinguishable from the training data. The Generator can be thought of as an object recognition model in reverse. Object recognition algorithms learn the patterns in images to discern an image’s content. Instead of recognizing the patterns, the Generator learns to create them essentially from scratch; indeed, the input into the Generator is often no more than a vector of random numbers.

以上是專業表達，生成器輸入是隨機向量，訓練過程中捕獲訓練數據特征，生成真假難辨的樣本；識別器是捕獲訓練數據特征，用以識別假樣本。

The Generator learns through the feedback it receives from the Discriminator’s classifications. The Discriminator’s goal is to determine whether a particular example is real (coming from the training dataset) or fake (created by the Generator). Accordingly, each time the Discriminator is fooled into classifying a fake image as real, the Generator knows it did something well. Conversely, each time the Discriminator correctly rejects a Generator-produced image as fake, the Generator receives the feedback that it needs to improve.

The Discriminator continues to improve as well. Like any classifier, it learns from how far its predictions are from the true labels (real or fake). So, as the Generator gets better at producing realistic-looking data, the Discriminator gets better at telling fake data from the real, and both networks continue to improve simultaneously.

如果識別器識別對了，識別器就知道自己做對了，生成器就會收到反饋進行自我提升。反之亦然。

Table 1.1 summarizes the key takeaways about the two GAN subnetworks.

1.3. GANs in action GAN實戰

Now that you have a high-level understanding of GANs and their constituent networks, let’s take a closer look at the system in action. Imagine that our goal is to teach a GAN to produce realistic-looking handwritten digits. (You’ll learn to implement such a model in chapter 3 and expand on it in chapter 4.) Figure 1.2 illustrates the core GAN architecture.

圖1.2描述了GAN核心架構

Figure 1.2. The two GAN subnetworks, their inputs and outputs, and their interactions

Let’s walk through the details of the diagram:

1. Training dataset— The dataset of real examples that we want the Generator to learn to emulate with near-perfect quality. In this case, the dataset consists of images of handwritten digits. This dataset serves as input (x) to the Discriminator network.
2. Random noise vector— The raw input (z) to the Generator network. This input is a vector of random numbers that the Generator uses as a starting point for synthesizing fake examples.
3. Generator network— The Generator takes in a vector of random numbers (z) as input and outputs fake examples (x*). Its goal is to make the fake examples it produces indistinguishable from the real examples in the training dataset.
4. Discriminator network— The Discriminator takes as input either a real example (x) coming from the training set or a fake example (x*) produced by the Generator. For each example, the Discriminator determines and outputs the probability of whether the example is real.
5. Iterative training/tuning— For each of the Discriminator’s predictions, we determine how good it is—much as we would for a regular classifier—and use the results to iteratively tune the Discriminator and the Generator networks through backpropagation:

The Discriminator’s weights and biases are updated to maximize its classification accuracy (maximizing the probability of correct prediction: x as real and x* as fake).
The Generator’s weights and biases are updated to maximize the probability that the Discriminator misclassifies x* as real.

1 表示真實訓練數據，作為識別器輸入 $x$
2 表示隨機向量，作為生成器輸入 $z$ ，用于產生虛假圖像
3 表示生成器網絡，輸入 $z$ ，輸出虛假圖像 $x^*$
4 表示識別器網絡，將真實圖像 $x$ 與虛假圖像 $x^*$ 作為輸入，輸出圖像為真實圖像的可能性。
5 表示迭代訓練/調參，

1.3.1. GAN training GAN訓練

Learning about the purpose of the various GAN components may feel like looking at a snapshot of an engine: it cannot be understood fully until we see it in motion. That’s what this section is all about. First, we present the GAN training algorithm; then, we illustrate the training process so you can see the architecture diagram in action.

我們先了解算法，再通過訓練過程來理解GAN。

GAN training algorithm
GAN訓練算法

For each training iteration do1. Train the Discriminator:1. Take a random real example x from the training dataset.2. Get a new random noise vector z and, using the Generator network, synthesize a fake example x*.3. Use the Discriminator network to classify x and x*.4. Compute the classification errors and backpropagate the total error to update the Discriminator’s trainable parameters, seeking to minimize the classification errors.2. Train the Generator:1. Get a new random noise vector z and, using the Generator network, synthesize a fake example x*.2. Use the Discriminator network to classify x*.3. Compute the classification error and backpropagate the error to update the Generator’s trainable parameters, seeking to maximize the Discriminator’s error. End for 循環開始1. 訓練識別器：1. 從訓練數據隨機獲取真實樣本x2. 產生隨機噪聲向量z，使用對抗網絡生成假樣本x*3. 使用識別器對x和x*進行分類4. 計算分類損失，反向傳播總損失更新識別器參數，以減少分類損失2. 訓練生成器：1. 獲取隨機噪聲向量z，使用生成器網絡，合成假樣本 x*2. 使用識別器對x*進行分類3. 計算分類損失，反向傳播更新生成器參數，以增大識別器損失循環結束

GAN training visualized
GAN訓練圖解
Figure 1.3 illustrates the GAN training algorithm. The letters in the diagram refer to the list of steps in the GAN training algorithm.

1.3.2. Reaching equilibrium 達到平衡

You may wonder when the GAN training loop is meant to stop. More precisely, how do we know when a GAN is fully trained so that we can determine the appropriate number of training iterations? With a regular neural network, we usually have a clear objective to achieve and measure. For example, when training a classifier, we measure the classification error on the training and validation sets, and we stop the process when the validation error starts getting worse (to avoid overfitting). In a GAN, the two networks have competing objectives: when one network gets better, the other gets worse. How do we determine when to stop?

Those familiar with game theory may recognize this setup as a zero-sum game—a situation in which one player’s gains equal the other player’s losses. When one player improves by a certain amount, the other player worsens by the same amount. All zero-sum games have a Nash equilibrium, a point at which neither player can improve their situation or payoff by changing their actions.

GAN reaches Nash equilibrium when the following conditions are met:

The Generator produces fake examples that are indistinguishable from the real data in the training dataset.
The Discriminator can at best randomly guess whether a particular example is real or fake (that is, make a 50/50 guess whether an example is real).

達到納什均衡的時候停止訓練，需滿足如下條件：

難以區分生成器產生的假圖片和訓練數據集的真圖片
識別器對圖片真假識別的概率都是50%

NOTE
Nash equilibrium is named after the American economist and mathematician John Forbes Nash Jr., whose life story and career were captured in the biography titled A Beautiful Mind and inspired the eponymous film.

Let us convince you of why this is the case. When each of the fake examples (x*) is truly indistinguishable from the real examples (x) coming from the training dataset, there is nothing the Discriminator can use to tell them apart from one another. Because half of the examples it receives are real and half are fake, the best the Discriminator can do is to flip a coin and classify each example as real or fake with 50% probability.

The Generator is likewise at a point where it has nothing to gain from further tuning. Because the examples it produces are already indistinguishable from the real ones, even a tiny change to the process it uses to turn the random noise vector (z) into a fake example (x*) may give the Discriminator a cue for how to discern the fake example from the real data, making the Generator worse off.

上面描述了達到納什均衡時，識別器和生成器都難以更進一步

With equilibrium achieved, GAN is said to have converged. Here is when it gets tricky. In practice, it is nearly impossible to find the Nash equilibrium for GANs because of the immense complexities involved in reaching convergence in nonconvex games (more on convergence in later chapters, particularly chapter 5). Indeed, GAN convergence remains one of the most important open questions in GAN research.

Fortunately, this has not impeded GAN research or the many innovative applications of generative adversarial learning. Even in the absence of rigorous mathematical guarantees, GANs have achieved remarkable empirical results. This book covers a selection of the most impactful ones, and the following section previews some of them.

實際中是很難達到納什均衡（GAN收斂），這是當前GAN研究中亟待解決的問題之一。但這并不妨礙GAN在研究應用中取得非凡的成就

1.4. Why study GANs? 為什么研究GAN

Since their invention, GANs have been hailed by academics and industry experts as one of the most consequential innovations in deep learning. Yann LeCun, the director of AI research at Facebook, went so far as to say that GANs and their variations are “the coolest idea in deep learning in the last 20 years.”^[2]
[2]:See “Google’s Dueling Neural Networks Spar to Get Smarter,” by Cade Metz, Wired, 2017, http://mng.bz/KE1X.

GAN自發明以來，一直被學術界和業界專家譽為深度學習領域最重要的創新之一。Facebook人工智能研究主管Yann LeCun甚至表示，GAN及其變體是“近20年來深度學習中最酷的想法”

The excitement is well justified. Unlike other advancements in machine learning that may be household names among researchers but would elicit no more than a quizzical look from anyone else, GANs have captured the imagination of researchers and the wider public alike. They have been covered by the New York Times, the BBC, Scientific American, and many other prominent media outlets. Indeed, it was one of those exciting GAN results that probably drove you to buy this book in the first place. (Right?)

GAN為科研人員和吃瓜群眾提供了足夠的創新空間，被各大媒體報道

Perhaps most notable is the capacity of GANs to create hyperrealistic imagery. None of the faces in figure 1.4 belongs to a real human; they are all fake, showcasing GANs’ ability to synthesize images with photorealistic quality. The faces were produced using Progressive GANs, a technique covered in chapter 6.

GAN能生成超真實的圖像，難以想象，圖1.4全是假的。這是使用漸進生成對抗網絡生成的，第六章會提到

Figure 1.4. These photorealistic but fake human faces were synthesized by a Progressive GAN trained on high-resolution portrait photos of celebrities.

(Source: “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” by Tero Karras et al., 2017, https://arxiv.org/abs/1710.10196.)

Another remarkable GAN achievement is image-to-image translation. Similarly to the way a sentence can be translated from, say, Chinese to Spanish, GANs can translate an image from one domain to another. As shown in figure 1.5, GANs can turn an image of a horse into an image of zebra (and back!), and a photo into a Monet-like painting—all with virtually no supervision and no labels whatsoever. The GAN variant that made this possible is called CycleGAN; you’ll learn all about it in chapter 9.

GAN的另一應用是圖像轉換，如圖1.5所示，GAN將馬的圖像變成斑馬（或者反過來）、將圖像變成Monet風格。這是由CycleGAN實現的，第九章會提到

Figure 1.5. By using a GAN variant called CycleGAN, we can turn a Monet painting into a photograph or turn an image of a zebra into a depiction of a horse, and vice versa.

(Source: See “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” by Jun-Yan Zhu et al., 2017, https://arxiv.org/abs/1703.10593.)

The more practically minded GAN use cases are just as fascinating. The online giant Amazon is experimenting with harnessing GANs for fashion recommendations: by analyzing countless outfits, the system learns to produce new items matching any given style.^[3] In medical research, GANs are used to augment datasets with synthetic examples to improve diagnostic accuracy.^[4] In chapter 11—after you’ve mastered the ins and outs of training GANs and their variants—you’ll explore both of these applications in detail.
[3]: See “Amazon Has Developed an AI Fashion Designer,” by Will Knight, MIT Technology Review, 2017, http://mng.bz/9wOj.
[4]: See “Synthetic Data Augmentation Using GAN for Improved Liver Lesion Classification,” by Maayan Frid-Adar et al., 2018, https://arxiv.org/abs/1801.02385.

GAN被亞馬遜用于服裝設計，也被用于醫療研究提高診斷準確性。在第十一章有相關內容

GANs are also seen as an important stepping stone toward achieving artificial general intelligence,^[5] an artificial system capable of matching human cognitive capacity to acquire expertise in virtually any domain—from motor skills involved in walking, to language, to creative skills needed to compose sonnets.
[5]: See “OpenAI Founder: Short-Term AGI Is a Serious Possibility,” by Tony Peng, Synced, 2018, http://mng.bz/j5Oa. See also “A Path to Unsupervised Learning Through Adversarial Networks,” by Soumith Chintala, f Code, 2016, http://mng.bz/WOag.

GAN被視為實現通用人工智能的重要基石。

But with the ability to generate new data and imagery, GANs also have the capacity to be dangerous. Much has been discussed about the spread and dangers of fake news, but the potential of GANs to create credible fake footage is disturbing. At the end of an aptly titled 2018 piece about GANs—“How an A.I. ‘Cat-and-Mouse Game’ Generates Believable Fake Photos”—the New York Times journalists Cade Metz and Keith Collins discuss the worrying prospect of GANs being exploited to create and spread convincing misinformation, including fake video footage of statements by world leaders. Martin Giles, the San Francisco bureau chief of MIT Technology Review, echoes their concern and mentions another potential risk in his 2018 article “The GANfather: The Man Who’s Given Machines the Gift of Imagination”: in the hands of skilled hackers, GANs can be used to intuit and exploit system vulnerabilities at an unprecedented scale. These concerns are what motivated us to discuss the ethical considerations of GANs in chapter 12.

GAN被用于制造虛假圖片、視頻信息以及網絡攻擊，讓人感到憂慮。關于這些考慮，會在第十二章提到

GANs can do much good for the world, but all technological innovations have misuses. Here the philosophy has to be one of awareness: because it is impossible to “uninvent” a technique, it is crucial to make sure people like you are aware of this technique’s rapid emergence and its substantial potential.

科技是把雙刃劍，我們無法阻止它到來，那么就認識它的潛力并讓其造福世界吧

In this book, we are only able to scratch the surface of what is possible with GANs. However, we hope that this book will provide you with the necessary theoretical knowledge and practical skills to continue exploring any facet of this field that you find most interesting.

So, without further ado, let’s dive in!

本書只探索了GAN的冰山一角，希望能給你提供必要的知識技能，讓你繼續探索感興趣的領域。言歸正傳，我們開始吧！

Summary 總結

GANs are a deep learning technique that uses a competitive dynamic between two neural networks to synthesize realistic data samples, such as fake photorealistic imagery. The two networks that constitute a GAN are as follows:
- The Generator, whose goal is to fool the Discriminator by producing data indistinguishable from the training dataset
- The Discriminator, whose goal is to correctly distinguish between real data coming from the training dataset and the fake data produced by the Generator
GANs have extensive applications across many different sectors, such as fashion, medicine, and cybersecurity.

GAN是通過兩個互相競爭的神經網絡來合成逼真的數據樣本，例如圖像。它包含兩部分：生成器和識別器。
GAN在很多領域有廣泛的應用，例如時尚、醫學和網絡安全

總結

以上是生活随笔為你收集整理的《Gans in Action》第一章对抗神经网络介绍的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Matlab实现二维Goldstein分
下一篇：学会重构与对比 ——码农鼻祖天才香农