當前位置：首頁 > 人工智能 > ChatGpt >内容正文

ChatGpt

ai人工智能使用的软件_MachineRay：使用AI创造抽象艺术

發布時間：2023/12/10 ChatGpt 44 豆豆

生活随笔收集整理的這篇文章主要介紹了 ai人工智能使用的软件_MachineRay：使用AI创造抽象艺术小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

ai人工智能使用的軟件

For the past three months, I have been exploring the latest techniques in Artificial Intelligence (AI) and Machine Learning (ML) to create abstract art. During my investigation, I learned that three things are needed to create abstract paintings: (A) source images, (B) an ML model, and (C) a lot of time to train the model on a high-end GPU. Before I discuss my work, let’s take a look at some prior research.

在過去的三個月中，我一直在探索人工智能(AI)和機器學習(ML)的最新技術來創作抽象藝術。在調查過程中，我了解到創建抽象繪畫需要三件事：(A)源圖像，(B)ML模型，以及(C)很多時間在高端GPU上訓練模型。在討論我的工作之前，讓我們看一下以前的研究。

背景 (Background)

人工神經網絡 (Artificial Neural Networks)

Warren McCulloch and Walter Pitts created a computational model for Neural Networks (NNs) back in 1943[1]. Their work led to research of both the biological processing in brains and the use of NNs for AI. Richard Nagyfi discusses the differences between Artificial Neural Networks (ANNs) and biological brains in this post. He describes an apt analogy that I will summarize here: ANNs are to brains as planes are to birds. Although the development of these technologies was inspired by biology, the actual implementations are very different!

沃倫·麥卡洛克(Warren McCulloch)和沃爾特·皮茨(Walter Pitts)早在1943年就為神經網絡(NN)建立了計算模型[1]。他們的工作導致研究大腦的生物加工以及將NN用于AI。 Richard Nagyfi在這篇文章中討論了人工神經網絡(ANN)與生物大腦之間的差異。他描述了一個恰當的類比，我將在這里總結一下： 人工神經網絡是大腦的大腦，而飛機是鳥類的大腦 。盡管這些技術的發展是受生物學啟發的，但實際實現卻大不相同！

Visual Analogy Neural Network chip artwork by mikemacmarketin CC BY 2.0, Brain model by biologycorner CC BY-NC 2.0, Plane photo by Moto@Club4AG CC BY 2.0, Bird photo by ksblack99 CC PDM 1.0視覺類比神經網絡芯片圖稿，由mikemacmarketin CC BY 2.0提供，大腦模型由biologycorner CC BY-NC 2.0提供，平面照片由Moto @ Club4AG CC BY 2.0提供，鳥類照片由ksblack99 CC PDM 1.0提供

Both ANNs and biological brains learn from external stimuli to understand things and predict outcomes. One of the key differences is that ANNs work with floating-point numbers and not just binary firing of neurons. With ANNs it’s numbers in and numbers out.

人工神經網絡和生物大腦都從外部刺激中學習以了解事物并預測結果。關鍵區別之一是ANN可以使用浮點數，而不僅僅是神經元的二進制觸發。 對于人工神經網絡，它是數字輸入和數字輸出。

The diagram below shows the structure of a typical ANN. The inputs on the left are the numerical values that contain the incoming stimuli. The input layer is connected to one or more hidden layers that contain the memory of prior learning. The output layer, in this case just one number, is connected to each of the nodes in the hidden layer.

下圖顯示了典型ANN的結構。左側的輸入是包含傳入刺激的數值。輸入層連接到包含先前學習記憶的一個或多個隱藏層。輸出層(在這種情況下只有一個數字)連接到隱藏層中的每個節點。

Diagram of a Typical ANN典型的人工神經網絡圖

Each of the internal arrows represents numerical weights that are used as multipliers to modify the numbers in the layers as they get processed in the network from left to right. The system is trained with a dataset of input values and expected output values. The weights are initially set to random values. For the training process, the system runs through the training set multiple times, adjusting the weights to achieve the expected outputs. Eventually, the system will not only predict the outputs correctly from the training set, but it will also be able to predict outputs for unseen input values. This is the essence of Machine Learning (ML). The intelligence is in the weights. A more detailed discussion of the training process for ANNs can be found in Conor McDonald’s post, here.

每個內部箭頭代表數字權重，當它們在網絡中從左到右進行處理時，它們用作乘數來修改圖層中的數字。使用輸入值和預期輸出值的數據集訓練系統。權重最初設置為隨機值。對于訓練過程，系統會多次運行訓練集，并調整權重以實現預期的輸出。最終，該系統不僅可以從訓練集中正確預測輸出，而且還可以為看不見的輸入值預測輸出。這是機器學習(ML)的本質。 才智在于權重 。培訓過程中的人工神經網絡進行更詳細的討論可以在康納爾麥當勞后發現，這里。

生成對抗網絡 (Generative Adversarial Networks)

In 2014, Ian Goodfellow and seven coauthors at the Université de Montréal presented a paper on Generative Adversarial Networks (GANs)[2]. They came up with a way to train two ANNs that effectively compete with each other to create content like photos, songs, prose, and yes, paintings. The first ANN is called the Generator and the second is called the Discriminator. The Generator is trying to create realistic output, in this case, a color painting. The Discriminator is trying to discern real paintings from the training set as opposed to fake paintings from the generator. Here’s what a GAN architecture looks like.

在2014年，伊恩古德費洛和蒙特利爾大學合作者7呈現的紙張上剖成對抗性網絡(甘斯)[2]。 他們想出了一種訓練兩個人工神經網絡的方法，這些人工神經網絡可以有效地競爭以創建照片，歌曲，散文和繪畫等內容。 第一個ANN被稱為生成器，第二個被稱為鑒別器。生成器正在嘗試創建逼真的輸出，在這種情況下為彩色繪畫。鑒別器試圖從訓練集中識別真實的繪畫，而不是生成器中的假繪畫。這是GAN架構的樣子。

Generative Adversarial Network生成對抗網絡

A series of random noise is fed into the Generator, which then uses its trained weights to generate the resultant output, in this case, a color image. The Discriminator is trained by alternating between processing real paintings, with an expected output of 1 and fake paintings, with an expected output of -1. After each painting is sent to the Discriminator, it sends back detailed feedback about why the painting is not real, and the Generator adjusts its weights with this new knowledge to try and do better the next time. The two networks in the GAN are effectively trained together in an adversarial fashion. The Generator gets better at trying to pass off a fake image as real, and the Discriminator gets better at determining which input is real, and which is fake. Eventually, the Generator gets pretty good at generating realistic-looking images. You can read more about GANs, and the math they use, in Shweta Goyal’s post here.

一系列隨機噪聲被饋入生成器，生成器隨后使用其受過訓練的權重生成結果輸出，在這種情況下為彩色圖像。鑒別器是通過在處理實際輸出(預期輸出為1)和偽造圖形(預期輸出為-1)之間進行交替訓練的。在將每幅繪畫發送給鑒別器之后，它會發送回回詳細信息，說明為什么繪畫不是真實的，并且生成器會使用這一新知識來調整其權重，以在下一次嘗試做得更好。 GAN中的兩個網絡以對抗的方式有效地一起訓練 。生成器會更好地嘗試假冒真實的圖像，而鑒別器會更好地確定哪個輸入是真實的，哪個是假的。最終，生成器非常擅長生成逼真的圖像。您可以在Shweta Goyal 在這里的帖子中閱讀有關GAN及其使用的數學的更多信息。

針對大圖像的改進GAN (Improved GANs for Large Images)

Although the basic GAN described above works well with small images (i.e. 64x64 pixels), there are issues with larger images (i.e. 1024x1024 pixels). The basic GAN architecture has difficulty converging on good results for large images due to the unstructured nature of the pixels. It can’t see the forest from the trees. Researchers at NVIDIA developed a series of improved methods that allow for the training of GANs with larger images. The first is called “Progressive Growing of GANs” [3].

盡管上述基本GAN適用于較小的圖像(即64x64像素)，但較大的圖像(即1024x1024像素)仍存在問題。由于像素的非結構化性質，基本的GAN架構很難在大圖像上收斂良好的結果。從樹上看不到森林。 NVIDIA研究人員開發了一系列改進的方法，可用于訓練具有較大圖像的GAN。第一個稱為“ GAN的漸進式增長 ” [3]。

The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality. — Tero Karras et. al., NVIDIA

關鍵思想是逐步增加生成器和鑒別器：從低分辨率開始，我們添加新層，以隨著訓練的進行對越來越細的細節建模。這既加快了訓練速度，又大大穩定了訓練速度，使我們能夠產生前所未有的圖像質量。 — Tero Karras等。等，NVIDIA

The team at NVIDIA continued their work on using GANs to generate large, realistic images, naming their architecture StyleGAN [4]. They started with their Progressive Growing of GANs as a base model and added a Style Mapping Network, which injects style information at various resolutions into the Generator Network.

NVIDIA的團隊繼續使用GAN來生成大型逼真的圖像的工作，并為其架構命名為StyleGAN [4]。他們從GAN的漸進式增長作為基本模型開始，并添加了樣式映射網絡，該網絡將各種分辨率的樣式信息注入到生成器網絡中。

StyleGAN Component DiagramStyleGAN組件圖

The team further improved the image creation results with StyleGAN2, allowing the GAN to efficiently create high-quality images with fewer unwanted artifacts [5]. You can read more about these developments in Akria’s post, “From GAN basic to StyleGAN2”.

該團隊使用StyleGAN 2進一步改善了圖像創建結果，從而使GAN可以有效創建高質量圖像，并減少不必要的假象[5]。您可以在Akria的帖子“ 從GAN基礎到StyleGAN2 ”中了解有關這些開發的更多信息。

使用GAN創作藝術的先前工作 (Prior Work to Create Art with GANs)

Researchers have been looking to use GANs to create art since the GAN was introduced in 2014. A description of a system called ArtGAN was published in 2017 by Wei Ren Tan et. al. from Shinshu University, Nagano, Japan [6]. Their paper proposes to extend GANs…

自GAN于2014年問世以來，研究人員一直在尋求使用GAN來創作藝術。WeiTan Tan等人于2017年發表了有關名為ArtGAN的系統的描述。等來自日本長野縣信州大學[6]。他們的論文提出擴展GAN。

… to synthetically generate more challenging and complex images such as artwork that have abstract characteristics. This is in contrast to most of the current solutions that focused on generating natural images such as room interiors, birds, flowers and faces. — Wei Ren Tan et. al., Shinshu University

…以綜合方式生成更具挑戰性和更復雜的圖像，例如具有抽象特征的藝術品。這與當前大多數專注于生成自然圖像(例如室內，鳥，花和臉)的解決方案形成鮮明對比。 - 韋仁譚等。信州大學

A broader survey of using GANs to create art was conducted by Drew Flaherty for his Masters Thesis at the Queensland University of Technology in Brisbane, Australia [7]. He experimented with various GANs including basic GANs, CycleGAN [8], BigGAN [9], Pix2Pix, and StyleGAN. Of everything he tried, he liked StyleGAN the best.

Drew Flaherty在澳大利亞布里斯班的昆士蘭科技大學為其碩士論文進行了更廣泛的使用GAN創作藝術的調查[7]。他嘗試了各種GAN，包括基本GAN， CycleGAN [8]， BigGAN [9]， Pix2Pix和StyleGAN。在他嘗試過的所有產品中，他最喜歡StyleGAN。

The best visual result from the research came from StyleGAN. … Visual quality of the outputs were relatively high considering the model was only partially trained, with progressive improvements from earlier iterations showing more defined lines, textures and forms, sharper detail, and more developed compositions overall. — Drew Flaherty, Queensland University of Technology

研究獲得的最佳視覺效果來自StyleGAN。 …考慮到該模型只是部分訓練的結果，輸出的視覺質量相對較高，較早的迭代進行了逐步改進，顯示出更清晰的線條，紋理和形式，更清晰的細節以及整體上更發達的合成。 —昆士蘭科技大學的Drew Flaherty

For his experiments, Flaherty used a large library of artwork gleaned from various sources, including WikiArt.org, the Google Arts Project, Saatchi Art, and Tumblr blogs. He noted that not all of the source images are in the public domain, but he discusses the doctrine of fair use and its implications on ML and AI.

對于他的實驗，弗萊厄蒂(Flaherty)使用了一個龐大的藝術品庫，這些藝術品庫來自各種來源，包括WikiArt.org ， Google Arts Project ， Saatchi Art和Tumblr博客。他指出，并非所有源圖像都屬于公共領域，但他討論了合理使用的原則及其對ML和AI的影響。

機器雷 (MachineRay)

總覽 (Overview)

For my experiment, named MachineRay, I gathered images of abstract paintings from WikiArt.org, processed them, and fed them into StyleGAN2 at the size of 1024x1024. I trained the GAN for three weeks on a GPU using Google Colab. I then processed the output images by adjusting the aspect ratio and running them through another ANN for a super-resolution resize. The resultant images are 4096 pixels wide or tall, depending on the aspect ratio. Here’s a diagram of the components.

對于名為MachineRay的實驗，我從WikiArt.org收集了抽象繪畫的圖像，對其進行了處理，然后將它們以1024x1024的尺寸輸入到StyleGAN2中。我使用Google Colab在GPU上訓練了GAN三個星期。然后，我通過調整縱橫比并通過另一個ANN運行它們以進行超分辨率調整來處理輸出圖像。最終的圖像寬高為4096像素，具體取決于寬高比。這是組件圖。

MachineRay Component DiagramMachineRay組件圖

收集源圖像 (Gathering Source Images)

To gather the source images, I wrote a Python script to scrape abstract paintings from WikiArt.org. Note that I filtered the images to only get paintings that were labeled in the “Abstract” genre, and only images that are labeled as being in the Public Domain. These include images that were published before 1925 or images that were created by artists who died before 1950. The top artists represented in the set are Wassily Kandinsky, Theo van Doesburg, Paul Klee, Kazimir Malevich, Janos Mattis-Teutsch, Giacomo Balla, and Piet Mondrian. A snippet of the Python code is below, and the full source file is here.

為了收集源圖像，我編寫了一個Python腳本來從WikiArt.org刮取抽象畫。請注意，我對圖像進行了過濾，以僅獲取標記為“抽象”類型的繪畫，并且僅獲取標記為處于“公共領域”的圖像。這些圖片包括1925年之前發布的圖像或1950年之前去世的藝術家創作的圖像。集合中代表的頂級藝術家是Wassily Kandinsky，Theo vanDoesburg，Paul Klee，Kazimir Malevich，Janos Mattis-Teutsch，Giacomo Balla和皮特·蒙德里安(Piet Mondrian)。下面是Python代碼的片段，完整的源文件在這里。

I gathered about 900 images, but I removed images that had representational components or ones that were too small, cutting the number down to 850. Here is a random sampling of the source images.

我收集了約900張圖像，但刪除了具有代表性成分或過小的圖像，將數量減少到850。這是對源圖像的隨機采樣。

Random Sample of Abstract Paintings from WikiArt.org in the Public Domain的抽象繪畫的隨機樣本

卸下框架 (Removing Frames)

As you can see above, some of the paintings retain their wooden frames in the images, but some of them have the frames cropped out. For example, you can see the frame in Arthur Dove’s Storm Clouds. To make the source images consistent, and to allow the GAN to focus on the content of the paintings, I automatically removed the frames using a Python script. A snippet is below, and the full script is here.

如您在上方看到的，有些畫在圖像中保留了木制框架，但有些畫卻被裁剪掉了。例如，您可以在Arthur Dove的Storm Clouds中看到框架。為了使源圖像一致，并使GAN專注于繪畫的內容，我使用Python腳本自動刪除了框架。下面是一個代碼段，完整的腳本在這里。

The code opens each image and looks for square regions around the edges that have a different color from most of the painting. Once the edges are found, the image is cropped to omit the frame. Here are some pictures of source paintings before and after the frame removal.

代碼打開每個圖像，并在邊緣周圍尋找與大多數繪畫顏色不同的正方形區域。一旦找到邊緣，便會裁剪圖像以忽略幀。這是移除畫框之前和之后的源畫的一些圖片。

Automatically Cropped Images from WikiArt.org in the Public Domain 自動從WikiArt.org中的公共領域裁剪圖像

圖像增強 (Image Augmentation)

Although 850 images may seem like a lot, it’s not really enough to properly train a GAN. If there isn’t enough variety of images, the GAN may overfit the model which will yield poor results, or, worse yet, fall into the dreaded state of “model collapse”, which will yield nearly identical images.

盡管850張圖像可能看起來很多，但不足以正確訓練GAN。如果沒有足夠多的圖像，則GAN可能會過度擬合模型，從而產生較差的結果，或者更糟的是，陷入可怕的“模型崩潰”狀態，這將產生幾乎相同的圖像。

StyleGAN2 has a built-in feature to randomly mirror the source images left-to-right. So this will effectively double the number of sample images to 1,700. This is better, but still not great. I used a technique called Image Augmentation to increase the number of images by a factor of 7, making it 11,900 images. Below is a code snippet for the Image Augmentation I used. The full source file is here.

StyleGAN2具有內置功能，可以從左到右隨機鏡像源圖像。因此，這將有效地將樣本圖像的數量增加一倍，達到1,700。這是更好的，但仍然不是很好。我使用一種稱為“圖像增強”的技術將圖像數量增加了7倍，使其達到11,900張圖像。以下是我使用的圖像增強的代碼段。完整的源文件在這里。

The augmentation uses random rotation, scaling, cropping, and mild color correction to create more variety in the image samples. Note that I resize the images to 1024 by 1024 before applying the Image Augmentation. I will discuss the aspect ratio further down in this post. Here are some examples of Image Augmentation. The original is on the left, and there are six additional variations to the right.

增強使用隨機旋轉，縮放，裁切和輕微的色彩校正來在圖像樣本中創建更多種類。請注意，在應用“圖像增強”之前，我將圖像大小調整為1024×1024。我將在本文中進一步討論長寬比。這是圖像增強的一些示例。原稿在左側，右側還有六個其他變體。

Examples of Image Augmentation Painting Images from WikiArt.org in the Public Domain來自WikiArt.org的公共領域中的圖像增強繪畫圖像示例

訓練GAN (Training the GAN)

I ran the training using Google Colab Pro. Using that service I could run for up to 24 hours on a high-end GPU, an NVIDIA Tesla P10 with 16 GB of memory. I also used Google Drive to retain the work in progress between runs. It took about 13 days to train the GAN, sending 5 million source images through the system. Here is a random sample of the results.

我使用Google Colab Pro進行了培訓。使用該服務，我可以在高端GPU(具有16 GB內存的NVIDIA Tesla P10)上運行長達24小時。我還使用Google云端硬盤保留了兩次運行之間正在進行的工作。訓練GAN大約花了13天，通過系統發送了500萬張源圖像。這是結果的隨機樣本。

Sample Output from MachineRayMachineRay的樣本輸出

You can see from the sample of 28 images above that MachineRay produced paintings in a variety of styles, although there are some visual commonalities between them. There are hints to the styles in the source images, but no exact copies.

您可以從上面的28張圖像樣本中看到，MachineRay產生了多種樣式的繪畫，盡管它們之間存在一些視覺上的共性。在源圖像中有樣式提示，但沒有確切的副本。

調整寬高比 (Adjusting the Aspect Ratio)

Although the original source images had various aspect ratios, ranging from a thinner portrait shape to a wider landscape shape, I made them all dead square to help with the training of the GAN. In order to have a variety of aspect ratios for the output images, I imposed a new aspect ratio prior to the upscaling. Instead of just choosing a purely random aspect ratio, I created a function that chooses an aspect ratio that is based on the statistical distribution of aspect ratios in the source images. Here’s what the distribution looks like.

盡管原始的源圖像具有各種長寬比，從較細的人像形狀到較寬的風景形狀，但我還是將它們全都變成死角以幫助GAN訓練。為了使輸出圖像具有各種寬高比，我在放大之前設置了新的寬高比。我創建了一個函數，該函數根據源圖像中縱橫比的統計分布來選擇縱橫比，而不僅僅是選擇純粹的隨機縱橫比。這是分布的樣子。

Aspect Ratio Distribution Images from WikiArt.org in the Public Domain.寬高比分布圖像。

The graph above plots the aspect ratio of all 850 source images. It ranges from about 0.5, which is a thin 1:2 ratio to about 2.0, which is a wide 2:1 ratio. The chart shows four of the source images to indicate where they are on the chart horizontally. Here’s my Python code that maps a random number from 0 to 850 into an aspect ratio based on the distribution of the source images.

上圖繪制了所有850個源圖像的縱橫比。它的范圍是從約0.5(約1：2的比率)到約2.0(約2：1的比率)。該圖表顯示了四個源圖像，以指示它們在水平位置上的位置。這是我的Python代碼，它根據源圖像的分布將0到850之間的隨機數映射為寬高比。

I adjusted the MachineRay output from above to have varying aspect ratios in the pictures below. You can see that the images seem a bit more natural and less homogenous with just this small change.

我從上方調整了MachineRay輸出，以使其在下圖中具有不同的寬高比。您可以看到，僅進行了這一微小的更改，圖像看起來就顯得更自然，更不均勻。

Sample Output from MachineRay with Varying Aspect Ratios來自MachineRay的具有不同寬高比的示例輸出

超分辨率調整大小 (Super Resolution Resizing)

The images generated from MachineRay have a maximum height or width of 1024 pixels, which is OK for viewing on a computer, but not OK for printing. At 300 DPI it would only print at a size of about 3.5 inches. The images could be resized up, but it would look very soft if printed at 12 inches. There is a technique that uses ANNs to resize images that maintain crisp features called Image Super-Resolution (ISR). For more information on Super-Resolution check out Bharath Raj’s post here.

從MachineRay生成的圖像最大高度或寬度為1024像素，可以在計算機上查看，但不能打印。在300 DPI時，它只能以大約3.5英寸的尺寸打印。可以調整圖像的尺寸，但是如果以12英寸的尺寸打印，則圖像看起來會非常柔和。有一種使用人工神經網絡調整保持清晰特征的圖像大小的技術稱為圖像超分辨率(ISR)。有關超級分辨率的更多信息，請在此處查看Bharath Raj的文章。

There is a nice open-source ISR system with pre-trained models available from Idealo, a German company. Their GANs model does a 4x resize using a GAN trained on photographs. I found that adding a little bit of random noise to the image prior to the ISR creates a painterly effect. Here is the Python code I used to post-process the images.

有一個很好的開源ISR系統，帶有德國公司Idealo提供的預訓練模型。他們的GAN模型使用訓練有素的GAN進行4倍大小調整。我發現在ISR之前向圖像添加一點隨機噪聲會產生繪畫效果。這是我用來對圖像進行后處理的Python代碼。

You can see the results of adding noise and Image Super-Resolution resizing here. Note that the texture detail looks a bit like brushstrokes.

您可以在此處查看添加噪點和調整圖像超分辨率的結果。請注意，紋理細節看起來有點像筆觸。

Left: Sample Image After Added Noise and ISR. Right: Close-up to Show Detail 左：添加噪聲和ISR之后的樣本圖像。 右：顯示細節的特寫

Check out the gallery in Appendix A to see high-resolution output samples from MachineRay.

查看附錄A中的圖庫，以查看MachineRay的高分辨率輸出樣本。

下一步 (Next Steps)

Additional work might include running the GAN at sizes greater than 1024x1024. Porting the code to run on Tensor Processing Units (TPUs) instead of GPUs would make the training run faster. Also, the ISR GAN from Idealo could be trained using paintings instead of photos. This may add a more realistic painterly effect to the images.

其他工作可能包括以大于1024x1024的大小運行GAN。移植代碼以在Tensor處理單元(TPU)而不是GPU上運行將使訓練運行得更快。此外，Idealo的ISR GAN可以使用繪畫代替照片來進行訓練。這可以為圖像添加更逼真的繪畫效果。

致謝 (Acknowledgments)

I would like to thank Jennifer Lim and Oliver Strimpel for their help and feedback on this project.

我要感謝Jennifer Lim和Oliver Strimpel對這個項目的幫助和反饋。

源代碼 (Source Code)

All source code for this project is available on GitHub. A Google Colab for generating images is available here. The sources are released under the CC BY-NC-SA license.

該項目的所有源代碼都可以在GitHub上找到。可在此處使用Google Colab生成圖像。來源根據CC BY-NC-SA許可發布。

Attribution-NonCommercial-ShareAlikeAttribution-NonCommercial-ShareAlike

翻譯自: https://towardsdatascience.com/machineray-using-ai-to-create-abstract-art-39829438076a