使用TensorFlow进行鬼写
Music has long been considered to be one of the most influential and powerful forms of artwork. As such, it has been used to express raw emotion from the artist and transfer it to the listener.
長(zhǎng)期以來,音樂一直被認(rèn)為是最有影響力和最有力的藝術(shù)品形式之一。 這樣,它已被用來表達(dá)藝術(shù)家的原始情感并將其傳遞給聽眾。
Being a fan of music myself, it was only natural to wonder how difficult it would be to generate lyrics using recurrent neural networks (RNNs). I really enjoy rap and hip hop music, so I chose to work off of artists in those genres. It was also a good fit since there is existing research on rap lyric generation.
作為自己的音樂迷,自然而然地想知道使用遞歸神經(jīng)網(wǎng)絡(luò)(RNN)生成歌詞將有多么困難。 我真的很喜歡說唱和嘻哈音樂,所以我選擇與那些類型的藝術(shù)家合作。 由于已有關(guān)于說唱歌詞生成的研究,因此也很合適。
Recurrent neural networks can be used for many language modeling tasks such as: chat bots, predictive keyboards, and language translation. Recurrent neural networks work well when it comes to text generation because of their ability to work with sequential data. This is beneficial as we need to preserve the context of a sentence or, in this case, a verse.
遞歸神經(jīng)網(wǎng)絡(luò)可用于許多語(yǔ)言建模任務(wù),例如:聊天機(jī)器人,預(yù)測(cè)性鍵盤和語(yǔ)言翻譯。 循環(huán)神經(jīng)網(wǎng)絡(luò)在處理文本生成時(shí)效果很好,因?yàn)樗鼈兛梢蕴幚眄樞驍?shù)據(jù)。 這是有益的,因?yàn)槲覀冃枰A艟渥拥纳舷挛?#xff0c;在這種情況下,還需要保留經(jīng)文。
An explanation of how an RNN works would be that it looks at previous data from the sequence to predict the next element in the sequence. Let’s say we have an RNN trained to perform text prediction on your phone’s keyboard (You know, the word predictions that pop up as you type). Based on previous messages I’ve typed I could input something like “Wezley is super …” and the neural network will take that sequence, and give a set of predicted words to go off of, such as: “cool”, “smart”, and “funny”.
對(duì)RNN的工作方式的解釋是,它會(huì)查看序列中的前一個(gè)數(shù)據(jù)以預(yù)測(cè)序列中的下一個(gè)元素。 假設(shè)我們訓(xùn)練過一個(gè)RNN,可以在手機(jī)鍵盤上執(zhí)行文本預(yù)測(cè)(您知道,鍵入時(shí)會(huì)彈出單詞預(yù)測(cè))。 根據(jù)我輸入的先前消息,我可以輸入“ Wezley is super…”之類的信息,然后神經(jīng)網(wǎng)絡(luò)將采用該順序,并給出一組預(yù)測(cè)的單詞,例如:“ cool”,“ smart” , 又可笑”。
體系結(jié)構(gòu)概述 (Overview of Architectures)
To add to this experiment, I wanted to train different recurrent neural network architectures to perform the rap lyric generation. I chose to go with SimpleRNN, Gated Recurrent Unit, Long Short Term Memory, and Convolution Neural Network + Long Short Term Memory based architectures. I chose these to ensure we are able to test each architecture against one-another to determine which would perform the best given the task. We don’t know if one model will outperform the other unless we try, right?
作為本實(shí)驗(yàn)的補(bǔ)充,我想訓(xùn)練不同的遞歸神經(jīng)網(wǎng)絡(luò)體系結(jié)構(gòu)來執(zhí)行說唱歌詞的生成。 我選擇使用SimpleRNN,門控循環(huán)單元,長(zhǎng)短期記憶以及基于卷積神經(jīng)網(wǎng)絡(luò)+長(zhǎng)短期記憶的體系結(jié)構(gòu)。 我選擇了這些以確保我們能夠針對(duì)另一種架構(gòu)進(jìn)行測(cè)試,以確定哪種架構(gòu)在執(zhí)行任務(wù)時(shí)性能最佳。 除非我們嘗試,否則不知道一個(gè)模型是否會(huì)勝過另一個(gè)模型。
The SimpleRNN architecture was more-so for a baseline to see how the other architectures will perform. A SimpleRNN architecture is not very good for this specific task, because of the vanishing gradient problem. This means that the SimpleRNN won’t be very useful in remembering context throughout a bar/verse because it will lose early information about the sequence the further in the sequence we go. This leads to incoherent verses that you’ll see later on in the article. If you are curious and want a TL;DR of how the model performed: we get verses such as “I am, what stone private bedroom now” or “And how the low changed up last gas guitar thing.” Both of these verses were generated from a dataset of Drake lyrics. Neither of them make much sense. However, I’d argue that they’re still fire bars.
對(duì)于基線而言,SimpleRNN體系結(jié)構(gòu)更適合查看其他體系結(jié)構(gòu)的性能。 由于逐漸消失的梯度問題,SimpleRNN體系結(jié)構(gòu)不適用于此特定任務(wù)。 這意味著SimpleRNN在記憶整個(gè)小節(jié)/單節(jié)中的上下文方面不是很有用,因?yàn)樗鼤?huì)在我們走的序列越遠(yuǎn)時(shí)丟失有關(guān)序列的早期信息。 這導(dǎo)致了不連貫的經(jīng)文,您將在本文后面看到。 如果您好奇并希望獲得模型表現(xiàn)的TL; DR:我們會(huì)得到如下經(jīng)文:“我是,現(xiàn)在是什么石制私人臥室”或“以及低點(diǎn)如何改變了最后的氣吉他。” 這兩個(gè)經(jīng)文都是從Drake歌詞數(shù)據(jù)集中產(chǎn)生的。 他們倆都沒有多大意義。 但是,我認(rèn)為他們?nèi)匀皇腔鹆Πl(fā)電廠。
The Gated Recurrent Unit architecture was the next architecture I tested. The gated recurrent unit differs from the SimpleRNN by being able to remember a little further down in the sequence. It accomplishes this by utilizing two gates, a reset gate and an update gate. These gates control if the previous sequence information continues through the network or if it it gets updated to the most recent step. I’ll go a little more in-depth on this further into the article.
門控循環(huán)單元架構(gòu)是我測(cè)試的下一個(gè)架構(gòu)。 門控循環(huán)單元與SimpleRNN的不同之處在于,它可以記住序列中的更深一層。 它通過利用兩個(gè)門,一個(gè)復(fù)位門和一個(gè)更新門來實(shí)現(xiàn)這一點(diǎn)。 這些門控制著先前的序列信息是否繼續(xù)通過網(wǎng)絡(luò)或是否更新為最新步驟。 在本文中,我將對(duì)此進(jìn)行更深入的介紹。
The Long Short Term Memory architecture was another architecture that was tested for this project. The LSTM differs from the SimpleRNN by, again, being able to remember further down the sequence. The LSTM has an advantage over the GRU by being able to remember longer sequences due to being a little more complex. The LSTM has three gates, instead of two, that control the information it forgets, carries on in the sequence, and updates from the latest step. Again, the LSTM will be covered a little more in-depth later on in the article.
長(zhǎng)短期內(nèi)存體系結(jié)構(gòu)是為此項(xiàng)目測(cè)試的另一種體系結(jié)構(gòu)。 LSTM與SimpleRNN的不同之處再次在于,它可以進(jìn)一步記憶序列。 LSTM與GRU相比具有優(yōu)勢(shì),因?yàn)樗晕?fù)雜一點(diǎn),因此能夠記住更長(zhǎng)的序列。 LSTM具有三個(gè)而不是兩個(gè)門來控制它忘記的信息,按順序進(jìn)行并從最新步驟進(jìn)行更新。 同樣,本文稍后將更深入地介紹LSTM。
The final architecture I tested was a mixture of a convolution neural network and long short term memory RNN. I threw this one in as a thought experiment based off of a paper that I read which used a C-LSTM architecture for text classification (Reference in Colab notebook). I wondered if the CNN would allow the LSTM to generalize a bar and better understand the stylistic elements of an artist. While fun to see a CNN in a text generation problem, I didn’t notice much of a different between this and the LSTM model.
我測(cè)試的最終體系結(jié)構(gòu)是卷積神經(jīng)網(wǎng)絡(luò)和長(zhǎng)期短期記憶RNN的混合體。 我以閱讀的論文為基礎(chǔ)進(jìn)行了思想實(shí)驗(yàn),以此作為思想實(shí)驗(yàn),該論文使用C-LSTM體系結(jié)構(gòu)進(jìn)行了文本分類(Colab筆記本中的參考)。 我想知道CNN是否可以使LSTM歸納標(biāo)準(zhǔn)并更好地了解藝術(shù)家的風(fēng)格元素。 在文本生成問題中看到CNN很有趣,但我并沒有注意到它與LSTM模型之間有很多不同。
獲取數(shù)據(jù)集 (Obtaining the Dataset)
With a defined set of architectures created, I set out to find the dataset I wanted to use for this problem.
創(chuàng)建一組定義的體系結(jié)構(gòu)后,我開始查找要用于此問題的數(shù)據(jù)集。
The dataset didn’t really matter to me, so long as it contained lyrics from prominent artists. I wanted to generate lyrics based off of artists I listen to often. This was so I could recognize if the model was able to generate similar lyrics. Don’t worry though! I didn’t determine a model’s performance solely off of what I thought sounded good. I also used a set of metrics that have been described in recent literature on the subject.
數(shù)據(jù)集對(duì)我來說并不重要,只要它包含著名藝術(shù)家的歌詞即可。 我想根據(jù)經(jīng)常聽的歌手來產(chǎn)生歌詞。 這樣我就可以識(shí)別出該模型是否能夠生成相似的歌詞。 不過不要擔(dān)心! 我并不能僅僅根據(jù)我認(rèn)為聽起來不錯(cuò)的方式來確定模型的性能。 我還使用了一組有關(guān)該主題的最新文獻(xiàn)中描述的指標(biāo)。
The dataset I found was here on Kaggle and was provided by Paul Mooney.
我發(fā)現(xiàn)的數(shù)據(jù)集在Kaggle上 ,由Paul Mooney提供。
This dataset was great because it contained lyrics from many of the rap/hip hop artists that I listen to. It also didn’t have any weird characters and took care of some of censoring of explicit lyrics.
這個(gè)數(shù)據(jù)集很棒,因?yàn)樗宋衣犨^的許多說唱/嘻哈歌手的歌詞。 它也沒有任何怪異的字符,并負(fù)責(zé)一些顯式歌詞的審查。
準(zhǔn)備數(shù)據(jù) (Preparing the Data)
With the dataset in hand, I set out to load and prepare the data for training.
有了數(shù)據(jù)集,我便開始加載和準(zhǔn)備數(shù)據(jù)以進(jìn)行訓(xùn)練。
The first thing I did was load in the data and finish censoring it. I used a preexisting Python library to perform the censorship so that I didn’t have to create a “naughty words” list manually. Unfortunately the library didn’t censor every word, so I apologize if you stumble across something explicit in the published notebook for this article.
我要做的第一件事是加載數(shù)據(jù)并完成審查。 我使用一個(gè)預(yù)先存在的Python庫(kù)來執(zhí)行檢查,因此不必手動(dòng)創(chuàng)建“頑皮的話”列表。 不幸的是,圖書館并沒有審查每個(gè)詞,因此,如果您偶然發(fā)現(xiàn)本文中已發(fā)表筆記本中的明顯內(nèi)容,我深表歉意。
With the lyrics read in and censored, I went ahead and split them into an array of bars. I didn’t do any other processing to the bars, but in the future I may try this again and add <start> and <end> tags to each bar. This way the model can possibly learn when to end the sequence. For now, I had it generate bars of randomized lengths and the results were good enough for the initial experiment.
讀完歌詞并對(duì)其進(jìn)行審查后,我繼續(xù)將其拆分為多個(gè)小節(jié)。 我沒有對(duì)條進(jìn)行任何其他處理,但是將來我可能會(huì)再次嘗試此操作,并在每個(gè)條中添加<start>和<end>標(biāo)記。 這樣,模型就可以學(xué)習(xí)何時(shí)結(jié)束序列。 現(xiàn)在,我讓它生成隨機(jī)長(zhǎng)度的條形圖,結(jié)果足以用于初始實(shí)驗(yàn)。
Once I finished splitting the data, I created a Markov model utilizing the markovify Python library. The Markov model will be used to generate the beginning sequences for each bar. This will help us ensure that the beginning of the sequence is somewhat coherent before passing it to the trained models. The models will then take the sequence and finish generating the lyrics for the bar.
分割完數(shù)據(jù)后,我就使用markovify Python庫(kù)創(chuàng)建了一個(gè)Markov模型。 馬爾可夫模型將用于生成每個(gè)小節(jié)的開始序列。 這將幫助我們確保在將序列傳遞給訓(xùn)練后的模型之前,序列的開始是連貫的。 然后,模型將采用序列并完成為小節(jié)生成歌詞。
The next step was to tokenize the lyrics so that they would be in a format that the models could understand. Tokenization is actually a pretty cool process, as it basically splits up the words into a dictionary of words with IDs tied to them and changes each bar into an array of the corresponding word IDs. There is an example of this in the published notebook, but here’s another example of this in action:
下一步是標(biāo)記歌詞,以使歌詞采用模型可以理解的格式。 令牌化實(shí)際上是一個(gè)非常酷的過程,因?yàn)樗旧蠈卧~拆分成帶有綁定ID的單詞字典,并將每個(gè)小節(jié)更改為相應(yīng)單詞ID的數(shù)組。 在已發(fā)布的筆記本中有一個(gè)示例,但是這是一個(gè)實(shí)際的示例:
For an example, let’s say we were to tokenize the following sentences:
例如,假設(shè)我們要標(biāo)記以下句子:
“Wezley is cool”
“韋茲利很酷”
“You are cool”
“你很酷”
“TensorFlow is very cool”
“ TensorFlow非常酷”
The following sequences would be produced:
將產(chǎn)生以下序列:
[1, 2, 3]
[1,2,3]
[4, 5, 3]
[4、5、3]
[6, 2, 7, 3]
[6,2,7,3]
Where the word dictionary is:
其中單詞字典是:
[‘Wezley’ : 1, ‘is’ : 2, ‘cool’ : 3, ‘You’ : 4, ‘a(chǎn)re’ : 5, ‘TensorFlow’ : 6, ‘very’ : 7]
['Wezley':1,'is':2,'cool':3,'You':4,'are':5,'TensorFlow':6,'very':7]
As-is, these sequences can’t be fed into a model since they are of different lengths. To fix this, we add padding to the front of the arrays.
照原樣,這些序列的長(zhǎng)度不同,因此無法輸入模型。 為了解決這個(gè)問題,我們?cè)跀?shù)組的前面添加了填充。
With padding we get:
通過填充,我們得到:
[0, 1, 2, 3]
[0,1,2,3]
[0, 4, 5, 3]
[0,4,5,3]
[6, 2, 7, 3]
[6,2,7,3]
With the bars tokenized, I was finally able to create my X and y data for training. The train_X data consisted of an entire bar, minus the last word. The train_y data was the last word in the bar.
在標(biāo)記條化之后,我終于能夠創(chuàng)建我的X和y數(shù)據(jù)進(jìn)行訓(xùn)練。 train_X數(shù)據(jù)包括一個(gè)完整的小節(jié),減去最后一個(gè)單詞。 train_y數(shù)據(jù)是該欄中的最后一個(gè)單詞。
Looking into the future, as with adding the <start> and <end> tags to the bars. I want to try changing up the way I’m splitting the training data. Maybe have the next version of this predict an entire bar based off the previous bar. That’ll be a project for another day though.
展望未來,就像在欄上添加<start>和<end>標(biāo)記一樣。 我想嘗試改變分割訓(xùn)練數(shù)據(jù)的方式。 也許讓此版本的下一個(gè)版本根據(jù)上一個(gè)柱形來預(yù)測(cè)整個(gè)柱形。 那將是另一天的項(xiàng)目。
定義模型 (Defining the Models)
With the data imported and split into the train_X and train_y sets. It’s time to define the model architectures and begin training.
導(dǎo)入數(shù)據(jù)并將其拆分為train_X和train_y集。 現(xiàn)在是時(shí)候定義模型架構(gòu)并開始培訓(xùn)了。
First up is the SimpleRNN architecture! The SimpleRNN will give a good baseline against the GRU, LSTM, and CNN+LSTM architectures.
首先是SimpleRNN架構(gòu)! SimpleRNN將為GRU,LSTM和CNN + LSTM體系結(jié)構(gòu)提供良好的基線。
The SimpleRNN unit can be expressed arithmetically as:
SimpleRNN單元可以算術(shù)表示為:
Where h(t) is expressed as the hidden state at a given point in time t. As you can see in the equation, the SimpleRNN relies on the previous hidden state h(t-1) and the current input x(t) to give us the current hidden state.
其中h(t)表示為在給定時(shí)間點(diǎn)t的隱藏狀態(tài)。 從方程式中可以看出,SimpleRNN依賴于先前的隱藏狀態(tài)h(t-1)和當(dāng)前輸入x(t)來提供當(dāng)前的隱藏狀態(tài)。
The SimpleRNN is great because of its ability to work with sequence data. The shortfall is in its simplicity. The SimpleRNN is unable to remember data further back in the sequence and thus suffers from the vanishing gradient problem. The vanishing gradient problem occurs when we start getting further down the sequence. This is when earlier states have a harder time being expressed. There is no mechanism in a SimpleRNN to help is keep track of previous states.
SimpleRNN之所以出色,是因?yàn)樗哂刑幚硇蛄袛?shù)據(jù)的能力。 不足之處在于其簡(jiǎn)單性。 SimpleRNN無法記住序列中更遠(yuǎn)的數(shù)據(jù),因此遭受梯度消失的困擾。 當(dāng)我們開始進(jìn)一步深入序列時(shí),就會(huì)出現(xiàn)消失的梯度問題。 這是較早的狀態(tài)很難表達(dá)的時(shí)候。 SimpleRNN中沒有機(jī)制可以幫助跟蹤以前的狀態(tài)。
Vanishing Gradient Visualization消失的梯度可視化In code, the SimpleRNN network looks like:
在代碼中,SimpleRNN網(wǎng)絡(luò)如下所示:
SimpleRNN ArchitectureSimpleRNN架構(gòu)The data being fed into the network is only expressed as a N*T vector, where the SimpleRNN is expecting an N*T*D vector. We correct this by adding an embedding layer to give the vector the D dimension. The embedding layer allows for the inputs to be transformed into a dense vector that can be fed into the SimpleRNN cells. For more information on the embedding layer see the TensorFlow documentation here.
饋入網(wǎng)絡(luò)的數(shù)據(jù)僅表示為N * T向量,其中SimpleRNN期望使用N * T * D向量。 我們通過添加嵌入層為向量賦予D維來糾正此問題。 嵌入層允許將輸入轉(zhuǎn)換成可以輸入到SimpleRNN單元中的密集向量。 對(duì)于埋入層的更多信息,請(qǐng)參閱TensorFlow文檔這里 。
I’m utilizing the Adam optimizer with a learning rate of 0.001. I’m using categorical cross-entropy as my loss function. Categorical cross-entropy is being used because we are trying to classify the next word in the sequence given the previous steps.
我正在使用學(xué)習(xí)率為0.001的Adam優(yōu)化器。 我正在使用分類交叉熵作為損失函數(shù)。 正在使用分類交叉熵,因?yàn)槲覀冋趪L試根據(jù)前面的步驟對(duì)序列中的下一個(gè)單詞進(jìn)行分類。
SimpleRNN CellSimpleRNN單元Next up is the network utilizing the Gated Recurrent Unit.
接下來是利用門控循環(huán)單元的網(wǎng)絡(luò)。
The GRU improves upon the SimpleRNN cell by introducing a reset and update gate. At a high level, these gates are used to decide which information we want to retain/lose previous states.
通過引入重置和更新門,GRU對(duì)SimpleRNN單元進(jìn)行了改進(jìn)。 在較高級(jí)別上,這些門用來確定我們要保留/丟失先前狀態(tài)的信息。
The GRU is expressed as:
GRU表示為:
Where z(t) is the update gate, r(t) is the reset gate, and h(t) is the hidden cell state.
其中z(t)是更新門, r(t)是重置門, h(t)是隱藏單元狀態(tài)。
Here’s how the GRU looks in action:
這是GRU運(yùn)作的樣子:
GRU CellGRU細(xì)胞Here is how the GRU network is constructed in TensorFlow:
這是在TensorFlow中構(gòu)建GRU網(wǎng)絡(luò)的方式:
Again, I’m utilizing Adam for the optimizer and categorical cross-entropy as the loss function.
同樣,我將Adam用于優(yōu)化程序,并將分類交叉熵作為損失函數(shù)。
The Long Short Term Memory architecture was the next to be utilized.
長(zhǎng)短期內(nèi)存架構(gòu)是下一個(gè)要使用的架構(gòu)。
The long short term memory cell has advantages over the SimpleRNN and GRU cells by being able retain even more information further down the sequence. The LSTM utilizes three different gates as oppose to the GRU’s two, and retains a cell state throughout the network. The GRU is known to have the advantage of speed over the LSTM, in that it is able to generalize faster and utilize fewer parameters. However, the LSTM tends to take the cake when it comes to retaining more contextual data throughout a sequence.
長(zhǎng)期短期存儲(chǔ)單元比SimpleRNN和GRU單元具有優(yōu)勢(shì),因?yàn)樗梢栽谛蛄兄羞M(jìn)一步保留更多信息。 LSTM與GRU的兩個(gè)相反,利用了三個(gè)不同的門,并在整個(gè)網(wǎng)絡(luò)中保留了單元狀態(tài)。 眾所周知,GRU具有速度優(yōu)于LSTM的優(yōu)勢(shì),因?yàn)樗軌蚋斓胤夯⒗酶俚膮?shù)。 但是,當(dāng)要在整個(gè)序列中保留更多上下文數(shù)據(jù)時(shí),LSTM往往是蛋糕。
The LSTM cell can be expressed as:
LSTM單元可以表示為:
Where f(t) represents the forget gate, and determines how much of the previous state to forget. Then i(t) represents the input gates which determines how much of the new information we will add to the cell state. The o(t) is the output gate, which determines which information will be progressing to the next hidden state. The cell state is represented by c(t), and the hidden state is h(t).
其中f(t)表示忘記門,并確定要忘記的先前狀態(tài)有多少。 然后i(t)代表輸入門,它決定了我們將添加到單元狀態(tài)的新信息量。 o(t)是輸出門,它確定哪些信息將前進(jìn)到下一個(gè)隱藏狀態(tài)。 單元狀態(tài)由c(t)表示 ,隱藏狀態(tài)為h(t)。
Here is a visualization of data progressing through and LSTM cell:
這是通過LSTM單元進(jìn)行的數(shù)據(jù)可視化:
LSTM CellLSTM電池See below for the implementation in code:
參見下面的代碼實(shí)現(xiàn):
The final architecture I wanted to test was a combination of a convolution neural network and LSTM.
我要測(cè)試的最終體系結(jié)構(gòu)是卷積神經(jīng)網(wǎng)絡(luò)和LSTM的組合。
This network was a thought experiment to see how the results would differ from the LSTM, GRU, and SimpleRNN. I was actually surprised at some of the verses it was about to put out.
該網(wǎng)絡(luò)是一個(gè)思想實(shí)驗(yàn),旨在查看結(jié)果與LSTM,GRU和SimpleRNN的不同之處。 實(shí)際上,我對(duì)即將推出的某些經(jīng)文感到驚訝。
Here is the code for the architecture:
這是該體系結(jié)構(gòu)的代碼:
用模型發(fā)火 (Generating Fire with the Models)
Creating the models for this project was only about half of the work. The other half was generating song lyrics utilizing the trained model.
為該項(xiàng)目創(chuàng)建模型僅完成了一半的工作。 另一半是利用訓(xùn)練好的模型來生成歌曲歌詞。
In my opinion, this is where the project became really fun. I was able to take the models I trained and utilize them for a non-trivial task.
我認(rèn)為,這是該項(xiàng)目真正有趣的地方。 我能夠采用我訓(xùn)練的模型,并將其用于一項(xiàng)重要任務(wù)。
This project was heavily inspired by “Evaluating Creative Language Generation: The Case of Rap Lyric Ghost Writing” by Peter Potash, Alexey Romanov, and Anna Rumshishky. With that, I’m going to utilize some of the methods outlined in their paper for evaluating the output of the models against the original lyrics from the artist.
這個(gè)項(xiàng)目的靈感來自Peter Potash,Alexey Romanov和Anna Rumshishky的“ 評(píng)估創(chuàng)新語(yǔ)言的產(chǎn)生:說唱抒情鬼寫作的案例” 。 這樣,我將利用他們論文中概述的一些方法,根據(jù)藝術(shù)家的原始歌詞來評(píng)估模型的輸出。
The methods I’m utilizing to evaluate bars and generate raps are: comprehension score, rhyme index, and lyrical uniqueness. I’ll discuss how I calculated these shortly.
我用來評(píng)估小節(jié)和產(chǎn)生說唱的方法是:理解力得分,韻律指數(shù)和抒情唯一性。 我將在短期內(nèi)討論如何計(jì)算這些。
A high level overview of how I’m generating songs can be described as:
我如何生成歌曲的高級(jí)概述可以描述為:
- Utilize Markov model to generate first four words of a bar 利用馬爾可夫模型生成小節(jié)的前四個(gè)單詞
- Take the output of the Markov model and feed them into the RNN 取馬爾可夫模型的輸出并將其輸入RNN
- Evaluate the output of the RNN against the original lyrics for uniqueness, similar rhyme index, and similar comprehension score 根據(jù)原始歌詞評(píng)估RNN的輸出是否具有唯一性,相似的韻律指數(shù)和相似的理解力得分
- Either throw out the bar (if it’s trash), or add it to the song (if it’s fire) 扔掉酒吧(如果是垃圾桶),或?qū)⑵涮砑拥礁枨?如果是火桶)
Fairly simple, right?
很簡(jiǎn)單,對(duì)吧?
Let’s jump into the code of how this is done.
讓我們跳入完成此操作的代碼。
First, I have a function named generate_rap. This function handles the main functionality of generating a rap song. generate_rap takes in the model I want to use to generate the rap (SimpleRNN, GRU, LSTM, or CNN+LSTM), the max bar length, how many bars we want in the rap, score thresholds, and how many tries we want for generating a fire bar. The score thresholds define how well the bar scores before it is considered fire — in this case, the closer to 0 the bar is, the more fire it is. Here is how the function looks in code:
首先,我有一個(gè)名為generate_rap的函數(shù)。 此功能處理產(chǎn)生說唱歌曲的主要功能。 generate_rap接受我要用于生成說唱的模型(SimpleRNN,GRU,LSTM或CNN + LSTM),最大條長(zhǎng)度,我們?cè)谡f唱中需要多少條,得分閾值以及我們想要進(jìn)行多少次嘗試產(chǎn)生火條。 得分閾值定義條在被認(rèn)為是開火之前得分的程度-在這種情況下,條越接近0,則開火越多。 該函數(shù)在代碼中的外觀如下:
As you can see, we generate a random bar, score it based on the artist’s average rhyme index, average comprehension, and the uniqueness of the bar. Then if the bar meets the score threshold it is graduated into the final song. If the algorithm fails to generate a fire bar within the defined max tries, it’ll put the best scored bar in the song and move on.
如您所見,我們生成一個(gè)隨機(jī)小節(jié),根據(jù)藝術(shù)家的平均韻律指數(shù),平均理解度和小節(jié)的唯一性對(duì)其評(píng)分。 然后,如果小節(jié)達(dá)到樂譜閾值,則將其定級(jí)為最終歌曲。 如果該算法未能在定義的最大嘗試次數(shù)內(nèi)生成火線,它將在歌曲中得分最高的火線并繼續(xù)前進(jìn)。
Within generate_rap I’m utilizing another function named generate_bar. This function takes in a seed phrase, the model we are using to generate the sequence, and the sequence’s length. generate_bar will then tokenize the seed phrase and feed it into the provided model until the sequence hits the desired length, then return the output. Here is the code:
在generate_rap中,我利用了另一個(gè)名為generate_bar的函數(shù)。 此函數(shù)包含一個(gè)種子短語(yǔ),我們用于生成序列的模型以及序列的長(zhǎng)度。 然后generate_bar將標(biāo)記種子短語(yǔ)并將其饋送到提供的模型中,直到序列達(dá)到所需的長(zhǎng)度,然后返回輸出。 這是代碼:
To score the bars, I’m utilizing a function named score_bar. This function takes in the bar we want to score, the artist’s original lyrics, the artist’s average comprehension score, and the artist’s average rhyme index. score_bar calculates the input bar’s comprehension score, rhyme index, and uniqueness index then scores the bar.
為了給小節(jié)打分,我利用了一個(gè)名為score_bar的函數(shù)。 此功能包含我們想要得分的小節(jié),藝術(shù)家的原始歌詞,藝術(shù)家的平均理解分?jǐn)?shù)和藝術(shù)家的平均韻律指數(shù)。 score_bar計(jì)算輸入小節(jié)的理解分?jǐn)?shù),韻律指數(shù)和唯一性指數(shù),然后對(duì)小節(jié)進(jìn)行評(píng)分。
The bar’s score can be positive or negative with 0 being the best score a bar can achieve. A score of 0 means that the bar has the same rhyme index and comprehension score while remaining completely unique from the original artist’s lyrics. A perfect score of 0 will be impossible to achieve, which is why we are defining min and max thresholds.
小節(jié)的分?jǐn)?shù)可以是正數(shù)或負(fù)數(shù),0是小節(jié)可以達(dá)到的最佳分?jǐn)?shù)。 分?jǐn)?shù)為0表示該小節(jié)具有相同的韻律指數(shù)和理解力分?jǐn)?shù),而與原始歌手的歌詞完全不同。 完美分?jǐn)?shù)0將無法實(shí)現(xiàn),這就是為什么我們要定義最小和最大閾值。
The score_bar function looks like:
score_bar函數(shù)如下所示:
To calculate the rhyme index of a bar, I’m utilizing the method as described in “Evaluating Creative Language Generation: The Case of Rap Lyric Ghostwriting.” Rhyme index is calculated by taking the number of rhymed syllables and dividing that by the total number of syllables in the bar or song. Here is that implementation in code:
為了計(jì)算小節(jié)的韻律指數(shù),我使用了“評(píng)估創(chuàng)意語(yǔ)言生成:Rap歌詞代筆案例”中描述的方法。 韻律指數(shù)是通過將押韻音節(jié)的數(shù)量除以小節(jié)或歌曲中的總音節(jié)數(shù)量來計(jì)算的。 這是代碼中的實(shí)現(xiàn):
For comparing the uniqueness of the generated bar, I’m computing the cosine distance between the generated bar and all of the artist’s bars. I’m then getting the average distance to compute the total uniqueness score. Here is how that looks:
為了比較生成的條的唯一性,我計(jì)算了生成的條與藝術(shù)家的所有條之間的余弦距離。 然后,我得到平均距離來計(jì)算總的唯一性分?jǐn)?shù)。 看起來是這樣的:
結(jié)果 (The Results)
With all of this I was finally able to generate a full rap utilizing the four models I trained. After generating the rap, I took the generated song and calculated the rhyme index and comprehension scores. Surprisingly the full song still remained fairly close to the original artist’s rhyme index and comprehension score.
有了這些,我終于能夠使用我訓(xùn)練的四個(gè)模型來產(chǎn)生完整的說唱。 產(chǎn)生說唱之后,我拿起產(chǎn)生的歌曲并計(jì)算了韻律指數(shù)和理解力分?jǐn)?shù)。 令人驚訝的是,整首歌仍然與原歌手的韻律指數(shù)和理解力分?jǐn)?shù)相當(dāng)接近。
Here are some of the outputs when training off of Drake lyrics.
這是訓(xùn)練Drake歌詞時(shí)的一些輸出。
The SimpleRNN:
SimpleRNN:
Generated rap with avg rhyme density: 0.5030674846625767 and avg readability of: 2.0599999999999996 Rap Generated with SimpleRNN: Now you’re throwing me baby know it knowLook I gotta started with you hook drake
I swear it happened no tellin’ yeah yeah
....
The GRU:
GRU:
Generated rap with avg rhyme density: 0.5176470588235295 and avg readability of: 1.9449999999999998 Rap Generated with GRU: That's why I died everything big crazy on meWho keepin' score up yeah yeah yeah yeah
I've loved and you everything big crazy on me on
....
The LSTM:
LSTM:
Generated rap with avg rhyme density: 0.3684210526315789 and avg readability of: 1.9749999999999996 Rap Generated with LSTM: Get the **** lick alone same that wait nowup ****, see what uh huh heart thing up yeah
Despite the things though up up up up yeah yeah
....
The LSTM+CSNN:
LSTM + CSNN:
Generated rap with avg rhyme density: 0.33519553072625696 and avg readability of: 2.2599999999999993 Rap Generated with CNN+LSTM: They still out know play through now out outI got it dedicate dedicate you yeah
I've been waiting much much aye aye days aye aye
....
For the full lyrics and list of references, take a look at the Google Colab notebook. Also feel free to try it yourself and change the artist for the style you want to mimic.
有關(guān)完整歌詞和參考文獻(xiàn)列表, 請(qǐng)查看Google Colab筆記本 。 也可以隨意嘗試一下,并根據(jù)您想模仿的風(fēng)格來改變藝術(shù)家。
As far as the SimpleRNN vs GRU vs LSTM vs CNN+LSTM experiment goes, I would say that the LSTM tended to have the best results. The CNN+LSTM had too many repetitive words in a bar, and I think this has to do with the CNN generalizing the sequence as a whole. The SimpleRNN and GRU produced pretty incoherent bars, and their rhyme densities were really far off from the original artist.
就SimpleRNN,GRU,LSTM,CNN + LSTM實(shí)驗(yàn)而言,我想說LSTM往往有最好的結(jié)果。 CNN + LSTM的條形圖中有太多重復(fù)的單詞,我認(rèn)為這與CNN概括了整個(gè)序列有關(guān)。 SimpleRNN和GRU產(chǎn)生了非常不連貫的小節(jié),并且它們的韻律密度與原始藝術(shù)家的確相距甚遠(yuǎn)。
That's it! Let me know what you think in the comments. I’d love to build upon this project in the future. If you have any suggestions for things I need to change to get better results, let me know! Thank you for reading.
而已! 讓我知道您在評(píng)論中的想法。 將來我會(huì)希望以此項(xiàng)目為基礎(chǔ)。 如果您對(duì)我需要更改以獲得更好結(jié)果的任何建議,請(qǐng)告訴我! 感謝您的閱讀。
Check out my GitHub for the code to this project, and other cool projects!
查看我的GitHub,獲取該項(xiàng)目以及其他出色項(xiàng)目的代碼!
翻譯自: https://towardsdatascience.com/ghost-writing-with-tensorflow-49e77e26978f
總結(jié)
以上是生活随笔為你收集整理的使用TensorFlow进行鬼写的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 《阿凡达 2》登影史票房榜第四,全球票房
- 下一篇: 一加Ace2竞速版设计曝光:直屏+直角中