Assignment | 05-week2 -Part_2-Emojify!
該系列僅在原課程基礎(chǔ)上課后作業(yè)部分添加個(gè)人學(xué)習(xí)筆記,如有錯(cuò)誤,還請(qǐng)批評(píng)指教。- ZJ
Coursera 課程 |deeplearning.ai |網(wǎng)易云課堂
CSDN:http://blog.csdn.net/JUNJUN_ZHAO/article/details/79470246
Welcome to the second assignment of Week 2. You are going to use word vector representations to build an Emojifier.
Have you ever wanted to make your text messages more expressive? Your emojifier app will help you do that. So rather than writing “Congratulations on the promotion! Lets get coffee and talk. Love you!” the emojifier can automatically turn this into “Congratulations on the promotion! ? Lets get coffee and talk. ?? Love you! ??”
You will implement a model which inputs a sentence (such as “Let’s go see the baseball game tonight!”) and finds the most appropriate emoji to be used with this sentence (??). In many emoji interfaces, you need to remember that ?? is the “heart” symbol rather than the “l(fā)ove” symbol. But using word vectors, you’ll see that even if your training set explicitly relates only a few words to a particular emoji, your algorithm will be able to generalize and associate words in the test set to the same emoji even if those words don’t even appear in the training set. This allows you to build an accurate classifier mapping from sentences to emojis, even using a small training set.
In this exercise, you’ll start with a baseline model (Emojifier-V1) using word embeddings, then build a more sophisticated model (Emojifier-V2) that further incorporates an LSTM.
Lets get started! Run the following cell to load the package you are going to use.
import numpy as np from emo_utils import * import emoji import matplotlib.pyplot as plt%matplotlib inline ''' emo_utils.py'''import csv import numpy as np import emoji import pandas as pd import matplotlib.pyplot as plt from sklearn.metrics import confusion_matrixdef read_glove_vecs(glove_file):with open(glove_file, 'r', encoding='utf-8') as f:words = set()word_to_vec_map = {}for line in f:line = line.strip().split()curr_word = line[0]words.add(curr_word)word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)i = 1words_to_index = {}index_to_words = {}for w in sorted(words):words_to_index[w] = iindex_to_words[i] = wi = i + 1return words_to_index, index_to_words, word_to_vec_mapdef softmax(x):"""Compute softmax values for each sets of scores in x."""e_x = np.exp(x - np.max(x))return e_x / e_x.sum()def read_csv(filename = 'data/emojify_data.csv'):phrase = []emoji = []with open (filename) as csvDataFile:csvReader = csv.reader(csvDataFile)for row in csvReader:phrase.append(row[0])emoji.append(row[1])X = np.asarray(phrase)Y = np.asarray(emoji, dtype=int)return X, Ydef convert_to_one_hot(Y, C):Y = np.eye(C)[Y.reshape(-1)]return Yemoji_dictionary = {"0": "\u2764\uFE0F", # :heart: prints a black instead of red heart depending on the font"1": ":baseball:","2": ":smile:","3": ":disappointed:","4": ":fork_and_knife:"}def label_to_emoji(label):"""Converts a label (int or string) into the corresponding emoji code (string) ready to be printed"""return emoji.emojize(emoji_dictionary[str(label)], use_aliases=True)def print_predictions(X, pred):print()for i in range(X.shape[0]):print(X[i], label_to_emoji(int(pred[i])))def plot_confusion_matrix(y_actu, y_pred, title='Confusion matrix', cmap=plt.cm.gray_r):df_confusion = pd.crosstab(y_actu, y_pred.reshape(y_pred.shape[0],), rownames=['Actual'], colnames=['Predicted'], margins=True)df_conf_norm = df_confusion / df_confusion.sum(axis=1)plt.matshow(df_confusion, cmap=cmap) # imshow#plt.title(title)plt.colorbar()tick_marks = np.arange(len(df_confusion.columns))plt.xticks(tick_marks, df_confusion.columns, rotation=45)plt.yticks(tick_marks, df_confusion.index)#plt.tight_layout()plt.ylabel(df_confusion.index.name)plt.xlabel(df_confusion.columns.name)def predict(X, Y, W, b, word_to_vec_map):"""Given X (sentences) and Y (emoji indices), predict emojis and compute the accuracy of your model over the given set.Arguments:X -- input data containing sentences, numpy array of shape (m, None)Y -- labels, containing index of the label emoji, numpy array of shape (m, 1)Returns:pred -- numpy array of shape (m, 1) with your predictions"""m = X.shape[0]pred = np.zeros((m, 1))for j in range(m): # Loop over training examples# Split jth test example (sentence) into list of lower case wordswords = X[j].lower().split()# Average words' vectorsavg = np.zeros((50,))for w in words:avg += word_to_vec_map[w]avg = avg/len(words)# Forward propagationZ = np.dot(W, avg) + bA = softmax(Z)pred[j] = np.argmax(A)print("Accuracy: " + str(np.mean((pred[:] == Y.reshape(Y.shape[0],1)[:]))))return pred1 - Baseline model: Emojifier-V1
1.1 - Dataset EMOJISET
Let’s start by building a simple baseline classifier.
You have a tiny dataset (X, Y) where:
- X contains 127 sentences (strings)
- Y contains a integer label between 0 and 4 corresponding to an emoji for each sentence
Let’s load the dataset using the code below. We split the dataset between training (127 examples) and testing (56 examples).
X_train, Y_train = read_csv('data/train_emoji.csv') X_test, Y_test = read_csv('data/tesss.csv') maxLen = len(max(X_train, key=len).split())Run the following cell to print sentences from X_train and corresponding labels from Y_train. Change index to see different examples. Because of the font the iPython notebook uses, the heart emoji may be colored black rather than red.
index = 7 print(X_train[index], label_to_emoji(Y_train[index])) congratulations on your acceptance ?1.2 - Overview of the Emojifier-V1
In this part, you are going to implement a baseline model called “Emojifier-v1”.
The input of the model is a string corresponding to a sentence (e.g. “I love you). In the code, the output will be a probability vector of shape (1,5), that you then pass in an argmax layer to extract the index of the most likely emoji output.
To get our labels into a format suitable for training a softmax classifier, lets convert YY from its current shape current shape (m,1)(m,1) into a “one-hot representation” (m,5)(m,5), where each row is a one-hot vector giving the label of one example, You can do so using this next code snipper. Here, Y_oh stands for “Y-one-hot” in the variable names Y_oh_train and Y_oh_test:
Y_oh_train = convert_to_one_hot(Y_train, C = 5) Y_oh_test = convert_to_one_hot(Y_test, C = 5)Let’s see what convert_to_one_hot() did. Feel free to change index to print out different values.
index = 50 print(Y_train[index], "is converted into one hot", Y_oh_train[index]) 0 is converted into one hot [1. 0. 0. 0. 0.]All the data is now ready to be fed into the Emojify-V1 model. Let’s implement the model!
1.3 - Implementing Emojifier-V1
As shown in Figure (2), the first step is to convert an input sentence into the word vector representation, which then get averaged together. Similar to the previous exercise, we will use pretrained 50-dimensional GloVe embeddings. Run the following cell to load the word_to_vec_map, which contains all the vector representations.
如圖(2)所示,第一步是將輸入語(yǔ)句轉(zhuǎn)換為單詞向量表示,然后將其平均到一起。 與之前的練習(xí)類似,我們將使用預(yù)訓(xùn)練的 50 維 GloVe 嵌入。 運(yùn)行以下單元格以加載包含所有向量表示的word_to_vec_map。
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')You’ve loaded:
- word_to_index: dictionary mapping from words to their indices in the vocabulary (400,001 words, with the valid indices ranging from 0 to 400,000)
- index_to_word: dictionary mapping from indices to their corresponding words in the vocabulary
- word_to_vec_map: dictionary mapping words to their GloVe vector representation.
Run the following cell to check if it works.
word = "cucumber" index = 289846 print("the index of", word, "in the vocabulary is", word_to_index[word]) print("the", str(index) + "th word in the vocabulary is", index_to_word[index]) the index of cucumber in the vocabulary is 113317 the 289846th word in the vocabulary is potatosExercise: Implement sentence_to_avg(). You will need to carry out two steps:
1. Convert every sentence to lower-case, then split the sentence into a list of words. X.lower() and X.split() might be useful.
2. For each word in the sentence, access its GloVe representation. Then, average all these values.
Expected Output:
| **avg= ** | [-0.008005 0.56370833 -0.50427333 0.258865 0.55131103 0.03104983 -0.21013718 0.16893933 -0.09590267 0.141784 -0.15708967 0.18525867 0.6495785 0.38371117 0.21102167 0.11301667 0.02613967 0.26037767 0.05820667 -0.01578167 -0.12078833 -0.02471267 0.4128455 0.5152061 0.38756167 -0.898661 -0.535145 0.33501167 0.68806933 -0.2156265 1.797155 0.10476933 -0.36775333 0.750785 0.10282583 0.348925 -0.27262833 0.66768 -0.10706167 -0.283635 0.59580117 0.28747333 -0.3366635 0.23393817 0.34349183 0.178405 0.1166155 -0.076433 0.1445417 0.09808667] |
Model
You now have all the pieces to finish implementing the model() function. After using sentence_to_avg() you need to pass the average through forward propagation, compute the cost, and then backpropagate to update the softmax’s parameters.
Exercise: Implement the model() function described in Figure (2). Assuming here that YohYoh (“Y one hot”) is the one-hot encoding of the output labels, the equations you need to implement in the forward pass and to compute the cross-entropy cost are:
a(i)=softmax(z(i))a(i)=softmax(z(i))
L(i)=?∑k=0ny?1Yoh(i)k?log(a(i)k)L(i)=?∑k=0ny?1Yohk(i)?log(ak(i))
It is possible to come up with a more efficient vectorized implementation. But since we are using a for-loop to convert the sentences one at a time into the avg^{(i)} representation anyway, let’s not bother this time.
We provided you a function softmax().
# GRADED FUNCTION: modeldef model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):"""Model to train word vector representations in numpy.Arguments:X -- input data, numpy array of sentences as strings, of shape (m, 1)Y -- labels, numpy array of integers between 0 and 7, numpy-array of shape (m, 1)word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representationlearning_rate -- learning_rate for the stochastic gradient descent algorithmnum_iterations -- number of iterationsReturns:pred -- vector of predictions, numpy-array of shape (m, 1)W -- weight matrix of the softmax layer, of shape (n_y, n_h)b -- bias of the softmax layer, of shape (n_y,)"""np.random.seed(1)# Define number of training examplesm = Y.shape[0] # number of training examplesn_y = 5 # number of classes n_h = 50 # dimensions of the GloVe vectors # Initialize parameters using Xavier initializationW = np.random.randn(n_y, n_h) / np.sqrt(n_h)b = np.zeros((n_y,))# Convert Y to Y_onehot with n_y classesY_oh = convert_to_one_hot(Y, C = n_y) # Optimization loopfor t in range(num_iterations): # Loop over the number of iterationsfor i in range(m): # Loop over the training examples### START CODE HERE ### (≈ 4 lines of code)# Average the word vectors of the words from the i'th training example X ,X 訓(xùn)練樣本中的 第 i 個(gè)樣本avg = sentence_to_avg(X[i], word_to_vec_map)# Forward propagate the avg through the softmax layerz = np.dot(W, avg) + ba = softmax(z)# Compute cost using the i'th training label's one hot representation and "A" (the output of the softmax)cost = -np.sum(Y_oh[i]*np.log(a))### END CODE HERE #### Compute gradients dz = a - Y_oh[i]dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))db = dz# Update parameters with Stochastic Gradient DescentW = W - learning_rate * dWb = b - learning_rate * dbif t % 100 == 0:print("Epoch: " + str(t) + " --- cost = " + str(cost))pred = predict(X, Y, W, b, word_to_vec_map)return pred, W, b print(X_train.shape) print(Y_train.shape) print(np.eye(5)[Y_train.reshape(-1)].shape) print(X_train[0]) print(type(X_train)) Y = np.asarray([5,0,0,5, 4, 4, 4, 6, 6, 4, 1, 1, 5, 6, 6, 3, 6, 3, 4, 4]) print(Y.shape)X = np.asarray(['I am going to the bar tonight', 'I love you', 'miss you my dear','Lets go party and drinks','Congrats on the new job','Congratulations','I am so happy for you', 'Why are you feeling bad', 'What is wrong with you','You totally deserve this prize', 'Let us go play football','Are you down for football this afternoon', 'Work hard play harder','It is suprising how people can be dumb sometimes','I am very disappointed','It is the best day in my life','I think I will end up alone','My life is so boring','Good job','Great so awesome'])print(X.shape) print(np.eye(5)[Y_train.reshape(-1)].shape) print(type(X_train)) (132,) (132,) (132, 5) never talk to me again <class 'numpy.ndarray'> (20,) (20,) (132, 5) <class 'numpy.ndarray'>Run the next cell to train your model and learn the softmax parameters (W,b).
pred, W, b = model(X_train, Y_train, word_to_vec_map) print(pred) Epoch: 0 --- cost = 1.952049881281007 Accuracy: 0.3484848484848485 Epoch: 100 --- cost = 0.07971818726014807 Accuracy: 0.9318181818181818 Epoch: 200 --- cost = 0.04456369243681402 Accuracy: 0.9545454545454546 Epoch: 300 --- cost = 0.03432267378786059 Accuracy: 0.9696969696969697 [[3.] [2.] [3.] [0.] [4.] [0.] [3.] [2.] [3.] [1.] [3.] [3.] [1.] [3.] [2.] [3.] [2.] [3.] [1.] [2.] [3.] [0.] [2.] [2.] [2.] [1.] [4.] [3.] [3.] [4.] [0.] [3.] [4.] [2.] [0.] [3.] [2.] [2.] [3.] [4.] [2.] [2.] [0.] [2.] [3.] [0.] [3.] [2.] [4.] [3.] [0.] [3.] [3.] [3.] [4.] [2.] [1.] [1.] [1.] [2.] [3.] [1.] [0.] [0.] [0.] [3.] [4.] [4.] [2.] [2.] [1.] [2.] [0.] [3.] [2.] [2.] [0.] [3.] [3.] [1.] [2.] [1.] [2.] [2.] [4.] [3.] [3.] [2.] [4.] [0.] [0.] [3.] [3.] [3.] [3.] [2.] [0.] [1.] [2.] [3.] [0.] [2.] [2.] [2.] [3.] [2.] [2.] [2.] [4.] [1.] [1.] [3.] [3.] [4.] [1.] [2.] [1.] [1.] [3.] [1.][0.] [4.] [0.] [3.] [3.] [4.] [4.] [1.] [4.] [3.] [0.] [2.]]Expected Output (on a subset of iterations):
| **Epoch: 0** | cost = 1.95204988128 | Accuracy: 0.348484848485 |
| **Epoch: 100** | cost = 0.0797181872601 | Accuracy: 0.931818181818 |
| **Epoch: 200** | cost = 0.0445636924368 | Accuracy: 0.954545454545 |
| **Epoch: 300** | cost = 0.0343226737879 | Accuracy: 0.969696969697 |
Great! Your model has pretty high accuracy on the training set. Lets now see how it does on the test set.
1.4 - Examining test set performance
print("Training set:") pred_train = predict(X_train, Y_train, W, b, word_to_vec_map) print('Test set:') pred_test = predict(X_test, Y_test, W, b, word_to_vec_map) Training set: Accuracy: 0.9772727272727273 Test set: Accuracy: 0.8571428571428571Expected Output:
| **Train set accuracy** | 97.7 |
| **Test set accuracy** | 85.7 |
Random guessing would have had 20% accuracy given that there are 5 classes. This is pretty good performance after training on only 127 examples.
In the training set, the algorithm saw the sentence “I love you” with the label ??. You can check however that the word “adore” does not appear in the training set. Nonetheless, lets see what happens if you write “I adore you.”
X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy"]) Y_my_labels = np.array([[0], [0], [2], [1], [4],[3]])pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map) print_predictions(X_my_sentences, pred) Accuracy: 0.8333333333333334i adore you ?? i love you ?? funny lol ? lets play with a ball ? food is ready ? not feeling happy ?Amazing! Because adore has a similar embedding as love, the algorithm has generalized correctly even to a word it has never seen before. Words such as heart, dear, beloved or adore have embedding vectors similar to love, and so might work too—feel free to modify the inputs above and try out a variety of input sentences. How well does it work?
Note though that it doesn’t get “not feeling happy” correct. This algorithm ignores word ordering, so is not good at understanding phrases like “not happy.”
Printing the confusion matrix can also help understand which classes are more difficult for your model. A confusion matrix shows how often an example whose label is one class (“actual” class) is mislabeled by the algorithm with a different class (“predicted” class).
print(Y_test.shape) print(' '+ label_to_emoji(0)+ ' ' + label_to_emoji(1) + ' ' + label_to_emoji(2)+ ' ' + label_to_emoji(3)+' ' + label_to_emoji(4)) print(pd.crosstab(Y_test, pred_test.reshape(56,), rownames=['Actual'], colnames=['Predicted'], margins=True)) plot_confusion_matrix(Y_test, pred_test) (56,)?? ? ? ? ? Predicted 0.0 1.0 2.0 3.0 4.0 All Actual 0 6 0 0 1 0 7 1 0 8 0 0 0 8 2 2 0 16 0 0 18 3 1 1 2 12 0 16 4 0 0 1 0 6 7 All 9 9 19 13 6 56
What you should remember from this part:
- Even with a 127 training examples, you can get a reasonably good model for Emojifying. This is due to the generalization power word vectors gives you.
- Emojify-V1 will perform poorly on sentences such as “This movie is not good and not enjoyable” because it doesn’t understand combinations of words–it just averages all the words’ embedding vectors together, without paying attention to the ordering of words. You will build a better algorithm in the next part.
- 即使有127個(gè)訓(xùn)練示例,您也可以獲得一個(gè)合理的良好模型進(jìn)行Emojifying。 這是由于泛化詞向量賦予的。
- Emojify-V1在諸如“這部電影不好,不愉快”等句子上表現(xiàn)不佳,因?yàn)樗焕斫鈫卧~的組合 - 它只是將所有單詞的嵌入矢量集中在一起,而沒有關(guān)注 單詞排序。 您將在下一部分中構(gòu)建一個(gè)更好的算法。
2 - Emojifier-V2: Using LSTMs in Keras:
Let’s build an LSTM model that takes as input word sequences. This model will be able to take word ordering into account. Emojifier-V2 will continue to use pre-trained word embeddings to represent words, but will feed them into an LSTM, whose job it is to predict the most appropriate emoji.
讓我們建立一個(gè)LSTM模型,將其作為輸入詞序列。 這個(gè)模型將能夠考慮文字排序。 Emojifier-V2將繼續(xù)使用預(yù)先訓(xùn)練的單詞嵌入來(lái)表示單詞,但會(huì)將它們輸入到LSTM中,其工作是預(yù)測(cè)最合適的表情符號(hào)。
Run the following cell to load the Keras packages.
import numpy as np np.random.seed(0) from keras.models import Model from keras.layers import Dense, Input, Dropout, LSTM, Activation from keras.layers.embeddings import Embedding from keras.preprocessing import sequence from keras.initializers import glorot_uniform np.random.seed(1)2.1 - Overview of the model
Here is the Emojifier-v2 you will implement:
2.2 Keras and mini-batching
In this exercise, we want to train Keras using mini-batches. However, most deep learning frameworks require that all sequences in the same mini-batch have the same length. This is what allows vectorization to work: If you had a 3-word sentence and a 4-word sentence, then the computations needed for them are different (one takes 3 steps of an LSTM, one takes 4 steps) so it’s just not possible to do them both at the same time.
The common solution to this is to use padding. Specifically, set a maximum sequence length, and pad all sequences to the same length. For example, of the maximum sequence length is 20, we could pad every sentence with “0”s so that each input sentence is of length 20. Thus, a sentence “i love you” would be represented as (ei,elove,eyou,0??,0??,…,0??)(ei,elove,eyou,0→,0→,…,0→). In this example, any sentences longer than 20 words would have to be truncated. One simple way to choose the maximum sequence length is to just pick the length of the longest sentence in the training set.
在這個(gè)練習(xí)中,我們想要使用小批量培訓(xùn)Keras。然而,大多數(shù)深度學(xué)習(xí)框架要求同一個(gè)小批量中的所有序列具有相同的長(zhǎng)度。這是允許矢量化工作的原因:如果你有一個(gè)3字的句子和一個(gè)4字的句子,那么他們所需要的計(jì)算是不同的(一個(gè)需要3個(gè)步驟的LSTM,一個(gè)需要4個(gè)步驟),所以這是不可能的同時(shí)做到這一點(diǎn)。
常見的解決方法是使用填充。具體而言,設(shè)置最大序列長(zhǎng)度,并將所有序列填充到相同長(zhǎng)度。例如,最大序列長(zhǎng)度為20,我們可以用“0”填充每個(gè)句子,使得每個(gè)輸入句子的長(zhǎng)度為20.因此,句子“我愛你”將被表示為(ei,elove,eyou,0??,0??,…,0??)(ei,elove,eyou,0→,0→,…,0→)。在這個(gè)例子中,任何超過(guò)20個(gè)單詞的句子都必須被截?cái)唷_x擇最大序列長(zhǎng)度的一個(gè)簡(jiǎn)單方法是只選擇訓(xùn)練集中最長(zhǎng)句子的長(zhǎng)度。
2.3 - The Embedding layer
In Keras, the embedding matrix is represented as a “l(fā)ayer”, and maps positive integers (indices corresponding to words) into dense vectors of fixed size (the embedding vectors). It can be trained or initialized with a pretrained embedding. In this part, you will learn how to create an Embedding() layer in Keras, initialize it with the GloVe 50-dimensional vectors loaded earlier in the notebook. Because our training set is quite small, we will not update the word embeddings but will instead leave their values fixed. But in the code below, we’ll show you how Keras allows you to either train or leave fixed this layer.
The Embedding() layer takes an integer matrix of size (batch size, max input length) as input. This corresponds to sentences converted into lists of indices (integers), as shown in the figure below.
在Keras中,嵌入矩陣表示為“圖層”,并將正整數(shù)(與單詞對(duì)應(yīng)的索引)映射到固定大小的密集向量(嵌入向量)。 它可以通過(guò)預(yù)訓(xùn)練嵌入進(jìn)行訓(xùn)練或初始化。 在這一部分中,您將學(xué)習(xí)如何在Keras中創(chuàng)建一個(gè)嵌入()圖層,并使用早先在筆記本中加載的GloVe 50維矢量進(jìn)行初始化。 因?yàn)槲覀兊挠?xùn)練集非常小,我們不會(huì)更新嵌入的單詞,而是會(huì)固定它們的值。 但在下面的代碼中,我們將向您展示Keras如何讓您能夠訓(xùn)練或離開固定此圖層。
Embedding()圖層將大小的整數(shù)矩陣(批量大小,最大輸入長(zhǎng)度)作為輸入。 這對(duì)應(yīng)于轉(zhuǎn)換為索引列表(整數(shù))的句子,如下圖所示。
The largest integer (i.e. word index) in the input should be no larger than the vocabulary size. The layer outputs an array of shape (batch size, max input length, dimension of word vectors).
The first step is to convert all your training sentences into lists of indices, and then zero-pad all these lists so that their length is the length of the longest sentence.
輸入中最大的整數(shù)(即單詞索引)不應(yīng)大于詞匯大小。 該圖層輸出形狀數(shù)組(批量大小,最大輸入長(zhǎng)度,單詞向量的維數(shù))。
第一步是將所有訓(xùn)練語(yǔ)句轉(zhuǎn)換為索引列表,然后將所有這些列表填零,以使其長(zhǎng)度為最長(zhǎng)句子的長(zhǎng)度。
Exercise: Implement the function below to convert X (array of sentences as strings) into an array of indices corresponding to words in the sentences. The output shape should be such that it can be given to Embedding() (described in Figure 4).
實(shí)現(xiàn)下面的函數(shù),將X(作為字符串的句子數(shù)組)轉(zhuǎn)換為與句子中的單詞相對(duì)應(yīng)的索引數(shù)組。 輸出形狀應(yīng)該可以賦予Embedding()(如圖4所示)。
# GRADED FUNCTION: sentences_to_indicesdef sentences_to_indices(X, word_to_index, max_len):"""Converts an array of sentences (strings) into an array of indices corresponding to words in the sentences.The output shape should be such that it can be given to `Embedding()` (described in Figure 4). Arguments:X -- array of sentences (strings), of shape (m, 1)word_to_index -- a dictionary containing the each word mapped to its indexmax_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. Returns:X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)"""m = X.shape[0] # number of training examples### START CODE HERE #### Initialize X_indices as a numpy matrix of zeros and the correct shape (≈ 1 line)X_indices = np.zeros((m, max_len))for i in range(m): # loop over training examples# Convert the ith training sentence in lower case and split is into words. You should get a list of words.sentence_words =X[i].lower().split()# Initialize j to 0j = 0# Loop over the words of sentence_wordsfor w in sentence_words:# Set the (i,j)th entry of X_indices to the index of the correct word.X_indices[i, j] = word_to_index[w]# Increment j to j + 1j = j + 1### END CODE HERE ###return X_indicesRun the following cell to check what sentences_to_indices() does, and check your results.
X1 = np.array(["funny lol", "lets play baseball", "food is ready for you"]) X1_indices = sentences_to_indices(X1,word_to_index, max_len = 5) print("X1 =", X1) print("X1_indices =", X1_indices) X1 = ['funny lol' 'lets play baseball' 'food is ready for you'] X1_indices = [[155345. 225122. 0. 0. 0.][220930. 286375. 69714. 0. 0.][151204. 192973. 302254. 151349. 394475.]]Expected Output:
| **X1 =** | [‘funny lol’ ‘lets play football’ ‘food is ready for you’] |
| **X1_indices =** | [[ 155345. 225122. 0. 0. 0.] [ 220930. 286375. 151266. 0. 0.] [ 151204. 192973. 302254. 151349. 394475.]] |
Let’s build the Embedding() layer in Keras, using pre-trained word vectors. After this layer is built, you will pass the output of sentences_to_indices() to it as an input, and the Embedding() layer will return the word embeddings for a sentence.
Exercise: Implement pretrained_embedding_layer(). You will need to carry out the following steps:
1. Initialize the embedding matrix as a numpy array of zeroes with the correct shape.
2. Fill in the embedding matrix with all the word embeddings extracted from word_to_vec_map.
3. Define Keras embedding layer. Use Embedding(). Be sure to make this layer non-trainable, by setting trainable = False when calling Embedding(). If you were to set trainable = True, then it will allow the optimization algorithm to modify the values of the word embeddings.
4. Set the embedding weights to be equal to the embedding matrix
Expected Output:
| **weights[0][1][3] =** | -0.3403 |
2.3 Building the Emojifier-V2
Lets now build the Emojifier-V2 model. You will do so using the embedding layer you have built, and feed its output to an LSTM network.
Exercise: Implement Emojify_V2(), which builds a Keras graph of the architecture shown in Figure 3. The model takes as input an array of sentences of shape (m, max_len, ) defined by input_shape. It should output a softmax probability vector of shape (m, C = 5). You may need Input(shape = ..., dtype = '...'), LSTM(), Dropout(), Dense(), and Activation().
# GRADED FUNCTION: Emojify_V2def Emojify_V2(input_shape, word_to_vec_map, word_to_index):"""Function creating the Emojify-v2 model's graph.Arguments:input_shape -- shape of the input, usually (max_len,)word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representationword_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)Returns:model -- a model instance in Keras"""### START CODE HERE #### Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).sentence_indices = Input(shape=input_shape, dtype='int32')# Create the embedding layer pretrained with GloVe Vectors (≈1 line)embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)# Propagate sentence_indices through your embedding layer, you get back the embeddings 通過(guò)您的嵌入層傳播句子索引,您可以找回詞嵌embeddings = embedding_layer(sentence_indices)# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state# Be careful, the returned output should be a batch of sequences. 要小心,返回的輸出應(yīng)該是一批序列。X = LSTM(128, return_sequences=True)(embeddings)# Add dropout with a probability of 0.5X = Dropout(0.5)(X)# Propagate X trough another LSTM layer with 128-dimensional hidden state# Be careful, the returned output should be a single hidden state, not a batch of sequences.X = LSTM(128, return_sequences=False)(X)# Add dropout with a probability of 0.5X = Dropout(0.5)(X)# Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.X = Dense(5, activation='softmax')(X)# Add a softmax activationX = Activation('softmax')(X)# Create Model instance which converts sentence_indices into X.model = Model(inputs=sentence_indices ,outputs=X)### END CODE HERE ###return modelRun the following cell to create your model and check its summary. Because all sentences in the dataset are less than 10 words, we chose max_len = 10. You should see your architecture, it uses “20,223,927” parameters, of which 20,000,050 (the word embeddings) are non-trainable, and the remaining 223,877 are. Because our vocabulary size has 400,001 words (with valid indices from 0 to 400,000) there are 400,001*50 = 20,000,050 non-trainable parameters.
model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index) model.summary() _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 10) 0 _________________________________________________________________ embedding_2 (Embedding) (None, 10, 50) 20000050 _________________________________________________________________ lstm_1 (LSTM) (None, 10, 128) 91648 _________________________________________________________________ dropout_1 (Dropout) (None, 10, 128) 0 _________________________________________________________________ lstm_2 (LSTM) (None, 128) 131584 _________________________________________________________________ dropout_2 (Dropout) (None, 128) 0 _________________________________________________________________ dense_1 (Dense) (None, 5) 645 _________________________________________________________________ activation_1 (Activation) (None, 5) 0 ================================================================= Total params: 20,223,927 Trainable params: 223,877 Non-trainable params: 20,000,050 _________________________________________________________________As usual, after creating your model in Keras, you need to compile it and define what loss, optimizer and metrics your are want to use. Compile your model using categorical_crossentropy loss, adam optimizer and ['accuracy'] metrics:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])It’s time to train your model. Your Emojifier-V2 model takes as input an array of shape (m, max_len) and outputs probability vectors of shape (m, number of classes). We thus have to convert X_train (array of sentences as strings) to X_train_indices (array of sentences as list of word indices), and Y_train (labels as indices) to Y_train_oh (labels as one-hot vectors).
X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen) Y_train_oh = convert_to_one_hot(Y_train, C = 5)Fit the Keras model on X_train_indices and Y_train_oh. We will use epochs = 50 and batch_size = 32.
model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True) Epoch 1/50 132/132 [==============================] - 3s 21ms/step - loss: 1.6086 - acc: 0.1818 Epoch 2/50 132/132 [==============================] - 0s 773us/step - loss: 1.5870 - acc: 0.3409 Epoch 3/50 132/132 [==============================] - 0s 773us/step - loss: 1.5725 - acc: 0.2652 ............ Epoch 37/50 132/132 [==============================] - 0s 713us/step - loss: 1.2161 - acc: 0.6894 Epoch 38/50 132/132 [==============================] - 0s 796us/step - loss: 1.2403 - acc: 0.6591 Epoch 39/50 132/132 [==============================] - 0s 841us/step - loss: 1.2404 - acc: 0.6591 Epoch 40/50 132/132 [==============================] - 0s 872us/step - loss: 1.2219 - acc: 0.6742 Epoch 41/50 132/132 [==============================] - 0s 834us/step - loss: 1.2183 - acc: 0.6818 Epoch 42/50 132/132 [==============================] - 0s 917us/step - loss: 1.1985 - acc: 0.6970 Epoch 43/50 132/132 [==============================] - 0s 864us/step - loss: 1.1996 - acc: 0.6970 Epoch 44/50 132/132 [==============================] - 0s 993us/step - loss: 1.1839 - acc: 0.7197 Epoch 45/50 132/132 [==============================] - 0s 834us/step - loss: 1.1949 - acc: 0.7121 Epoch 46/50 132/132 [==============================] - 0s 758us/step - loss: 1.1841 - acc: 0.7121 Epoch 47/50 132/132 [==============================] - 0s 781us/step - loss: 1.1618 - acc: 0.7424 Epoch 48/50 132/132 [==============================] - 0s 796us/step - loss: 1.1614 - acc: 0.7348 Epoch 49/50 132/132 [==============================] - 0s 773us/step - loss: 1.1440 - acc: 0.7727 Epoch 50/50 132/132 [==============================] - 0s 758us/step - loss: 1.1098 - acc: 0.7955<keras.callbacks.History at 0x237004d0518>Your model should perform close to 100% accuracy on the training set. The exact accuracy you get may be a little different. Run the following cell to evaluate your model on the test set.
X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen) Y_test_oh = convert_to_one_hot(Y_test, C = 5) loss, acc = model.evaluate(X_test_indices, Y_test_oh) print() print("Test accuracy = ", acc) 56/56 [==============================] - 0s 2ms/stepTest accuracy = 0.839285705770765You should get a test accuracy between 80% and 95%. Run the cell below to see the mislabelled examples.
# This code allows you to see the mislabelled examples C = 5 y_test_oh = np.eye(C)[Y_test.reshape(-1)] X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen) pred = model.predict(X_test_indices) for i in range(len(X_test)):x = X_test_indicesnum = np.argmax(pred[i])if(num != Y_test[i]):print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip()) Expected emoji:? prediction: she got me a nice present ?? Expected emoji:? prediction: work is hard ? Expected emoji:? prediction: This girl is messing with me ?? Expected emoji:? prediction: This stupid grader is not working ?? Expected emoji:? prediction: work is horrible ? Expected emoji:? prediction: you brighten my day ?? Expected emoji:? prediction: she is a bully ?? Expected emoji:? prediction: Why are you feeling bad ?? Expected emoji:? prediction: My life is so boring ??Now you can try it on your own example. Write your own sentence below.
# Change the sentence below to see your prediction. Make sure all the words are in the Glove embeddings. x_test = np.array(['not feeling happy']) X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen) print(x_test[0] +' '+ label_to_emoji(np.argmax(model.predict(X_test_indices)))) not feeling happy ?Previously, Emojify-V1 model did not correctly label “not feeling happy,” but our implementation of Emojiy-V2 got it right. (Keras’ outputs are slightly random each time, so you may not have obtained the same result.) The current model still isn’t very robust at understanding negation (like “not happy”) because the training set is small and so doesn’t have a lot of examples of negation. But if the training set were larger, the LSTM model would be much better than the Emojify-V1 model at understanding such complex sentences.
以前,Emojify-V1 模型沒有正確標(biāo)注“不快樂(lè)”,但我們的 Emojiy-V2 的實(shí)現(xiàn)是正確的。 (Keras 的輸出每次都是隨機(jī)的,所以你可能沒有得到相同的結(jié)果。)目前的模型在理解否定(如“不高興”)方面仍然不是很穩(wěn)健,因?yàn)橛?xùn)練集很小, 有很多否定的例子。 但是如果訓(xùn)練集較大,在理解這樣復(fù)雜的句子時(shí),LSTM 模型會(huì)比 Emojify-V1 模型好得多。
Congratulations!
You have completed this notebook! ??????
What you should remember:
- If you have an NLP task where the training set is small, using word embeddings can help your algorithm significantly. Word embeddings allow your model to work on words in the test set that may not even have appeared in your training set.
- Training sequence models in Keras (and in most other deep learning frameworks) requires a few important details:
- To use mini-batches, the sequences need to be padded so that all the examples in a mini-batch have the same length.
- An Embedding() layer can be initialized with pretrained values. These values can be either fixed or trained further on your dataset. If however your labeled dataset is small, it’s usually not worth trying to train a large pre-trained set of embeddings.
- LSTM() has a flag called return_sequences to decide if you would like to return every hidden states or only the last one.
- You can use Dropout() right after LSTM() to regularize your network.
- 如果您的NLP任務(wù)的訓(xùn)練集較小,則使用詞嵌入可以顯著幫助您的算法。 字嵌入允許您的模型在測(cè)試集中的單詞上工作,這些單詞甚至可能不會(huì)出現(xiàn)在您的訓(xùn)練集中。
- Keras(以及大多數(shù)其他深度學(xué)習(xí)框架)中的訓(xùn)練序列模型需要一些重要細(xì)節(jié):
- 要使用小批量,序列需要填充,以便小批量中的所有示例具有相同的長(zhǎng)度。
- Embedding()圖層可以使用預(yù)訓(xùn)練值進(jìn)行初始化。 這些值可以是固定的,也可以是在數(shù)據(jù)集上進(jìn)一步訓(xùn)練的。 但是,如果您標(biāo)記的數(shù)據(jù)集很小,則通常不值得嘗試訓(xùn)練大量預(yù)先訓(xùn)練好的嵌入。
- LSTM()有一個(gè)名為return_sequences的標(biāo)志來(lái)決定是否要返回每個(gè)隱藏狀態(tài)或僅返回最后一個(gè)狀態(tài)。
- 您可以在LSTM()后立即使用 Dropout()來(lái)調(diào)整您的網(wǎng)絡(luò)。
Congratulations on finishing this assignment and building an Emojifier. We hope you’re happy with what you’ve accomplished in this notebook!
??????
Acknowledgments
Thanks to Alison Darcy and the Woebot team for their advice on the creation of this assignment. Woebot is a chatbot friend that is ready to speak with you 24/7. As part of Woebot’s technology, it uses word embeddings to understand the emotions of what you say. You can play with it by going to http://woebot.io
總結(jié)
以上是生活随笔為你收集整理的Assignment | 05-week2 -Part_2-Emojify!的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 单片机PWM舵机控制
- 下一篇: 浏览器三种刷新方式