當前位置：首頁 >

机器翻译 - 日期翻译

發布時間：2024/1/17 35 豆豆

生活随笔收集整理的這篇文章主要介紹了机器翻译 - 日期翻译小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Neural Machine Translation

您將構建神經機器翻譯（NMT）模型，將人類可讀日期 ("25th of June, 2009")轉換為機器可讀日期("2009-06-25")。您將使用注意模型執行此操作，注意模型是最復雜的sequence to sequence模型之一。

讓我們加載此作業所需的所有包。

from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply from keras.layers import RepeatVector, Dense, Activation, Lambda from keras.optimizers import Adam from keras.utils import to_categorical from keras.models import load_model, Model import keras.backend as K import numpy as npfrom faker import Faker import random from tqdm import tqdm from babel.dates import format_date from nmt_utils import * import matplotlib.pyplot as plt %matplotlib inline

1 - 將人類可讀日期翻譯成機器可讀日期

您將在此處構建的模型可用于將一種語言翻譯為另一種語言，例如從英語翻譯為印地語。但是，語言翻譯需要大量數據集，并且通常需要數天的GPU訓練。為了讓您在不使用大量數據集的情況下嘗試使用這些模型，我們將使用更簡單的“日期轉換”任務。

神經網絡將輸入以各種可能格式(e.g. "the 29th of August 1958", "03/30/1968", "24 JUNE 1987") ，將其翻譯成標準化的機器可讀日期（(e.g. "1958-08-29", "1968-03-30", "1987-06-24")。我們將讓網絡學習到機器可讀的日期格式YYYY-MM-DD。

Take a look at nmt_utils.py to see all the formatting. Count and figure out how the formats work, you will need this knowledge later.

1.1 - 數據集

我們將利用10000個人類可讀日期及其等效、標準化機器可讀日期來訓練一個模型。讓我們運行以下單元格來加載數據集并打印一些示例。

m = 10000 dataset, human_vocab, machine_vocab, inv_machine_vocab = load_dataset(m) 100%|██████████████████████████████████████████| 10000/10000 [00:00<00:00, 30957.97it/s] dataset[:10] [('15 october 1989', '1989-10-15'),('sunday july 15 1984', '1984-07-15'),('wednesday april 5 1978', '1978-04-05'),('3/26/16', '2016-03-26'),('sunday april 5 1992', '1992-04-05'),('sunday october 16 2005', '2005-10-16'),('11 dec 1992', '1992-12-11'),('21 03 75', '1975-03-21'),('tuesday july 23 2013', '2013-07-23'),('sunday november 3 1996', '1996-11-03')]

你加載了：

dataset：元組列表 a list of tuples（人類可讀日期，機器可讀日期）
human_vocab：一個python字典，將人類可讀日期中使用的所有字符映射到整數值索引
machine_vocab：一個python字典，將機器可讀日期中使用的所有字符映射到整數值索引。這些索引不一定與human_vocab一致。
inv_machine_vocab：machine_vocab的逆字典，從索引到字符的映射。

讓我們預處理數據并將原始文本數據映射到索引值。我們還將使用Tx = 30（我們假設是人類可讀日期的最大長度;如果我們得到更長的輸入，我們將截斷它）和Ty = 10（因為“YYYY-MM-DD”是長度為10的字符）。

Tx = 30 Ty = 10 X, Y, Xoh, Yoh = preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty)print("X.shape:", X.shape) print("Y.shape:", Y.shape) print("Xoh.shape:", Xoh.shape) print("Yoh.shape:", Yoh.shape) X.shape: (10000, 30) Y.shape: (10000, 10) Xoh.shape: (10000, 30, 37) Yoh.shape: (10000, 10, 11)

你現在有：

X：訓練集中人類可讀日期的處理版本，其中每個字符由通過human_vocab映射到該字符的索引替換。使用特殊字符（）將每個日期進一步填充到$ T_x $（30）值。 X.shape =（m，Tx）
Y：訓練集中機器可讀日期的處理版本，其中每個字符由它在machine_vocab中映射到的索引替換。你應該有'Y.shape =（m，Ty）`。
Xoh：X的one-hot向量，每個樣本轉換為長度是len（machine_vocab），每個字符在human_vocab對應的位置表示為1，其他的位置是0 Xoh.shape =（m，Tx，len（human_vocab）） m個樣本，Tx個字符，每個字符對應的one-hot長度是len（human_vocab）。（one-hot version of X, the "1" entry's index is mapped to the character thanks to human_vocab.）
Yoh：Y的one-hot向量， Yoh.shape = (m, Tx, len(machine_vocab)). 這里，len（machine_vocab）= 11，因為有11個字符（' - '以及0-9）。

讓我們看一下預處理訓練集。隨意在下面的單元格中使用index來導航數據集并查看源/目標日期是如何預處理的。

index = 0 print("Source date:", dataset[index][0]) print("Target date:", dataset[index][1]) print() print("Source after preprocessing (indices):", X[index]) print("Target after preprocessing (indices):", Y[index]) print() print("Source after preprocessing (one-hot):", Xoh[index]) print("Target after preprocessing (one-hot):", Yoh[index]) Source date: 15 october 1989 Target date: 1989-10-15Source after preprocessing (indices): [ 4 8 0 26 15 30 26 14 17 28 0 4 12 11 12 36 36 36 36 36 36 36 36 3636 36 36 36 36 36] Target after preprocessing (indices): [ 2 10 9 10 0 2 1 0 2 6]Source after preprocessing (one-hot): [[0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][1. 0. 0. ... 0. 0. 0.]...[0. 0. 0. ... 0. 0. 1.][0. 0. 0. ... 0. 0. 1.][0. 0. 0. ... 0. 0. 1.]] Target after preprocessing (one-hot): [[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.][0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.][0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.][1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.][0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.][1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.][0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]]

2 - 使用注意力機制的神經網絡翻譯

如果你必須將一本書的段落從法語翻譯成英語，你就不會閱讀整段，然后關閉書籍并翻譯。即使在翻譯過程中，您也會閱讀/重新閱讀，并專注于正在翻譯的英語部分相對應的法文段落部分。

注意機制告訴神經機器翻譯模型，它應該關注每一步的什么地方。The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step.

2.1 - Attention 機制

在這一部分中，您將實現講座視頻中到的注意力機制。這個圖提醒你模型的工作原理。左側的圖表顯示了注意力模型。右邊的圖表顯示了一個“注意”步驟"Attention" step如何來計算注意力權重 attention variables $\alpha^{\langle t, t' \rangle}$，它用于計算每個時間步($t=1, \ldots, T_y$)輸出的上下文變量 $context ^{\langle t \rangle}$。(which are used to compute the context variable $context^{\langle t \rangle}$ for each timestep in the output ($t=1, \ldots, T_y$). )

注：$t'$是雙向SLTM的時間步，$t$ 是上面的LSTM的時間步，例如$\alpha^{\langle 1, 2 \rangle}$ 表示的是上面SLTM第1個時間步于下面的雙向SLTM第二個時間步之間的權重。
對于上面SLTM每個時間步，例如第一步，注意力權重滿足：注意力權重的和為1
\[\sum_{t} \alpha^{<1, t^{\prime} \rangle}=1\]
注意力權重表示在第t時間步（上層的LSTM）花在$a^{\langle t' \rangle}$（下層的雙向LSTM）的注意力程度，也就是說在生成第t個輸出詞時應該花費多少注意力在第t‘個輸入詞上面。

第一步的上下文c等于: 雙向SLTM的時間步的 $a^{\langle t' \rangle}$ 乘對應的注意力權重的累加和，也就是每一步都考慮了這些狀態，但是有不同的權重

**Figure 1**: Neural machine translation with attention

以下是模型的一些屬性：

此模型中有兩個單獨的LSTM（參見左側圖表）。圖片底部的那個是雙向LSTM并且在attention機制之前，我們將其稱為pre-attention Bi-LSTM。圖表頂部的LSTM在attention機制之后，因此我們稱之為post-attention LSTM。 pre-attention Bi-LSTM 經歷了$T_x$時間步; post-attention LSTM經歷了$ T_y $時間步。
post-attention LSTM將 $s^{\langle t \rangle}, c^{\langle t \rangle}$從一個時間步傳遞到下一個時間步。在講座視頻中，我們僅使用基本RNN作為post-activation sequence模型，因此RNN輸出狀態激活$s^{\langle t\rangle}$。但由于我們在這里使用LSTM，LSTM既有輸出激活$s^{\langle t\rangle}$又有隱藏的單元狀態 hidden cell state $c^{\langle t\rangle}$。但是，與之前的文本生成示例（例如第1周的Dinosaurus）不同，在此模型中，$t$ 時的post-activation LSTM不會將特定生成的 $y^{\langle t-1 \rangle}$ 作為輸入; 它只需要 $s^{\langle t\rangle}$和$c^{\langle t\rangle}$ （沒有輸入x）作為輸入。我們以這種方式設計了模型，（與相鄰字符高度相關的語言生成不同），因為YYYY-MM-DD日期中前一個字符與下一個字符之間的依賴性不強。
我們使用 $a^{\langle t \rangle} = [\overrightarrow{a}^{\langle t \rangle}; \overleftarrow{a}^{\langle t \rangle}]$ 表示關注pre-attention Bi-LSTM的前向和后向激活的串聯（concatenation）。
右邊的圖表使用RepeatVector節點來復制$s^{\langle t-1 \rangle}$的值$ T_x $次，然后Concatenation連接 $s^{\langle t-1 \rangle}$ 和 $a^{\langle t \rangle}$ 來計算$e^{\langle t, t'\rangle}$，然后傳遞到softmax來計算 $\alpha^{\langle t, t' \rangle}$。我們將在下面解釋如何在Keras中使用RepeatVector和Concatenation。

讓我們實現這個模型。您將從實現兩個函數開始:one_step_attention（）和model（）。

1) one_step_attention(): 在 $t$時間步, 根據Bi-LSTM的隱藏狀態 ($[a^{<1>},a^{<2>}, ..., a^{<T_x>}]$) 和第二個LSTM的previous隱藏狀態（previous hidden state of the second LSTM） ($s^{<t-1>}$), one_step_attention() 將計算出注意力權值($[\alpha^{<t,1>},\alpha^{<t,2>}, ..., \alpha^{<t,T_x>}]$) 并輸出上下文向量（context vector）(see Figure 1 (right) for details):
\[context^{<t>} = \sum_{t' = 0}^{T_x} \alpha^{<t,t'>}a^{<t'>}\tag{1}\]

請注意，我們在此將注意力表示為$context^{\langle t \rangle}$。在講座視頻中，上下文表示為$c^{\langle t \rangle}$，但在這里我們稱之為$context^{\langle t \rangle}$ 以避免與 post-attention LSTM的內部記憶單元混淆。

2) model(): 實現整個模型。它首先將輸入放到Bi-LSTM運行并得到$[a^{<1>},a^{<2>}, ..., a^{<T_x>}]$。然后，它調用one_step_attention（）$T_y$次（用for循環）。在此循環的每次迭代中，它將計算出的上下文向量 $c^{<t>}$ （$context^{\langle t \rangle}$）提供給第二個LSTM，并通過具有softmax激活的密集層生成預測結果$\hat{y}^{<t>}$。

練習：實現one_step_attention（）。函數model（）將使用for循環調用one_step_attention（） $ T_y $ 次，注意所有 $T_y$ copies 具有相同的權重。也就是說，它不應該每次重新初始化權重。換句話說，所有 $T_y $ 步驟都應該具有共享一樣的權重。以下是如何在Keras中實現具有可共享權重的圖層：
1.定義圖層對象（作為示例的全局變量）。
2.傳播輸入時調用這些對象。

我們已經將您需要的層定義為全局變量。請運行以下單元格來創建它們。請查看Keras文檔以確保您了解這些圖層是什么：
RepeatVector(), Concatenate(), Dense(), Activation(), Dot().

# 將共享層定義為全局變量 Defined shared layers as global variables repeator = RepeatVector(Tx) concatenator = Concatenate(axis=-1) densor1 = Dense(10, activation = "tanh") densor2 = Dense(1, activation = "relu") activator = Activation(softmax, name='attention_weights') # We are using a custom softmax(axis = 1) loaded in this notebook dotor = Dot(axes = 1)

現在您可以使用這些層來實現one_step_attention（）。為了將keras中的X張量傳遞到這些層，則使用layer（X）（如果它需要多個輸入使用 layer（[X，Y]））。 densor（X）將通過上面定義的Dense（1）層傳播X.

Now you can use these layers to implement one_step_attention(). In order to propagate a Keras tensor object X through one of these layers, use layer(X) (or layer([X,Y]) if it requires multiple inputs.), e.g. densor(X) will propagate X through the Dense(1) layer defined above.

# GRADED FUNCTION: one_step_attentiondef one_step_attention(a, s_prev):"""計算過程是上面的右圖Performs one step of attention: Outputs a context vector computed as a dot product of the attention weights"alphas" and the hidden states "a" of the Bi-LSTM.Arguments:a -- hidden state output of the Bi-LSTM, numpy-array of shape (m, Tx, 2*n_a)s_prev -- previous hidden state of the (post-attention) LSTM, numpy-array of shape (m, n_s)Returns:context -- context vector, input of the next (post-attetion) LSTM cell"""### START CODE HERE #### Use repeator to repeat s_prev to be of shape (m, Tx, n_s) so that you can concatenate it with all hidden states "a" (≈ 1 line)s_prev = repeator(s_prev)# Use concatenator to concatenate a and s_prev on the last axis (≈ 1 line)concat = concatenator([a, s_prev]) #連接成 (a[1],s_prev) (a[2], s_prev)# Use densor1 to propagate concat through a small fully-connected neural network to compute the "intermediate energies" variable e. (≈1 lines)e = densor1(concat) #第一個全連接層# Use densor2 to propagate e through a small fully-connected neural network to compute the "energies" variable energies. (≈1 lines)energies = densor2(e) #第二個全連接層# Use "activator" on "energies" to compute the attention weights "alphas" (≈ 1 line)alphas = activator(energies) #softmax激活# Use dotor together with "alphas" and "a" to compute the context vector to be given to the next (post-attention) LSTM-cell (≈ 1 line)context = dotor([alphas, a])### END CODE HERE ###return context

在編寫了model（）函數之后，檢查one_step_attention（）的預期輸出。

練習：實現model（），如圖2和上面的文字所述。同樣，我們已經定義了要在model（）中共享權重的全局圖層。

n_a = 32 n_s = 64 post_activation_LSTM_cell = LSTM(n_s, return_state = True) output_layer = Dense(len(machine_vocab), activation=softmax)

現在，您可以在for循環中使用這些layers $?_?$次來生成輸出，并且不能重新初始化它們的參數。您必須執行以下步驟：

1.將輸入傳播到Bidirectional LSTM
2.迭代$t = 0, \dots, T_y-1$:
???? 1.使用$[\alpha^{<t,1>},\alpha^{<t,2>}, ..., \alpha^{<t,T_x>}]$和$s^{<t-1>}$調用one_step_attention（）函數，來獲取上下文向量$context^{<t>}$。
???? 2.將$context^{<t>}$ 傳遞到post-attention LSTM單元。請記住使用 initial_state= [previous hidden state, previous cell state]來傳入previous hidden-state $s^{\langle t-1\rangle}$ 和 cell-states $c^{\langle t-1\rangle}$，從而獲取新的 hidden state $s^{<t>}$ 和新的 cell state $c^{<t>}$.
3.將softmax圖層應用于$s^{<t>}$，獲取輸出。
4.通過將輸出添加到輸出列表來保存輸出。

3.創建您的Keras模型實例，它應該有三個輸入("inputs", $s^{<0>}$ and $c^{<0>}$)，最后輸出“輸出”列表。

# GRADED FUNCTION: modeldef model(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size):"""Arguments:Tx -- length of the input sequenceTy -- length of the output sequencen_a -- hidden state size of the Bi-LSTMn_s -- hidden state size of the post-attention LSTMhuman_vocab_size -- size of the python dictionary "human_vocab"machine_vocab_size -- size of the python dictionary "machine_vocab"Returns:model -- Keras model instance"""# Define the inputs of your model with a shape (Tx,)# Define s0 and c0, initial hidden state for the decoder LSTM of shape (n_s,)X = Input(shape=(Tx, human_vocab_size))s0 = Input(shape=(n_s,), name='s0')c0 = Input(shape=(n_s,), name='c0')s = s0c = c0# Initialize empty list of outputsoutputs = []### START CODE HERE #### Step 1: Define your pre-attention Bi-LSTM. Remember to use return_sequences=True. (≈ 1 line)a = Bidirectional(LSTM(n_a, return_sequences=True), name='bidirectional_1')(X)# Step 2: Iterate for Ty stepsfor t in range(Ty):# Step 2.A: Perform one step of the attention mechanism to get back the context vector at step t (≈ 1 line)context = one_step_attention(a, s)# Step 2.B: Apply the post-attention LSTM cell to the "context" vector.# Don't forget to pass: initial_state = [hidden state, cell state] (≈ 1 line)s, _, c = post_activation_LSTM_cell(context, initial_state=[s, c])# Step 2.C: Apply Dense layer to the hidden state output of the post-attention LSTM (≈ 1 line)out = output_layer(s)# Step 2.D: Append "out" to the "outputs" list (≈ 1 line)outputs.append(out)# Step 3: Create model instance taking three inputs and returning the list of outputs. (≈ 1 line)model = Model(inputs=[X, s0, c0], outputs=outputs)### END CODE HERE ###return model

運行以下單元格以創建模型。

model = model(Tx, Ty, n_a, n_s, len(human_vocab), len(machine_vocab))

Let's get a summary of the model to check if it matches the expected output.

model.summary() __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) (None, 30, 37) 0 __________________________________________________________________________________________________ s0 (InputLayer) (None, 64) 0 __________________________________________________________________________________________________ bidirectional_1 (Bidirectional) (None, 30, 64) 17920 input_1[0][0] __________________________________________________________________________________________________ repeat_vector_1 (RepeatVector) (None, 30, 64) 0 s0[0][0] lstm_2[0][0] lstm_2[1][0] lstm_2[2][0] lstm_2[3][0] lstm_2[4][0] lstm_2[5][0] lstm_2[6][0] lstm_2[7][0] lstm_2[8][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate) (None, 30, 128) 0 bidirectional_1[0][0] repeat_vector_1[0][0] bidirectional_1[0][0] repeat_vector_1[1][0] bidirectional_1[0][0] repeat_vector_1[2][0] bidirectional_1[0][0] repeat_vector_1[3][0] bidirectional_1[0][0] repeat_vector_1[4][0] bidirectional_1[0][0] repeat_vector_1[5][0] bidirectional_1[0][0] repeat_vector_1[6][0] bidirectional_1[0][0] repeat_vector_1[7][0] bidirectional_1[0][0] repeat_vector_1[8][0] bidirectional_1[0][0] repeat_vector_1[9][0] __________________________________________________________________________________________________ dense_1 (Dense) (None, 30, 10) 1290 concatenate_1[0][0] concatenate_1[1][0] concatenate_1[2][0] concatenate_1[3][0] concatenate_1[4][0] concatenate_1[5][0] concatenate_1[6][0] concatenate_1[7][0] concatenate_1[8][0] concatenate_1[9][0] __________________________________________________________________________________________________ dense_2 (Dense) (None, 30, 1) 11 dense_1[0][0] dense_1[1][0] dense_1[2][0] dense_1[3][0] dense_1[4][0] dense_1[5][0] dense_1[6][0] dense_1[7][0] dense_1[8][0] dense_1[9][0] __________________________________________________________________________________________________ attention_weights (Activation) (None, 30, 1) 0 dense_2[0][0] dense_2[1][0] dense_2[2][0] dense_2[3][0] dense_2[4][0] dense_2[5][0] dense_2[6][0] dense_2[7][0] dense_2[8][0] dense_2[9][0] __________________________________________________________________________________________________ dot_1 (Dot) (None, 1, 64) 0 attention_weights[0][0] bidirectional_1[0][0] attention_weights[1][0] bidirectional_1[0][0] attention_weights[2][0] bidirectional_1[0][0] attention_weights[3][0] bidirectional_1[0][0] attention_weights[4][0] bidirectional_1[0][0] attention_weights[5][0] bidirectional_1[0][0] attention_weights[6][0] bidirectional_1[0][0] attention_weights[7][0] bidirectional_1[0][0] attention_weights[8][0] bidirectional_1[0][0] attention_weights[9][0] bidirectional_1[0][0] __________________________________________________________________________________________________ c0 (InputLayer) (None, 64) 0 __________________________________________________________________________________________________ lstm_2 (LSTM) [(None, 64), (None, 33024 dot_1[0][0] s0[0][0] c0[0][0] dot_1[1][0] lstm_2[0][0] lstm_2[0][2] dot_1[2][0] lstm_2[1][0] lstm_2[1][2] dot_1[3][0] lstm_2[2][0] lstm_2[2][2] dot_1[4][0] lstm_2[3][0] lstm_2[3][2] dot_1[5][0] lstm_2[4][0] lstm_2[4][2] dot_1[6][0] lstm_2[5][0] lstm_2[5][2] dot_1[7][0] lstm_2[6][0] lstm_2[6][2] dot_1[8][0] lstm_2[7][0] lstm_2[7][2] dot_1[9][0] lstm_2[8][0] lstm_2[8][2] __________________________________________________________________________________________________ dense_4 (Dense) (None, 11) 715 lstm_2[0][0] lstm_2[1][0] lstm_2[2][0] lstm_2[3][0] lstm_2[4][0] lstm_2[5][0] lstm_2[6][0] lstm_2[7][0] lstm_2[8][0] lstm_2[9][0] ================================================================================================== Total params: 52,960 Trainable params: 52,960 Non-trainable params: 0 __________________________________________________________________________________________________

Expected Output:

Here is the summary you should see

Total params:	185,484
Trainable params:	185,484
Non-trainable params:	0
bidirectional_1's output shape	(None, 30, 128)
repeat_vector_1's output shape	(None, 30, 128)
concatenate_1's output shape	(None, 30, 256)
attention_weights's output shape	(None, 30, 1)
dot_1's output shape	(None, 1, 128)
dense_2's output shape	(None, 11)

像往常一樣，在Keras中創建模型后，您需要編譯它并定義您想要使用的損失函數，優化器和指標metrics。使用categorical_crossentropy loss，優化器Adam optimizer編譯你的模型(learning rate = 0.005, $\beta_1 = 0.9$, $\beta_2 = 0.999$, decay = 0.01)， metrics是['accuracy']

### START CODE HERE ### (≈2 lines) opt = Adam(lr=0.005, beta_1=0.9, beta_2=0.999, decay=0.001) model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy']) ### END CODE HERE ###

最后一步是定義所有輸入和輸出以訓練模型：

你已經有了包含訓練樣例的X形狀$(m = 10000, T_x = 30)$。
需要創建s0和c0以使用0初始化post_activation_LSTM_cell。
根據model()，"outputs" 為11個shape (m, T_y)元素的列表。這樣outputs[i][0], ..., outputs[i][Ty]表示對應于 $i^{th}$ 訓練樣本(X[i])的真實label（字符）。更一般地，outputs[i][j]是第$i^{th}$真實label的第$j^{th}$字符。

s0 = np.zeros((m, n_s)) c0 = np.zeros((m, n_s)) outputs = list(Yoh.swapaxes(0,1))

讓我們現在適合模型并運行一個 epoch.

model.fit([Xoh, s0, c0], outputs, epochs=1, batch_size=100)

在訓練時，您可以看到輸出的10個位置中的每個位置的損失和準確性。下表給出了一個例子，說明如果batch有兩個例子，精度可能是多少：

Thus, dense_2_acc_8: 0.89 means that you are predicting the 7th character of the output correctly 89% of the time in the current batch of data.

我們已經運行了這個模型更長時間，并保存了權重。運行下一個單元格以加載我們的權重。（通過訓練模型幾分鐘，您應該能夠獲得類似精度的模型，但加載我們的模型將節省您的時間。）

model.load_weights('models/model.h5')

You can now see the results on new examples.

EXAMPLES = ['3 May 1979', '5 April 09', '21th of August 2016', 'Tue 10 Jul 2007', 'Saturday May 9 2018', 'March 3 2001', 'March 3rd 2001', '1 March 2001'] for example in EXAMPLES:source = string_to_int(example, Tx, human_vocab)source = np.array(list(map(lambda x: to_categorical(x, num_classes=len(human_vocab)), source))).swapaxes(0,1)prediction = model.predict([[source.T], s0, c0]) #prediction = model.predict([source, s0, c0]) #原來的寫法維度不對prediction = np.argmax(prediction, axis = -1)output = [inv_machine_vocab[int(i)] for i in prediction]print("source:", example)print("output:", ''.join(output)) source: 3 May 1979 output: 1979-05-03 source: 5 April 09 output: 2009-05-05 source: 21th of August 2016 output: 2016-08-21 source: Tue 10 Jul 2007 output: 2007-07-10 source: Saturday May 9 2018 output: 2018-05-09 source: March 3 2001 output: 2001-03-03 source: March 3rd 2001 output: 2001-03-03 source: 1 March 2001 output: 2001-03-01

您還可以更改這些示例以使用您自己的示例進行測試。下一部分將讓您更好地了解注意機制正在做什么 - 在生成特定輸出字符時網絡注意哪些部分輸入, what part of the input the network is paying attention to when generating a particular output character.

3 - 可視化 Attention (Optional / Ungraded)

由于問題具有10的固定輸出長度，因此還可以使用10個不同的softmax單元來執行該任務以生成輸出的10個字符。但注意模型的一個優點是輸出的每個部分（比如月份）都知道它只需要依賴于輸入的一小部分（輸入中給出月份的字符）。我們可以可視化輸出的哪個部分正在查看輸入的哪個部分。

考慮將"Saturday 9 May 2018"翻譯為 "2018-05-09"的任務。如果我們可視化計算出的attention 權重參數$\alpha^{\langle t, t' \rangle}$ 我們得到這個：

Figure 8: Full Attention Map

注意輸出如何忽略輸入的“Saturday”部分。輸出時間步長都沒有注意到輸入的“Saturday”部分。我們還看到9已被翻譯為09并且May已被正確翻譯為05，輸出時要注意翻譯所需的輸入部分。年主要要求它注意輸入的“18”以產生“2018”。

3.1 - 從網絡獲取激活

現在讓我們可視化網絡中的注意力值。我們將通過網絡傳播一個樣本，然后可視化$\alpha^{\langle t, t' \rangle}$的值。

為了確定注意力值的位置（where the attention values are located），讓我們首先打印模型的summary。

瀏覽上面的model.summary()輸出。你可以看到，在dot_2 計算每個時間步 $t = 0, \ldots, T_y-1$的上下文向量（context vector）之前， attention_weights 層輸出形狀（m，30,1）的alphas 。讓我們從這一層獲得激活。

函數attention_map() 從模型中提取attention values并繪制它們。

attention_map = plot_attention_map(model, human_vocab, inv_machine_vocab, "Tuesday 09 Oct 1993", num = 7, n_s = 64) <Figure size 432x288 with 0 Axes>

在生成的圖上，您可以觀察預測輸出的每個字符的attention weights。檢查此圖并檢查網絡關注的哪個位置對你有意義。（ where the network is paying attention makes sense to you.）

在日期翻譯應用程序中，您將觀察到大多數時間注意力有助于預測年份，并且對預測日期/月份沒有太大影響。

Congratulations!

你已經完成了這項任務

這是你應該記住的內容：

機器翻譯模型可用于將一個序列映射到另一個序列。它們不僅可用于翻譯人類語言（如法語->英語），還可用于日期格式翻譯等任務。
注意機制允許網絡在生成輸出的特定部分時，關注輸入的最相關部分。
使用注意機制的網絡可以從長度為$T_x$的輸入轉換為長度為$T_y$的輸出，其中$T_x$和$T_y$可以不同。
你可以可視化attention weights $\alpha^{\langle t,t' \rangle}$ ，看看網絡在生成每個輸出在關注（paying attention to）什么。

轉載于:https://www.cnblogs.com/Moonshade/p/10953450.html

總結

以上是生活随笔為你收集整理的机器翻译 - 日期翻译的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： hdfs深入：10、hdfs的javaA
下一篇：注解@NotNull/@NotEmpty

Total params:	185,484
Trainable params:	185,484
Non-trainable params:	0
bidirectional_1's output shape	(None, 30, 128)
repeat_vector_1's output shape	(None, 30, 128)
concatenate_1's output shape	(None, 30, 256)
attention_weights's output shape	(None, 30, 1)
dot_1's output shape	(None, 1, 128)
dense_2's output shape	(None, 11)