當前位置：首頁 >

学习笔记CB012: LSTM 简单实现、完整实现、torch、小说训练word2vec lstm机器人

發(fā)布時間：2024/8/23 37 豆豆

生活随笔收集整理的這篇文章主要介紹了学习笔记CB012: LSTM 简单实现、完整实现、torch、小说训练word2vec lstm机器人小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

摘要：?真正掌握一種算法，最實際的方法，完全手寫出來。 LSTM（Long Short Tem Memory）特殊遞歸神經(jīng)網(wǎng)絡(luò)，神經(jīng)元保存歷史記憶，解決自然語言處理統(tǒng)計方法只能考慮最近n個詞語而忽略更久前詞語的問題。

真正掌握一種算法，最實際的方法，完全手寫出來。

LSTM（Long Short Tem Memory）特殊遞歸神經(jīng)網(wǎng)絡(luò)，神經(jīng)元保存歷史記憶，解決自然語言處理統(tǒng)計方法只能考慮最近n個詞語而忽略更久前詞語的問題。用途：word representation（embedding）(詞語向量)、sequence to sequence learning（輸入句子預測句子）、機器翻譯、語音識別等。

100多行原始python代碼實現(xiàn)基于LSTM二進制加法器。https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/?，翻譯http://blog.csdn.net/zzukun/article/details/49968129?：

import copy, numpy as np np.random.seed(0)

最開始引入numpy庫，矩陣操作。

def sigmoid(x):output = 1/(1+np.exp(-x))return output

聲明sigmoid激活函數(shù)，神經(jīng)網(wǎng)絡(luò)基礎(chǔ)內(nèi)容，常用激活函數(shù)sigmoid、tan、relu等，sigmoid取值范圍[0, 1]，tan取值范圍[-1,1]，x是向量，返回output是向量。

def sigmoid_output_to_derivative(output):return output*(1-output)

聲明sigmoid求導函數(shù)。
加法器思路：二進制加法是二進制位相加，記錄滿二進一進位，訓練時隨機c=a+b樣本，輸入a、b輸出c是整個lstm預測過程，訓練由a、b二進制向c各種轉(zhuǎn)換矩陣和權(quán)重，神經(jīng)網(wǎng)絡(luò)。

int2binary = {}

聲明詞典，由整型數(shù)字轉(zhuǎn)成二進制，存起來不用隨時計算，提前存好讀取更快。

binary_dim = 8

largest_number = pow(2,binary_dim)
聲明二進制數(shù)字維度，8，二進制能表達最大整數(shù)2^8=256，largest_number。

binary = np.unpackbits(np.array([range(largest_number)],dtype=np.uint8).T,axis=1) for i in range(largest_number):int2binary[i] = binary[i]

預先把整數(shù)到二進制轉(zhuǎn)換詞典存起來。

alpha = 0.1 input_dim = 2 hidden_dim = 16 output_dim = 1

設(shè)置參數(shù)，alpha是學習速度，input_dim是輸入層向量維度，輸入a、b兩個數(shù)，是2，hidden_dim是隱藏層向量維度，隱藏層神經(jīng)元個數(shù)，output_dim是輸出層向量維度，輸出一個c，是1維。從輸入層到隱藏層權(quán)重矩陣是216維，從隱藏層到輸出層權(quán)重矩陣是161維，隱藏層到隱藏層權(quán)重矩陣是16*16維：

synapse_0 = 2*np.random.random((input_dim,hidden_dim)) - 1 synapse_1 = 2*np.random.random((hidden_dim,output_dim)) - 1 synapse_h = 2*np.random.random((hidden_dim,hidden_dim)) - 1

2x-1，np.random.random生成從0到1之間隨機浮點數(shù)，2x-1使其取值范圍在[-1, 1]。

synapse_0_update = np.zeros_like(synapse_0) synapse_1_update = np.zeros_like(synapse_1) synapse_h_update = np.zeros_like(synapse_h)

聲明三個矩陣更新，Delta。

for j in range(10000):

進行10000次迭代。

a_int = np.random.randint(largest_number/2) a = int2binary[a_int] b_int = np.random.randint(largest_number/2) b = int2binary[b_int] c_int = a_int + b_int c = int2binary[c_int]

隨機生成樣本，包含二進制a、b、c，c=a+b，a_int、b_int、c_int分別是a、b、c對應整數(shù)格式。

d = np.zeros_like(c)

d存模型對c預測值。

overallError = 0

全局誤差，觀察模型效果。
layer_2_deltas = list()
存儲第二層(輸出層)殘差，輸出層殘差計算公式推導公式http://deeplearning.stanford.edu/wiki/index.php/%E5%8F%8D%E5%90%91%E4%BC%A0%E5%AF%BC%E7%AE%97%E6%B3%95?。

layer_1_values = list() layer_1_values.append(np.zeros(hidden_dim))

存儲第一層(隱藏層)輸出值，賦0值作為上一個時間值。

for position in range(binary_dim):

遍歷二進制每一位。

X = np.array([[a[binary_dim - position - 1],b[binary_dim - position - 1]]]) y = np.array([[c[binary_dim - position - 1]]]).T

X和y分別是樣本輸入和輸出二進制值第position位，X對于每個樣本有兩個值，分別是a和b對應第position位。把樣本拆成每個二進制位用于訓練，二進制加法存在進位標記正好適合利用LSTM長短期記憶訓練，每個樣本8個二進制位是一個時間序列。

layer_1 = sigmoid(np.dot(X,synapse_0) + np.dot(layer_1_values[-1],synapse_h))

公式Ct = sigma(W0·Xt + Wh·Ct-1)

layer_2 = sigmoid(np.dot(layer_1,synapse_1))

這里使用的公式是C2 = sigma(W1·C1)，

layer_2_error = y - layer_2

計算預測值和真實值誤差。

layer_2_deltas.append((layer_2_error)*sigmoid_output_to_derivative(layer_2))

反向傳導，計算delta，添加到數(shù)組layer_2_deltas

overallError += np.abs(layer_2_error[0])

計算累加總誤差，用于展示和觀察。

d[binary_dim - position - 1] = np.round(layer_2[0][0])

存儲預測position位輸出值。

layer_1_values.append(copy.deepcopy(layer_1))

存儲中間過程生成隱藏層值。

future_layer_1_delta = np.zeros(hidden_dim)

存儲下一個時間周期隱藏層歷史記憶值，先賦一個空值。

for position in range(binary_dim):

遍歷二進制每一位。

X = np.array([[a[position],b[position]]])

取出X值，從大位開始更新，反向傳導按時序逆著一級一級更新。

layer_1 = layer_1_values[-position-1]

取出位對應隱藏層輸出。

prev_layer_1 = layer_1_values[-position-2]

取出位對應隱藏層上一時序輸出。

layer_2_delta = layer_2_deltas[-position-1]

取出位對應輸出層delta。

layer_1_delta = (future_layer_1_delta.dot(synapse_h.T) + layer_2_delta.dot(synapse_1.T)) * sigmoid_output_to_derivative(layer_1)

神經(jīng)網(wǎng)絡(luò)反向傳導公式，加上隱藏層?值。

synapse_1_update += np.atleast_2d(layer_1).T.dot(layer_2_delta)

累加權(quán)重矩陣更新，對權(quán)重(權(quán)重矩陣)偏導等于本層輸出與下一層delta點乘。

synapse_h_update += np.atleast_2d(prev_layer_1).T.dot(layer_1_delta)

前一時序隱藏層權(quán)重矩陣更新，前一時序隱藏層輸出與本時序delta點乘。

synapse_0_update += X.T.dot(layer_1_delta)

輸入層權(quán)重矩陣更新。

future_layer_1_delta = layer_1_delta

記錄本時序隱藏層delta。

synapse_0 += synapse_0_update * alpha synapse_1 += synapse_1_update * alpha synapse_h += synapse_h_update * alpha

權(quán)重矩陣更新。

synapse_0_update *= 0 synapse_1_update *= 0 synapse_h_update *= 0

更新變量歸零。

if(j % 1000 == 0):print "Error:" + str(overallError)print "Pred:" + str(d)print "True:" + str(c)out = 0for index,x in enumerate(reversed(d)):out += x*pow(2,index)print str(a_int) + " + " + str(b_int) + " = " + str(out)print "------------"

每訓練1000個樣本輸出總誤差信息，運行時看收斂過程。
LSTM最簡單實現(xiàn)，沒有考慮偏置變量，只有兩個神經(jīng)元。

完整LSTM python實現(xiàn)。完全參照論文great intro paper實現(xiàn),代碼來源https://github.com/nicodjimenez/lstm?，作者解釋http://nicodjimenez.github.io/2014/08/08/lstm.html?，具體過程參考http://colah.github.io/posts/2015-08-Understanding-LSTMs/?圖。

import random import numpy as np import mathdef sigmoid(x):return 1. / (1 + np.exp(-x))

聲明sigmoid函數(shù)。

def rand_arr(a, b, *args):np.random.seed(0)return np.random.rand(*args) * (b - a) + a

生成隨機矩陣，取值范圍[a,b)，shape用args指定。

class LstmParam:def __init__(self, mem_cell_ct, x_dim):self.mem_cell_ct = mem_cell_ctself.x_dim = x_dimconcat_len = x_dim + mem_cell_ct# weight matricesself.wg = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)self.wi = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)self.wf = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)self.wo = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len)# bias termsself.bg = rand_arr(-0.1, 0.1, mem_cell_ct)self.bi = rand_arr(-0.1, 0.1, mem_cell_ct)self.bf = rand_arr(-0.1, 0.1, mem_cell_ct)self.bo = rand_arr(-0.1, 0.1, mem_cell_ct)# diffs (derivative of loss function w.r.t. all parameters)self.wg_diff = np.zeros((mem_cell_ct, concat_len))self.wi_diff = np.zeros((mem_cell_ct, concat_len))self.wf_diff = np.zeros((mem_cell_ct, concat_len))self.wo_diff = np.zeros((mem_cell_ct, concat_len))self.bg_diff = np.zeros(mem_cell_ct)self.bi_diff = np.zeros(mem_cell_ct)self.bf_diff = np.zeros(mem_cell_ct)self.bo_diff = np.zeros(mem_cell_ct)

LstmParam類傳遞參數(shù)，mem_cell_ct是lstm神經(jīng)元數(shù)目，x_dim是輸入數(shù)據(jù)維度，concat_len是mem_cell_ct與x_dim長度和，wg是輸入節(jié)點權(quán)重矩陣，wi是輸入門權(quán)重矩陣，wf是忘記門權(quán)重矩陣，wo是輸出門權(quán)重矩陣，bg、bi、bf、bo分別是輸入節(jié)點、輸入門、忘記門、輸出門偏置，wg_diff、wi_diff、wf_diff、wo_diff分別是輸入節(jié)點、輸入門、忘記門、輸出門權(quán)重損失，bg_diff、bi_diff、bf_diff、bo_diff分別是輸入節(jié)點、輸入門、忘記門、輸出門偏置損失，初始化按照矩陣維度初始化，損失矩陣歸零。

def apply_diff(self, lr = 1):self.wg -= lr * self.wg_diffself.wi -= lr * self.wi_diffself.wf -= lr * self.wf_diffself.wo -= lr * self.wo_diffself.bg -= lr * self.bg_diffself.bi -= lr * self.bi_diffself.bf -= lr * self.bf_diffself.bo -= lr * self.bo_diff# reset diffs to zeroself.wg_diff = np.zeros_like(self.wg)self.wi_diff = np.zeros_like(self.wi)self.wf_diff = np.zeros_like(self.wf)self.wo_diff = np.zeros_like(self.wo)self.bg_diff = np.zeros_like(self.bg)self.bi_diff = np.zeros_like(self.bi)self.bf_diff = np.zeros_like(self.bf)self.bo_diff = np.zeros_like(self.bo)

定義權(quán)重更新過程，先減損失，再把損失矩陣歸零。

class LstmState:def __init__(self, mem_cell_ct, x_dim):self.g = np.zeros(mem_cell_ct)self.i = np.zeros(mem_cell_ct)self.f = np.zeros(mem_cell_ct)self.o = np.zeros(mem_cell_ct)self.s = np.zeros(mem_cell_ct)self.h = np.zeros(mem_cell_ct)self.bottom_diff_h = np.zeros_like(self.h)self.bottom_diff_s = np.zeros_like(self.s)self.bottom_diff_x = np.zeros(x_dim)

LstmState存儲LSTM神經(jīng)元狀態(tài)，包括g、i、f、o、s、h，s是內(nèi)部狀態(tài)矩陣(記憶)，h是隱藏層神經(jīng)元輸出矩陣。

class LstmNode:def __init__(self, lstm_param, lstm_state):# store reference to parameters and to activationsself.state = lstm_stateself.param = lstm_param# non-recurrent input to nodeself.x = None# non-recurrent input concatenated with recurrent inputself.xc = None

LstmNode對應樣本輸入，x是輸入樣本x，xc是用hstack把x和遞歸輸入節(jié)點拼接矩陣（hstack是橫拼矩陣，vstack是縱拼矩陣）。

def bottom_data_is(self, x, s_prev = None, h_prev = None):# if this is the first lstm node in the networkif s_prev == None: s_prev = np.zeros_like(self.state.s)if h_prev == None: h_prev = np.zeros_like(self.state.h)# save data for use in backpropself.s_prev = s_prevself.h_prev = h_prev# concatenate x(t) and h(t-1)xc = np.hstack((x, h_prev))self.state.g = np.tanh(np.dot(self.param.wg, xc) + self.param.bg)self.state.i = sigmoid(np.dot(self.param.wi, xc) + self.param.bi)self.state.f = sigmoid(np.dot(self.param.wf, xc) + self.param.bf)self.state.o = sigmoid(np.dot(self.param.wo, xc) + self.param.bo)self.state.s = self.state.g * self.state.i + s_prev * self.state.fself.state.h = self.state.s * self.state.oself.x = xself.xc = xc

bottom和top是兩個方向，輸入樣本從底部輸入，反向傳導從頂部向底部傳導，bottom_data_is是輸入樣本過程，把x和先前輸入拼接成矩陣，用公式wx+b分別計算g、i、f、o值，激活函數(shù)tanh和sigmoid。
每個時序神經(jīng)網(wǎng)絡(luò)有四個神經(jīng)網(wǎng)絡(luò)層(激活函數(shù))，最左邊忘記門，直接生效到記憶C，第二個輸入門，依賴輸入樣本數(shù)據(jù)，按照一定“比例”影響記憶C，“比例”通過第三個層(tanh)實現(xiàn)，取值范圍是[-1,1]可以正向影響也可以負向影響，最后一個輸出門，每一時序產(chǎn)生輸出既依賴輸入樣本x和上一時序輸出，還依賴記憶C，設(shè)計模仿生物神經(jīng)元記憶功能。

def top_diff_is(self, top_diff_h, top_diff_s):# notice that top_diff_s is carried along the constant error carouselds = self.state.o * top_diff_h + top_diff_sdo = self.state.s * top_diff_hdi = self.state.g * dsdg = self.state.i * dsdf = self.s_prev * ds# diffs w.r.t. vector inside sigma / tanh functiondi_input = (1. - self.state.i) * self.state.i * didf_input = (1. - self.state.f) * self.state.f * dfdo_input = (1. - self.state.o) * self.state.o * dodg_input = (1. - self.state.g ** 2) * dg# diffs w.r.t. inputsself.param.wi_diff += np.outer(di_input, self.xc)self.param.wf_diff += np.outer(df_input, self.xc)self.param.wo_diff += np.outer(do_input, self.xc)self.param.wg_diff += np.outer(dg_input, self.xc)self.param.bi_diff += di_inputself.param.bf_diff += df_inputself.param.bo_diff += do_inputself.param.bg_diff += dg_input# compute bottom diffdxc = np.zeros_like(self.xc)dxc += np.dot(self.param.wi.T, di_input)dxc += np.dot(self.param.wf.T, df_input)dxc += np.dot(self.param.wo.T, do_input)dxc += np.dot(self.param.wg.T, dg_input)# save bottom diffsself.state.bottom_diff_s = ds * self.state.fself.state.bottom_diff_x = dxc[:self.param.x_dim]self.state.bottom_diff_h = dxc[self.param.x_dim:]

反向傳導，整個訓練過程核心。假設(shè)在t時刻lstm輸出預測值h(t)，實際輸出值是y(t)，之間差別是損失，假設(shè)損失函數(shù)為l(t) = f(h(t), y(t)) = ||h(t) - y(t)||^2，歐式距離，整體損失函數(shù)是L(t) = ∑l(t)，t從1到T，T表示整個事件序列最大長度。最終目標是用梯度下降法讓L(t)最小化，找到一個最優(yōu)權(quán)重w使得L(t)最小，當w發(fā)生微小變化L(t)不再變化，達到局部最優(yōu)，即L對w偏導梯度為0。
dL/dw表示當w發(fā)生單位變化L變化多少，dh(t)/dw表示當w發(fā)生單位變化h(t)變化多少，dL/dh(t)表示當h(t)發(fā)生單位變化時L變化多少，(dL/dh(t)) * (dh(t)/dw)表示第t時序第i個記憶單元w發(fā)生單位變化L變化多少，把所有由1到M的i和所有由1到T的t累加是整體dL/dw。

第i個記憶單元，h(t)發(fā)生單位變化，整個從1到T時序所有局部損失l的累加和，是dL/dh(t)，h(t)只影響從t到T時序局部損失l。

假設(shè)L(t)表示從t到T損失和，L(t) = ∑l(s)。

h(t)對w導數(shù)。

L(t) = l(t) + L(t+1)，dL(t)/dh(t) = dl(t)/dh(t) + dL(t+1)/dh(t)，用下一時序?qū)?shù)得出當前時序?qū)?shù)，規(guī)律推導，計算T時刻導數(shù)往前推，在T時刻，dL(T)/dh(T) = dl(T)/dh(T)。

class LstmNetwork():def __init__(self, lstm_param):self.lstm_param = lstm_paramself.lstm_node_list = []# input sequenceself.x_list = []def y_list_is(self, y_list, loss_layer):"""Updates diffs by setting target sequencewith corresponding loss layer.Will *NOT* update parameters. To update parameters,call self.lstm_param.apply_diff()"""assert len(y_list) == len(self.x_list)idx = len(self.x_list) - 1# first node only gets diffs from label ...loss = loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])# here s is not affecting loss due to h(t+1), hence we set equal to zerodiff_s = np.zeros(self.lstm_param.mem_cell_ct)self.lstm_node_list[idx].top_diff_is(diff_h, diff_s)idx -= 1### ... following nodes also get diffs from next nodes, hence we add diffs to diff_h### we also propagate error along constant error carousel using diff_swhile idx >= 0:loss += loss_layer.loss(self.lstm_node_list[idx].state.h, y_list[idx])diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])diff_h += self.lstm_node_list[idx + 1].state.bottom_diff_hdiff_s = self.lstm_node_list[idx + 1].state.bottom_diff_sself.lstm_node_list[idx].top_diff_is(diff_h, diff_s)idx -= 1return loss

diff_h(預測結(jié)果誤差發(fā)生單位變化損失L多少，dL(t)/dh(t)數(shù)值計算)，由idx從T往前遍歷到1，計算loss_layer.bottom_diff和下一個時序bottom_diff_h和作為diff_h(第一次遍歷即T不加bottom_diff_h)。
loss_layer.bottom_diff：

def bottom_diff(self, pred, label):diff = np.zeros_like(pred)diff[0] = 2 * (pred[0] - label)return diff

l(t) = f(h(t), y(t)) = ||h(t) - y(t)||^2導數(shù)l'(t) = 2 * (h(t) - y(t))
。當s(t)發(fā)生變化，L(t)變化來源s(t)影響h(t)和h(t+1)，影響L(t)。
h(t+1)不會影響l(t)。
左邊式子(dL(t)/dh(t)) * (dh(t)/ds(t))，由t+1到t來逐級反推dL(t)/ds(t)。
神經(jīng)元self.state.h = self.state.s?self.state.o，h(t) = s(t)?o(t)，dh(t)/ds(t) = o(t)，dL(t)/dh(t)是top_diff_h。

top_diff_is，Bottom means input to the layer, top means output of the layer. Caffe also uses this terminology. bottom表示神經(jīng)網(wǎng)絡(luò)層輸入，top表示神經(jīng)網(wǎng)絡(luò)層輸出，和caffe概念一致。
def top_diff_is(self, top_diff_h, top_diff_s):
top_diff_h表示當前t時序dL(t)/dh(t), top_diff_s表示t+1時序記憶單元dL(t)/ds(t)。

ds = self.state.o * top_diff_h + top_diff_sdo = self.state.s * top_diff_hdi = self.state.g * dsdg = self.state.i * dsdf = self.s_prev * ds

前綴d表達誤差L對某一項導數(shù)(directive)。
ds是在根據(jù)公式dL(t)/ds(t)計算當前t時序dL(t)/ds(t)。
do是計算dL(t)/do(t)，h(t) = s(t)?o(t)，dh(t)/do(t) = s(t)，dL(t)/do(t) = (dL(t)/dh(t))?(dh(t)/do(t)) = top_diff_h * s(t)。
di是計算dL(t)/di(t)。s(t) = f(t)?s(t-1) + i(t)?g(t)。dL(t)/di(t) = (dL(t)/ds(t))?(ds(t)/di(t)) = ds?g(t)。
dg是計算dL(t)/dg(t)，dL(t)/dg(t) = (dL(t)/ds(t))?(ds(t)/dg(t)) = ds?i(t)。
df是計算dL(t)/df(t)，dL(t)/df(t) = (dL(t)/ds(t))?(ds(t)/df(t)) = ds?s(t-1)。

di_input = (1. - self.state.i) * self.state.i * didf_input = (1. - self.state.f) * self.state.f * dfdo_input = (1. - self.state.o) * self.state.o * dodg_input = (1. - self.state.g ** 2) * dg

sigmoid函數(shù)導數(shù)，tanh函數(shù)導數(shù)。di_input，(1. - self.state.i) * self.state.i，sigmoid導數(shù)，當i神經(jīng)元輸入發(fā)生單位變化時輸出值有多大變化，再乘di表示當i神經(jīng)元輸入發(fā)生單位變化時誤差L(t)發(fā)生多大變化，dL(t)/d i_input(t)。

self.param.wi_diff += np.outer(di_input, self.xc)self.param.wf_diff += np.outer(df_input, self.xc)self.param.wo_diff += np.outer(do_input, self.xc)self.param.wg_diff += np.outer(dg_input, self.xc)self.param.bi_diff += di_inputself.param.bf_diff += df_inputself.param.bo_diff += do_inputself.param.bg_diff += dg_input

w_diff是權(quán)重矩陣誤差，b_diff是偏置誤差，用于更新。

dxc = np.zeros_like(self.xc)dxc += np.dot(self.param.wi.T, di_input)dxc += np.dot(self.param.wf.T, df_input)dxc += np.dot(self.param.wo.T, do_input)dxc += np.dot(self.param.wg.T, dg_input)

累加輸入xdiff，x在四處起作用，四處diff加和后作xdiff。

self.state.bottom_diff_s = ds * self.state.fself.state.bottom_diff_x = dxc[:self.param.x_dim]self.state.bottom_diff_h = dxc[self.param.x_dim:]

bottom_diff_s是在t-1時序上s變化和t時序上s變化時f倍關(guān)系。dxc是x和h橫向合并矩陣，分別取兩部分diff信息bottom_diff_x和bottom_diff_h。

def x_list_clear(self):self.x_list = []def x_list_add(self, x):self.x_list.append(x)if len(self.x_list) > len(self.lstm_node_list):# need to add new lstm node, create new state memlstm_state = LstmState(self.lstm_param.mem_cell_ct, self.lstm_param.x_dim)self.lstm_node_list.append(LstmNode(self.lstm_param, lstm_state))# get index of most recent x inputidx = len(self.x_list) - 1if idx == 0:# no recurrent inputs yetself.lstm_node_list[idx].bottom_data_is(x)else:s_prev = self.lstm_node_list[idx - 1].state.sh_prev = self.lstm_node_list[idx - 1].state.hself.lstm_node_list[idx].bottom_data_is(x, s_prev, h_prev)

添加訓練樣本，輸入x數(shù)據(jù)。

def example_0():# learns to repeat simple sequence from random inputsnp.random.seed(0)# parameters for input data dimension and lstm cell countmem_cell_ct = 100x_dim = 50concat_len = x_dim + mem_cell_ctlstm_param = LstmParam(mem_cell_ct, x_dim)lstm_net = LstmNetwork(lstm_param)y_list = [-0.5,0.2,0.1, -0.5]input_val_arr = [np.random.random(x_dim) for _ in y_list]for cur_iter in range(100):print "cur iter: ", cur_iterfor ind in range(len(y_list)):lstm_net.x_list_add(input_val_arr[ind])print "y_pred[%d] : %f" % (ind, lstm_net.lstm_node_list[ind].state.h[0])loss = lstm_net.y_list_is(y_list, ToyLossLayer)print "loss: ", losslstm_param.apply_diff(lr=0.1)lstm_net.x_list_clear()

初始化LstmParam，指定記憶存儲單元數(shù)為100，指定輸入樣本x維度是50。初始化LstmNetwork訓練模型，生成4組各50個隨機數(shù)，分別以[-0.5,0.2,0.1, -0.5]作為y值訓練，每次喂50個隨機數(shù)和一個y值，迭代100次。
lstm輸入一串連續(xù)質(zhì)數(shù)預估下一個質(zhì)數(shù)。小測試，生成100以內(nèi)質(zhì)數(shù)，循環(huán)拿出50個質(zhì)數(shù)序列作x，第51個質(zhì)數(shù)作y，拿出10個樣本參與訓練1w次，均方誤差由0.17973最終達到了1.05172e-06，幾乎完全正確：

import numpy as np import sysfrom lstm import LstmParam, LstmNetworkclass ToyLossLayer:"""Computes square loss with first element of hidden layer array.""" @classmethoddef loss(self, pred, label):return (pred[0] - label) ** 2 @classmethoddef bottom_diff(self, pred, label):diff = np.zeros_like(pred)diff[0] = 2 * (pred[0] - label)return diffclass Primes:def __init__(self):self.primes = list()for i in range(2, 100):is_prime = Truefor j in range(2, i-1):if i % j == 0:is_prime = Falseif is_prime:self.primes.append(i)self.primes_count = len(self.primes)def get_sample(self, x_dim, y_dim, index):result = np.zeros((x_dim+y_dim))for i in range(index, index + x_dim + y_dim):result[i-index] = self.primes[i%self.primes_count]/100.0return resultdef example_0():mem_cell_ct = 100x_dim = 50concat_len = x_dim + mem_cell_ctlstm_param = LstmParam(mem_cell_ct, x_dim)lstm_net = LstmNetwork(lstm_param)primes = Primes()x_list = []y_list = []for i in range(0, 10):sample = primes.get_sample(x_dim, 1, i)x = sample[0:x_dim]y = sample[x_dim:x_dim+1].tolist()[0]x_list.append(x)y_list.append(y)for cur_iter in range(10000):if cur_iter % 1000 == 0:print "y_list=", y_listfor ind in range(len(y_list)):lstm_net.x_list_add(x_list[ind])if cur_iter % 1000 == 0:print "y_pred[%d] : %f" % (ind, lstm_net.lstm_node_list[ind].state.h[0])loss = lstm_net.y_list_is(y_list, ToyLossLayer)if cur_iter % 1000 == 0:print "loss: ", losslstm_param.apply_diff(lr=0.01)lstm_net.x_list_clear()if __name__ == "__main__":example_0()

質(zhì)數(shù)列表全都除以100，這個代碼訓練數(shù)據(jù)必須是小于1數(shù)值。

torch是深度學習框架。1）tensorflow，谷歌主推，時下最火，小型試驗和大型計算都可以，基于python，缺點是上手相對較難，速度一般；2）torch，facebook主推，用于小型試驗，開源應用較多，基于lua，上手較快，網(wǎng)上文檔較全，缺點是lua語言相對冷門；3）mxnet，Amazon主推，主要用于大型計算，基于python和R，缺點是網(wǎng)上開源項目較少；4）caffe，facebook主推，用于大型計算，基于c++、python，缺點是開發(fā)不是很方便；5）theano，速度一般，基于python，評價很好。

torch github上lstm實現(xiàn)項目比較多。

在mac上安裝torch。https://github.com/torch/torch7/wiki/Cheatsheet#installing-and-running-torch?。

git clone https://github.com/torch/distro.git ~/torch --recursive cd ~/torch; bash install-deps; ./install.sh

qt安裝不成功問題，自己單獨安裝。

brew install cartr/qt4/qt

安裝后需要手工加到~/.bash_profile中。

. ~/torch/install/bin/torch-activate

source ~/.bash_profile后執(zhí)行th使用torch。
安裝itorch，安裝依賴

brew install zeromq brew install openssl luarocks install luacrypto OPENSSL_DIR=/usr/local/opt/openssl/git clone https://github.com/facebook/iTorch.git cd iTorch luarocks make

用卷積神經(jīng)網(wǎng)絡(luò)實現(xiàn)圖像識別。
創(chuàng)建pattern_recognition.lua：

require 'nn' require 'paths' if (not paths.filep("cifar10torchsmall.zip")) thenos.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')os.execute('unzip cifar10torchsmall.zip') end trainset = torch.load('cifar10-train.t7') testset = torch.load('cifar10-test.t7') classes = {'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'} setmetatable(trainset, {__index = function(t, i)return {t.data[i], t.label[i]} end} ); trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor.function trainset:size()return self.data:size(1) end mean = {} -- store the mean, to normalize the test set in the future stdv = {} -- store the standard-deviation for the future for i=1,3 do -- over each image channelmean[i] = trainset.data[{ {}, {i}, {}, {} }]:mean() -- mean estimationprint('Channel ' .. i .. ', Mean: ' .. mean[i])trainset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtractionstdv[i] = trainset.data[{ {}, {i}, {}, {} }]:std() -- std estimationprint('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])trainset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling end net = nn.Sequential() net:add(nn.SpatialConvolution(3, 6, 5, 5)) -- 3 input image channels, 6 output channels, 5x5 convolution kernel net:add(nn.ReLU()) -- non-linearity net:add(nn.SpatialMaxPooling(2,2,2,2)) -- A max-pooling operation that looks at 2x2 windows and finds the max. net:add(nn.SpatialConvolution(6, 16, 5, 5)) net:add(nn.ReLU()) -- non-linearity net:add(nn.SpatialMaxPooling(2,2,2,2)) net:add(nn.View(16*5*5)) -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5 net:add(nn.Linear(16*5*5, 120)) -- fully connected layer (matrix multiplication between input and weights) net:add(nn.ReLU()) -- non-linearity net:add(nn.Linear(120, 84)) net:add(nn.ReLU()) -- non-linearity net:add(nn.Linear(84, 10)) -- 10 is the number of outputs of the network (in this case, 10 digits) net:add(nn.LogSoftMax()) -- converts the output to a log-probability. Useful for classification problems criterion = nn.ClassNLLCriterion() trainer = nn.StochasticGradient(net, criterion) trainer.learningRate = 0.001 trainer.maxIteration = 5 trainer:train(trainset) testset.data = testset.data:double() -- convert from Byte tensor to Double tensor for i=1,3 do -- over each image channeltestset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtractiontestset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling end predicted = net:forward(testset.data[100]) print(classes[testset.label[100]]) print(predicted:exp()) for i=1,predicted:size(1) doprint(classes[i], predicted[i]) end correct = 0 for i=1,10000 dolocal groundtruth = testset.label[i]local prediction = net:forward(testset.data[i])local confidences, indices = torch.sort(prediction, true) -- true means sort in descending orderif groundtruth == indices[1] thencorrect = correct + 1end endprint(correct, 100*correct/10000 .. ' % ') class_performance = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0} for i=1,10000 dolocal groundtruth = testset.label[i]local prediction = net:forward(testset.data[i])local confidences, indices = torch.sort(prediction, true) -- true means sort in descending orderif groundtruth == indices[1] thenclass_performance[groundtruth] = class_performance[groundtruth] + 1end endfor i=1,#classes doprint(classes[i], 100*class_performance[i]/1000 .. ' %') end

執(zhí)行th pattern_recognition.lua。

首先下載cifar10torchsmall.zip樣本，有50000張訓練用圖片，10000張測試用圖片，分別都標注，包括airplane、automobile等10種分類，對trainset綁定__index和size方法，兼容nn.Sequential使用，綁定函數(shù)看lua教程：http://tylerneylon.com/a/learn-lua/?,trainset數(shù)據(jù)正規(guī)化，數(shù)據(jù)轉(zhuǎn)成均值為1方差為1的double類型張量。初始化卷積神經(jīng)網(wǎng)絡(luò)模型，包括兩層卷積、兩層池化、一個全連接以及一個softmax層，進行訓練，學習率為0.001，迭代5次，模型訓練好后對測試機第100號圖片做預測，打印出整體正確率以及每種分類準確率。https://github.com/soumith/cvpr2015/blob/master/Deep%20Learning%20with%20Torch.ipynb?。

torch可以方便支持gpu計算，需要對代碼做修改。

比較流行的seq2seq基本都用lstm組成編碼器解碼器模型實現(xiàn)，開源實現(xiàn)大都基于one-hot embedding(沒有詞向量表達信息量大)。word2vec詞向量 seq2seq模型，只有一個lstm單元機器人。

下載《甄環(huán)傳》小說原文。上網(wǎng)隨便百度“甄環(huán)傳 txt”，下載下來，把文件轉(zhuǎn)碼成utf-8編碼，把windows回車符都替換成n，以便后續(xù)處理。

對甄環(huán)傳切詞。切詞工具word_segment.py到github下載，地址在https://github.com/warmheartli/ChatBotCourse/blob/master/word_segment.py?。

python ./word_segment.py zhenhuanzhuan.txt zhenhuanzhuan.segment

生成詞向量。用word2vec，word2vec源碼?https://github.com/warmheartli/ChatBotCourse/tree/master/word2vec?。make編譯即可執(zhí)行。

./word2vec -train ./zhenhuanzhuan.segment -output vectors.bin -cbow 1 -size 200 -window 8 -negative 25 -hs 0 -sample 1e-4 -threads 20 -binary 1 -iter 15

生成一個vectors.bin文件，基于甄環(huán)傳原文生成的詞向量文件。

訓練代碼。

# -*- coding: utf-8 -*-import sys import math import tflearn import chardet import numpy as np import structseq = []max_w = 50 float_size = 4 word_vector_dict = {}def load_vectors(input):"""從vectors.bin加載詞向量，返回一個word_vector_dict的詞典，key是詞，value是200維的向量"""print "begin load vectors"input_file = open(input, "rb")# 獲取詞表數(shù)目及向量維度words_and_size = input_file.readline()words_and_size = words_and_size.strip()words = long(words_and_size.split(' ')[0])size = long(words_and_size.split(' ')[1])print "words =", wordsprint "size =", sizefor b in range(0, words):a = 0word = ''# 讀取一個詞while True:c = input_file.read(1)word = word + cif False == c or c == ' ':breakif a < max_w and c != 'n':a = a + 1word = word.strip()vector = []for index in range(0, size):m = input_file.read(float_size)(weight,) = struct.unpack('f', m)vector.append(weight)# 將詞及其對應的向量存到dict中word_vector_dict[word.decode('utf-8')] = vectorinput_file.close()print "load vectors finish"def init_seq():"""讀取切好詞的文本文件，加載全部詞序列"""file_object = open('zhenhuanzhuan.segment', 'r')vocab_dict = {}while True:line = file_object.readline()if line:for word in line.decode('utf-8').split(' '):if word_vector_dict.has_key(word):seq.append(word_vector_dict[word])else:breakfile_object.close()def vector_sqrtlen(vector):len = 0for item in vector:len += item * itemlen = math.sqrt(len)return lendef vector_cosine(v1, v2):if len(v1) != len(v2):sys.exit(1)sqrtlen1 = vector_sqrtlen(v1)sqrtlen2 = vector_sqrtlen(v2)value = 0for item1, item2 in zip(v1, v2):value += item1 * item2return value / (sqrtlen1*sqrtlen2)def vector2word(vector):max_cos = -10000match_word = ''for word in word_vector_dict:v = word_vector_dict[word]cosine = vector_cosine(vector, v)if cosine > max_cos:max_cos = cosinematch_word = wordreturn (match_word, max_cos)def main():load_vectors("./vectors.bin")init_seq()xlist = []ylist = []test_X = None#for i in range(len(seq)-100):for i in range(10):sequence = seq[i:i+20]xlist.append(sequence)ylist.append(seq[i+20])if test_X is None:test_X = np.array(sequence)(match_word, max_cos) = vector2word(seq[i+20])print "right answer=", match_word, max_cosX = np.array(xlist)Y = np.array(ylist)net = tflearn.input_data([None, 20, 200])net = tflearn.lstm(net, 200)net = tflearn.fully_connected(net, 200, activation='linear')net = tflearn.regression(net, optimizer='sgd', learning_rate=0.1,loss='mean_square')model = tflearn.DNN(net)model.fit(X, Y, n_epoch=500, batch_size=10,snapshot_epoch=False,show_metric=True)model.save("model")predict = model.predict([test_X])#print predict#for v in test_X:# print vector2word(v)(match_word, max_cos) = vector2word(predict[0])print "predict=", match_word, max_cosmain()

load_vectors從vectors.bin加載詞向量，init_seq加載甄環(huán)傳切詞文本并存到一個序列里，vector2word求距離某向量最近詞，模型只有一個lstm單元。
經(jīng)過500個epoch訓練，均方損失降到0.33673，以0.941794432002余弦相似度預測出下一個字。

強大gpu，調(diào)整參數(shù)，整篇文章都訓練，修改代碼predict部分，不斷輸出下一個字，自動吐出甄環(huán)體。基于tflearn實現(xiàn)，tflearn官方文檔examples實現(xiàn)seq2seq直接調(diào)用tensorflow中的tensorflow/python/ops/seq2seq.py，基于one-hot embedding方法，一定沒有詞向量效果好。

本文作者：利炳根

原文鏈接

總結(jié)

以上是生活随笔為你收集整理的学习笔记CB012: LSTM 简单实现、完整实现、torch、小说训练word2vec lstm机器人的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：阿里测试环境运维及研发效率提升之道
下一篇：人工智能让边缘计算更有价值!

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

学习笔记CB012: LSTM 简单实现、完整实现、torch、小说训练word2vec lstm机器人

總結(jié)