當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Tensorflow中GRU和LSTM的权重初始化

發(fā)布時間：2025/3/15 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 Tensorflow中GRU和LSTM的权重初始化小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

GRU和LSTM權重初始化

在編寫模型的時候，有時候你希望RNN用某種特別的方式初始化RNN的權重矩陣，比如xaiver或者orthogonal，這時候呢，只需要：

12345678910

cell = LSTMCell if self.args.use_lstm else GRUCellwith tf.variable_scope(initializer=tf.orthogonal_initializer()):input = tf.nn.embedding_lookup(embedding, questions_bt)cell_fw = MultiRNNCell(cells=[cell(hidden_size) for _ in range(num_layers)])cell_bw = MultiRNNCell(cells=[cell(hidden_size) for _ in range(num_layers)])outputs, last_states = tf.nn.bidirectional_dynamic_rnn(cell_bw=cell_bw,cell_fw=cell_fw,dtype="float32",inputs=input,swap_memory=True)

那么這么寫到底是不是正確的初始化了權重呢，我們跟著bidirectional_dynamic_rnn的代碼看進去，先只看forward：

123456

with vs.variable_scope("fw") as fw_scope:output_fw, output_state_fw = dynamic_rnn(cell=cell_fw, inputs=inputs, sequence_length=sequence_length,initial_state=initial_state_fw, dtype=dtype,parallel_iterations=parallel_iterations, swap_memory=swap_memory,time_major=time_major, scope=fw_scope)

發(fā)現(xiàn)它增加了一個variable_scope叫做fw_scope，繼續(xù)看dynamic_rnn發(fā)現(xiàn)這個scope只用在了緩存管理中，而dynamic_rnn實際調(diào)用了下面的內(nèi)容：

12345678

(outputs, final_state) = _dynamic_rnn_loop(cell,inputs,state,parallel_iterations=parallel_iterations,swap_memory=swap_memory,sequence_length=sequence_length,dtype=dtype)

總之，調(diào)用來調(diào)用去，最后調(diào)用到了一個語句：

1	call_cell = lambda: cell(input_t, state)

好，最后都調(diào)用了GRUCell或者LSTMCell的__call__()方法，我們順著看進去，比如GRU的__call__()長下面這個樣子：

12345678910111213141516

def __call__(self, inputs, state, scope=None): """Gated recurrent unit (GRU) with nunits cells.""" with _checked_scope(self, scope or "gru_cell", reuse=self._reuse): with vs.variable_scope("gates"): # Reset gate and update gate. # We start with bias of 1.0 to not reset and not update.value = sigmoid(_linear([inputs, state], 2 * self._num_units, True, 1.0))r, u = array_ops.split(value=value,num_or_size_splits=2,axis=1) with vs.variable_scope("candidate"):c = self._activation(_linear([inputs, r * state],self._num_units, True))new_h = u * state + (1 - u) * c return new_h, new_h

咦？怎么沒有權重和偏置呢？好像__init__()方法里也沒有，看到這個_linear()了吧，其實所有的權重都在這個方法里面（LSTMCell也一樣），這個方法中有玄機了：

12345678910

with vs.variable_scope(scope) as outer_scope:weights = vs.get_variable(_WEIGHTS_VARIABLE_NAME, [total_arg_size, output_size], dtype=dtype)# ....some code with vs.variable_scope(outer_scope) as inner_scope:inner_scope.set_partitioner(None)biases = vs.get_variable(_BIAS_VARIABLE_NAME, [output_size],dtype=dtype,initializer=init_ops.constant_initializer(bias_start, dtype=dtype))

所以，這個方法里面，就是又增加了一個variable_scope，然后調(diào)用get_variable()方法獲取權重和偏置。所以，我們的variable_scope里面嵌套了若干層variable_scope后，我們定義的初始化方法還有沒有用呢，實驗一下吧：

好的，經(jīng)過我們的測試，嵌套的variable_scope如果內(nèi)層沒有初始化方法，那么以外層的為準。所以我們的結論呼之欲出：

RNN的兩個變種在Tensorflow版本1.1.0的實現(xiàn)，只需要調(diào)用它們時在variable_scope加上初始化方法，它們的權重就會以該方式初始化；

但是無論是LSTM還是GRU，都沒有提供偏置的初始化方法（不過好像可以定義初始值）。

原文地址：　 http://cairohy.github.io/2017/05/05/ml-coding-summarize/Tensorflow%E4%B8%ADGRU%E5%92%8CLSTM%E7%9A%84%E6%9D%83%E9%87%8D%E5%88%9D%E5%A7%8B%E5%8C%96/

總結

以上是生活随笔為你收集整理的Tensorflow中GRU和LSTM的权重初始化的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：反卷积在神经网络可视化上的成功应用
下一篇： Autoencoder 详解