Tensorflow中GRU和LSTM的权重初始化
GRU和LSTM權重初始化
在編寫模型的時候,有時候你希望RNN用某種特別的方式初始化RNN的權重矩陣,比如xaiver或者orthogonal,這時候呢,只需要:
| 12345678910 | cell = LSTMCell if self.args.use_lstm else GRUCellwith tf.variable_scope(initializer=tf.orthogonal_initializer()):input = tf.nn.embedding_lookup(embedding, questions_bt)cell_fw = MultiRNNCell(cells=[cell(hidden_size) for _ in range(num_layers)])cell_bw = MultiRNNCell(cells=[cell(hidden_size) for _ in range(num_layers)])outputs, last_states = tf.nn.bidirectional_dynamic_rnn(cell_bw=cell_bw,cell_fw=cell_fw,dtype="float32",inputs=input,swap_memory=True) |
那么這么寫到底是不是正確的初始化了權重呢,我們跟著bidirectional_dynamic_rnn的代碼看進去,先只看forward:
| 123456 | with vs.variable_scope("fw") as fw_scope:output_fw, output_state_fw = dynamic_rnn(cell=cell_fw, inputs=inputs, sequence_length=sequence_length,initial_state=initial_state_fw, dtype=dtype,parallel_iterations=parallel_iterations, swap_memory=swap_memory,time_major=time_major, scope=fw_scope) |
發(fā)現(xiàn)它增加了一個variable_scope叫做fw_scope,繼續(xù)看dynamic_rnn發(fā)現(xiàn)這個scope只用在了緩存管理中,而dynamic_rnn實際調(diào)用了下面的內(nèi)容:
| 12345678 | (outputs, final_state) = _dynamic_rnn_loop(cell,inputs,state,parallel_iterations=parallel_iterations,swap_memory=swap_memory,sequence_length=sequence_length,dtype=dtype) |
總之,調(diào)用來調(diào)用去,最后調(diào)用到了一個語句:
| 1 | call_cell = lambda: cell(input_t, state) |
好,最后都調(diào)用了GRUCell或者LSTMCell的__call__()方法,我們順著看進去,比如GRU的__call__()長下面這個樣子:
| 12345678910111213141516 | def __call__(self, inputs, state, scope=None): """Gated recurrent unit (GRU) with nunits cells.""" with _checked_scope(self, scope or "gru_cell", reuse=self._reuse): with vs.variable_scope("gates"): # Reset gate and update gate. # We start with bias of 1.0 to not reset and not update.value = sigmoid(_linear([inputs, state], 2 * self._num_units, True, 1.0))r, u = array_ops.split(value=value,num_or_size_splits=2,axis=1) with vs.variable_scope("candidate"):c = self._activation(_linear([inputs, r * state],self._num_units, True))new_h = u * state + (1 - u) * c return new_h, new_h |
咦?怎么沒有權重和偏置呢?好像__init__()方法里也沒有,看到這個_linear()了吧,其實所有的權重都在這個方法里面(LSTMCell也一樣),這個方法中有玄機了:
| 12345678910 | with vs.variable_scope(scope) as outer_scope:weights = vs.get_variable(_WEIGHTS_VARIABLE_NAME, [total_arg_size, output_size], dtype=dtype)# ....some code with vs.variable_scope(outer_scope) as inner_scope:inner_scope.set_partitioner(None)biases = vs.get_variable(_BIAS_VARIABLE_NAME, [output_size],dtype=dtype,initializer=init_ops.constant_initializer(bias_start, dtype=dtype)) |
所以,這個方法里面,就是又增加了一個variable_scope,然后調(diào)用get_variable()方法獲取權重和偏置。所以,我們的variable_scope里面嵌套了若干層variable_scope后,我們定義的初始化方法還有沒有用呢,實驗一下吧:
好的,經(jīng)過我們的測試,嵌套的variable_scope如果內(nèi)層沒有初始化方法,那么以外層的為準。所以我們的結論呼之欲出:
- RNN的兩個變種在Tensorflow版本1.1.0的實現(xiàn),只需要調(diào)用它們時在variable_scope加上初始化方法,它們的權重就會以該方式初始化;
- 但是無論是LSTM還是GRU,都沒有提供偏置的初始化方法(不過好像可以定義初始值)。
總結
以上是生活随笔為你收集整理的Tensorflow中GRU和LSTM的权重初始化的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 反卷积在神经网络可视化上的成功应用
- 下一篇: Autoencoder 详解