當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

tensorflow中batch normalization的用法

發(fā)布時間：2024/7/23 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 tensorflow中batch normalization的用法小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

?轉(zhuǎn)載網(wǎng)址：如果侵權(quán)，聯(lián)系我刪除

https://www.cnblogs.com/hrlnw/p/7227447.html

https://www.cnblogs.com/eilearn/p/9780696.html

https://www.cnblogs.com/stingsl/p/6428694.html

神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)過程本質(zhì)就是為了學(xué)習(xí)數(shù)據(jù)分布，一旦訓(xùn)練數(shù)據(jù)與測試數(shù)據(jù)的分布不同，那么網(wǎng)絡(luò)的泛化能力也大大降低；另外一方面，一旦每批訓(xùn)練數(shù)據(jù)的分布各不相同(batch 梯度下降)，那么網(wǎng)絡(luò)就要在每次迭代都去學(xué)習(xí)適應(yīng)不同的分布，這樣將會大大降低網(wǎng)絡(luò)的訓(xùn)練速度，這也正是為什么我們需要對數(shù)據(jù)都要做一個歸一化預(yù)處理的原因。

對于深度網(wǎng)絡(luò)的訓(xùn)練是一個復(fù)雜的過程，只要網(wǎng)絡(luò)的前面幾層發(fā)生微小的改變，那么后面幾層就會被累積放大下去。一旦網(wǎng)絡(luò)某一層的輸入數(shù)據(jù)的分布發(fā)生改變，那么這一層網(wǎng)絡(luò)就需要去適應(yīng)學(xué)習(xí)這個新的數(shù)據(jù)分布，所以如果訓(xùn)練過程中，訓(xùn)練數(shù)據(jù)的分布一直在發(fā)生變化，那么將會影響網(wǎng)絡(luò)的訓(xùn)練速度。

我們知道網(wǎng)絡(luò)一旦train起來，那么參數(shù)就要發(fā)生更新，除了輸入層的數(shù)據(jù)外(因?yàn)檩斎雽訑?shù)據(jù)，我們已經(jīng)人為的為每個樣本歸一化)，后面網(wǎng)絡(luò)每一層的輸入數(shù)據(jù)分布是一直在發(fā)生變化的，因?yàn)樵谟?xùn)練的時候，前面層訓(xùn)練參數(shù)的更新將導(dǎo)致后面層輸入數(shù)據(jù)分布的變化。以網(wǎng)絡(luò)第二層為例：網(wǎng)絡(luò)的第二層輸入，是由第一層的參數(shù)和input計(jì)算得到的，而第一層的參數(shù)在整個訓(xùn)練過程中一直在變化，因此必然會引起后面每一層輸入數(shù)據(jù)分布的改變。我們把網(wǎng)絡(luò)中間層在訓(xùn)練過程中，數(shù)據(jù)分布的改變稱之為：“Internal ?Covariate?Shift”。Paper所提出的算法，就是要解決在訓(xùn)練過程中，中間層數(shù)據(jù)分布發(fā)生改變的情況，于是就有了Batch??Normalization，這個牛逼算法的誕生。

1.原理

公式如下：

y=γ(x-μ)/σ+β

其中x是輸入，y是輸出，μ是均值，σ是方差，γ和β是縮放（scale）、偏移（offset）系數(shù)。

一般來講，這些參數(shù)都是基于channel來做的，比如輸入x是一個16*32*32*128(NWHC格式)的feature map，那么上述參數(shù)都是128維的向量。其中γ和β是可有可無的，有的話，就是一個可以學(xué)習(xí)的參數(shù)（參與前向后向），沒有的話，就簡化成y=(x-μ)/σ。而μ和σ，在訓(xùn)練的時候，使用的是batch內(nèi)的統(tǒng)計(jì)值，測試/預(yù)測的時候，采用的是訓(xùn)練時計(jì)算出的滑動平均值。

2.tensorflow中使用

tensorflow中batch normalization的實(shí)現(xiàn)主要有下面三個：

tf.nn.batch_normalization

tf.layers.batch_normalization

tf.contrib.layers.batch_norm

封裝程度逐個遞進(jìn)，建議使用tf.layers.batch_normalization或tf.contrib.layers.batch_norm，因?yàn)樵趖ensorflow官網(wǎng)的解釋比較詳細(xì)。我平時多使用tf.layers.batch_normalization，因此下面的步驟都是基于這個。

3.訓(xùn)練

訓(xùn)練的時候需要注意兩點(diǎn)，(1)輸入?yún)?shù)training=True,(2)計(jì)算loss時，要添加以下代碼（即添加update_ops到最后的train_op中）。這樣才能計(jì)算μ和σ的滑動平均（測試時會用到）

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)with tf.control_dependencies(update_ops):train_op = optimizer.minimize(loss)

4.測試

測試時需要注意一點(diǎn)，輸入?yún)?shù)training=False，其他就沒了

5.預(yù)測

預(yù)測時比較特別，因?yàn)檫@一步一般都是從checkpoint文件中讀取模型參數(shù)，然后做預(yù)測。一般來說，保存checkpoint的時候，不會把所有模型參數(shù)都保存下來，因?yàn)橐恍o關(guān)數(shù)據(jù)會增大模型的尺寸，常見的方法是只保存那些訓(xùn)練時更新的參數(shù)（可訓(xùn)練參數(shù)），如下：

var_list = tf.trainable_variables() saver = tf.train.Saver(var_list=var_list, max_to_keep=5)

但使用了batch_normalization，γ和β是可訓(xùn)練參數(shù)沒錯，μ和σ不是，它們僅僅是通過滑動平均計(jì)算出的，如果按照上面的方法保存模型，在讀取模型預(yù)測時，會報錯找不到μ和σ。更詭異的是，利用tf.moving_average_variables()也沒法獲取bn層中的μ和σ（也可能是我用法不對），不過好在所有的參數(shù)都在tf.global_variables()中，因此可以這么寫：

var_list = tf.trainable_variables() g_list = tf.global_variables() bn_moving_vars = [g for g in g_list if 'moving_mean' in g.name] bn_moving_vars += [g for g in g_list if 'moving_variance' in g.name] var_list += bn_moving_vars saver = tf.train.Saver(var_list=var_list, max_to_keep=5)

按照上述寫法，即可把μ和σ保存下來，讀取模型預(yù)測時也不會報錯，當(dāng)然輸入?yún)?shù)training=False還是要的。

注意上面有個不嚴(yán)謹(jǐn)?shù)牡胤?#xff0c;因?yàn)槲业木W(wǎng)絡(luò)結(jié)構(gòu)中只有bn層包含moving_mean和moving_variance，因此只根據(jù)這兩個字符串做了過濾，如果你的網(wǎng)絡(luò)結(jié)構(gòu)中其他層也有這兩個參數(shù)，但你不需要保存，建議使用諸如bn/moving_mean的字符串進(jìn)行過濾。

2018.4.22更新

提供一個基于mnist的示例，供大家參考。包含兩個文件，分別用于train/test。注意bn_train.py文件的51-61行，僅保存了網(wǎng)絡(luò)中的可訓(xùn)練變量和bn層利用統(tǒng)計(jì)得到的mean和var。注意示例中需要下載mnist數(shù)據(jù)集，要保持電腦可以聯(lián)網(wǎng)。

import tensorflow as tf import os from tensorflow.examples.tutorials.mnist import input_datatf.logging.set_verbosity(tf.logging.INFO)if __name__ == '__main__':mnist = input_data.read_data_sets('mnist', one_hot=True)x = tf.placeholder(tf.float32, [None, 784])y_ = tf.placeholder(tf.float32, [None, 10])image = tf.reshape(x, [-1, 28, 28, 1])conv1 = tf.layers.conv2d(image, filters=32, kernel_size=[3, 3], strides=[1, 1], padding='same',activation=tf.nn.relu,kernel_initializer=tf.truncated_normal_initializer(stddev=0.1),name='conv1')bn1 = tf.layers.batch_normalization(conv1, training=True, name='bn1')pool1 = tf.layers.max_pooling2d(bn1, pool_size=[2, 2], strides=[2, 2], padding='same', name='pool1')conv2 = tf.layers.conv2d(pool1, filters=64, kernel_size=[3, 3], strides=[1, 1], padding='same',activation=tf.nn.relu,kernel_initializer=tf.truncated_normal_initializer(stddev=0.1),name='conv2')bn2 = tf.layers.batch_normalization(conv2, training=True, name='bn2')pool2 = tf.layers.max_pooling2d(bn2, pool_size=[2, 2], strides=[2, 2], padding='same', name='pool2')flatten_layer = tf.contrib.layers.flatten(pool2, 'flatten_layer')weights = tf.get_variable(shape=[flatten_layer.shape[-1], 10], dtype=tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.1), name='fc_weights')biases = tf.get_variable(shape=[10], dtype=tf.float32,initializer=tf.constant_initializer(0.0), name='fc_biases')logit_output = tf.nn.bias_add(tf.matmul(flatten_layer, weights), biases, name='logit_output')cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=logit_output))pred_label = tf.argmax(logit_output, 1)label = tf.argmax(y_, 1)accuracy = tf.reduce_mean(tf.cast(tf.equal(pred_label, label), tf.float32))update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)global_step = tf.get_variable('global_step', [], dtype=tf.int32,initializer=tf.constant_initializer(0), trainable=False)learning_rate = tf.train.exponential_decay(learning_rate=0.1, global_step=global_step, decay_steps=5000,decay_rate=0.1, staircase=True)opt = tf.train.AdadeltaOptimizer(learning_rate=learning_rate, name='optimizer')with tf.control_dependencies(update_ops):grads = opt.compute_gradients(cross_entropy)train_op = opt.apply_gradients(grads, global_step=global_step)tf_config = tf.ConfigProto()tf_config.gpu_options.allow_growth = Truetf_config.allow_soft_placement = Truesess = tf.InteractiveSession(config=tf_config)sess.run(tf.global_variables_initializer())# only save trainable and bn variablesvar_list = tf.trainable_variables()if global_step is not None:var_list.append(global_step)g_list = tf.global_variables()bn_moving_vars = [g for g in g_list if 'moving_mean' in g.name]bn_moving_vars += [g for g in g_list if 'moving_variance' in g.name]var_list += bn_moving_varssaver = tf.train.Saver(var_list=var_list,max_to_keep=5)# save all variables# saver = tf.train.Saver(max_to_keep=5)if tf.train.latest_checkpoint('ckpts') is not None:saver.restore(sess, tf.train.latest_checkpoint('ckpts'))train_loops = 10000for i in range(train_loops):batch_xs, batch_ys = mnist.train.next_batch(32)_, step, loss, acc = sess.run([train_op, global_step, cross_entropy, accuracy],feed_dict={x: batch_xs, y_: batch_ys})if step % 100 == 0: ?# print training infolog_str = 'step:%d \t loss:%.6f \t acc:%.6f' % (step, loss, acc)tf.logging.info(log_str)if step % 1000 == 0: ?# save current modelsave_path = os.path.join('ckpts', 'mnist-model.ckpt')saver.save(sess, save_path, global_step=step)sess.close() import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_datatf.logging.set_verbosity(tf.logging.INFO)if __name__ == '__main__':mnist = input_data.read_data_sets('mnist', one_hot=True)x = tf.placeholder(tf.float32, [None, 784])y_ = tf.placeholder(tf.float32, [None, 10])image = tf.reshape(x, [-1, 28, 28, 1])conv1 = tf.layers.conv2d(image, filters=32, kernel_size=[3, 3], strides=[1, 1], padding='same',activation=tf.nn.relu,kernel_initializer=tf.truncated_normal_initializer(stddev=0.1),name='conv1')bn1 = tf.layers.batch_normalization(conv1, training=False, name='bn1')pool1 = tf.layers.max_pooling2d(bn1, pool_size=[2, 2], strides=[2, 2], padding='same', name='pool1')conv2 = tf.layers.conv2d(pool1, filters=64, kernel_size=[3, 3], strides=[1, 1], padding='same',activation=tf.nn.relu,kernel_initializer=tf.truncated_normal_initializer(stddev=0.1),name='conv2')bn2 = tf.layers.batch_normalization(conv2, training=False, name='bn2')pool2 = tf.layers.max_pooling2d(bn2, pool_size=[2, 2], strides=[2, 2], padding='same', name='pool2')flatten_layer = tf.contrib.layers.flatten(pool2, 'flatten_layer')weights = tf.get_variable(shape=[flatten_layer.shape[-1], 10], dtype=tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.1), name='fc_weights')biases = tf.get_variable(shape=[10], dtype=tf.float32,initializer=tf.constant_initializer(0.0), name='fc_biases')logit_output = tf.nn.bias_add(tf.matmul(flatten_layer, weights), biases, name='logit_output')cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=logit_output))pred_label = tf.argmax(logit_output, 1)label = tf.argmax(y_, 1)accuracy = tf.reduce_mean(tf.cast(tf.equal(pred_label, label), tf.float32))tf_config = tf.ConfigProto()tf_config.gpu_options.allow_growth = Truetf_config.allow_soft_placement = Truesess = tf.InteractiveSession(config=tf_config)saver = tf.train.Saver()if tf.train.latest_checkpoint('ckpts') is not None:saver.restore(sess, tf.train.latest_checkpoint('ckpts'))else:assert 'can not find checkpoint folder path!'loss, acc = sess.run([cross_entropy,accuracy],feed_dict={x: mnist.test.images,y_: mnist.test.labels})log_str = 'loss:%.6f \t acc:%.6f' % (loss, acc)tf.logging.info(log_str)sess.close()

創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎勵來咯，堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎

總結(jié)

以上是生活随笔為你收集整理的tensorflow中batch normalization的用法的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：数据结构之Dijkstra算法
下一篇： Could not create dir