日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

tensorflow中batch normalization的用法

發(fā)布時間:2024/7/23 编程问答 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 tensorflow中batch normalization的用法 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

?轉(zhuǎn)載網(wǎng)址:如果侵權(quán),聯(lián)系我刪除

https://www.cnblogs.com/hrlnw/p/7227447.html

https://www.cnblogs.com/eilearn/p/9780696.html

https://www.cnblogs.com/stingsl/p/6428694.html

神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)過程本質(zhì)就是為了學(xué)習(xí)數(shù)據(jù)分布,一旦訓(xùn)練數(shù)據(jù)與測試數(shù)據(jù)的分布不同,那么網(wǎng)絡(luò)的泛化能力也大大降低;另外一方面,一旦每批訓(xùn)練數(shù)據(jù)的分布各不相同(batch 梯度下降),那么網(wǎng)絡(luò)就要在每次迭代都去學(xué)習(xí)適應(yīng)不同的分布,這樣將會大大降低網(wǎng)絡(luò)的訓(xùn)練速度,這也正是為什么我們需要對數(shù)據(jù)都要做一個歸一化預(yù)處理的原因。

對于深度網(wǎng)絡(luò)的訓(xùn)練是一個復(fù)雜的過程,只要網(wǎng)絡(luò)的前面幾層發(fā)生微小的改變,那么后面幾層就會被累積放大下去。一旦網(wǎng)絡(luò)某一層的輸入數(shù)據(jù)的分布發(fā)生改變,那么這一層網(wǎng)絡(luò)就需要去適應(yīng)學(xué)習(xí)這個新的數(shù)據(jù)分布,所以如果訓(xùn)練過程中,訓(xùn)練數(shù)據(jù)的分布一直在發(fā)生變化,那么將會影響網(wǎng)絡(luò)的訓(xùn)練速度。

我們知道網(wǎng)絡(luò)一旦train起來,那么參數(shù)就要發(fā)生更新,除了輸入層的數(shù)據(jù)外(因?yàn)檩斎雽訑?shù)據(jù),我們已經(jīng)人為的為每個樣本歸一化),后面網(wǎng)絡(luò)每一層的輸入數(shù)據(jù)分布是一直在發(fā)生變化的,因?yàn)樵谟?xùn)練的時候,前面層訓(xùn)練參數(shù)的更新將導(dǎo)致后面層輸入數(shù)據(jù)分布的變化。以網(wǎng)絡(luò)第二層為例:網(wǎng)絡(luò)的第二層輸入,是由第一層的參數(shù)和input計(jì)算得到的,而第一層的參數(shù)在整個訓(xùn)練過程中一直在變化,因此必然會引起后面每一層輸入數(shù)據(jù)分布的改變。我們把網(wǎng)絡(luò)中間層在訓(xùn)練過程中,數(shù)據(jù)分布的改變稱之為:“Internal ?Covariate?Shift”。Paper所提出的算法,就是要解決在訓(xùn)練過程中,中間層數(shù)據(jù)分布發(fā)生改變的情況,于是就有了Batch??Normalization,這個牛逼算法的誕生。

1.原理

公式如下:

y=γ(x-μ)/σ+β

其中x是輸入,y是輸出,μ是均值,σ是方差,γ和β是縮放(scale)、偏移(offset)系數(shù)。

一般來講,這些參數(shù)都是基于channel來做的,比如輸入x是一個16*32*32*128(NWHC格式)的feature map,那么上述參數(shù)都是128維的向量。其中γ和β是可有可無的,有的話,就是一個可以學(xué)習(xí)的參數(shù)(參與前向后向),沒有的話,就簡化成y=(x-μ)/σ。而μ和σ,在訓(xùn)練的時候,使用的是batch內(nèi)的統(tǒng)計(jì)值,測試/預(yù)測的時候,采用的是訓(xùn)練時計(jì)算出的滑動平均值。

?

2.tensorflow中使用

tensorflow中batch normalization的實(shí)現(xiàn)主要有下面三個:

tf.nn.batch_normalization

tf.layers.batch_normalization

tf.contrib.layers.batch_norm

封裝程度逐個遞進(jìn),建議使用tf.layers.batch_normalization或tf.contrib.layers.batch_norm,因?yàn)樵趖ensorflow官網(wǎng)的解釋比較詳細(xì)。我平時多使用tf.layers.batch_normalization,因此下面的步驟都是基于這個。

?

3.訓(xùn)練

訓(xùn)練的時候需要注意兩點(diǎn),(1)輸入?yún)?shù)training=True,(2)計(jì)算loss時,要添加以下代碼(即添加update_ops到最后的train_op中)。這樣才能計(jì)算μ和σ的滑動平均(測試時會用到)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)with tf.control_dependencies(update_ops):train_op = optimizer.minimize(loss)

?

4.測試

測試時需要注意一點(diǎn),輸入?yún)?shù)training=False,其他就沒了

?

5.預(yù)測

預(yù)測時比較特別,因?yàn)檫@一步一般都是從checkpoint文件中讀取模型參數(shù),然后做預(yù)測。一般來說,保存checkpoint的時候,不會把所有模型參數(shù)都保存下來,因?yàn)橐恍o關(guān)數(shù)據(jù)會增大模型的尺寸,常見的方法是只保存那些訓(xùn)練時更新的參數(shù)(可訓(xùn)練參數(shù)),如下:

var_list = tf.trainable_variables() saver = tf.train.Saver(var_list=var_list, max_to_keep=5)

?

但使用了batch_normalization,γ和β是可訓(xùn)練參數(shù)沒錯,μ和σ不是,它們僅僅是通過滑動平均計(jì)算出的,如果按照上面的方法保存模型,在讀取模型預(yù)測時,會報錯找不到μ和σ。更詭異的是,利用tf.moving_average_variables()也沒法獲取bn層中的μ和σ(也可能是我用法不對),不過好在所有的參數(shù)都在tf.global_variables()中,因此可以這么寫:

var_list = tf.trainable_variables() g_list = tf.global_variables() bn_moving_vars = [g for g in g_list if 'moving_mean' in g.name] bn_moving_vars += [g for g in g_list if 'moving_variance' in g.name] var_list += bn_moving_vars saver = tf.train.Saver(var_list=var_list, max_to_keep=5)

按照上述寫法,即可把μ和σ保存下來,讀取模型預(yù)測時也不會報錯,當(dāng)然輸入?yún)?shù)training=False還是要的。

注意上面有個不嚴(yán)謹(jǐn)?shù)牡胤?#xff0c;因?yàn)槲业木W(wǎng)絡(luò)結(jié)構(gòu)中只有bn層包含moving_mean和moving_variance,因此只根據(jù)這兩個字符串做了過濾,如果你的網(wǎng)絡(luò)結(jié)構(gòu)中其他層也有這兩個參數(shù),但你不需要保存,建議使用諸如bn/moving_mean的字符串進(jìn)行過濾。

?

2018.4.22更新

提供一個基于mnist的示例,供大家參考。包含兩個文件,分別用于train/test。注意bn_train.py文件的51-61行,僅保存了網(wǎng)絡(luò)中的可訓(xùn)練變量和bn層利用統(tǒng)計(jì)得到的mean和var。注意示例中需要下載mnist數(shù)據(jù)集,要保持電腦可以聯(lián)網(wǎng)。

import tensorflow as tf import os from tensorflow.examples.tutorials.mnist import input_datatf.logging.set_verbosity(tf.logging.INFO)if __name__ == '__main__':mnist = input_data.read_data_sets('mnist', one_hot=True)x = tf.placeholder(tf.float32, [None, 784])y_ = tf.placeholder(tf.float32, [None, 10])image = tf.reshape(x, [-1, 28, 28, 1])conv1 = tf.layers.conv2d(image, filters=32, kernel_size=[3, 3], strides=[1, 1], padding='same',activation=tf.nn.relu,kernel_initializer=tf.truncated_normal_initializer(stddev=0.1),name='conv1')bn1 = tf.layers.batch_normalization(conv1, training=True, name='bn1')pool1 = tf.layers.max_pooling2d(bn1, pool_size=[2, 2], strides=[2, 2], padding='same', name='pool1')conv2 = tf.layers.conv2d(pool1, filters=64, kernel_size=[3, 3], strides=[1, 1], padding='same',activation=tf.nn.relu,kernel_initializer=tf.truncated_normal_initializer(stddev=0.1),name='conv2')bn2 = tf.layers.batch_normalization(conv2, training=True, name='bn2')pool2 = tf.layers.max_pooling2d(bn2, pool_size=[2, 2], strides=[2, 2], padding='same', name='pool2')flatten_layer = tf.contrib.layers.flatten(pool2, 'flatten_layer')weights = tf.get_variable(shape=[flatten_layer.shape[-1], 10], dtype=tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.1), name='fc_weights')biases = tf.get_variable(shape=[10], dtype=tf.float32,initializer=tf.constant_initializer(0.0), name='fc_biases')logit_output = tf.nn.bias_add(tf.matmul(flatten_layer, weights), biases, name='logit_output')cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=logit_output))pred_label = tf.argmax(logit_output, 1)label = tf.argmax(y_, 1)accuracy = tf.reduce_mean(tf.cast(tf.equal(pred_label, label), tf.float32))update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)global_step = tf.get_variable('global_step', [], dtype=tf.int32,initializer=tf.constant_initializer(0), trainable=False)learning_rate = tf.train.exponential_decay(learning_rate=0.1, global_step=global_step, decay_steps=5000,decay_rate=0.1, staircase=True)opt = tf.train.AdadeltaOptimizer(learning_rate=learning_rate, name='optimizer')with tf.control_dependencies(update_ops):grads = opt.compute_gradients(cross_entropy)train_op = opt.apply_gradients(grads, global_step=global_step)tf_config = tf.ConfigProto()tf_config.gpu_options.allow_growth = Truetf_config.allow_soft_placement = Truesess = tf.InteractiveSession(config=tf_config)sess.run(tf.global_variables_initializer())# only save trainable and bn variablesvar_list = tf.trainable_variables()if global_step is not None:var_list.append(global_step)g_list = tf.global_variables()bn_moving_vars = [g for g in g_list if 'moving_mean' in g.name]bn_moving_vars += [g for g in g_list if 'moving_variance' in g.name]var_list += bn_moving_varssaver = tf.train.Saver(var_list=var_list,max_to_keep=5)# save all variables# saver = tf.train.Saver(max_to_keep=5)if tf.train.latest_checkpoint('ckpts') is not None:saver.restore(sess, tf.train.latest_checkpoint('ckpts'))train_loops = 10000for i in range(train_loops):batch_xs, batch_ys = mnist.train.next_batch(32)_, step, loss, acc = sess.run([train_op, global_step, cross_entropy, accuracy],feed_dict={x: batch_xs, y_: batch_ys})if step % 100 == 0: ?# print training infolog_str = 'step:%d \t loss:%.6f \t acc:%.6f' % (step, loss, acc)tf.logging.info(log_str)if step % 1000 == 0: ?# save current modelsave_path = os.path.join('ckpts', 'mnist-model.ckpt')saver.save(sess, save_path, global_step=step)sess.close() import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_datatf.logging.set_verbosity(tf.logging.INFO)if __name__ == '__main__':mnist = input_data.read_data_sets('mnist', one_hot=True)x = tf.placeholder(tf.float32, [None, 784])y_ = tf.placeholder(tf.float32, [None, 10])image = tf.reshape(x, [-1, 28, 28, 1])conv1 = tf.layers.conv2d(image, filters=32, kernel_size=[3, 3], strides=[1, 1], padding='same',activation=tf.nn.relu,kernel_initializer=tf.truncated_normal_initializer(stddev=0.1),name='conv1')bn1 = tf.layers.batch_normalization(conv1, training=False, name='bn1')pool1 = tf.layers.max_pooling2d(bn1, pool_size=[2, 2], strides=[2, 2], padding='same', name='pool1')conv2 = tf.layers.conv2d(pool1, filters=64, kernel_size=[3, 3], strides=[1, 1], padding='same',activation=tf.nn.relu,kernel_initializer=tf.truncated_normal_initializer(stddev=0.1),name='conv2')bn2 = tf.layers.batch_normalization(conv2, training=False, name='bn2')pool2 = tf.layers.max_pooling2d(bn2, pool_size=[2, 2], strides=[2, 2], padding='same', name='pool2')flatten_layer = tf.contrib.layers.flatten(pool2, 'flatten_layer')weights = tf.get_variable(shape=[flatten_layer.shape[-1], 10], dtype=tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.1), name='fc_weights')biases = tf.get_variable(shape=[10], dtype=tf.float32,initializer=tf.constant_initializer(0.0), name='fc_biases')logit_output = tf.nn.bias_add(tf.matmul(flatten_layer, weights), biases, name='logit_output')cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=logit_output))pred_label = tf.argmax(logit_output, 1)label = tf.argmax(y_, 1)accuracy = tf.reduce_mean(tf.cast(tf.equal(pred_label, label), tf.float32))tf_config = tf.ConfigProto()tf_config.gpu_options.allow_growth = Truetf_config.allow_soft_placement = Truesess = tf.InteractiveSession(config=tf_config)saver = tf.train.Saver()if tf.train.latest_checkpoint('ckpts') is not None:saver.restore(sess, tf.train.latest_checkpoint('ckpts'))else:assert 'can not find checkpoint folder path!'loss, acc = sess.run([cross_entropy,accuracy],feed_dict={x: mnist.test.images,y_: mnist.test.labels})log_str = 'loss:%.6f \t acc:%.6f' % (loss, acc)tf.logging.info(log_str)sess.close()

?

創(chuàng)作挑戰(zhàn)賽新人創(chuàng)作獎勵來咯,堅(jiān)持創(chuàng)作打卡瓜分現(xiàn)金大獎

總結(jié)

以上是生活随笔為你收集整理的tensorflow中batch normalization的用法的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。