當前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

深度学习（16）TensorFlow高阶操作五: 张量限幅

發(fā)布時間：2023/12/15 pytorch 33 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学习（16）TensorFlow高阶操作五: 张量限幅小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

深度學習（16）TensorFlow高階操作五: 張量限幅

1. clip_by_value
2. relu
3. clip_by_norm
4. Gradient clipping
5. 梯度爆炸實例以及利用clip_by_global_norm解決問題
6. 實戰(zhàn)

Outline

clip_by_value
relu
clip_by_norm
gradient clipping

1. clip_by_value

(1) tf.maximum(a, 2): 將a中比2小的數(shù)進行限幅，也就是比2小的全部變?yōu)?;
(2) tf.minimum(a, 8): 將a中比8大的數(shù)進行限幅，也就是比8大的全部變?yōu)?;
(3) tf.clip_by_value(a, 2, 8): 將a中比2小比8大的數(shù)進行限幅; 也就是比2小的全部變?yōu)?，比8大的全部變?yōu)?;

2. relu

當值小于0時，將值置位0; 當值大于0時，等于原值。

(1) tf.nn.relu(a): 將a進行relu化操作;
(2) tf.maximum(a, 0): 作用與tf.nn.relu(a)一樣;

3. clip_by_norm

如果我們將一些數(shù)值限幅在我們希望的區(qū)域內(nèi)，但是可能會導致梯度變化，就是不是我們希望看到的結(jié)果，這時我們就需要clip_by_norm()函數(shù)了，clip_by_norm的思想就是先求這個范圍的向量值，也就是二范數(shù)，將其值限制在[0~1]之間，再放大這個范圍，利用這個方法進行限幅就不會改變梯度值的大小。

(1) tf.norm(a): 求a的二范數(shù)，即: √(∑?x_i^2 );
(2) aa = tf.clip_by_norm(a, 15): 將a限制在15之間，但不改變其梯度大小，其中15就是一個new norm;

4. Gradient clipping

Gradient Exploding or vanishing
set lr=1
new_grads, total_norm = tf.clip_by_global_norm(grads, 25)

目的在于保持整體的參數(shù)梯度方向不變，例如原來的 $w_1,w_2,w_3 ]=[2,4,8]$ ，利用
clip_by_global_norm可以使 $w_1,w_2,w_3$ 同時縮小n倍，例如同時縮小2倍，就是 $w_1,w_2,w_3 ]=[1,2,4]$ ，這樣就保證了梯度的方向不會發(fā)生變化。其中25代表梯度的值不會超過25。

5. 梯度爆炸實例以及利用clip_by_global_norm解決問題

(1) Before
我們?yōu)榱搜菔咎荻缺▽W習率設置得高一點，這樣即使是簡單的MNIST數(shù)據(jù)集也會發(fā)生梯度爆炸問題。

Step0: 我們可以看到(g-x代表x的梯度):
g- $w_1$ =89.03711
g- $b_1$ =2.6179454
g- $w_2$ =118.17449
g- $b_2$ =2.1617627
g- $w_3$ =134.27968
g- $b_3$ =2.5254946
Step1:
g- $w_1$ =1143.292
g- $b_1$ =35.148225
g- $w_2$ =1279.236
g- $b_2$ =24.312374
g- $w_3$ =1185.6311
g- $b_3$ =17.80448

一般來說，梯度值在[0~20]之間我們是可以接受的，所以從第1輪訓練之后就發(fā)生了Gradient Exploding（梯度爆炸）問題;

(2) Gradient Clipping

將 $w_1,b_1,w_2,b_2,w_3,b_3$ 通過clip_by_global_norm進行同比例裁剪; 15表示梯度值不會超過15;
(3) After

可以看到: 梯度相比于沒有優(yōu)化之前好了很多，這樣我們在更新的時候是比較穩(wěn)定的。

6. 實戰(zhàn)

(1) Before

import tensorflow as tf from tensorflow import keras from tensorflow.keras import datasets, layers, optimizers import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' print(tf.__version__)(x, y), _ = datasets.mnist.load_data() x = tf.convert_to_tensor(x, dtype=tf.float32) / 50. y = tf.convert_to_tensor(y) y = tf.one_hot(y, depth=10) print('x:', x.shape, 'y:', y.shape) train_db = tf.data.Dataset.from_tensor_slices((x, y)).batch(128).repeat(30) x, y = next(iter(train_db)) print('sample:', x.shape, y.shape)# print(x[0], y[0])def main():# 784 => 512w1, b1 = tf.Variable(tf.random.truncated_normal([784, 512], stddev=0.1)), tf.Variable(tf.zeros([512]))# 512 => 256w2, b2 = tf.Variable(tf.random.truncated_normal([512, 256], stddev=0.1)), tf.Variable(tf.zeros([256]))# 256 => 10w3, b3 = tf.Variable(tf.random.truncated_normal([256, 10], stddev=0.1)), tf.Variable(tf.zeros([10]))optimizer = optimizers.SGD(lr=0.01)for step, (x, y) in enumerate(train_db):# [b, 28, 28] => [b, 784]x = tf.reshape(x, (-1, 784))with tf.GradientTape() as tape:# layer1.h1 = x @ w1 + b1h1 = tf.nn.relu(h1)# layer2h2 = h1 @ w2 + b2h2 = tf.nn.relu(h2)# outputout = h2 @ w3 + b3# out = tf.nn.relu(out)# compute loss# [b, 10] - [b, 10]loss = tf.square(y - out)# [b, 10] => [b]loss = tf.reduce_mean(loss, axis=1)# [b] => scalarloss = tf.reduce_mean(loss)# compute gradientgrads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])print('==before==')for g in grads:print(tf.norm(g))# grads, _ = tf.clip_by_global_norm(grads, 15)# print('==after==')# for g in grads:# print(tf.norm(g))# update w' = w - lr*gradoptimizer.apply_gradients(zip(grads, [w1, b1, w2, b2, w3, b3]))if step % 100 == 0:print(step, 'loss:', float(loss))if __name__ == '__main__':main()

注: 這里因為很難出現(xiàn)梯度爆炸的問題，為了實驗clip_by_global_norm方法的效率，我們將原數(shù)據(jù)集輸入除以50，以達到梯度爆炸的效果。
運行結(jié)果如下:

(2) After

import tensorflow as tf from tensorflow import keras from tensorflow.keras import datasets, layers, optimizers import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' print(tf.__version__)(x, y), _ = datasets.mnist.load_data() x = tf.convert_to_tensor(x, dtype=tf.float32) / 50. y = tf.convert_to_tensor(y) y = tf.one_hot(y, depth=10) print('x:', x.shape, 'y:', y.shape) train_db = tf.data.Dataset.from_tensor_slices((x, y)).batch(128).repeat(30) x, y = next(iter(train_db)) print('sample:', x.shape, y.shape)# print(x[0], y[0])def main():# 784 => 512w1, b1 = tf.Variable(tf.random.truncated_normal([784, 512], stddev=0.1)), tf.Variable(tf.zeros([512]))# 512 => 256w2, b2 = tf.Variable(tf.random.truncated_normal([512, 256], stddev=0.1)), tf.Variable(tf.zeros([256]))# 256 => 10w3, b3 = tf.Variable(tf.random.truncated_normal([256, 10], stddev=0.1)), tf.Variable(tf.zeros([10]))optimizer = optimizers.SGD(lr=0.01)for step, (x, y) in enumerate(train_db):# [b, 28, 28] => [b, 784]x = tf.reshape(x, (-1, 784))with tf.GradientTape() as tape:# layer1.h1 = x @ w1 + b1h1 = tf.nn.relu(h1)# layer2h2 = h1 @ w2 + b2h2 = tf.nn.relu(h2)# outputout = h2 @ w3 + b3# out = tf.nn.relu(out)# compute loss# [b, 10] - [b, 10]loss = tf.square(y - out)# [b, 10] => [b]loss = tf.reduce_mean(loss, axis=1)# [b] => scalarloss = tf.reduce_mean(loss)# compute gradientgrads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])print('==before==')for g in grads:print(tf.norm(g))grads, _ = tf.clip_by_global_norm(grads, 15)print('==after==')for g in grads:print(tf.norm(g))# update w' = w - lr*gradoptimizer.apply_gradients(zip(grads, [w1, b1, w2, b2, w3, b3]))if step % 100 == 0:print(step, 'loss:', float(loss))if __name__ == '__main__':main()

運行結(jié)果如下:

可以看到，即使迭代了幾十輪，這些參數(shù)還是能夠在正常范圍內(nèi)進行更新。
(3) 查看收斂

import tensorflow as tf from tensorflow import keras from tensorflow.keras import datasets, layers, optimizers import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' print(tf.__version__)(x, y), _ = datasets.mnist.load_data() x = tf.convert_to_tensor(x, dtype=tf.float32) / 50. y = tf.convert_to_tensor(y) y = tf.one_hot(y, depth=10) print('x:', x.shape, 'y:', y.shape) train_db = tf.data.Dataset.from_tensor_slices((x, y)).batch(128).repeat(30) x, y = next(iter(train_db)) print('sample:', x.shape, y.shape)# print(x[0], y[0])def main():# 784 => 512w1, b1 = tf.Variable(tf.random.truncated_normal([784, 512], stddev=0.1)), tf.Variable(tf.zeros([512]))# 512 => 256w2, b2 = tf.Variable(tf.random.truncated_normal([512, 256], stddev=0.1)), tf.Variable(tf.zeros([256]))# 256 => 10w3, b3 = tf.Variable(tf.random.truncated_normal([256, 10], stddev=0.1)), tf.Variable(tf.zeros([10]))optimizer = optimizers.SGD(lr=0.01)for step, (x, y) in enumerate(train_db):# [b, 28, 28] => [b, 784]x = tf.reshape(x, (-1, 784))with tf.GradientTape() as tape:# layer1.h1 = x @ w1 + b1h1 = tf.nn.relu(h1)# layer2h2 = h1 @ w2 + b2h2 = tf.nn.relu(h2)# outputout = h2 @ w3 + b3# out = tf.nn.relu(out)# compute loss# [b, 10] - [b, 10]loss = tf.square(y - out)# [b, 10] => [b]loss = tf.reduce_mean(loss, axis=1)# [b] => scalarloss = tf.reduce_mean(loss)# compute gradientgrads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])# print('==before==')# for g in grads:# print(tf.norm(g))grads, _ = tf.clip_by_global_norm(grads, 15)# print('==after==')# for g in grads:# print(tf.norm(g))# update w' = w - lr*gradoptimizer.apply_gradients(zip(grads, [w1, b1, w2, b2, w3, b3]))if step % 100 == 0:print(step, 'loss:', float(loss))if __name__ == '__main__':main()

運行結(jié)果如下:

可以看到，模型能夠很好地收斂。

參考文獻:
[1] 龍良曲:《深度學習與TensorFlow2入門實戰(zhàn)》

總結(jié)

以上是生活随笔為你收集整理的深度学习（16）TensorFlow高阶操作五: 张量限幅的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： 200万VS30万！全新雷克萨斯LX57
下一篇：深度学习（17）TensorFlow高阶