當前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

笔记2深度学习梯度和梯度法

發布時間：2024/9/30 pytorch 111 豆豆

生活随笔收集整理的這篇文章主要介紹了笔记2深度学习梯度和梯度法小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

enumerate() 函數用于將一個可遍歷的數據對象(如列表、元組或字符串)組合為一個索引序列，同時列出數據和數據下標，一般用在 for 循環當中。

函數實現

對于一個函數f(x0,x1)=x0的平方+x1的平方
偏導數可以這樣實現：

def function_2(x):return x[0]**2 + x[1]**2#或者return np.sum(x**2)+

梯度可以這樣實現:

def _numerical_gradient_no_batch(f, x):h = 1e-4 # 0.0001grad = np.zeros_like(x)for idx in range(x.size):tmp_val = x[idx]x[idx] = float(tmp_val) + hfxh1 = f(x) # f(x+h)x[idx] = tmp_val - h fxh2 = f(x) # f(x-h)grad[idx] = (fxh1 - fxh2) / (2*h)x[idx] = tmp_val # 還原值return grad

def numerical_gradient(f, X):參數f為函數，x為NumPy數組，該函數對NumPy數組X的各個元素求數值微分
現在用它來求（3，4）處的梯度:

numerical_gradient(function_2,np.array([3.0,4.0])

梯度指示的方向是各點處函數值減小最多的方向。

梯度法

機器學習主要任務是在學習時尋找最優參數，神經網也必須在學習時找到最優參數（權重和偏置）這里的最優參數是損失函數取最小值時的函數，通過巧妙地使用梯度來尋找函數最小值的方法就是梯度法。
函數的極小值最小值以及被稱為鞍點的地方梯度為0.
通過不斷沿著梯度方向前進，逐漸減小函數值過程就是梯度法，尋找最小值的梯度法稱為梯度下降法，尋找最大值的梯度法稱為梯度上升法。一般，神經網絡（深度學習)中，梯度法主要是梯度下降法。

python實現梯度下降法：

import numpy as np import matplotlib.pylab as pltdef _numerical_gradient_no_batch(f, x):h = 1e-4 # 0.0001grad = np.zeros_like(x)for idx in range(x.size):tmp_val = x[idx]x[idx] = float(tmp_val) + hfxh1 = f(x) # f(x+h)x[idx] = tmp_val - hfxh2 = f(x) # f(x-h)grad[idx] = (fxh1 - fxh2) / (2 * h)x[idx] = tmp_val # 還原值return graddef numerical_gradient(f, X):if X.ndim == 1: #ndim返回的是數組的維度，返回的只有一個數，該數即表示數組的維度。return _numerical_gradient_no_batch(f, X)else:grad = np.zeros_like(X)for idx, x in enumerate(X):grad[idx] = _numerical_gradient_no_batch(f, x)return graddef gradient_descent(f, init_x, lr=0.01, step_num=100):x = init_xx_history = []for i in range(step_num):x_history.append( x.copy() )grad = numerical_gradient(f, x)x -= lr * gradreturn x, np.array(x_history)def function_2(x):return x[0]**2 + x[1]**2init_x = np.array([-3.0, 4.0])lr = 0.1 step_num = 20 x, x_history = gradient_descent(function_2, init_x, lr=lr, step_num=step_num)plt.plot( [-5, 5], [0,0], '--b') plt.plot( [0,0], [-5, 5], '--b') plt.plot(x_history[:,0], x_history[:,1], 'o')plt.xlim(-3.5, 3.5) plt.ylim(-4.5, 4.5) plt.xlabel("X0") plt.ylabel("X1") plt.show()

init_x = np.array([-3.0, 4.0]) ：設置初始值為(-3,4) 最終尋找的結果很接近0
def gradient_descent(f, init_x, lr=0.01, step_num=100):
第二個參數是初始值，第三個參數是學習率，第四個是梯度法的重復次數，使用這個函數求函數的極小值，順利的話還可以求最小值
其中lr過小或者過大都會無法得到好的結果
當lr為10：
lr
當lr為1e-8:

像學習率這樣的參數成為超參數。相對于神經網絡的權重參數是通過訓練數據和學習算法自動獲得的，學習率這樣的超參數則是**人工設定的，**一般超參數需要嘗試多個值，以便找到一種可以使學習順利進行的設定。

神經網絡的梯度

這里的梯度是指損失函數關于權重參數的梯度.
神經網絡的學習的實現使用的是前面介紹過的mini-batch學習，即從訓練數據中隨機選擇一部分數據（稱為mini-batch），再以這些mini-batch為對象，使用梯度法更新參數的過程
common.functions.py:

import numpy as npdef identity_function(x):return xdef step_function(x):return np.array(x > 0, dtype=np.int)def sigmoid(x):return 1 / (1 + np.exp(-x)) def sigmoid_grad(x):return (1.0 - sigmoid(x)) * sigmoid(x)def relu(x):return np.maximum(0, x)def relu_grad(x):grad = np.zeros(x)grad[x>=0] = 1return graddef softmax(x):if x.ndim == 2:x = x.Tx = x - np.max(x, axis=0)y = np.exp(x) / np.sum(np.exp(x), axis=0)return y.T x = x - np.max(x) # 溢出對策return np.exp(x) / np.sum(np.exp(x))def mean_squared_error(y, t):return 0.5 * np.sum((y-t)**2)def cross_entropy_error(y, t):if y.ndim == 1:t = t.reshape(1, t.size)y = y.reshape(1, y.size)# 監督數據是one-hot-vector的情況下，轉換為正確解標簽的索引if t.size == y.size:t = t.argmax(axis=1)batch_size = y.shape[0]return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_sizedef softmax_loss(X, t):y = softmax(X)return cross_entropy_error(y, t)

common.gradient.py:

import numpy as npdef _numerical_gradient_1d(f, x):h = 1e-4 # 0.0001grad = np.zeros_like(x)for idx in range(x.size):tmp_val = x[idx]x[idx] = float(tmp_val) + hfxh1 = f(x) # f(x+h)x[idx] = tmp_val - h fxh2 = f(x) # f(x-h)grad[idx] = (fxh1 - fxh2) / (2*h)x[idx] = tmp_val # 還原值return graddef numerical_gradient_2d(f, X):if X.ndim == 1:return _numerical_gradient_1d(f, X)else:grad = np.zeros_like(X)for idx, x in enumerate(X):grad[idx] = _numerical_gradient_1d(f, x)return graddef numerical_gradient(f, x):h = 1e-4 # 0.0001grad = np.zeros_like(x)it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])while not it.finished:idx = it.multi_indextmp_val = x[idx]x[idx] = float(tmp_val) + hfxh1 = f(x) # f(x+h)x[idx] = tmp_val - h fxh2 = f(x) # f(x-h)grad[idx] = (fxh1 - fxh2) / (2*h)x[idx] = tmp_val # 還原值it.iternext() return grad

enumerate函數:

list(enumerate(seasons))
[(0, ‘Spring’), (1, ‘Summer’), (2, ‘Fall’), (3, ‘Winter’)]

實現一個二層神經網絡的類:

from common.functions import * from common.gradient import numerical_gradientclass TwoLayerNet:def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01):# 初始化權重 self.params = {}self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)self.params['b1'] = np.zeros(hidden_size)self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size)self.params['b2'] = np.zeros(output_size)def predict(self, x): #進行識別（推理）參數X是圖像數據W1, W2 = self.params['W1'], self.params['W2']b1, b2 = self.params['b1'], self.params['b2']a1 = np.dot(x, W1) + b1z1 = sigmoid(a1)a2 = np.dot(z1, W2) + b2y = softmax(a2)return y# x:輸入數據, t:監督數據def loss(self, x, t): #計算損失函數的值x是圖像數據，t是正確解標簽y = self.predict(x)return cross_entropy_error(y, t)def accuracy(self, x, t): #計算識別精度y = self.predict(x)y = np.argmax(y, axis=1)t = np.argmax(t, axis=1)accuracy = np.sum(y == t) / float(x.shape[0])return accuracy# x:輸入數據, t:監督數據def numerical_gradient(self, x, t): #計算權重參數的梯度loss_W = lambda W: self.loss(x, t)grads = {}grads['W1'] = numerical_gradient(loss_W, self.params['W1'])grads['b1'] = numerical_gradient(loss_W, self.params['b1'])grads['W2'] = numerical_gradient(loss_W, self.params['W2'])grads['b2'] = numerical_gradient(loss_W, self.params['b2'])return gradsdef gradient(self, x, t):#計算權重參數的梯度 numerical_gradient的高速版 W1, W2 = self.params['W1'], self.params['W2']b1, b2 = self.params['b1'], self.params['b2']grads = {}batch_num = x.shape[0]# forwarda1 = np.dot(x, W1) + b1z1 = sigmoid(a1)a2 = np.dot(z1, W2) + b2y = softmax(a2)# backwarddy = (y - t) / batch_numgrads['W2'] = np.dot(z1.T, dy)grads['b2'] = np.sum(dy, axis=0)da1 = np.dot(dy, W2.T)dz1 = sigmoid_grad(a1) * da1grads['W1'] = np.dot(x.T, dz1)grads['b1'] = np.sum(dz1, axis=0)return grads

params[‘W1’]:第一層的權重
params[‘b1’] 第一層的偏置
grads ：保存梯度的字典型變量（numerical_gradient的返回值）

總結

以上是生活随笔為你收集整理的笔记2深度学习梯度和梯度法的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： JDBC批量操作批量增加批量修改
下一篇： TensorFlow 教程——中国大学M