3.10 程序示例--神经网络设计-机器学习笔记-斯坦福吴恩达教授
生活随笔
收集整理的這篇文章主要介紹了
3.10 程序示例--神经网络设计-机器学习笔记-斯坦福吴恩达教授
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
神經網絡設計
在神經網絡的結構設計方面,往往遵循如下要點:
因此,對于神經網絡模塊,我們考慮如下設計:
- 設計 sigmoid 函數作為激勵函數:
g(z)=11+e?zg(z)=\frac1{1+e^{?z}}g(z)=1+e?z1?g′(z)=g(z)(1?(g(z)))g′(z)=g(z)(1?(g(z)))g′(z)=g(z)(1?(g(z)))=a(1?a)=a(1?a)=a(1?a)a=g(z)a=g(z)a=g(z)
- 設計初始化權值矩陣的函數:
- 定義參數展開和參數還原函數:
- 定義梯度校驗過程:
- 計算代價計算函數:
J(Θ)J(Θ)=?1m∑i=1m∑k=1K[yk(i)log((hΘ(x(i)))k)+(1?yk(i))log(1?(hΘ(x(i)))k)]+λ2m∑l=1L?1∑i=1sl∑j=1sl+1(Θj,i(l))2J(Θ)J(Θ)=?\frac 1m∑_{i=1}^m∑_{k=1}^K[y^{(i)}_k\ log((h_Θ(x^{(i)}))_k)+(1?y^{(i)}_k)\ log(1?(h_Θ(x^{(i)}))_k)]+\frac λ{2m}∑_{l=1}^{L?1}∑_{i=1}^{s_l}∑_{j=1}^{s_l+1}(Θ^{(l)}_{j,i})^2J(Θ)J(Θ)=?m1?i=1∑m?k=1∑K?[yk(i)??log((hΘ?(x(i)))k?)+(1?yk(i)?)?log(1?(hΘ?(x(i)))k?)]+2mλ?l=1∑L?1?i=1∑sl??j=1∑sl?+1?(Θj,i(l)?)2矩陣表示為:J(Θ)=?1m∑(YT.?log(ΘA))+log(1?ΘA).?(1?YT))矩陣表示為:J(\Theta)=-\frac1m\sum(Y^T.*\ log(\Theta A))+log(1-\Theta A).*(1-Y^T))矩陣表示為:J(Θ)=?m1?∑(YT.??log(ΘA))+log(1?ΘA).?(1?YT))
- 設計前向傳播過程:
a(1)=xa^{(1)}=xa(1)=xz(2)=Θ(1)a(1)z^{(2)}=\Theta^{(1)}a^{(1)}z(2)=Θ(1)a(1)a(2)=g(z(2))a^{(2)}=g(z^{(2)})a(2)=g(z(2))z(3)=Θ(3)a(3)z^{(3)}=\Theta^{(3)}a^{(3)}z(3)=Θ(3)a(3)a(3)=g(z(3))a^{(3)}=g(z^{(3)})a(3)=g(z(3))hΘ(x)=a(3)h_\Theta(x)=a^{(3)} hΘ?(x)=a(3)
- 設計反向傳播過程
σ(l)={a(l)?yl=L(Θ(l)σ(l+1))T.?g′(z(l))l=2,3,...,L?1\sigma^{(l)}=\begin{cases} a^{(l)}-y\quad\quad\quad\ \quad\quad\quad\quad l=L\\ (\Theta^{(l)}\sigma^{(l+1)})^T.*g^′(z^{(l)})\quad l=2,3,...,L-1 \end{cases}σ(l)={a(l)?y?l=L(Θ(l)σ(l+1))T.?g′(z(l))l=2,3,...,L?1?
Δ(l)=δ(l+1)(a(l))T\Delta^{(l)}=\delta^{(l+1)}(a^{(l)})^TΔ(l)=δ(l+1)(a(l))T
Di,j(l)={1m(Δi,j(l)+λΘi,j(l)),ifj≠01m(Δij(l)),ifj=0D^{(l)}_{i,j}=\begin{cases} \frac 1m(\Delta^{(l)}_{i,j}+\lambda\Theta^{(l)}_{i,j}), \ \ \quad\quad if\ j \ne 0\\ \frac1m(\Delta^{(l)}_{ij}),\quad\quad\quad\quad\quad\quad if j=0 \end{cases}Di,j(l)?={m1?(Δi,j(l)?+λΘi,j(l)?),??if?j?=0m1?(Δij(l)?),ifj=0?
def bp(Thetas, a, y, theLambda):"""反向傳播過程Args:a 激活值y 標簽Returns:D 權值梯度"""m = y.shape[0]layers = range(len(Thetas) + 1)layerNum = len(layers)d = range(len(layers))delta = [np.zeros(Theta.shape) for Theta in Thetas]for l in layers[::-1]:if l == 0:# 輸入層不計算誤差breakif l == layerNum - 1:# 輸出層誤差d[l] = a[l] - y.Telse:# 忽略偏置d[l] = np.multiply((Thetas[l][:,1:].T * d[l + 1]), sigmoidDerivative(a[l][1:, :]))for l in layers[0:layerNum - 1]:delta[l] = d[l + 1] * (a[l].T)D = [np.zeros(Theta.shape) for Theta in Thetas]for l in range(len(Thetas)):Theta = Thetas[l]# 偏置更新增量D[l][:, 0] = (1.0 / m) * (delta[l][0:, 0].reshape(1, -1))# 權值更新增量D[l][:, 1:] = (1.0 / m) * (delta[l][0:, 1:] +theLambda * Theta[:, 1:])return D- 獲得了梯度后,設計權值更新過程:
Θ(l)=Θ(l)+αD(l)\Theta^{(l)}=\Theta^{(l)}+\alpha D^{(l)}Θ(l)=Θ(l)+αD(l)
- 綜上,我們能得到梯度下降過程:
1.前向傳播計算各層激活值
2.反向計算權值的更新梯度
3.更新權值
- 則整個網絡的訓練過程如下:
- 默認由系統自動初始化權值矩陣
- 默認為不含有隱層的感知器神經網絡
- 默認隱層單元數為 5 個
- 默認學習率為 1
- 默認不進行正規化
- 默認誤差精度為 10?2
- 默認最大迭代次數為 50 次
在訓練之前,我們會進行一次梯度校驗來確定網絡是否正確:
def train(X, y, Thetas=None, hiddenNum=0, unitNum=5, epsilon=1, alpha=1, theLambda=0, precision=0.01, maxIters=50):"""網絡訓練Args:X 訓練樣本y 標簽集Thetas 初始化的Thetas,如果為None,由系統隨機初始化ThetashiddenNum 隱藏層數目unitNum 隱藏層的單元數epsilon 初始化權值的范圍[-epsilon, epsilon]alpha 學習率theLambda 正規化參數precision 誤差精度maxIters 最大迭代次數"""# 樣本數,特征數m, n = X.shape# 矯正標簽集y = adjustLabels(y)classNum = y.shape[1]# 初始化Thetaif Thetas is None:Thetas = initThetas(inputSize=n,hiddenNum=hiddenNum,unitNum=unitNum,classNum=classNum,epsilon=epsilon)# 先進性梯度校驗print 'Doing Gradient Checking....'checked = gradientCheck(Thetas, X, y, theLambda)if checked:for i in range(maxIters):error, Thetas = gradientDescent(Thetas, X, y, alpha=alpha, theLambda=theLambda)if error < precision:breakif error == np.inf:breakif error < precision:success = Trueelse:success = Falsereturn {'error': error,'Thetas': Thetas,'iters': i,'success': error}else:print 'Error: Gradient Cheching Failed!!!'return {'error': None,'Thetas': None,'iters': 0,'success': False}訓練結果將包含如下信息:(1)網絡的預測誤差 error;(2)各層權值矩陣 Thetas;(3)迭代次數 iters;(4)是否訓練成功 success。
- 預測函數:
完整的神經網絡模塊為:
# coding: utf-8 # neural_network/nn.py import numpy as np from scipy.optimize import minimize from scipy import statsdef sigmoid(z):"""sigmoid"""return 1 / (1 + np.exp(-z))def sigmoidDerivative(a):"""sigmoid求導"""return np.multiply(a, (1-a))def initThetas(hiddenNum, unitNum, inputSize, classNum, epsilon):"""初始化權值矩陣Args:hiddenNum 隱層數目unitNum 每個隱層的神經元數目inputSize 輸入層規模classNum 分類數目epsilon epsilonReturns:Thetas 權值矩陣序列"""hiddens = [unitNum for i in range(hiddenNum)]units = [inputSize] + hiddens + [classNum]Thetas = []for idx, unit in enumerate(units):if idx == len(units) - 1:breaknextUnit = units[idx + 1]# 考慮偏置Theta = np.random.rand(nextUnit, unit + 1) * 2 * epsilon - epsilonThetas.append(Theta)return Thetasdef computeCost(Thetas, y, theLambda, X=None, a=None):"""計算代價Args:Thetas 權值矩陣序列X 樣本y 標簽集a 各層激活值Returns:J 預測代價"""m = y.shape[0]if a is None:a = fp(Thetas, X)error = -np.sum(np.multiply(y.T,np.log(a[-1]))+np.multiply((1-y).T, np.log(1-a[-1])))# 正規化參數reg = -np.sum([np.sum(Theta[:, 1:]) for Theta in Thetas])return (1.0 / m) * error + (1.0 / (2 * m)) * theLambda * regdef gradientCheck(Thetas,X,y,theLambda):"""梯度校驗Args:Thetas 權值矩陣X 樣本y 標簽theLambda 正規化參數Returns:checked 是否檢測通過"""m, n = X.shape# 前向傳播計算各個神經元的激活值a = fp(Thetas, X)# 反向傳播計算梯度增量D = bp(Thetas, a, y, theLambda)# 計算預測代價J = computeCost(Thetas, y, theLambda, a=a)DVec = unroll(D)# 求梯度近似epsilon = 1e-4gradApprox = np.zeros(DVec.shape)ThetaVec = unroll(Thetas)shapes = [Theta.shape for Theta in Thetas]for i,item in enumerate(ThetaVec):ThetaVec[i] = item - epsilonJMinus = computeCost(roll(ThetaVec,shapes),y,theLambda,X=X)ThetaVec[i] = item + epsilonJPlus = computeCost(roll(ThetaVec,shapes),y,theLambda,X=X)gradApprox[i] = (JPlus-JMinus) / (2*epsilon)# 用歐氏距離表示近似程度diff = np.linalg.norm(gradApprox - DVec)if diff < 1e-2:return Trueelse:return Falsedef adjustLabels(y):"""校正分類標簽Args:y 標簽集Returns:yAdjusted 校正后的標簽集"""# 保證標簽對類型的標識是邏輯標識if y.shape[1] == 1:classes = set(np.ravel(y))classNum = len(classes)minClass = min(classes)if classNum > 2:yAdjusted = np.zeros((y.shape[0], classNum), np.float64)for row, label in enumerate(y):yAdjusted[row, label - minClass] = 1else:yAdjusted = np.zeros((y.shape[0], 1), np.float64)for row, label in enumerate(y):if label != minClass:yAdjusted[row, 0] = 1.0return yAdjustedreturn ydef unroll(matrixes):"""參數展開Args:matrixes 矩陣Return:vec 向量"""vec = []for matrix in matrixes:vector = matrix.reshape(1, -1)[0]vec = np.concatenate((vec, vector))return vecdef roll(vector, shapes):"""參數恢復Args:vector 向量shapes shape listReturns:matrixes 恢復的矩陣序列"""matrixes = []begin = 0for shape in shapes:end = begin + shape[0] * shape[1]matrix = vector[begin:end].reshape(shape)begin = endmatrixes.append(matrix)return matrixesdef fp(Thetas, X):"""前向反饋過程Args:Thetas 權值矩陣X 輸入樣本Returns:a 各層激活向量"""layers = range(len(Thetas) + 1)layerNum = len(layers)# 激活向量序列a = range(layerNum)# 前向傳播計算各層輸出for l in layers:if l == 0:a[l] = X.Telse:z = Thetas[l - 1] * a[l - 1]a[l] = sigmoid(z)# 除輸出層外,需要添加偏置if l != layerNum - 1:a[l] = np.concatenate((np.ones((1, a[l].shape[1])), a[l]))return adef bp(Thetas, a, y, theLambda):"""反向傳播過程Args:a 激活值y 標簽Returns:D 權值梯度"""m = y.shape[0]layers = range(len(Thetas) + 1)layerNum = len(layers)d = range(len(layers))delta = [np.zeros(Theta.shape) for Theta in Thetas]for l in layers[::-1]:if l == 0:# 輸入層不計算誤差breakif l == layerNum - 1:# 輸出層誤差d[l] = a[l] - y.Telse:# 忽略偏置d[l] = np.multiply((Thetas[l][:,1:].T * d[l + 1]), sigmoidDerivative(a[l][1:, :]))for l in layers[0:layerNum - 1]:delta[l] = d[l + 1] * (a[l].T)D = [np.zeros(Theta.shape) for Theta in Thetas]for l in range(len(Thetas)):Theta = Thetas[l]# 偏置更新增量D[l][:, 0] = (1.0 / m) * (delta[l][0:, 0].reshape(1, -1))# 權值更新增量D[l][:, 1:] = (1.0 / m) * (delta[l][0:, 1:] +theLambda * Theta[:, 1:])return Ddef updateThetas(m, Thetas, D, alpha, theLambda):"""更新權值Args:m 樣本數Thetas 各層權值矩陣D 梯度alpha 學習率theLambda 正規化參數Returns:Thetas 更新后的權值矩陣"""for l in range(len(Thetas)):Thetas[l] = Thetas[l] - alpha * D[l]return Thetasdef gradientDescent(Thetas, X, y, alpha, theLambda):"""梯度下降Args:X 樣本y 標簽alpha 學習率theLambda 正規化參數Returns:J 預測代價Thetas 更新后的各層權值矩陣"""# 樣本數,特征數m, n = X.shape# 前向傳播計算各個神經元的激活值a = fp(Thetas, X)# 反向傳播計算梯度增量D = bp(Thetas, a, y, theLambda)# 計算預測代價J = computeCost(Thetas,y,theLambda,a=a)# 更新權值Thetas = updateThetas(m, Thetas, D, alpha, theLambda)if np.isnan(J):J = np.infreturn J, Thetasdef train(X, y, Thetas=None, hiddenNum=0, unitNum=5, epsilon=1, alpha=1, theLambda=0, precision=0.01, maxIters=50):"""網絡訓練Args:X 訓練樣本y 標簽集Thetas 初始化的Thetas,如果為None,由系統隨機初始化ThetashiddenNum 隱藏層數目unitNum 隱藏層的單元數epsilon 初始化權值的范圍[-epsilon, epsilon]alpha 學習率theLambda 正規化參數precision 誤差精度maxIters 最大迭代次數"""# 樣本數,特征數m, n = X.shape# 矯正標簽集y = adjustLabels(y)classNum = y.shape[1]# 初始化Thetaif Thetas is None:Thetas = initThetas(inputSize=n,hiddenNum=hiddenNum,unitNum=unitNum,classNum=classNum,epsilon=epsilon)# 先進性梯度校驗print 'Doing Gradient Checking....'checked = gradientCheck(Thetas, X, y, theLambda)if checked:for i in range(maxIters):error, Thetas = gradientDescent(Thetas, X, y, alpha=alpha, theLambda=theLambda)if error < precision:breakif error == np.inf:breakif error < precision:success = Trueelse:success = Falsereturn {'error': error,'Thetas': Thetas,'iters': i,'success': error}else:print 'Error: Gradient Cheching Failed!!!'return {'error': None,'Thetas': None,'iters': 0,'success': False}def predict(X, Thetas):"""預測函數Args:X: 樣本Thetas: 訓練后得到的參數Return:a"""a = fp(Thetas,X)return a[-1]總結
以上是生活随笔為你收集整理的3.10 程序示例--神经网络设计-机器学习笔记-斯坦福吴恩达教授的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 3.9 神经网络解决多分类问题-机器学习
- 下一篇: 3.11 程序示例--逻辑运算-机器学习