當前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

01.神经网络和深度学习 W2.神经网络基础（作业：逻辑回归图片识别）

發布時間：2024/7/5 pytorch 34 豆豆

生活随笔收集整理的這篇文章主要介紹了 01.神经网络和深度学习 W2.神经网络基础（作业：逻辑回归图片识别）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

- 編程題 1
- - 1. numpy 基本函數
  - - 1.1 編寫 sigmoid 函數
    - 1.2 編寫 sigmoid 函數的導數
    - 1.3 reshape操作
    - 1.4 標準化
    - 1.5 廣播機制
  - 2. 向量化
  - - 2.1 L1\L2損失函數
- 編程題 2. 圖片🐱識別
- - 1. 導入包
  - 2. 數據預覽
  - 3. 算法的一般結構
  - 4. 建立算法
  - - 4.1 輔助函數
    - 4.2 初始化參數
    - 4.3 前向后向傳播
    - 4.4 更新參數，梯度下降
    - 4.5 合并所有函數到Model
    - 4.6 分析
    - 4.7 用自己的照片測試模型
  - 5. 總結

選擇題測試，請參考鏈接博文

編程題 1

1. numpy 基本函數

1.1 編寫 sigmoid 函數

import mathdef basic_sigmoid(x):"""Compute sigmoid of x.Arguments:x -- A scalarReturn:s -- sigmoid(x)"""### START CODE HERE ### (≈ 1 line of code)s = 1/(1+math.pow(math.e, -x)) # or s = 1/(1+math.exp(-x))### END CODE HERE ###return s

不推薦使用 math 包，因為深度學習里很多都是向量，math 包不能對向量進行計算

### One reason why we use "numpy" instead of "math" in Deep Learning ### x = [1, 2, 3] basic_sigmoid(x) # you will see this give an error when you run it, because x is a vector. # 會報錯！ import numpy as np# example of np.exp x = np.array([1, 2, 3]) print(np.exp(x)) # result is (exp(1), exp(2), exp(3)) # [ 2.71828183 7.3890561 20.08553692] # numpy 可以對向量進行操作

使用 numpy 編寫的 sigmoid 函數

import numpy as np # this means you can access numpy functions by writing np.function() instead of numpy.function()def sigmoid(x):"""Compute the sigmoid of xArguments:x -- A scalar or numpy array of any sizeReturn:s -- sigmoid(x)"""### START CODE HERE ### (≈ 1 line of code)s = 1/(1+np.exp(-x))### END CODE HERE ###return s x = np.array([1, 2, 3]) sigmoid(x) # array([0.73105858, 0.88079708, 0.95257413])

1.2 編寫 sigmoid 函數的導數

# GRADED FUNCTION: sigmoid_derivativedef sigmoid_derivative(x):"""Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.You can store the output of the sigmoid function into variables and then use it to calculate the gradient.Arguments:x -- A scalar or numpy arrayReturn:ds -- Your computed gradient."""### START CODE HERE ### (≈ 2 lines of code)s = sigmoid(x)ds = s*(1-s)### END CODE HERE ###return ds x = np.array([1, 2, 3]) sigmoid_derivative(x) print ("sigmoid_derivative(x) = " + str(sigmoid_derivative(x))) # sigmoid_derivative(x) = [0.19661193 0.10499359 0.04517666]

1.3 reshape操作

將照片的數據展平，不想計算的維，可以置為 -1，會自動計算

# GRADED FUNCTION: image2vector def image2vector(image):"""Argument:image -- a numpy array of shape (length, height, depth)Returns:v -- a vector of shape (length*height*depth, 1)"""### START CODE HERE ### (≈ 1 line of code)v = image.reshape(-1,1)### END CODE HERE ###return v # This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values image = np.array([[[ 0.67826139, 0.29380381],[ 0.90714982, 0.52835647],[ 0.4215251 , 0.45017551]],[[ 0.92814219, 0.96677647],[ 0.85304703, 0.52351845],[ 0.19981397, 0.27417313]],[[ 0.60659855, 0.00533165],[ 0.10820313, 0.49978937],[ 0.34144279, 0.94630077]]])print ("image2vector(image) = " + str(image2vector(image))) # 輸出 image2vector(image) = [[0.67826139][0.29380381][0.90714982][0.52835647][0.4215251 ][0.45017551][0.92814219][0.96677647][0.85304703][0.52351845][0.19981397][0.27417313][0.60659855][0.00533165][0.10820313][0.49978937][0.34144279][0.94630077]]

1.4 標準化

標準化通常使得梯度下降收斂更快。

舉個例子 $\begin{bmatrix} 0 & 3 & 4 \\ 2 & 6 & 4 \\ \end{bmatrix}$
那么 $∥x∥=np.linalg.norm(x,axis=1,keepdims=True)=[556]\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix} 5 \\ \sqrt{56} \\ \end{bmatrix}$
$x_normalized=x∥x∥=[03545256656456]x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix} 0 & \frac{3}{5} & \frac{4}{5} \\ \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\ \end{bmatrix}$

# GRADED FUNCTION: normalizeRowsdef normalizeRows(x):"""Implement a function that normalizes each row of the matrix x (to have unit length).Argument:x -- A numpy matrix of shape (n, m)Returns:x -- The normalized (by row) numpy matrix. You are allowed to modify x."""### START CODE HERE ### (≈ 2 lines of code)# Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)x_norm = np.linalg.norm(x, axis=1, keepdims=True)# Divide x by its norm.x = x/x_norm### END CODE HERE ###return x x = np.array([[0, 3, 4],[1, 6, 4]]) print("normalizeRows(x) = " + str(normalizeRows(x))) # normalizeRows(x) = [[0. 0.6 0.8 ] # [0.13736056 0.82416338 0.54944226]]

1.5 廣播機制

官方文檔

對于行向量 $\in \mathbb{R}^{1\times n} \text{, } softmax(x) = softmax(\begin{bmatrix} x_1 && x_2 && ... && x_n \end{bmatrix}) = \begin{bmatrix} \frac{e^{x_1}}{\sum_{j}e^{x_j}} && \frac{e^{x_2}}{\sum_{j}e^{x_j}} && ... && \frac{e^{x_n}}{\sum_{j}e^{x_j}} \end{bmatrix}$

對于矩陣 $x$ $∈Rm×n\in \mathbb{R}^{m \times n}$
$x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$ , thus we have

$softmax\begin{bmatrix} x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\ x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn} \end{bmatrix} = \begin{bmatrix} \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\ \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}} \end{bmatrix} = \begin{pmatrix} softmax\text{(first row of x)} \\ softmax\text{(second row of x)} \\ ... \\ softmax\text{(last row of x)} \\ \end{pmatrix}$

# GRADED FUNCTION: softmaxdef softmax(x):"""Calculates the softmax for each row of the input x.Your code should work for a row vector and also for matrices of shape (n, m).Argument:x -- A numpy matrix of shape (n,m)Returns:s -- A numpy matrix equal to the softmax of x, of shape (n,m)"""### START CODE HERE ### (≈ 3 lines of code)# Apply exp() element-wise to x. Use np.exp(...).x_exp = np.exp(x)# Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).x_sum = np.sum(x_exp, axis=1, keepdims=True)# Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.s = x_exp/x_sum### END CODE HERE ###return s x = np.array([[9, 2, 5, 0, 0],[7, 5, 0, 0 ,0]]) print("softmax(x) = " + str(softmax(x))) softmax(x) = [[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-041.21052389e-04][8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-048.01252314e-04]]

2. 向量化

向量化計算更簡潔，更高效

2.1 L1\L2損失函數

$L1(y^,y)=∑i=0m∣y(i)?y^(i)∣\begin{aligned} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{aligned}$

def L1(yhat, y):"""Arguments:yhat -- vector of size m (predicted labels)y -- vector of size m (true labels)Returns:loss -- the value of the L1 loss function defined above"""### START CODE HERE ### (≈ 1 line of code)loss = np.sum(abs(yhat-y))### END CODE HERE ###return loss yhat = np.array([.9, 0.2, 0.1, .4, .9]) y = np.array([1, 0, 0, 1, 1]) print("L1 = " + str(L1(yhat,y))) # L1 = 1.1

$L2(y^,y)=∑i=0m(y(i)?y^(i))2\begin{aligned} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{aligned}$

import numpy as np
a = np.array([1, 2, 3])
np.dot(a, a)
14

# GRADED FUNCTION: L2def L2(yhat, y):"""Arguments:yhat -- vector of size m (predicted labels)y -- vector of size m (true labels)Returns:loss -- the value of the L2 loss function defined above"""### START CODE HERE ### (≈ 1 line of code)loss = np.dot(yhat-y, yhat-y)### END CODE HERE ###return loss yhat = np.array([.9, 0.2, 0.1, .4, .9]) y = np.array([1, 0, 0, 1, 1]) print("L2 = " + str(L2(yhat,y))) # L2 = 0.43

編程題 2. 圖片🐱識別

使用神經網絡識別貓

1. 導入包

import numpy as np import matplotlib.pyplot as plt import h5py import scipy from PIL import Image from scipy import ndimage from lr_utils import load_dataset%matplotlib inline

2. 數據預覽

弄清楚數據的維度
reshape 數據
標準化數據

有訓練集，標簽為 y = 1 是貓，y = 0 不是貓
有測試集，帶標簽的
每個圖片是 3 通道的

讀取數據

# Loading the data (cat/non-cat) train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

預覽圖片

# Example of a picture index = 24 plt.imshow(train_set_x_orig[index]) print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") + "' picture.") y = [1], it's a 'cat' picture.

數據大小

### START CODE HERE ### (≈ 3 lines of code) m_train = train_set_x_orig.shape[0] m_test = test_set_x_orig.shape[0] num_px = train_set_x_orig.shape[1] ### END CODE HERE ###print ("Number of training examples: m_train = " + str(m_train)) print ("Number of testing examples: m_test = " + str(m_test)) print ("Height/Width of each image: num_px = " + str(num_px)) print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)") print ("train_set_x shape: " + str(train_set_x_orig.shape)) print ("train_set_y shape: " + str(train_set_y.shape)) print ("test_set_x shape: " + str(test_set_x_orig.shape)) print ("test_set_y shape: " + str(test_set_y.shape)) Number of training examples: m_train = 209 Number of testing examples: m_test = 50 Height/Width of each image: num_px = 64 Each image is of size: (64, 64, 3) train_set_x shape: (209, 64, 64, 3) train_set_y shape: (1, 209) test_set_x shape: (50, 64, 64, 3) test_set_y shape: (1, 50)

將樣本圖片矩陣展平

# Reshape the training and test examples### START CODE HERE ### (≈ 2 lines of code) train_set_x_flatten = train_set_x_orig.reshape(m_train, -1).T test_set_x_flatten = test_set_x_orig.reshape(m_test, -1).T ### END CODE HERE ###print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape)) print ("train_set_y shape: " + str(train_set_y.shape)) print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape)) print ("test_set_y shape: " + str(test_set_y.shape)) print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0])) train_set_x_flatten shape: (12288, 209) train_set_y shape: (1, 209) test_set_x_flatten shape: (12288, 50) test_set_y shape: (1, 50) sanity check after reshaping: [17 31 56 22 33]

圖片的矩陣數值為 0 - 255，標準化數據

train_set_x = train_set_x_flatten/255. test_set_x = test_set_x_flatten/255.

3. 算法的一般結構

用神經網絡的思路，建立一個 Logistic 回歸

4. 建立算法

定義模型結構（如，輸入的特征個數）
初始化模型參數
循環迭代：

計算當前損失（前向傳播）

計算當前梯度（后向傳播）

更新參數（梯度下降）

4.1 輔助函數

sigmoid 函數

# GRADED FUNCTION: sigmoiddef sigmoid(z):"""Compute the sigmoid of zArguments:z -- A scalar or numpy array of any size.Return:s -- sigmoid(z)"""### START CODE HERE ### (≈ 1 line of code)s = 1/(1+np.exp(-z))### END CODE HERE ###return s

4.2 初始化參數

邏輯回歸的參數可以都設置為 0（神經網絡不可以）

# GRADED FUNCTION: initialize_with_zerosdef initialize_with_zeros(dim):"""This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.Argument:dim -- size of the w vector we want (or number of parameters in this case)Returns:w -- initialized vector of shape (dim, 1)b -- initialized scalar (corresponds to the bias)"""### START CODE HERE ### (≈ 1 line of code)w = np.zeros((dim, 1))b = 0### END CODE HERE ###assert(w.shape == (dim, 1))assert(isinstance(b, float) or isinstance(b, int))return w, b

4.3 前向后向傳播

前向傳播：

有 $X$ 特征
計算 $\sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$
計算損失函數： $-\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

方程：

$?J?w=1mX(A?Y)T\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T$
$?J?b=1m∑i=1m(a(i)?y(i))\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})$

# GRADED FUNCTION: propagatedef propagate(w, b, X, Y):"""Implement the cost function and its gradient for the propagation explained aboveArguments:w -- weights, a numpy array of size (num_px * num_px * 3, 1)b -- bias, a scalarX -- data of size (num_px * num_px * 3, number of examples)Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)Return:cost -- negative log-likelihood cost for logistic regressiondw -- gradient of the loss with respect to w, thus same shape as wdb -- gradient of the loss with respect to b, thus same shape as bTips:- Write your code step by step for the propagation. np.log(), np.dot()"""m = X.shape[1]# FORWARD PROPAGATION (FROM X TO COST)### START CODE HERE ### (≈ 2 lines of code)A = sigmoid(np.dot(w.T, X)+b) # compute activation# w 是列向量， A 行向量，dot 矩陣乘法cost = np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))/(-m) # compute cost# Y 行向量，* 對應位置相乘### END CODE HERE #### BACKWARD PROPAGATION (TO FIND GRAD)### START CODE HERE ### (≈ 2 lines of code)dw = np.dot(X, (A-Y).T)/mdb = np.sum(A-Y, axis=1)/m### END CODE HERE ###assert(dw.shape == w.shape)assert(db.dtype == float)cost = np.squeeze(cost)assert(cost.shape == ())grads = {"dw": dw,"db": db}return grads, cost w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]), np.array([[1,0]]) grads, cost = propagate(w, b, X, Y) print ("dw = " + str(grads["dw"])) print ("db = " + str(grads["db"])) print ("cost = " + str(cost)) dw = [[0.99993216][1.99980262]] db = [0.49993523] cost = 6.000064773192205

4.4 更新參數，梯度下降

# GRADED FUNCTION: optimizedef optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):"""This function optimizes w and b by running a gradient descent algorithmArguments:w -- weights, a numpy array of size (num_px * num_px * 3, 1)b -- bias, a scalarX -- data of shape (num_px * num_px * 3, number of examples)Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)num_iterations -- number of iterations of the optimization looplearning_rate -- learning rate of the gradient descent update ruleprint_cost -- True to print the loss every 100 stepsReturns:params -- dictionary containing the weights w and bias bgrads -- dictionary containing the gradients of the weights and bias with respect to the cost functioncosts -- list of all the costs computed during the optimization, this will be used to plot the learning curve.Tips:You basically need to write down two steps and iterate through them:1) Calculate the cost and the gradient for the current parameters. Use propagate().2) Update the parameters using gradient descent rule for w and b."""costs = []for i in range(num_iterations):# Cost and gradient calculation (≈ 1-4 lines of code)### START CODE HERE ### grads, cost = propagate(w, b, X, Y)### END CODE HERE #### Retrieve derivatives from gradsdw = grads["dw"]db = grads["db"]# update rule (≈ 2 lines of code)### START CODE HERE ###w = w - learning_rate * dwb = b - learning_rate * db### END CODE HERE #### Record the costsif i % 100 == 0:costs.append(cost)# Print the cost every 100 training examplesif print_cost and i % 100 == 0:print ("Cost after iteration %i: %f" %(i, cost))params = {"w": w,"b": b}grads = {"dw": dw,"db": db}return params, grads, costs params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)print ("w = " + str(params["w"])) print ("b = " + str(params["b"])) print ("dw = " + str(grads["dw"])) print ("db = " + str(grads["db"])) w = [[0.1124579 ][0.23106775]] b = [1.55930492] dw = [[0.90158428][1.76250842]] db = [0.43046207]

可以利用學習到的參數來進行預測

計算預測值 $Y^=A=σ(wTX+b)\hat{Y} = A = \sigma(w^T X + b)$
根據預測值進行分類，<= 0.5 標記為0，否則為1

# GRADED FUNCTION: predictdef predict(w, b, X):'''Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)Arguments:w -- weights, a numpy array of size (num_px * num_px * 3, 1)b -- bias, a scalarX -- data of size (num_px * num_px * 3, number of examples)Returns:Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X'''m = X.shape[1]Y_prediction = np.zeros((1,m))w = w.reshape(X.shape[0], 1)# Compute vector "A" predicting the probabilities of a cat being present in the picture### START CODE HERE ### (≈ 1 line of code)A = sigmoid(np.dot(w.T, X) + b)### END CODE HERE ###for i in range(A.shape[1]):# Convert probabilities A[0,i] to actual predictions p[0,i]### START CODE HERE ### (≈ 4 lines of code)Y_prediction[0][i] = 0 if A[0][i] <= 0.5 else 1### END CODE HERE ###assert(Y_prediction.shape == (1, m))return Y_prediction print ("predictions = " + str(predict(w, b, X))) predictions = [[1. 1.]]

4.5 合并所有函數到Model

# GRADED FUNCTION: modeldef model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):"""Builds the logistic regression model by calling the function you've implemented previouslyArguments:X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)num_iterations -- hyperparameter representing the number of iterations to optimize the parameterslearning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()print_cost -- Set to true to print the cost every 100 iterationsReturns:d -- dictionary containing information about the model."""### START CODE HERE #### initialize parameters with zeros (≈ 1 line of code)w, b = initialize_with_zeros(X_train.shape[0])# Gradient descent (≈ 1 line of code)parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = print_cost)# Retrieve parameters w and b from dictionary "parameters"w = parameters["w"]b = parameters["b"]# Predict test/train set examples (≈ 2 lines of code)Y_prediction_test = predict(w, b, X_test)Y_prediction_train = predict(w, b, X_train)### END CODE HERE #### Print train/test Errorsprint("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))d = {"costs": costs,"Y_prediction_test": Y_prediction_test, "Y_prediction_train" : Y_prediction_train, "w" : w, "b" : b,"learning_rate" : learning_rate,"num_iterations": num_iterations}return d d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True) Cost after iteration 0: 0.693147 Cost after iteration 100: 0.584508 Cost after iteration 200: 0.466949 Cost after iteration 300: 0.376007 Cost after iteration 400: 0.331463 Cost after iteration 500: 0.303273 Cost after iteration 600: 0.279880 Cost after iteration 700: 0.260042 Cost after iteration 800: 0.242941 Cost after iteration 900: 0.228004 Cost after iteration 1000: 0.214820 Cost after iteration 1100: 0.203078 Cost after iteration 1200: 0.192544 Cost after iteration 1300: 0.183033 Cost after iteration 1400: 0.174399 Cost after iteration 1500: 0.166521 Cost after iteration 1600: 0.159305 Cost after iteration 1700: 0.152667 Cost after iteration 1800: 0.146542 Cost after iteration 1900: 0.140872 train accuracy: 99.04306220095694 % test accuracy: 70.0 %

模型在訓練集上表現的很好，在測試集上一般，存在過擬合現象

# Example of a picture that was wrongly classified. index = 24 plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3))) print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[int(d["Y_prediction_test"][0,index])].decode("utf-8") + "\" picture.") y = 1, you predicted that it is a "cat" picture.

更改 index 可以查看測試集的預測值和真實值

繪制代價函數、梯度

# Plot learning curve (with costs) costs = np.squeeze(d['costs']) plt.plot(costs) plt.ylabel('cost') plt.xlabel('iterations (per hundreds)') plt.title("Learning rate =" + str(d["learning_rate"])) plt.show()

增加訓練迭代次數為 3000（上面是2000）

train accuracy: 99.52153110047847 % test accuracy: 68.0 %

訓練集上的準確率上升，但是測試集上準確率下降，這就是過擬合了

4.6 分析

不同學習率下的對比

learning_rates = [0.01, 0.001, 0.0001] models = {} for i in learning_rates:print ("learning rate is: " + str(i))models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)print ('\n' + "-------------------------------------------------------" + '\n')for i in learning_rates:plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))plt.ylabel('cost') plt.xlabel('iterations')legend = plt.legend(loc='upper center', shadow=True) frame = legend.get_frame() frame.set_facecolor('0.90') plt.show() learning rate is: 0.01 train accuracy: 99.52153110047847 % test accuracy: 68.0 %-------------------------------------------------------learning rate is: 0.001 train accuracy: 88.99521531100478 % test accuracy: 64.0 %-------------------------------------------------------learning rate is: 0.0001 train accuracy: 68.42105263157895 % test accuracy: 36.0 %-------------------------------------------------------

學習率太大的話，容易引起震蕩，導致不收斂（本例子0.01，不算太壞，最后收斂了）
低的cost不意味著好的模型，要檢查是否過擬合（訓練集很好，測試集很差）

4.7 用自己的照片測試模型

## START CODE HERE ## (PUT YOUR IMAGE NAME) my_image = "cat1.jpg" # change this to the name of your image file ## END CODE HERE ### We preprocess the image to fit your algorithm. fname = "images/" + my_image image = Image.open(fname) my_image = np.array(image.resize((num_px, num_px))).reshape((1, num_px*num_px*3)).T my_predicted_image = predict(d["w"], d["b"], my_image)plt.imshow(image) print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") + "\" picture.")

5. 總結

處理數據很重要，數據維度，數據標準化
各個獨立的函數，初始化，前后向傳播，梯度下降更新參數
組成模型
調節學習率等超參數

總結

以上是生活随笔為你收集整理的01.神经网络和深度学习 W2.神经网络基础（作业：逻辑回归图片识别）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：天池在线编程 LR String
下一篇：梳理百年深度学习发展史-七月在线机器学习