當(dāng)前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

深度学习（3）手写数字识别问题

發(fā)布時間：2023/12/15 pytorch 31 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学习（3）手写数字识别问题小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

深度學(xué)習(xí)（3）手寫數(shù)字識別問題

1. 問題歸類
2. 數(shù)據(jù)集
3. Image
4. Input and Output
5. Regression VS Classification
6. Computation Graph
7. 兩個問題
8. Particularly
9. 如何訓(xùn)練模型？ $→\to$ Loss
10. 總結(jié)
11. Deep Learning?
12. Classification Procedure
13. We need TensorFlow
14. Next

1. 問題歸類

Discrete Prediction（離散值的預(yù)測）

$y = w ? x + b$
[up, left, down, right]
[dog, cat, whale, bird, …]
手寫數(shù)字識別問題屬于離散值的預(yù)測。

2. 數(shù)據(jù)集

MNIST
- 7000 images per category
- train/test splitting: 60k vs 10k

3. Image

[28, 28, 1]
圖片是由28行×28列，共784個像素點組成，[0, 255]代表圖片像素的灰度值，其中0代表純白色，255代表純黑色，1代表每個像素點的灰度值，也就是每個像素點只有1個維度，就是其灰度值。
$→\to$ [784]
將28×28的數(shù)據(jù)變?yōu)橐痪S，將第二行的像素點拼接到一行后，后面26行同理，這樣一張圖片就變?yōu)榱藫碛?84個元素的一維數(shù)據(jù)。

4. Input and Output

(1) 輸入
$x : [b, 784]$
輸入是[b, 784]，b可以理解為共有多少張圖片，784表示每張圖片有784個像素點。
(2) 編碼方式

dog=0, cat=1, fish=2, …
缺點: 不確定性高，例如要是預(yù)測值為1.5，就會產(chǎn)生判斷失誤。
dog = [1, 0, 0, …]，其中“1”表示該預(yù)測值為“dog”的概率，“0”表示該預(yù)測值為“cat”的概率，…，這些概率和為1。
cat = [0, 1, 0, …]
fish = [0, 0, 1, …]

這種編碼方式被稱為one-hot編碼。

5. Regression VS Classification

(1) 模型

$y = x ? x + b$
$y∈R^d$

(2) 輸出

$o u t = X @ W + b$
$o u t : [0.1, 0.8, 0.02, 0.08]$

(3) 預(yù)測

$p r e d = a r g m a x (o u t)$
- $p r e d : 1$
- $l a b e l : 2$

6. Computation Graph

$o u t = X @ W + b$
$X : [b, 784]$
$W : [784, 10]$
$b : [10]$

7. 兩個問題

(1) It’s Linear!

$o u t = X @ W + b$
$→\to$
$o u t = f (X @ W + b)$

$o u t = r e l u (X @ W + b)$

(2) It’s too simple!

$o u t = r e l u (X @ W + b)$
$→\to$
$h_1=relu(X@W_1+b_1)$
$h_2=relu(h_1@W_2+b_2)$
$out=relu(h_2@W_3+b_3)$

8. Particularly

(1) $X:[v_1,v_2,…,v_784]$

X:[1,784]

(2) $h_1=relu(X@W_1+b_1)$

W_1:[784,512]
$→[1,784]@[784,512]+[512]=[1,512]+[512]=[1,512]\to [1,784]@[784,512]+[512]=[1,512]+[512]=[1,512]$
$b_1:[1,512]$

(3) $h_2=relu(h_1@W_2+b_2)$

W_2:[512,256]
$→[1,512]@[512,256]+[256]=[1,256]+[256]=[1,256]\to [1,512]@[512,256]+[256]=[1,256]+[256]=[1,256]$
$b_2:[1,256]$

(4) $out=relu(h_2@W_3+b_3)$

W_3:[256,10]
$→[1,256]@[256,10]+[10]=[1,10]+[10]=[1,10]\to [1,256]@[256,10]+[10]=[1,10]+[10]=[1,10]$
$b_3:[1,10]$
從以上計算過程可以看出，神經(jīng)網(wǎng)絡(luò)其實是一個降維的過程，圖片由原來的 $[1, 784]$ 降為 $[1, 512]$ ，再降為 $[1, 256]$ ，最后降為 $[1, 10]$ 。

$→[0,0,0.01,0.1,0.8,0,…]\to [0,0,0.01,0.1,0.8,0,…]$
根據(jù)以上輸出可以判斷這張圖片為“5”的概率最大，所以這張圖片的預(yù)測值為“5”。

9. 如何訓(xùn)練模型？ $→\to$ Loss

out:[1,10]
$→\to$
Y/label: 0~9
- eg.: 1 $→\to$ [0,1,0,0,0,0,0,0,0,0]
- eg.: 3 $→\to$ [0,0,0,1,0,0,0,0,0,0]

$→\to$

Euclidean Distance（歐式距離）: $\to Label$
- MSE，即 $∑(y?out)2\sum(y-out)^2$

10. 總結(jié)

$out=relu\{relu\{relu[X@W_1+b_1]@W_2+b_2\}@W_3+b_3\}$
$p r e d = a r g m a x (o u t)$
$l o s s = M S E (o u t, l a b e l)$
$m i n i m i z e l o s s$
- $W_1',b_1',W_2',b_2',W_3',b_3']$

11. Deep Learning?

We have not seen it.
But we already master it.
We will show you It’s（almost）Deep Learning!

12. Classification Procedure

Step1. Compute $h_1,h_2,out$
Step2. Compute $L o s s$
Step3. Compute gradient and update $W_1',b_1',W_2',b_2',W_3',b_3']$
Step4. Loop

13. We need TensorFlow

數(shù)據(jù)量龐大;
TensorFlow計算和處理更快。

14. Next

Step1. have fun on MNIST classification
Step2. and we learn TensorFlow
Step3. and we implement Step1. by ourselves!

參考文獻:
[1] 龍良曲:《深度學(xué)習(xí)與TensorFlow2入門實戰(zhàn)》

總結(jié)

以上是生活随笔為你收集整理的深度学习（3）手写数字识别问题的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：猜一猜以下哪种动物属于秦岭四宝？蚂蚁庄园
下一篇：深度学习（4）手写数字识别实战