41激活函數(shù)與GPU加速
sigmoid /Tanh 會出現(xiàn)梯度離散問題,就是梯度為0(導數(shù)為0)
relu 在x=0處不連續(xù),x小于0時梯度為0,x大于0梯度為1不變,利于串行的傳播,這樣就不會出現(xiàn)梯度爆炸或梯度離散的情況
relu x小于0時梯度為0,為解決這個在x小于0部分 設置了y=a*x,使得有一定的梯度a而不是0,斜角一般默認0.02的樣子
selu=relu+指數(shù)函數(shù),使得在x=0出也有平滑的曲線變得連續(xù)。
softplus, 是relu在x=0做個平滑的曲線使得其在附近連續(xù)
gpu加速
使用.to(device)方法
注意data和data.to(device)的類型是不一樣的 因為一個是cpu版本一個是gpu版本,除此之外的.to(deviece)是一樣的
激活函數(shù)改成leakyrelu且搬到gpu上加速, 比原來的relu acc從
83%變
94%# 超參數(shù)
from torchvision import datasets
, transformsbatch_size
= 200
learning_rate
= 0.01
epochs
= 10# 獲取訓練數(shù)據
train_db
= datasets
.MNIST('../data', train
=True
, download
=True
, # train
=True則得到的是訓練集transform
=transforms
.Compose([ # transform進行數(shù)據預處理transforms
.ToTensor(), # 轉成Tensor類型的數(shù)據transforms
.Normalize((0.1307,), (0.3081,)) # 進行數(shù)據標準化
(減去均值除以方差
)]))# DataLoader把訓練數(shù)據分成多個小組,此函數(shù)每次拋出一組數(shù)據。直至把所有的數(shù)據都拋出。就是做一個數(shù)據的初始化
train_loader
= torch
.utils
.data
.DataLoader(train_db
, batch_size
=batch_size
, shuffle
=True
)# 獲取測試數(shù)據
test_db
= datasets
.MNIST('../data', train
=False
,transform
=transforms
.Compose([transforms
.ToTensor(),transforms
.Normalize((0.1307,), (0.3081,))]))test_loader
= torch
.utils
.data
.DataLoader(test_db
, batch_size
=batch_size
, shuffle
=True
)class MLP(nn
.Module
):def
__init__(self
):super(MLP
,self
).__init__()self
.model
=nn
.Sequential(#sequential串聯(lián)起來nn
.Linear(784,200),nn
.LeakyReLU(inplace
=True
),nn
.Linear(200, 200),nn
.LeakyReLU(inplace
=True
),nn
.Linear(200,10),nn
.LeakyReLU(inplace
=True
),)def
forward(self
,x
):x
= self
.model(x
)return x
#Train
device
=torch
.device('cuda:0')
net
=MLP().to(device
)#網絡結構 就是foward函數(shù)
optimizer
=optim
.SGD(net
.parameters(),lr
=learning_rate
)#使用nn
.Module可以直接代替之前
[w1
,b1
,w2
,b2
.。。
]
criteon
=nn
.CrossEntropyLoss().to(device
)for epoch in
range(epochs
):for batch_ind
,(data
,target
) in
enumerate(train_loader
):data
=data
.view(-1,28*28)data
,target
=data
.to(device
),target
.to(device
) #target
.cuda()logits
=net(data
)#這不要再加softmax logits就是predloss
=criteon(logits
,target
)#求lossoptimizer
.zero_grad()loss
.backward()optimizer
.step()if batch_ind
%100==0:print('Train Epoch:{} [{}/{} ({:.0f}%)]\t Loss:{:.6f}'.format(epoch
,batch_ind
*len(data
),len(train_loader
.dataset
),100.* batch_ind
/len(train_loader
),loss
.item()))test_loss
=0
correct
=0for data
,target in test_loader
:data
=data
.view(-1,28*28)#第一個維度保持不變寫
-1data
, target
= data
.to(device
), target
.to(device
)logits
=net(data
)test_loss
+=criteon(logits
,target
).item()pred
=logits
.data
.max(1)[1]#因為correct
+=pred
.eq(target
.data
).sum()test_loss
/=len(train_loader
.dataset
)
print('\n test set:average loss:{:.4f},Accuracy:{}/{} ({:.0f}%)\n'.format(test_loss
,correct
,len(test_loader
.dataset
),100.*correct
/len(test_loader
.dataset
)
))'''
F
:\anaconda\envs\pytorch\python
.exe F
:/pythonProject1
/pythonProject3
/ll
.py
Train Epoch:
0 [0/60000 (0%)] Loss
:2.315502
Train Epoch:
0 [20000/60000 (33%)] Loss
:2.117644
Train Epoch:
0 [40000/60000 (67%)] Loss
:1.659186
Train Epoch:
1 [0/60000 (0%)] Loss
:1.290930
Train Epoch:
1 [20000/60000 (33%)] Loss
:1.049087
Train Epoch:
1 [40000/60000 (67%)] Loss
:0.872082
Train Epoch:
2 [0/60000 (0%)] Loss
:0.528612
Train Epoch:
2 [20000/60000 (33%)] Loss
:0.402818
Train Epoch:
2 [40000/60000 (67%)] Loss
:0.400452
Train Epoch:
3 [0/60000 (0%)] Loss
:0.318432
Train Epoch:
3 [20000/60000 (33%)] Loss
:0.344411
Train Epoch:
3 [40000/60000 (67%)] Loss
:0.443066
Train Epoch:
4 [0/60000 (0%)] Loss
:0.310835
Train Epoch:
4 [20000/60000 (33%)] Loss
:0.263893
Train Epoch:
4 [40000/60000 (67%)] Loss
:0.292117
Train Epoch:
5 [0/60000 (0%)] Loss
:0.331171
Train Epoch:
5 [20000/60000 (33%)] Loss
:0.192741
Train Epoch:
5 [40000/60000 (67%)] Loss
:0.396357
Train Epoch:
6 [0/60000 (0%)] Loss
:0.363707
Train Epoch:
6 [20000/60000 (33%)] Loss
:0.225204
Train Epoch:
6 [40000/60000 (67%)] Loss
:0.218652
Train Epoch:
7 [0/60000 (0%)] Loss
:0.209941
Train Epoch:
7 [20000/60000 (33%)] Loss
:0.210056
Train Epoch:
7 [40000/60000 (67%)] Loss
:0.296629
Train Epoch:
8 [0/60000 (0%)] Loss
:0.361880
Train Epoch:
8 [20000/60000 (33%)] Loss
:0.213277
Train Epoch:
8 [40000/60000 (67%)] Loss
:0.170169
Train Epoch:
9 [0/60000 (0%)] Loss
:0.301176
Train Epoch:
9 [20000/60000 (33%)] Loss
:0.175931
Train Epoch:
9 [40000/60000 (67%)] Loss
:0.214820test set
:average loss:
0.0002,Accuracy
:9370/10000 (94%)Process finished with exit code
0'''
42測試方法
當在train上面不停的train,可能會使得loss很低 accuracy很高,但其實模型只是記住很淺層的東西,不能學習本質上的東西,造成over fitting過擬合,在validation上做test就可以發(fā)現(xiàn)在后面階段acc不穩(wěn)定甚至下降,loss不穩(wěn)定甚至上升,所以不是說越訓練越好,數(shù)據量和架構是核心
logits
=torch
.rand(4,10)
#四張圖片每張圖片
10維的vector(代表特征:類別
0-9)
pred
=F
.softmax(logits
,dim
=1)
#在dim
=1上做softmax 因為希望對每張圖片的輸出值做softmax, 再dim
=0做softmax結果也是
[4,
10]的tensor但是結果不同pred_label
=pred
.argmax(dim
=1)
logits
.argmax(dim
=1)
#先對pred的值做argmax,再對logits的值做argmax,返回的都是
[b
]大小的tensor,發(fā)現(xiàn)是二者的argmax是一樣的,因為
4張圖片每張圖片都有一個最大可能性的label,所以對pred還是對logits對argmax都可以correct
=torch
.eq(pred_label
,label
)#比較是否預測正確
correct
.sum().float().item()/4#就是acc 因為correct是tensor,
.item()取標量
控制精度test頻率變高,花大量時間train就是test頻率變小
#
-*- codeing
= utf
-8 -*-
# @Time
:2021/5/14 21:06
# @Author
:sueong
# @File
:ll
.py
# @Software
:PyCharm
import torch
import torch
.nn as nn
from torch import optim# 超參數(shù)
from torchvision import datasets
, transformsbatch_size
= 200
learning_rate
= 0.01
epochs
= 10# 獲取訓練數(shù)據
train_db
= datasets
.MNIST('../data', train
=True
, download
=True
, # train
=True則得到的是訓練集transform
=transforms
.Compose([ # transform進行數(shù)據預處理transforms
.ToTensor(), # 轉成Tensor類型的數(shù)據transforms
.Normalize((0.1307,), (0.3081,)) # 進行數(shù)據標準化
(減去均值除以方差
)]))# DataLoader把訓練數(shù)據分成多個小組,此函數(shù)每次拋出一組數(shù)據。直至把所有的數(shù)據都拋出。就是做一個數(shù)據的初始化
train_loader
= torch
.utils
.data
.DataLoader(train_db
, batch_size
=batch_size
, shuffle
=True
)# 獲取測試數(shù)據
test_db
= datasets
.MNIST('../data', train
=False
,transform
=transforms
.Compose([transforms
.ToTensor(),transforms
.Normalize((0.1307,), (0.3081,))]))test_loader
= torch
.utils
.data
.DataLoader(test_db
, batch_size
=batch_size
, shuffle
=True
)class MLP(nn
.Module
):def
__init__(self
):super(MLP
,self
).__init__()self
.model
=nn
.Sequential(#sequential串聯(lián)起來nn
.Linear(784,200),nn
.LeakyReLU(inplace
=True
),nn
.Linear(200, 200),nn
.LeakyReLU(inplace
=True
),nn
.Linear(200,10),nn
.LeakyReLU(inplace
=True
),)def
forward(self
,x
):x
= self
.model(x
)return x
#Train
device
=torch
.device('cuda:0')
net
=MLP().to(device
)#網絡結構 就是foward函數(shù)
optimizer
=optim
.SGD(net
.parameters(),lr
=learning_rate
)#使用nn
.Module可以直接代替之前
[w1
,b1
,w2
,b2
.。。
]
criteon
=nn
.CrossEntropyLoss().to(device
)for epoch in
range(epochs
):for batch_ind
,(data
,target
) in
enumerate(train_loader
):data
=data
.view(-1,28*28)data
,target
=data
.to(device
),target
.to(device
) #target
.cuda()logits
=net(data
)#這不要再加softmax logits就是predloss
=criteon(logits
,target
)#求lossoptimizer
.zero_grad()loss
.backward()optimizer
.step()if batch_ind
%100==0:print('Train Epoch:{} [{}/{} ({:.0f}%)]\t Loss:{:.6f}'.format(epoch
,batch_ind
*len(data
),len(train_loader
.dataset
),100.* batch_ind
/len(train_loader
),loss
.item()))#每一個epcho test一次可以發(fā)現(xiàn)acc再增加test_loss
=0correct
=0for data
,target in test_loader
:data
=data
.view(-1,28*28)#第一個維度保持不變寫
-1data
, target
= data
.to(device
), target
.to(device
)logits
=net(data
)test_loss
+=criteon(logits
,target
).item()pred
=logits
.data
.max(1)[1]#因為correct
+=pred
.eq(target
.data
).sum()test_loss
/=len(train_loader
.dataset
)print('\n test set:average loss:{:.4f},Accuracy:{}/{} ({:.0f}%)\n'.format(test_loss
,correct
,len(test_loader
.dataset
),100.*correct
/len(test_loader
.dataset
)))
'''
F
:\anaconda\envs\pytorch\python
.exe F
:/pythonProject1
/pythonProject3
/ll
.py
Train Epoch:
0 [0/60000 (0%)] Loss
:2.308717
Train Epoch:
0 [20000/60000 (33%)] Loss
:2.017611
Train Epoch:
0 [40000/60000 (67%)] Loss
:1.563952test set
:average loss:
0.0011,Accuracy
:6175/10000 (62%)Train Epoch:
1 [0/60000 (0%)] Loss
:1.301144
Train Epoch:
1 [20000/60000 (33%)] Loss
:1.313298
Train Epoch:
1 [40000/60000 (67%)] Loss
:1.184744test set
:average loss:
0.0008,Accuracy
:7102/10000 (71%)Train Epoch:
2 [0/60000 (0%)] Loss
:0.946402
Train Epoch:
2 [20000/60000 (33%)] Loss
:0.762401
Train Epoch:
2 [40000/60000 (67%)] Loss
:0.697880test set
:average loss:
0.0004,Accuracy
:8841/10000 (88%)Train Epoch:
3 [0/60000 (0%)] Loss
:0.579781
Train Epoch:
3 [20000/60000 (33%)] Loss
:0.480412
Train Epoch:
3 [40000/60000 (67%)] Loss
:0.347749test set
:average loss:
0.0003,Accuracy
:9047/10000 (90%)Train Epoch:
4 [0/60000 (0%)] Loss
:0.363675
Train Epoch:
4 [20000/60000 (33%)] Loss
:0.304079
Train Epoch:
4 [40000/60000 (67%)] Loss
:0.401550test set
:average loss:
0.0003,Accuracy
:9118/10000 (91%)Train Epoch:
5 [0/60000 (0%)] Loss
:0.324268
Train Epoch:
5 [20000/60000 (33%)] Loss
:0.269142
Train Epoch:
5 [40000/60000 (67%)] Loss
:0.284855test set
:average loss:
0.0002,Accuracy
:9195/10000 (92%)Train Epoch:
6 [0/60000 (0%)] Loss
:0.181122
Train Epoch:
6 [20000/60000 (33%)] Loss
:0.214253
Train Epoch:
6 [40000/60000 (67%)] Loss
:0.310929test set
:average loss:
0.0002,Accuracy
:9229/10000 (92%)Train Epoch:
7 [0/60000 (0%)] Loss
:0.233558
Train Epoch:
7 [20000/60000 (33%)] Loss
:0.345559
Train Epoch:
7 [40000/60000 (67%)] Loss
:0.240973test set
:average loss:
0.0002,Accuracy
:9286/10000 (93%)Train Epoch:
8 [0/60000 (0%)] Loss
:0.197916
Train Epoch:
8 [20000/60000 (33%)] Loss
:0.368038
Train Epoch:
8 [40000/60000 (67%)] Loss
:0.367101test set
:average loss:
0.0002,Accuracy
:9310/10000 (93%)Train Epoch:
9 [0/60000 (0%)] Loss
:0.221928
Train Epoch:
9 [20000/60000 (33%)] Loss
:0.190280
Train Epoch:
9 [40000/60000 (67%)] Loss
:0.183632test set
:average loss:
0.0002,Accuracy
:9351/10000 (94%)Process finished with exit code
0'''
43可視化
TensorBoard
visdom
1安裝
2 run server damon
legend里面放的是y1y2的一個圖標
# -*- codeing = utf-8 -*-
# @Time :2021/5/14 21:06
# @Author:sueong
# @File:ll.py
# @Software:PyCharm
import torch
import torch.nn as nn
from torch import optim
from visdom import Visdom# 超參數(shù)
from torchvision import datasets, transforms
from visdom import Visdombatch_size = 200
learning_rate = 0.01
epochs = 10# 獲取訓練數(shù)據
train_db = datasets.MNIST('../data', train=True, download=True, # train=True則得到的是訓練集transform=transforms.Compose([ # transform進行數(shù)據預處理transforms.ToTensor(), # 轉成Tensor類型的數(shù)據transforms.Normalize((0.1307,), (0.3081,)) # 進行數(shù)據標準化(減去均值除以方差)]))# DataLoader把訓練數(shù)據分成多個小組,此函數(shù)每次拋出一組數(shù)據。直至把所有的數(shù)據都拋出。就是做一個數(shù)據的初始化
train_loader = torch.utils.data.DataLoader(train_db, batch_size=batch_size, shuffle=True)# 獲取測試數(shù)據
test_db = datasets.MNIST('../data', train=False,transform=transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.1307,), (0.3081,))]))test_loader = torch.utils.data.DataLoader(test_db, batch_size=batch_size, shuffle=True)class MLP(nn.Module):def __init__(self):super(MLP,self).__init__()self.model=nn.Sequential(#sequential串聯(lián)起來nn.Linear(784,200),nn.LeakyReLU(inplace=True),nn.Linear(200, 200),nn.LeakyReLU(inplace=True),nn.Linear(200,10),nn.LeakyReLU(inplace=True),)def forward(self,x):x = self.model(x)return x#Train
device=torch.device('cuda:0')
net=MLP().to(device)#網絡結構 就是foward函數(shù)
optimizer=optim.SGD(net.parameters(),lr=learning_rate)#使用nn.Module可以直接代替之前[w1,b1,w2,b2.。。]
criteon=nn.CrossEntropyLoss().to(device)#在訓練-測試的迭代過程之前,定義兩條曲線,這里相當于是占位,
#在訓練-測試的過程中再不斷填充點以實現(xiàn)曲線隨著訓練動態(tài)增長:
'''
這里第一步可以提供參數(shù)env='xxx'來設置環(huán)境窗口的名稱,這里什么都沒傳,所以是在默認的main窗口下。第二第三步的viz.line的前兩個參數(shù)是曲線的Y和X的坐標(前面是縱軸后面才是橫軸),
這里為了占位所以都設置了0(實際上為Loss初始Y值設置為0的話,
在圖中剛開始的地方會有個大跳躍有點難看,因為Loss肯定是從大往小了走的)。
為它們設置了不同的win參數(shù),它們就會在不同的窗口中展示,
因為第三步定義的是測試集的loss和acc兩條曲線,所以在X等于0時Y給了兩個初始值。'''viz = Visdom()
viz.line([0.], [0.], win='train_loss', opts=dict(title='train loss'))
viz.line([[0.0, 0.0]], [0.], win='test', opts=dict(title='test loss&acc.',legend=['loss', 'acc.']))global_step = 0
#為了知道訓練了多少個batch了,緊接著設置一個全局的計數(shù)器:for epoch in range(epochs):for batch_ind,(data,target) in enumerate(train_loader):data=data.view(-1,28*28)data,target=data.to(device),target.to(device) #target.cuda()logits=net(data)#這不要再加softmax logits就是predloss=criteon(logits,target)#求lossoptimizer.zero_grad()loss.backward()# print(w1.grad.norm(), w2.grad.norm())optimizer.step()#在每個batch訓練完后,為訓練曲線添加點,來讓曲線實時增長:#注意這里用win參數(shù)來選擇是哪條曲線,# 用update='append'的方式添加曲線的增長點,前面是Y坐標,后面是X坐標。global_step+=1viz.line([loss.item()], [global_step], win='train_loss', update='append')if batch_ind%100==0:print('Train Epoch:{} [{}/{} ({:.0f}%)]\t Loss:{:.6f}'.format(epoch,batch_ind*len(data),len(train_loader.dataset),100.* batch_ind/len(train_loader),loss.item()))#每一個epcho test一次可以發(fā)現(xiàn)acc再增加test_loss=0correct=0for data,target in test_loader:data=data.view(-1,28*28)#第一個維度保持不變寫-1data, target = data.to(device), target.to(device)logits=net(data)test_loss+=criteon(logits,target).item()pred=logits.data.max(1)[1]# 在dim=1上找最大值correct += pred.eq(target).float().sum().item()#在每次測試結束后, 并在另外兩個窗口(用win參數(shù)設置)中展示圖像(.images)和真實值(文本用.text):viz.line([[test_loss, correct / len(test_loader.dataset)]],[global_step], win='test', update='append')viz.images(data.view(-1, 1, 28, 28), win='x')viz.text(str(pred.detach().cpu().numpy()), win='pred',opts=dict(title='pred'))#老師的代碼里用到了.detach(),并把數(shù)據搬到了CPU上,這樣才能展示出來。test_loss/=len(train_loader.dataset)print('\n test set:average loss:{:.4f},Accuracy:{}/{} ({:.0f}%)\n'.format(test_loss,correct,len(test_loader.dataset),100.*correct/len(test_loader.dataset)))
44欠擬合和過擬合
真實分布符合認知,但是不知道真實的分布和function的參數(shù)
而且這些函數(shù)不是線性的可能存在噪聲
次方越大,波形越大,抖動越大波形越復雜
衡量模型的學習能力,次方增加了,表達能力增強,能表達的分布更復雜,對復雜的映射也能學習到,即model capacity增大了
estimated用的模型的表達能力
ground-truth真實模型的復雜度
case1:Estimated<Ground-truth :under-fitting,用的模型表達能力不夠導致欠擬合
underfitting的表現(xiàn),增加模型復雜度/層數(shù)是否得到改善
case2:Ground-truth<Estimated :under-fitting,模型過于復雜,在有限數(shù)據集上包含了噪聲,過擬合導致泛化能力不好
在train上很好 但是test不好
現(xiàn)實中通常是overfitting
45交叉驗證 1Train-Val-Test劃分
我們做test的目的是看有沒有overfitting 選取在overfitting之前最好的參數(shù),這里的test_loader其實的validation
我們一般選取test acc最高的點,然后終止訓練,然后選取最高點作為模型的最終狀態(tài)
保存在overfitting前效果最好的參數(shù)w和b
val set:挑選模型參數(shù),在overfitting前停止train
test set:測試,是交給客戶在驗收的時候看性能怎么樣(test是模型不知道的數(shù)據,防止val和train一起訓練,如果客戶用val測試,那么val已經train過了效果就很好就是作弊)
因為test是看不見的,如果根據test set反饋的acc去調整參數(shù),那么test和val的功能就一樣就會造成數(shù)據污染
總結
以上是生活随笔為你收集整理的pytorch教程龙曲良41-45的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。