當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

multi task训练torch_手把手教你使用PyTorch(2)-requires_gradamp;computation graph

發(fā)布時(shí)間：2023/12/10 编程问答 53 豆豆

生活随笔收集整理的這篇文章主要介紹了 multi task训练torch_手把手教你使用PyTorch(2)-requires_gradamp;computation graph 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

import torch

1. Requires_grad

但是，模型畢竟不是人，它的智力水平還不足夠去自主辨識(shí)那些量的梯度需要計(jì)算，既然如此，就需要手動(dòng)對(duì)其進(jìn)行標(biāo)記。

在PyTorch中，通用的數(shù)據(jù)結(jié)構(gòu)tensor包含一個(gè)attributerequires_grad，它被用于說(shuō)明當(dāng)前量是否需要在計(jì)算中保留對(duì)應(yīng)的梯度信息，以上文所述的線性回歸為例，容易知道參數(shù)www為需要訓(xùn)練的對(duì)象，為了得到最合適的參數(shù)值，我們需要設(shè)置一個(gè)相關(guān)的損失函數(shù)，根據(jù)梯度回傳的思路進(jìn)行訓(xùn)練。

官方文檔中的說(shuō)明如下

If there’s a single input to an operation that requires gradient, its output will also require gradient.

只要某一個(gè)輸入需要相關(guān)梯度值，則輸出也需要保存相關(guān)梯度信息，這樣就保證了這個(gè)輸入的梯度回傳。

而反之，若所有的輸入都不需要保存梯度，那么輸出的requires_grad會(huì)自動(dòng)設(shè)置為False。既然沒(méi)有了相關(guān)的梯度值，自然進(jìn)行反向傳播時(shí)會(huì)將這部分子圖從計(jì)算中剔除。

Conversely, only if all inputs don’t require gradient, the output also won’t require it. Backward computation is never performed in the subgraphs, where all Tensors didn’t require gradients.

對(duì)于那些要求梯度的tensor，PyTorch會(huì)存儲(chǔ)他們相關(guān)梯度信息和產(chǎn)生他們的操作，這產(chǎn)生額外內(nèi)存消耗，為了優(yōu)化內(nèi)存使用，默認(rèn)產(chǎn)生的tensor是不需要梯度的。

而我們?cè)谑褂蒙窠?jīng)網(wǎng)絡(luò)時(shí)，這些全連接層卷積層等結(jié)構(gòu)的參數(shù)都是默認(rèn)需要梯度的。

a = torch.tensor([1., 2., 3.])

print('a:', a.requires_grad)

b = torch.tensor([1., 4., 2.], requires_grad = True)

print('b:', b.requires_grad)

print('sum of a and b:', (a+b).requires_grad)

a: False

b: True

sum of a and b: True

2. Computation Graph

從PyTorch的設(shè)計(jì)原理上來(lái)說(shuō)，在每次進(jìn)行前向計(jì)算得到pred時(shí)，會(huì)產(chǎn)生一個(gè)用于梯度回傳的計(jì)算圖，這張圖儲(chǔ)存了進(jìn)行back propagation需要的中間結(jié)果，當(dāng)調(diào)用了.backward()后，會(huì)從內(nèi)存中將這張圖進(jìn)行釋放

這張計(jì)算圖保存了計(jì)算的相關(guān)歷史和提取計(jì)算所需的所有信息，以output作為root節(jié)點(diǎn)，以input和所有的參數(shù)為leaf節(jié)點(diǎn)，

we only retain the grad of the leaf node with requires_grad =True

在完成了前向計(jì)算的同時(shí)，PyTorch也獲得了一張由計(jì)算梯度所需要的函數(shù)所組成的圖

而從數(shù)據(jù)集中獲得的input其requires_grad為False，故我們只會(huì)保存參數(shù)的梯度，進(jìn)一步據(jù)此進(jìn)行參數(shù)優(yōu)化

在PyTorch中，multi-task任務(wù)一個(gè)標(biāo)準(zhǔn)的train from scratch流程為

for idx, data in enumerate(train_loader):

xs, ys = data

optmizer.zero_grad()

# 計(jì)算d(l1)/d(x)

pred1 = model1(xs) #生成graph1

loss = loss_fn1(pred1, ys)

loss.backward() #釋放graph1

# 計(jì)算d(l2)/d(x)

pred2 = model2(xs)#生成graph2

loss2 = loss_fn2(pred2, ys)

loss.backward() #釋放graph2

# 使用d(l1)/d(x)+d(l2)/d(x)進(jìn)行優(yōu)化

optmizer.step()

Computation Graph本質(zhì)上是一個(gè)operation的圖，所有的節(jié)點(diǎn)都是一個(gè)operation，而進(jìn)行相應(yīng)計(jì)算的參數(shù)則以葉節(jié)點(diǎn)的形式進(jìn)行輸入

借助torchviz庫(kù)以下面的模型作為示例

import torch.nn.functional as F

import torch.nn as nn

class Conv_Classifier(nn.Module):

def __init__(self):

super(Conv_Classifier, self).__init__()

self.conv1 = nn.Conv2d(1, 5, 5)

self.pool1 = nn.MaxPool2d(2)

self.conv2 = nn.Conv2d(5, 16, 5)

self.pool2 = nn.MaxPool2d(2)

self.fc1 = nn.Linear(256, 20)

self.fc2 = nn.Linear(20, 10)

def forward(self, x):

x = F.relu(self.pool1((self.conv1(x))))

x = F.relu(self.pool2((self.conv2(x))))

x = F.dropout2d(x, training=self.training)

x = x.view(-1, 256)

x = F.relu(self.fc1(x))

x = F.relu(self.fc2(x))

return x

Mnist_Classifier = Conv_Classifier()

from torchviz import make_dot

input_sample = torch.rand((1, 1, 28, 28))

make_dot(Mnist_Classifier(input_sample), params=dict(Mnist_Classifier.named_parameters()))

其對(duì)應(yīng)的計(jì)算梯度所需的圖(計(jì)算圖)為

可以看到，所有的葉子節(jié)點(diǎn)對(duì)應(yīng)的操作都被記錄，以便之后的梯度回傳。

總結(jié)

以上是生活随笔為你收集整理的multi task训练torch_手把手教你使用PyTorch(2)-requires_gradamp;computation graph的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： 2020年“1024”，程序员日
下一篇： QQ MSN 网页互动代码