當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

torch基础学习

發(fā)布時間：2024/8/5 编程问答 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 torch基础学习小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

1.PyTorch 中，nn 與 nn.functional 有什么區(qū)別？

這兩個是差不多的，不過一個包裝好的類，一個是可以直接調(diào)用的函數(shù)。我們可以去翻這兩個模塊的具體實現(xiàn)代碼，我下面以卷積Conv1d為例。torch.nn下的Conv1d:

class Conv1d(_ConvNd):
? ? def __init__(self, in_channels, out_channels, kernel_size, stride=1,
? ? ? ? ? ? ? ? ?padding=0, dilation=1, groups=1, bias=True):
? ? ? ? kernel_size = _single(kernel_size)
? ? ? ? stride = _single(stride)
? ? ? ? padding = _single(padding)
? ? ? ? dilation = _single(dilation)
? ? ? ? super(Conv1d, self).__init__(
? ? ? ? ? ? in_channels, out_channels, kernel_size, stride, padding, dilation,
? ? ? ? ? ? False, _single(0), groups, bias)

? ? def forward(self, input):
? ? ? ? return F.conv1d(input, self.weight, self.bias, self.stride,
? ? ? ? ? ? ? ? ? ? ? ? self.padding, self.dilation, self.groups)

torch.nn.functional下的conv1d:
def conv1d(input, weight, bias=None, stride=1, padding=0, dilation=1,
? ? ? ? ? ?groups=1):
? ? if input is not None and input.dim() != 3:
? ? ? ? raise ValueError("Expected 3D tensor as input, got {}D tensor instead.".format(input.dim()))

? ? f = ConvNd(_single(stride), _single(padding), _single(dilation), False,
? ? ? ? ? ? ? ?_single(0), groups, torch.backends.cudnn.benchmark,
? ? ? ? ? ? ? ?torch.backends.cudnn.deterministic, torch.backends.cudnn.enabled)
? ? return f(input, weight, bias)

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
? ? def __init__(self):
? ? ? ? super(Net, self).__init__()
? ? ? ? self.fc1 = nn.Linear(512, 256)
? ? ? ? self.fc2 = nn.Linear(256, 256)
? ? ? ? self.fc3 = nn.Linear(256, 2)

? ? def forward(self, x):
? ? ? ? x = F.relu(self.fc1(x))
? ? ? ? x = F.relu(F.dropout(self.fc2(x), 0.5))
? ? ? ? x = F.dropout(self.fc3(x), 0.5)
? ? ? ? return x

需要維持狀態(tài)的，主要是三個線性變換，所以在構(gòu)造Module是，定義了三個nn.Linear對象，而在計算時，relu,dropout之類不需要保存狀態(tài)的可以直接使用。注：dropout的話有個坑，需要設(shè)置自行設(shè)定training的state。

至于喜歡哪一種方式，是個人口味問題，但PyTorch官方推薦：具有學(xué)習(xí)參數(shù)的（例如，conv2d, linear, batch_norm)采用nn.Xxx方式，沒有學(xué)習(xí)參數(shù)的（例如，maxpool, loss func, activation func）等根據(jù)個人選擇使用nn.functional.xxx或者nn.Xxx方式。但關(guān)于dropout，個人強(qiáng)烈推薦使用nn.Xxx方式，因為一般情況下只有訓(xùn)練階段才進(jìn)行dropout，在eval階段都不會進(jìn)行dropout。使用nn.Xxx方式定義dropout，在調(diào)用model.eval()之后，model中所有的dropout layer都關(guān)閉，但以nn.function.dropout方式定義dropout，在調(diào)用model.eval()之后并不能關(guān)閉dropout。

nn.functional中的都是沒有副作用無狀態(tài)的函數(shù)，也就是說function內(nèi)部一定是沒有Variable的，只是純粹從輸入到輸出的一個變換nn下面的可能有可能沒有，一般都是nn.Module的子類，可以借助nn.Module的父方法方便的管理各種需要的變量（狀態(tài)）

2.?PyTorch權(quán)重初始化的幾種方法

1)class discriminator(nn.Module):

? ? def __init__(self, dataset = 'mnist'):
? ? ? ? super(discriminator, self).__init__()
? ? ? 。...
? ? ? ? self.conv = nn.Sequential(
? ? ? ? ? ? nn.Conv2d(self.input_dim, 64, 4, 2, 1),
? ? ? ? ? ? nn.ReLU(),
? ? ? ? )
? ? ? ? ...
? ? ? ? self.fc = nn.Sequential(
? ? ? ? ? ? nn.Linear(32, 64 * (self.input_height // 2) * (self.input_width // 2)),
? ? ? ? ? ? nn.BatchNorm1d(64 * (self.input_height // 2) * (self.input_width // 2)),
? ? ? ? ? ? nn.ReLU(),
? ? ? ? )
? ? ? ? self.deconv = nn.Sequential(
? ? ? ? ? ? nn.ConvTranspose2d(64, self.output_dim, 4, 2, 1),
? ? ? ? ? ? #nn.Sigmoid(), ? ? ? ? # EBGAN does not work well when using Sigmoid().
? ? ? ? )
? ? utils.initialize_weights(self)

? ? def forward(self, input):
? ? ...

def initialize_weights(net):
? ? for m in net.modules():
? ? ? ? if isinstance(m, nn.Conv2d):
? ? ? ? ? ? m.weight.data.normal_(0, 0.02)
? ? ? ? ? ? m.bias.data.zero_()
? ? ? ? elif isinstance(m, nn.ConvTranspose2d):
? ? ? ? ? ? m.weight.data.normal_(0, 0.02)
? ? ? ? ? ? m.bias.data.zero_()
? ? ? ? elif isinstance(m, nn.Linear):
? ? ? ? ? ? m.weight.data.normal_(0, 0.02)
m.bias.data.zero_()

2)def init_weights(m):
? ? ?print(m)
? ? ?if type(m) == nn.Linear:
? ? ? ? ?m.weight.data.fill_(1.0)
? ? ? ? ?print(m.weight)

?net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
?net.apply(init_weights)

3)def weights_init(m):
? ? classname = m.__class__.__name__
? ? if classname.find('Conv') != -1:
? ? ? ? m.weight.data.normal_(0.0, 0.02)
? ? elif classname.find('BatchNorm') != -1:
? ? ? ? m.weight.data.normal_(1.0, 0.02)
? ? ? ? m.bias.data.fill_(0)

net.apply(weights_init)

3.pytorch注意點：

1）通過對模型進(jìn)行并行GPU處理（這里一般指單機(jī)多卡），可以相對提高處理速度，但是處理方法大致有兩種。
將Module放在GPU上運(yùn)行也十分簡單，只需兩步：
model = model.cuda()：將模型的所有參數(shù)轉(zhuǎn)存到GPU
input.cuda()：將輸入數(shù)據(jù)也放置到GPU上
至于如何在多個GPU上并行計算，PyTorch也提供了兩個函數(shù)，可實現(xiàn)簡單高效的并行GPU計算。
①nn.parallel.data_parallel(module, inputs, device_ids=None, output_device=None, dim=0, module_kwargs=None)
②class torch.nn.DataParallel(module, device_ids=None, output_device=None, dim=0)
可見二者的參數(shù)十分相似，通過device_ids參數(shù)可以指定在哪些GPU上進(jìn)行優(yōu)化，output_device指定輸出到哪個GPU上。唯一的不同就在于前者直接利用多GPU并行計算得出結(jié)果，而后者則返回一個新的module，能夠自動在多GPU上進(jìn)行并行加速。
# method 1
new_net = nn.DataParallel(net, device_ids=[0, 1])
output = new_net(input)
# method 2
output = nn.parallel.data_parallel(new_net, input, device_ids=[0, 1])
# method 3

from parallel import DataParallelModel, DataParallelCriterion

parallel_model = DataParallelModel(model) ? ? ? ? ? ? # Encapsulate the model
parallel_loss ?= DataParallelCriterion(loss_function) # Encapsulate the loss function

predictions = parallel_model(inputs) ? ? ?# Parallel forward pass
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # "predictions" is a tuple of n_gpu tensors
loss = parallel_loss(predictions, labels) # Compute loss function in parallel
loss.backward() ? ? ? ? ? ? ? ? ? ? ? ? ? # Backward pass
optimizer.step() ? ? ? ? ? ? ? ? ? ? ? ? ?# Optimizer step

DataParallel并行的方式，是將輸入一個batch的數(shù)據(jù)均分成多份，分別送到對應(yīng)的GPU進(jìn)行計算，各個GPU得到的梯度累加。與Module相關(guān)的所有數(shù)據(jù)也都會以淺復(fù)制的方式復(fù)制多份，在此需要注意，在module中屬性應(yīng)該是只讀的。
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
? print("Let's use", torch.cuda.device_count(), "GPUs!")
? # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
? model = nn.DataParallel(model)
if torch.cuda.is_available():
? ?model.cuda()

4.pytorch動態(tài)調(diào)整學(xué)習(xí)率

1）自定義根據(jù) epoch 改變學(xué)習(xí)率。
def adjust_learning_rate(optimizer, epoch):
? ? """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
? ? lr = args.lr * (0.1 ** (epoch // 30))
? ? for param_group in optimizer.param_groups:
? ? ? ? param_group['lr'] = lr
示例：
optimizer = torch.optim.SGD(model.parameters(),lr = args.lr,momentum = 0.9)
for epoch in range(10):
? ? adjust_learning_rate(optimizer,epoch)
? ? train(...)
? ? validate(...)
2）針對模型的不同層設(shè)置不同的學(xué)習(xí)率。
model = torchvision.models.resnet101(pretrained=True)
large_lr_layers = list(map(id,model.fc.parameters()))
small_lr_layers = filter(lambda p:id(p) not in large_lr_layers,model.parameters())
optimizer = torch.optim.SGD([
? ? ? ? ? ? {"params":large_lr_layers},
? ? ? ? ? ? {"params":small_lr_layers,"lr":1e-4}
? ? ? ? ? ? ],lr = 1e-2,momenum=0.9)
注：large_lr_layers 學(xué)習(xí)率為 1e-2，small_lr_layers 學(xué)習(xí)率為 1e-4，兩部分參數(shù)共用一個 momenum。

optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=0.0001, betas=(0.9, 0.999), eps=1e-08, weight_decay=1e-5)
class Net(nn.Module):
? ? def __init__(self):
? ? ? ? super(Net, self).__init__()
? ? ? ? self.conv1 = nn.Conv2d(1, 6, 5)
? ? ? ? self.conv2 = nn.Conv2d(6, 16, 5)
? ? ? ? for p in self.parameters():
? ? ? ? ? ? p.requires_grad=False
? ? ? ? self.fc1 = nn.Linear(16 * 5 * 5, 120)
? ? ? ? self.fc2 = nn.Linear(120, 84)
? ? ? ? self.fc3 = nn.Linear(84, 10)
這樣就將for循環(huán)以上的參數(shù)固定, 只訓(xùn)練下面的參數(shù)(f,g,h,gamma,fc,等), 但是注意需要在optimizer中添加上這樣的一句話filter(lambda p: p.requires_grad, model.parameters()

5.?手動設(shè)置 lr 衰減區(qū)間。

示例：
def adjust_learning_rate(optimizer, lr):
? ? for param_group in optimizer.param_groups:
? ? ? ? param_group['lr'] = lr

for epoch in range(60): ? ? ? ?
? ? lr = 30e-5
? ? if epoch > 25:
? ? ? ? lr = 15e-5
? ? if epoch > 30:
? ? ? ? lr = 7.5e-5
? ? if epoch > 35:
? ? ? ? lr = 3e-5
? ? if epoch > 40:
? ? ? ? lr = 1e-5
? ? adjust_learning_rate(optimizer, lr)
4）余弦退火
示例：
epochs = 60
optimizer = optim.SGD(model.parameters(),lr = config.lr,momentum=0.9,weight_decay=1e-4)?
scheduler = lr_scheduler.CosineAnnealingLR(optimizer,T_max = (epochs // 9) + 1)
for epoch in range(epochs):
? ? scheduler.step(epoch)
5)根據(jù)epochs.
scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)?
for epoch in range(100):?
?? ?scheduler.step()?
?? ?train(...)?
?? ?validate(...)
?

總結(jié)

以上是生活随笔為你收集整理的torch基础学习的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

基础
torch