日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

模型训练加速方法

發(fā)布時(shí)間:2025/4/5 编程问答 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 模型训练加速方法 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

模型訓(xùn)練加速方法

  • 學(xué)習(xí)率設(shè)置

    • lr = 0.00125*num_gpu*samples_per_gpu

    數(shù)據(jù)讀取加速

  • data prefetch (Nvidia Apex中提供的解決方案)

    # pip install prefetch_generatorfrom torch.utils.data import DataLoaderfrom prefetch_generator import BackgroundGenerator# 使用DataLoaderX代替DataLoaderclass DataLoaderX(DataLoader):def __iter__(self):return BackgroundGenerator(super().__iter__())
  • cuda.Steam加速拷貝過程

    """ 該代碼是在使用amp半精度計(jì)算的條件下:否則加 if args.fp16:self.mean = self.mean.half()self.std = self.std.half() """ class DataPrefetcher():def __init__(self, loader, opt):self.loader = iter(loader)self.opt = optself.stream = torch.cuda.Stream()self.preload()def preload(self):try:self.batch = next(self.loader)except StopIteration:self.batch = Nonereturnwith torch.cuda.stream(self.stream):for k in self.batch:if k != 'meta':self.batch[k] = self.batch[k].to(device=self.opt.device, non_blocking=True)def next(self):torch.cuda.current_stream().wait_stream(self.stream)batch = self.batchself.preload()return batchclass data_prefetcher():def __init__(self, loader):self.loader = iter(loader)self.stream = torch.cuda.Stream()self.mean = torch.tensor([0.485 * 255, 0.456 * 255, 0.406 * 255]).cuda().view(1,3,1,1)self.std = torch.tensor([0.229 * 255, 0.224 * 255, 0.225 * 255]).cuda().view(1,3,1,1)# With Amp, it isn't necessary to manually convert data to half.# if args.fp16:# self.mean = self.mean.half()# self.std = self.std.half()self.preload()def preload(self):try:self.next_input, self.next_target = next(self.loader)except StopIteration:self.next_input = Noneself.next_target = Nonereturnwith torch.cuda.stream(self.stream):self.next_input = self.next_input.cuda(non_blocking=True)self.next_target = self.next_target.cuda(non_blocking=True)# With Amp, it isn't necessary to manually convert data to half.# if args.fp16:# self.next_input = self.next_input.half()# else:self.next_input = self.next_input.float()self.next_input = self.next_input.sub_(self.mean).div_(self.std)def next(self):torch.cuda.current_stream().wait_stream(self.stream)input = self.next_inputtarget = self.next_targetself.preload()return input, target# 加入前數(shù)據(jù)加載: for iter_id, batch in enumerate(data_loader):if iter_id >= num_iters:breakfor k in batch:if k != 'meta':batch[k] = batch[k].to(device=opt.device, non_blocking=True)run_step()# 加入后加載數(shù)據(jù) prefetcher = DataPrefetcher(data_loader, opt) batch = prefetcher.next() iter_id = 0 while batch is not None:iter_id += 1if iter_id >= num_iters:breakrun_step()batch = prefetcher.next()
  • OCR模型訓(xùn)練tricks

    • 標(biāo)點(diǎn)符號:在建立數(shù)據(jù)集的時(shí)候,需要將中文的如[,.’ ";:]等標(biāo)點(diǎn)符號換成英文的,或者反過來,不要有兩份一樣的,因?yàn)槟壳安徽撌莂ttention_ocr還是ctc都算是象形文字,所以模型看到中文分號和英文分號,總覺得是同一個(gè)東西,所以會(huì)分錯(cuò);

    • 訓(xùn)練集:在建立數(shù)據(jù)集的時(shí)候,因?yàn)閏tc_loss中有個(gè)sequence_length,所以,為了增加數(shù)據(jù)分布一致性和ctc的效率,最好先對圖片對應(yīng)的文字進(jìn)行長度排序,比如前面100個(gè)樣本的label都是小于5的字符串;后面100個(gè)都是小于10的字符串;后面100個(gè)都是小于15的字符串,等等。

    • batch間獨(dú)立,batch內(nèi)相等:在讀取數(shù)據(jù)的時(shí)候,同一個(gè)batch中因?yàn)閳D片大小需要相同,而如果是全卷積網(wǎng)絡(luò),是可以讓不同batch之間獨(dú)立的。所以圖片的縮放可以按照batch之間各自決定。比如第一個(gè)batch 讀取長度小于5的label和圖片,將其縮放到100*32;第二個(gè)讀取長度小于10的label和圖片,將其縮放到200**32;

    • 訓(xùn)練集雙尾問題:為了數(shù)據(jù)的平衡性,需要將數(shù)據(jù)集中出現(xiàn)次數(shù)特別少的和出現(xiàn)次數(shù)特別多的label的樣本刪除,保證每個(gè)字符的頻率都適中;

  • pytorch處理類別不均衡問題

    • # 數(shù)據(jù)方面 import torch from torch.utils.data.dataset import random_split from torch.utils.data import DataLoader, WeightedRandomSampler from collections import Countor def load_data(sample):train_data = Nonetrain_set, val_set = random_split(train_full, [math.floor(len(train_full)*0.8), math.ceil(len(train_full)*0.2)])self.train_classes = [label for _, label in train_set]if sample:# Need to get weight for every image in the datasetclass_count = Counter(self.train_classes)class_weights = torch.Tensor([len(self.train_classes)/c for c in pd.Series(class_count).sort_index().values]) # Can't iterate over class_count because dictionary is unorderedsample_weights = [0] * len(train_set)for idx, (image, label) in enumerate(train_set):class_weight = class_weights[label]sample_weights[idx] = class_weightsampler = WeightedRandomSampler(weights=sample_weights,num_samples = len(train_set), replacement=True) train_loader = DataLoader(train_set, batch_size=self.batch_size, sampler=sampler)else:train_loader = DataLoader(train_set, batch_size=self.batch_size, shuffle=True)val_loader = DataLoader(val_set, batch_size=self.batch_size)return train_loader, val_loader# 模型訓(xùn)練loss加權(quán)重 def load_model(self, arch='resnet'):if arch == 'resnet':self.model = torchvision.models.resnet50(pretrained=True)if self.freeze_backbone:for param in self.model.parameters():param.requires_grad = Falseself.model.fc = nn.Linear(in_features=self.model.fc.in_features, out_features=self.num_classes)elif arch == 'efficient-net':self.model = EfficientNet.from_pretrained('efficientnet-b7')if self.freeze_backbone:for param in self.model.parameters():param.requires_grad = Falseself.model._fc = nn.Linear(in_features=self.model._fc.in_features, out_features=self.num_classes) self.model = self.model.to(self.device)self.optimizer = torch.optim.Adam(self.model.parameters(), self.lr) if self.loss_weights:class_count = Counter(self.train_classes)class_weights = torch.Tensor([len(self.train_classes)/c for c in pd.Series(class_count).sort_index().values])# Cant iterate over class_count because dictionary is unorderedclass_weights = class_weights.to(self.device) self.criterion = nn.CrossEntropyLoss(class_weights)else:self.criterion = nn.CrossEntropyLoss()
  • 早期停止

    • #Callbacks # Early stopping class EarlyStopping:def __init__(self, patience=1, delta=0, path='checkpoint.pt'):self.patience = patienceself.delta = deltaself.path= pathself.counter = 0self.best_score = Noneself.early_stop = Falsedef __call__(self, val_loss, model):if self.best_score is None:self.best_score = val_lossself.save_checkpoint(model)elif val_loss > self.best_score:self.counter +=1if self.counter >= self.patience:self.early_stop = True else:self.best_score = val_lossself.save_checkpoint(model)self.counter = 0 def save_checkpoint(self, model):torch.save(model.state_dict(), self.path)

總結(jié)

以上是生活随笔為你收集整理的模型训练加速方法的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。