當(dāng)前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

Lesson 12.1 深度学习建模实验中数据集生成函数的创建与使用

發(fā)布時間：2025/4/5 pytorch 41 豆豆

生活随笔收集整理的這篇文章主要介紹了 Lesson 12.1 深度学习建模实验中数据集生成函数的创建与使用小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

Lesson 12.1 深度學(xué)習(xí)建模實(shí)驗(yàn)中數(shù)據(jù)集生成函數(shù)的創(chuàng)建與使用

??為了方便后續(xù)練習(xí)的展開，我們嘗試自己創(chuàng)建一個數(shù)據(jù)生成器，用于自主生成一些符合某些條件、具備某些特性的數(shù)據(jù)集。相比于傳統(tǒng)的機(jī)器學(xué)習(xí)領(lǐng)域，深度學(xué)習(xí)的數(shù)據(jù)集往往更加復(fù)雜，大多數(shù)情況也無法把數(shù)據(jù)生成數(shù)據(jù)表來進(jìn)行查看，在建模過程中，往往都是設(shè)計(jì)完模型結(jié)構(gòu)后直接訓(xùn)練模型，只能通過一些指標(biāo)來觀測模型的效果，外加復(fù)雜神經(jīng)網(wǎng)絡(luò)內(nèi)部其實(shí)也是“黑箱”，因此我們基本只能控制流程、輸入數(shù)據(jù)、觀測結(jié)果，說是煉丹師也并不為過。不過在學(xué)習(xí)階段，尤其在學(xué)習(xí)優(yōu)化算法的過程，我們還是希望能夠從更多角度觀測數(shù)據(jù)、觀測建模過程，這就需要我們自己動手，創(chuàng)建一些數(shù)據(jù)用于實(shí)驗(yàn)的原材料，通過一些實(shí)驗(yàn)深入了解模型原理，從“煉丹師”朝著“化學(xué)家”更進(jìn)一步。

導(dǎo)入相關(guān)的包

# 隨機(jī)模塊 import random# 繪圖模塊 import matplotlib as mpl import matplotlib.pyplot as plt# numpy import numpy as np# pytorch import torch from torch import nn,optim import torch.nn.functional as F from torch.utils.data import Dataset,TensorDataset,DataLoader

以上均為此前用到的包，其他的新的包將在使用時再進(jìn)行導(dǎo)入及介紹

一、回歸類數(shù)據(jù)集創(chuàng)建方法

1.手動生成數(shù)據(jù)

??回歸類模型的數(shù)據(jù)，特征和標(biāo)簽都是連續(xù)型數(shù)值。

正常情況，應(yīng)該是對于連續(xù)型數(shù)值標(biāo)簽的預(yù)測，我們采用回歸類模型，此處因?yàn)橄壬蓴?shù)據(jù)后進(jìn)行建模，因此我們稱可用于回歸模型訓(xùn)練的數(shù)據(jù)為回歸類模型數(shù)據(jù)，分類模型數(shù)據(jù)亦然。

數(shù)據(jù)生成

生成兩個特征、存在偏差，自變量和因變量存在線性關(guān)系的數(shù)據(jù)集

num_inputs = 2 # 兩個特征 num_examples = 1000 # 總共一千條數(shù)據(jù)

然后嘗試通過線性方程，確定自變量和因變量的真實(shí)關(guān)系

torch.manual_seed(420) # 設(shè)置隨機(jī)數(shù)種子 #<torch._C.Generator at 0x25d6ababcd0> # 線性方程系數(shù) w_true = torch.tensor([2., -1]).reshape(2, 1) b_true = torch.tensor(1.)# 特征和標(biāo)簽取值 features = torch.randn(num_examples, num_inputs) labels_true = torch.mm(features, w_true) + b_true labels = labels_true + torch.randn(size = labels_true.shape) * 0.01

此處設(shè)置所有的數(shù)據(jù)都是浮點(diǎn)型。

注意，此時labels_true和features滿足嚴(yán)格意義上的線性方程關(guān)系
$y = 2x_1-x_2+1$

但我們實(shí)際使用的標(biāo)簽labels，則是在labels_true的基礎(chǔ)上增添了一個擾動項(xiàng)，torch.randn(size = labels_true.shape) * 0.01，這其實(shí)也符合我們一般獲取數(shù)據(jù)的情況：真實(shí)客觀世界或許存在某個規(guī)律，但我們搜集到的數(shù)據(jù)往往會因?yàn)楦鞣N原因存在一定的誤差，無法完全描述真實(shí)世界的客觀規(guī)律，這其實(shí)也是模型誤差的來源之一（另一個誤差來源是模型本身捕獲規(guī)律的能力）。這其中， $y=2x_1-x_2+1$ 相當(dāng)于我們從上帝視角創(chuàng)建的數(shù)據(jù)真實(shí)服從的規(guī)律，而擾動項(xiàng)，則相當(dāng)于人為創(chuàng)造的獲取數(shù)據(jù)時的誤差。

這種按照某種規(guī)律生成數(shù)據(jù)、又人為添加擾動項(xiàng)的創(chuàng)建數(shù)據(jù)的方法，也是數(shù)學(xué)領(lǐng)域創(chuàng)建數(shù)據(jù)的一般方法。

數(shù)據(jù)探索

features[: 10] #tensor([[-0.0070, 0.5044], # [ 0.6704, -0.3829], # [ 0.0302, 0.3826], # [-0.5131, 0.7104], # [ 1.8092, 0.4352], # [ 2.6453, 0.2654], # [ 0.9235, -0.4376], # [ 2.0182, 1.3498], # [-0.2523, -0.0355], # [-0.0646, -0.5918]]) labels[: 10] #tensor([[ 0.4735], # [ 2.7285], # [ 0.6764], # [-0.7537], # [ 4.1722], # [ 6.0236], # [ 3.2936], # [ 3.6706], # [ 0.5282], # [ 1.4557]]) plt.subplot(121) plt.scatter(features[:, 0], labels) # 第一個特征和標(biāo)簽的關(guān)系 plt.subplot(122) plt.scatter(features[:, 1], labels) # 第二個特征和標(biāo)簽的關(guān)系

不難看出，兩個特征和標(biāo)簽都存在一定的線性關(guān)系，并且跟特征的系數(shù)絕對值有很大關(guān)系。當(dāng)然，若要增加線性模型的建模難度，可以增加擾動項(xiàng)的數(shù)值比例，從而削弱線性關(guān)系。

# 設(shè)置隨機(jī)數(shù)種子 torch.manual_seed(420) # 修改因變量 labels1 = labels_true + torch.randn(size = labels_true.shape) * 2# 可視化展示# 擾動較小的情況 plt.subplot(221) plt.scatter(features[:, 0], labels) # 第一個特征和標(biāo)簽的關(guān)系 plt.subplot(222) plt.plot(features[:, 1], labels, 'ro') # 第二個特征和標(biāo)簽的關(guān)系# 擾動較大的情況 plt.subplot(223) plt.scatter(features[:, 0], labels1) # 第一個特征和標(biāo)簽的關(guān)系 plt.subplot(224) plt.plot(features[:, 1], labels1, 'yo') # 第二個特征和標(biāo)簽的關(guān)系

當(dāng)然，我們也能生成非線性關(guān)系的數(shù)據(jù)集，此處我們創(chuàng)建滿足 $y=x^2+1$ 規(guī)律的數(shù)據(jù)集。

# 設(shè)置隨機(jī)數(shù)種子 torch.manual_seed(420) num_inputs = 2 # 兩個特征 num_examples = 1000 # 總共一千條數(shù)據(jù)# 線性方程系數(shù) w_true = torch.tensor(2.) b_true = torch.tensor(1.)# 特征和標(biāo)簽取值 features = torch.randn(num_examples, num_inputs) labels_true = torch.pow(features, 2) * w_true + b_true labels = labels_true + torch.randn(size = labels_true.shape) * 0.1# 可視化展示 plt.scatter(features, labels)

2.創(chuàng)建生成回歸類數(shù)據(jù)的函數(shù)

??為了方便后續(xù)使用，我們將上述過程封裝在一個函數(shù)內(nèi)

定義創(chuàng)建函數(shù)

def tensorGenReg(num_examples = 1000, w = [2, -1, 1], bias = True, delta = 0.01, deg = 1):"""回歸類數(shù)據(jù)集創(chuàng)建函數(shù)。:param num_examples: 創(chuàng)建數(shù)據(jù)集的數(shù)據(jù)量:param w: 包括截距的（如果存在）特征系數(shù)向量:param bias：是否需要截距:param delta：擾動項(xiàng)取值:param deg：方程次數(shù):return: 生成的特征張和標(biāo)簽張量"""if bias == True:num_inputs = len(w)-1 # 特征張量features_true = torch.randn(num_examples, num_inputs) # 不包含全是1的列的特征張量w_true = torch.tensor(w[:-1]).reshape(-1, 1).float() # 自變量系數(shù)b_true = torch.tensor(w[-1]).float() # 截距if num_inputs == 1: # 若輸入特征只有1個，則不能使用矩陣乘法labels_true = torch.pow(features_true, deg) * w_true + b_trueelse:labels_true = torch.mm(torch.pow(features_true, deg), w_true) + b_truefeatures = torch.cat((features_true, torch.ones(len(features_true), 1)), 1) # 在特征張量的最后添加一列全是1的列labels = labels_true + torch.randn(size = labels_true.shape) * delta else: num_inputs = len(w)features = torch.randn(num_examples, num_inputs)w_true = torch.tensor(w).reshape(-1, 1).float()if num_inputs == 1:labels_true = torch.pow(features, deg) * w_trueelse:labels_true = torch.mm(torch.pow(features, deg), w_true)labels = labels_true + torch.randn(size = labels_true.shape) * deltareturn features, labels

注：上述函數(shù)無法創(chuàng)建帶有交叉項(xiàng)的方程

測試函數(shù)性能

首先查看擾動項(xiàng)較小的時候的數(shù)據(jù)情況

# 設(shè)置隨機(jī)數(shù)種子 torch.manual_seed(420) # 擾動項(xiàng)取值為0.01 f, l = tensorGenReg(delta=0.01)f #tensor([[-0.0070, 0.5044, 1.0000], # [ 0.6704, -0.3829, 1.0000], # [ 0.0302, 0.3826, 1.0000], # ..., # [-0.9164, -0.6087, 1.0000], # [ 0.7815, 1.2865, 1.0000], # [ 1.4819, 1.1390, 1.0000]]) # 繪制圖像查看結(jié)果 plt.subplot(223) plt.scatter(f[:, 0], l) # 第一個特征和標(biāo)簽的關(guān)系 plt.subplot(224) plt.scatter(f[:, 1], l) # 第二個特征和標(biāo)簽的關(guān)系

然后查看擾動項(xiàng)較大時數(shù)據(jù)情況

# 設(shè)置隨機(jī)數(shù)種子 torch.manual_seed(420) # 擾動項(xiàng)取值為2 f, l = tensorGenReg(delta=2)# 繪制圖像查看結(jié)果 plt.subplot(223) plt.scatter(f[:, 0], l) # 第一個特征和標(biāo)簽的關(guān)系 plt.subplot(224) plt.scatter(f[:, 1], l) # 第二個特征和標(biāo)簽的關(guān)系

當(dāng)特征和標(biāo)簽滿足二階關(guān)系時候數(shù)據(jù)表現(xiàn)

# 設(shè)置隨機(jī)數(shù)種子 torch.manual_seed(420) # 2階方程 f, l = tensorGenReg(deg=2)# 繪制圖像查看結(jié)果 plt.subplot(223) plt.scatter(f[:, 0], l) # 第一個特征和標(biāo)簽的關(guān)系 plt.subplot(224) plt.scatter(f[:, 1], l) # 第二個特征和標(biāo)簽的關(guān)系

當(dāng)只有一個特征時數(shù)據(jù)表現(xiàn)

# 設(shè)置隨機(jī)數(shù)種子 torch.manual_seed(420) # 2階方程 f, l = tensorGenReg(w=[1], deg=2, bias=False) plt.scatter(f, l)

二、分類數(shù)據(jù)集創(chuàng)建方法

??和回歸模型的數(shù)據(jù)不同，分類模型數(shù)據(jù)的標(biāo)簽是離散值。

1.手動創(chuàng)建分類數(shù)據(jù)集

數(shù)據(jù)生成

在嘗試創(chuàng)建分類數(shù)據(jù)集之前，首先回顧torch.normal創(chuàng)建某種服從正態(tài)分布的隨機(jī)數(shù)的創(chuàng)建方法。

torch.randn(4, 2) #tensor([[ 1.4000, 0.3924], # [-0.0695, -1.7610], # [ 0.3227, 1.7285], # [-0.1107, -1.6273]]) torch.normal(4, 2, size=(10,2)) #tensor([[4.8092, 0.9773], # [4.4092, 3.3987], # [1.7446, 6.2281], # [3.0095, 4.2286], # [7.8873, 6.5354], # [3.9286, 4.0315], # [2.0309, 4.5259], # [3.6491, 0.7394], # [3.6549, 5.4767], # [8.5935, 3.0440]])

接下來嘗試創(chuàng)建一個擁有兩個特征的三分類的數(shù)據(jù)集，每個類別包含500條數(shù)據(jù)，并且第一個類別的兩個特征都服從均值為4、標(biāo)準(zhǔn)差為2的正態(tài)分布，第二個類別的兩個特征都服從均值為-2、標(biāo)準(zhǔn)差為2的正態(tài)分布，第三個類別的兩個特征都服從均值為-6、標(biāo)準(zhǔn)差為2的正態(tài)分布，創(chuàng)建過程如下:

# 設(shè)置隨機(jī)數(shù)種子 torch.manual_seed(420) # 創(chuàng)建初始標(biāo)記值 num_inputs = 2 num_examples = 500# 創(chuàng)建自變量簇 data0 = torch.normal(4, 2, size=(num_examples, num_inputs)) data1 = torch.normal(-2, 2, size=(num_examples, num_inputs)) data2 = torch.normal(-6, 2, size=(num_examples, num_inputs))# 創(chuàng)建標(biāo)簽 label0 = torch.zeros(500) label1 = torch.ones(500) label2 = torch.full_like(label1, 2)# 合并生成最終數(shù)據(jù) features = torch.cat((data0, data1, data2)).float() labels = torch.cat((label0, label1, label2)).long().reshape(-1, 1)

此處需要注意：

normal函數(shù)的均值參數(shù)位、標(biāo)準(zhǔn)差參數(shù)位都允許輸入高維數(shù)組，從而最終輸出結(jié)果也是形狀相同的高維數(shù)組；
一般來說，約定俗成的方式，是針對多分類問題，類別標(biāo)記從0開始依次遞增；
對于PyTorch來說，分類問題標(biāo)簽要求是默認(rèn)整型。
數(shù)據(jù)探索

features[: 10] #tensor([[3.9859, 5.0089], # [5.3407, 3.2343], # [4.0605, 4.7653], # [2.9738, 5.4208], # [7.6183, 4.8705], # [9.2907, 4.5307], # [5.8470, 3.1249], # [8.0364, 6.6997], # [3.4954, 3.9290], # [3.8709, 2.8165]]) labels[: 10] #tensor([[0], # [0], # [0], # [0], # [0], # [0], # [0], # [0], # [0], # [0]]) # 可視化展示 plt.scatter(features[:, 0], features[:, 1], c = labels)

能夠看出，類別彼此交叉情況較少，分類器在此數(shù)據(jù)集上會有不錯表現(xiàn)。當(dāng)然，若要增加分類器的分類難度，可以將各類的均值壓縮，并增加方差，從而增加從二維圖像上來看彼此交錯的情況。

# 設(shè)置隨機(jī)數(shù)種子 torch.manual_seed(420) # 創(chuàng)建初始標(biāo)記值 num_inputs = 2 num_examples = 500# 創(chuàng)建自變量簇 data0 = torch.normal(3, 2, size=(num_examples, num_inputs)) data1 = torch.normal(0, 2, size=(num_examples, num_inputs)) data2 = torch.normal(-3, 2, size=(num_examples, num_inputs))# 創(chuàng)建標(biāo)簽 label0 = torch.zeros(500) label1 = torch.ones(500) label2 = torch.full_like(label1, 2)# 合并生成最終數(shù)據(jù) features1 = torch.cat((data0, data1, data2)).float() labels1 = torch.cat((label0, label1, label2)).long().reshape(-1, 1)# 可視化展示 plt.subplot(121) plt.scatter(features[:, 0], features[:, 1], c = labels) plt.subplot(122) plt.scatter(features1[:, 0], features1[:, 1], c = labels1)

2.創(chuàng)建生成分類數(shù)據(jù)的函數(shù)

??同樣，我們將上述創(chuàng)建分類函數(shù)的過程封裝為一個函數(shù)。這里需要注意的是，我們希望找到一個變量可以控制數(shù)據(jù)整體離散程度，也就是后續(xù)建模的難以程度。這里我們規(guī)定，如果每個分類數(shù)據(jù)集中心點(diǎn)較近、且每個類別的點(diǎn)內(nèi)部方差較大，則數(shù)據(jù)集整體離散程度較高，反之離散程度較低。在實(shí)際函數(shù)創(chuàng)建過程中，我們也希望能夠找到對應(yīng)的參數(shù)能夠方便進(jìn)行自主調(diào)節(jié)。

定義創(chuàng)建函數(shù)

def tensorGenCla(num_examples = 500, num_inputs = 2, num_class = 3, deg_dispersion = [4, 2], bias = False):"""分類數(shù)據(jù)集創(chuàng)建函數(shù)。:param num_examples: 每個類別的數(shù)據(jù)數(shù)量:param num_inputs: 數(shù)據(jù)集特征數(shù)量:param num_class：數(shù)據(jù)集標(biāo)簽類別總數(shù):param deg_dispersion：數(shù)據(jù)分布離散程度參數(shù)，需要輸入一個列表，其中第一個參數(shù)表示每個類別數(shù)組均值的參考、第二個參數(shù)表示隨機(jī)數(shù)組標(biāo)準(zhǔn)差。:param bias：建立模型邏輯回歸模型時是否帶入截距:return: 生成的特征張量和標(biāo)簽張量，其中特征張量是浮點(diǎn)型二維數(shù)組，標(biāo)簽張量是長正型二維數(shù)組。"""cluster_l = torch.empty(num_examples, 1) # 每一類標(biāo)簽張量的形狀mean_ = deg_dispersion[0] # 每一類特征張量的均值的參考值std_ = deg_dispersion[1] # 每一類特征張量的方差lf = [] # 用于存儲每一類特征張量的列表容器ll = [] # 用于存儲每一類標(biāo)簽張量的列表容器k = mean_ * (num_class-1) / 2 # 每一類特征張量均值的懲罰因子（視頻中部分是+1，實(shí)際應(yīng)該是-1）for i in range(num_class):data_temp = torch.normal(i*mean_-k, std_, size=(num_examples, num_inputs)) # 生成每一類張量lf.append(data_temp) # 將每一類張量添加到lf中labels_temp = torch.full_like(cluster_l, i) # 生成類一類的標(biāo)簽ll.append(labels_temp) # 將每一類標(biāo)簽添加到ll中features = torch.cat(lf).float()labels = torch.cat(ll).long()if bias == True:features = torch.cat((features, torch.ones(len(features), 1)), 1) # 在特征張量中添加一列全是1的列return features, labels [0, 4, 8] -> [-4, 0, 4] -> [-1, 0 ,1] [0, 4, 8, 12, 16] -> [-8, -4, 0, 4, 8] -> [-2, -1, 0, 1, 2]

??函數(shù)整體結(jié)構(gòu)不復(fù)雜，且所使用的方法都是此前介紹過的tensor常用方法，唯一需要注意的是函數(shù)對于分布離散程度的控制。函數(shù)內(nèi)部變量k是一個隨著均值增加和分類類別數(shù)量增加而增加的數(shù)值，且分類數(shù)量增加對k值增加影響是通過和1取平均后進(jìn)行懲罰的結(jié)果。而i*mean_則是一個隨著i增加穩(wěn)步增量的量，二者相減最終能獲得一個整體特征均勻分布在0附近的特征張量。

測試函數(shù)性能

在使用函數(shù)的過程中，離散度的第一個數(shù)值可以理解為簇的大概分布區(qū)間，第二個數(shù)值可以理解為每個簇的離散程度。

# 設(shè)置隨機(jī)數(shù)種子 torch.manual_seed(420) # 創(chuàng)建數(shù)據(jù) f, l = tensorGenCla(deg_dispersion = [6, 2]) # 離散程度較小 f1, l1 = tensorGenCla(deg_dispersion = [6, 4]) # 離散程度較大# 繪制圖像查看 plt.subplot(121) plt.scatter(f[:, 0], f[:, 1], c = l) plt.subplot(122) plt.scatter(f1[:, 0], f1[:, 1], c = l1)

三、創(chuàng)建小批量切分函數(shù)

??在深度學(xué)習(xí)建模過程中，梯度下降是最常用的求解目標(biāo)函數(shù)的優(yōu)化方法，而針對不同類型、擁有不同函數(shù)特性的目標(biāo)函數(shù)，所使用的梯度下降算法也各有不同。目前為止，我們判斷小批量梯度下降（MBGD）是較為“普適”的優(yōu)化算法，它既擁有隨機(jī)梯度下降（SGD）的能夠跨越局部最小值點(diǎn)的特性，同時又和批量梯度下降（BGD）一樣，擁有相對較快的收斂速度（雖然速度略慢與BGD）。而在小批量梯度下降過程中，我們需要對函數(shù)進(jìn)行分批量的切分，因此，在手動實(shí)現(xiàn)各類深度學(xué)習(xí)基礎(chǔ)算法之前，我們需要定義數(shù)據(jù)集小批量切分的函數(shù)。

另外，后續(xù)講解的交叉驗(yàn)證計(jì)算過程也需要對數(shù)據(jù)進(jìn)行切分

shuffle過程：將原序列亂序排列

l = list(range(5)) l #[0, 1, 2, 3, 4] random.shuffle(l) l #[3, 2, 0, 1, 4]

批量切分函數(shù)的目標(biāo)就是根據(jù)設(shè)置的“批數(shù)”，將原數(shù)據(jù)集隨機(jī)均勻切分。可通過如下函數(shù)實(shí)現(xiàn)：

def data_iter(batch_size, features, labels):"""數(shù)據(jù)切分函數(shù):param batch_size: 每個子數(shù)據(jù)集包含多少數(shù)據(jù):param featurs: 輸入的特征張量:param labels：輸入的標(biāo)簽張量:return l：包含batch_size個列表，每個列表切分后的特征和標(biāo)簽所組成 """num_examples = len(features)indices = list(range(num_examples))random.shuffle(indices)l = []for i in range(0, num_examples, batch_size):j = torch.tensor(indices[i: min(i + batch_size, num_examples)])l.append([torch.index_select(features, 0, j), torch.index_select(labels, 0, j)])return lfor i in range(0, 5, 2):print(i) #0 #2 #4 # 設(shè)置隨機(jī)數(shù)種子 torch.manual_seed(420) # 生成二分類數(shù)據(jù)集 features, labels = tensorGenCla() features[:5] #tensor([[-4.0141, -2.9911], # [-2.6593, -4.7657], # [-3.9395, -3.2347], # [-5.0262, -2.5792], # [-0.3817, -3.1295]]) torch.tensor(l[0:2]) #tensor([3, 2]) torch.index_select(features, 0, torch.tensor(l[0:2])) #tensor([[-5.0262, -2.5792], # [-3.9395, -3.2347]]) labels #tensor([[0], # [0], # [0], # ..., # [2], # [2], # [2]]) l = data_iter(10, features, labels) l[0] # 查看切分后的第一個數(shù)據(jù)集 #[tensor([[ 0.7901, 2.4304], # [ 4.0788, 3.7885], # [-1.1552, -0.8829], # [ 1.3738, 2.3689], # [-2.1479, -6.6638], # [-2.5418, -7.9962], # [-1.0777, -0.7594], # [ 5.6215, 3.9071], # [ 3.5896, 3.3644], # [ 1.2458, 0.0179]]), # tensor([[1], # [2], # [1], # [1], # [0], # [0], # [1], # [2], # [2], # [1]])] plt.scatter(l[0][0][:, 0], l[0][0][:, 1], c = l[0][1])

此處又使用了空列表用于存儲數(shù)據(jù)。在經(jīng)典機(jī)器學(xué)習(xí)領(lǐng)域，我們經(jīng)常使用空的列表來存儲經(jīng)過處理之后的數(shù)據(jù)，這么做能讓我們非常清楚的看到數(shù)據(jù)的真實(shí)情況，但在深度學(xué)習(xí)領(lǐng)域，這么做卻不是常規(guī)操作。在深度學(xué)習(xí)領(lǐng)域，數(shù)據(jù)量往往非常大，甚至數(shù)據(jù)本身就是分布式存儲的，要調(diào)取數(shù)據(jù)進(jìn)行完整的查看，一方面會耗費(fèi)大量的存儲空間，另一方面也會消耗一定的算力，因此PyTorch的Dataset和DataLoader都是將數(shù)據(jù)進(jìn)行迭代存儲或者映射存儲。關(guān)于數(shù)據(jù)生成器大的相關(guān)內(nèi)容我們將在后續(xù)進(jìn)行討論，此處由于我們是進(jìn)行手動實(shí)驗(yàn)，將處理完的數(shù)據(jù)完整的存在列表容器中則是為了方便調(diào)用查看，是一種更適合初學(xué)者的方法。

四、Python模塊編寫

根據(jù)此前介紹的課程安排，本節(jié)定義的函數(shù)將后續(xù)課程中將經(jīng)常使用，因此需要將其封裝為一個模塊方便后續(xù)調(diào)用。封裝為模塊有以下幾種基本方法：

打開文本編輯器，將寫好并測試完成的函數(shù)寫入其中，并將文本的拓展名改寫為.py；
在spyder或者pycharm中復(fù)制相關(guān)函數(shù)，并保存為.py文件；

然后將文件保存在jupyter主目錄下，并取名為torchLearning，后續(xù)即可通過import torchLearning進(jìn)行調(diào)用。如果是jupyterlab用戶，也可按照如下方式進(jìn)行編寫：

Step 1.打開左側(cè)文件管理欄頁，點(diǎn)擊新建

Step 2.在新建目錄中，選擇Test File

Step 3.在打開的文本編輯器中輸入代碼
需要保存的函數(shù)有：

tensorGenReg函數(shù)
tensorGenCla函數(shù)
data_iter函數(shù)

Step 4.保存退出，并將文件名改寫為torchLearning.py

然后即可在其他ipy文件中調(diào)用，具體調(diào)用方法見下一節(jié)內(nèi)容。

總結(jié)

以上是生活随笔為你收集整理的Lesson 12.1 深度学习建模实验中数据集生成函数的创建与使用的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Lesson 11.1-11.5 梯度下
下一篇： Lesson 13.1 深度学习建模目标