日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【算法竞赛学习】气象海洋预测-Task4 模型建立之 TCNN+RNN

發布時間:2023/12/15 编程问答 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【算法竞赛学习】气象海洋预测-Task4 模型建立之 TCNN+RNN 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

氣象海洋預測-Task4 模型建立之 TCNN+RNN

該方案中采用的模型是TCNN+RNN。

在Task3中我們學習了CNN+LSTM模型,但是LSTM層的參數量較大,這就帶來以下問題:一是參數量大的模型在數據量小的情況下容易過擬合;二是為了盡量避免過擬合,在有限的數據集下我們無法構建更深的模型,難以挖掘到更豐富的信息。相較于LSTM,CNN的參數量只與過濾器的大小有關,在各類任務中往往都有不錯的表現,因此我們可以考慮同樣用卷積操作來挖掘時間信息。但是如果用三維卷積來同時挖掘時間和空間信息,假設使用的過濾器大小為(T_f, H_f, W_f),那么一層的參數量就是T_f×H_f×W_f,這樣的參數量仍然是比較大的。為了進一步降低每一層的參數,增加模型深度,我們本次學習的這個TOP方案對時間和空間分別進行卷積操作,即采用TCN單元挖掘時間信息,然后輸入CNN單元中挖掘空間信息,將TCN單元+CNN單元的串行結構稱為TCNN層,通過堆疊多層的TCNN層就可以交替地提取時間和空間信息。同時,考慮到不同時間尺度下的時空信息對預測結果的影響可能是不同的,該方案采用了三個RNN層來抽取三種時間尺度下的特征,將三者拼接起來通過全連接層預測Nino3.4指數。

學習目標

  • 學習TOP方案的模型構建方法
  • 內容介紹

  • 數據處理
    • 數據扁平化
    • 空值填充
    • 構造數據集
  • 模型構建
    • 構造評估函數
    • 模型構造
    • 模型訓練
    • 模型評估
  • 總結
  • 代碼示例

    數據處理

    該TOP方案的數據處理主要包括三部分:

  • 數據扁平化。
  • 空值填充。
  • 構造數據集
  • 在該方案中除了沒有構造新的特征外,其他數據處理方法都與Task3基本相同,因此不多做贅述。

    import netCDF4 as nc import random import os from tqdm import tqdm import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inlineimport torch from torch import nn, optim import torch.nn.functional as F from torch.utils.data import Dataset, DataLoaderfrom sklearn.metrics import mean_squared_error # 固定隨機種子 SEED = 22def seed_everything(seed=42):random.seed(seed)os.environ['PYTHONHASHSEED'] = str(seed)np.random.seed(seed)torch.manual_seed(seed)torch.cuda.manual_seed(seed)torch.backends.cudnn.deterministic = Trueseed_everything(SEED) # 查看GPU是否可用 train_on_gpu = torch.cuda.is_available()if not train_on_gpu:print('CUDA is not available. Training on CPU ...') else:print('CUDA is available! Training on GPU ...') CUDA is available! Training on GPU ... # 讀取數據# 存放數據的路徑 path = '/kaggle/input/ninoprediction/' soda_train = nc.Dataset(path + 'SODA_train.nc') soda_label = nc.Dataset(path + 'SODA_label.nc') cmip_train = nc.Dataset(path + 'CMIP_train.nc') cmip_label = nc.Dataset(path + 'CMIP_label.nc')

    數據扁平化

    采用滑窗構造數據集。

    def make_flatted(train_ds, label_ds, info, start_idx=0):keys = ['sst', 't300', 'ua', 'va']label_key = 'nino'# 年數years = info[1]# 模式數models = info[2]train_list = []label_list = []# 將同種模式下的數據拼接起來for model_i in range(models):blocks = []# 對每個特征,取每條數據的前12個月進行拼接for key in keys:block = train_ds[key][start_idx + model_i * years: start_idx + (model_i + 1) * years, :12].reshape(-1, 24, 72, 1).datablocks.append(block)# 將所有特征在最后一個維度上拼接起來train_flatted = np.concatenate(blocks, axis=-1)# 取12-23月的標簽進行拼接,注意加上最后一年的最后12個月的標簽(與最后一年12-23月的標簽共同構成最后一年前12個月的預測目標)label_flatted = np.concatenate([label_ds[label_key][start_idx + model_i * years: start_idx + (model_i + 1) * years, 12: 24].reshape(-1).data,label_ds[label_key][start_idx + (model_i + 1) * years - 1, 24: 36].reshape(-1).data], axis=0)train_list.append(train_flatted)label_list.append(label_flatted)return train_list, label_list soda_info = ('soda', 100, 1) cmip6_info = ('cmip6', 151, 15) cmip5_info = ('cmip5', 140, 17)soda_trains, soda_labels = make_flatted(soda_train, soda_label, soda_info) cmip6_trains, cmip6_labels = make_flatted(cmip_train, cmip_label, cmip6_info) cmip5_trains, cmip5_labels = make_flatted(cmip_train, cmip_label, cmip5_info, cmip6_info[1]*cmip6_info[2])# 得到扁平化后的數據維度為(模式數×序列長度×緯度×經度×特征數),其中序列長度=年數×12 np.shape(soda_trains), np.shape(cmip6_trains), np.shape(cmip5_trains) ((1, 1200, 24, 72, 4), (15, 1812, 24, 72, 4), (17, 1680, 24, 72, 4))

    空值填充

    將空值填充為0。

    # 填充SODA數據中的空值 soda_trains = np.array(soda_trains) soda_trains_nan = np.isnan(soda_trains) soda_trains[soda_trains_nan] = 0 print('Number of null in soda_trains after fillna:', np.sum(np.isnan(soda_trains))) Number of null in soda_trains after fillna: 0 # 填充CMIP6數據中的空值 cmip6_trains = np.array(cmip6_trains) cmip6_trains_nan = np.isnan(cmip6_trains) cmip6_trains[cmip6_trains_nan] = 0 print('Number of null in cmip6_trains after fillna:', np.sum(np.isnan(cmip6_trains))) Number of null in cmip6_trains after fillna: 0 # 填充CMIP5數據中的空值 cmip5_trains = np.array(cmip5_trains) cmip5_trains_nan = np.isnan(cmip5_trains) cmip5_trains[cmip5_trains_nan] = 0 print('Number of null in cmip6_trains after fillna:', np.sum(np.isnan(cmip5_trains))) Number of null in cmip6_trains after fillna: 0

    構造數據集

    構造訓練和驗證集。

    # 構造訓練集X_train = [] y_train = [] # 從CMIP5的17種模式中各抽取100條數據 for model_i in range(17):samples = np.random.choice(cmip5_trains.shape[1]-12, size=100)for ind in samples:X_train.append(cmip5_trains[model_i, ind: ind+12])y_train.append(cmip5_labels[model_i][ind: ind+24]) # 從CMIP6的15種模式種各抽取100條數據 for model_i in range(15):samples = np.random.choice(cmip6_trains.shape[1]-12, size=100)for ind in samples:X_train.append(cmip6_trains[model_i, ind: ind+12])y_train.append(cmip6_labels[model_i][ind: ind+24]) X_train = np.array(X_train) y_train = np.array(y_train) # 構造驗證集X_valid = [] y_valid = [] samples = np.random.choice(soda_trains.shape[1]-12, size=100) for ind in samples:X_valid.append(soda_trains[0, ind: ind+12])y_valid.append(soda_labels[0][ind: ind+24]) X_valid = np.array(X_valid) y_valid = np.array(y_valid) # 查看數據集維度 X_train.shape, y_train.shape, X_valid.shape, y_valid.shape ((3200, 12, 24, 72, 4), (3200, 24), (100, 12, 24, 72, 4), (100, 24)) # 保存數據集 np.save('X_train_sample.npy', X_train) np.save('y_train_sample.npy', y_train) np.save('X_valid_sample.npy', X_valid) np.save('y_valid_sample.npy', y_valid)

    模型構建

    這一部分我們來重點學習一下該方案的模型結構。

    # 讀取數據集 X_train = np.load('../input/ai-earth-task04-samples/X_train_sample.npy') y_train = np.load('../input/ai-earth-task04-samples/y_train_sample.npy') X_valid = np.load('../input/ai-earth-task04-samples/X_valid_sample.npy') y_valid = np.load('../input/ai-earth-task04-samples/y_valid_sample.npy') X_train.shape, y_train.shape, X_valid.shape, y_valid.shape ((3200, 12, 24, 72, 4), (3200, 24), (100, 12, 24, 72, 4), (100, 24)) # 構造數據管道 class AIEarthDataset(Dataset):def __init__(self, data, label):self.data = torch.tensor(data, dtype=torch.float32)self.label = torch.tensor(label, dtype=torch.float32)def __len__(self):return len(self.label)def __getitem__(self, idx):return self.data[idx], self.label[idx] batch_size = 32trainset = AIEarthDataset(X_train, y_train) trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True)validset = AIEarthDataset(X_valid, y_valid) validloader = DataLoader(validset, batch_size=batch_size, shuffle=True)

    構造評估函數

    def rmse(y_true, y_preds):return np.sqrt(mean_squared_error(y_pred = y_preds, y_true = y_true))# 評估函數 def score(y_true, y_preds):# 相關性技巧評分accskill_score = 0# RMSErmse_scores = 0a = [1.5] * 4 + [2] * 7 + [3] * 7 + [4] * 6y_true_mean = np.mean(y_true, axis=0)y_pred_mean = np.mean(y_preds, axis=0)for i in range(24):fenzi = np.sum((y_true[:, i] - y_true_mean[i]) * (y_preds[:, i] - y_pred_mean[i]))fenmu = np.sqrt(np.sum((y_true[:, i] - y_true_mean[i])**2) * np.sum((y_preds[:, i] - y_pred_mean[i])**2))cor_i = fenzi / fenmuaccskill_score += a[i] * np.log(i+1) * cor_irmse_score = rmse(y_true[:, i], y_preds[:, i])rmse_scores += rmse_scorereturn 2/3.0 * accskill_score - rmse_scores

    模型構造

    該TOP方案采用TCN單元+CNN單元串行組成TCNN層,通過堆疊多層的TCNN層來交替地提取時間和空間信息,并將提取到的時空信息用RNN來抽取出三種不同時間尺度的特征表達。

    • TCN單元

    TCN模型全稱時間卷積網絡(Temporal Convolutional Network),與RNN一樣是時序模型。TCN以CNN為基礎,為了適應序列問題,它從以下三方面做出了改進:

  • 因果卷積
  • TCN處理輸入與輸出等長的序列問題,它的每一個隱藏層節點數與輸入步長是相同的,并且隱藏層t時刻節點的值只依賴于前一層t時刻及之前節點的值。也就是說TCN通過追溯前因(t時刻及之前的值)來獲得當前結果,稱為因果卷積。

  • 擴張卷積
  • 傳統CNN的感受野受限于卷積核的大小,需要通過增加池化層來獲得更大的感受野,但是池化的操作會帶來信息的損失。為了解決這個問題,TCN采用擴張卷積來增大感受野,獲取更長時間的信息。擴張卷積對輸入進行間隔采樣,采樣間隔由擴張因子d控制,公式定義如下:
    F(s)=(X?df)(s)=∑i=0k?1f(i)×Xs?diF(s) = (X * df)(s) = \sum_{i=0}^{k-1} f(i) \times X_{s-di} F(s)=(X?df)(s)=i=0k?1?f(i)×Xs?di?
    其中X為當前層的輸入,k為當前層的卷積核大小,s為當前節點的時刻。也就是說,對于擴張因子為d、卷積核為k的隱藏層,對前一層的輸入每d個點采樣一次,共采樣k個點作為當前時刻s的輸入。這樣TCN的感受野就由卷積核的大小k和擴張因子d共同決定,可以獲取更長時間的依賴信息。

  • 殘差連接
  • 網絡的層數越多,所能提取到的特征就越豐富,但這也會帶來梯度消失或爆炸的問題,目前解決這個問題的一個有效方法就是殘差連接。TCN的殘差模塊包含兩層卷積操作,并且采用了WeightNorm和Dropout進行正則化,如下圖所示。

    總的來說,TCN是卷積操作在序列問題上的改進,具有CNN參數量少的優點,可以搭建更深層的網絡,相比于RNN不容易存在梯度消失和爆炸的問題,同時TCN具有靈活的感受野,能夠適應不同的任務,在許多數據集上的比較表明TCN比RNN、LSTM、GRU等序列模型有更好的表現。

    想要更深入地了解TCN可以參考以下鏈接:

    • 論文原文:https://arxiv.org/pdf/1803.01271.pdf
    • GitHub:https://github.com/locuslab/tcn

    該方案中所構建的TCN單元并不是標準的TCN層,它的結構如下圖所示,可以看到,這里的TCN單元只是用了一個卷積層,并且在卷積層前后都采用了BatchNormalization來提高模型的泛化能力。需要注意的是,這里的卷積操作是對時間維度進行操作,因此需要對輸入的形狀進行轉換,并且為了便于匹配之后的網絡層,需要將輸出的形狀轉換回輸入時的(N,T,C,H,W)的形式。

    # 構建TCN單元 class TCNBlock(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, padding):super().__init__()self.bn1 = nn.BatchNorm1d(in_channels)self.conv = nn.Conv1d(in_channels, out_channels, kernel_size, stride, padding)self.bn2 = nn.BatchNorm1d(out_channels)if in_channels == out_channels and stride == 1:self.res = lambda x: xelse:self.res = nn.Conv1d(in_channels, out_channels, kernel_size=1, stride=stride)def forward(self, x):# 轉換輸入形狀N, T, C, H, W = x.shapex = x.permute(0, 3, 4, 2, 1).contiguous()x = x.view(N*H*W, C, T)# 殘差res = self.res(x) res = self.bn2(res)x = F.relu(self.bn1(x))x = self.conv(x)x = self.bn2(x)x = x + res# 將輸出轉換回(N,T,C,H,W)的形式_, C_new, T_new = x.shapex = x.view(N, H, W, C_new, T_new)x = x.permute(0, 4, 3, 1, 2).contiguous()return x
    • CNN單元

    CNN單元結構與TCN單元相似,都只有一個卷積層,并且使用BatchNormalization來提高模型泛化能力。同時,類似TCN單元,CNN單元中也加入了殘差連接。結構如下圖所示:

    # 構建CNN單元 class CNNBlock(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, padding):super().__init__()self.bn1 = nn.BatchNorm2d(in_channels)self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)self.bn2 = nn.BatchNorm2d(out_channels)if (in_channels == out_channels) and (stride == 1):self.res = lambda x: xelse:self.res = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)def forward(self, x):# 轉換輸入形狀N, T, C, H, W = x.shapex = x.view(N*T, C, H, W)# 殘差res = self.res(x)res = self.bn2(res)x = F.relu(self.bn1(x))x = self.conv(x)x = self.bn2(x)x = x + res# 將輸出轉換回(N,T,C,H,W)的形式_, C_new, H_new, W_new = x.shapex = x.view(N, T, C_new, H_new, W_new)return x
    • TCNN層

    將TCN單元和CNN單元串行連接,就構成了一個TCNN層。

    class TCNNBlock(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride_tcn, stride_cnn, padding):super().__init__()self.tcn = TCNBlock(in_channels, out_channels, kernel_size, stride_tcn, padding)self.cnn = CNNBlock(out_channels, out_channels, kernel_size, stride_cnn, padding)def forward(self, x):x = self.tcn(x)x = self.cnn(x)return x
    • TCNN+RNN模型

    整體的模型結構如下圖所示:

  • TCNN部分
  • TCNN部分的模型結構類似傳統CNN的結構,非常規整,通過逐漸增加通道數來提取更豐富的特征表達。需要注意的是輸入數據的格式是(N,T,H,W,C),為了匹配卷積層的輸入格式,需要將數據格式轉換為(N,T,C,H,W)。

  • GAP層
  • GAP全稱為全局平均池化(Global Average Pooling)層,它的作用是把每個通道上的特征圖取全局平均,假設經過TCNN部分得到的輸出格式為(N,T,C,H,W),那么GAP層就會把每個通道上形狀為H×W的特征圖上的所有值求平均,最終得到的輸出格式就變成(N,T,C)。GAP層最早出現在論文《Network in Network》(論文原文:https://arxiv.org/pdf/1312.4400.pdf )中用于代替傳統CNN中的全連接層,之后的許多實驗證明GAP層確實可以提高CNN的效果。

    那么GAP層為什么可以代替全連接層呢?在傳統CNN中,經過多層卷積和池化的操作后,會由Flatten層將特征圖拉伸成一列,然后經過全連接層,那么對于形狀為(C,H,W)的一條數據,經Flatten層拉伸后的長度為C×H×W,此時假設全連接層節點數為U,全連接層的參數量就是C×H×W×U,這么大的參數量很容易使得模型過擬合。相比之下,GAP層不引入新的參數,因此可以有效減少過擬合問題,并且模型參數少也能加快訓練速度。另一方面,全連接層是一個黑箱子,我們很難解釋多分類的信息是怎樣傳回卷積層的,而GAP層就很容易理解,每個通道的值就代表了經過多層卷積操作后所提取出來的特征。更詳細的理解可以參考https://www.zhihu.com/question/373188099

    在Pytorch中沒有內置的GAP層,因此可以用adaptive_avg_pool2d來替代,這個函數可以將特征圖壓縮成給定的輸出形狀,將output_size參數設置為(1,1),就等同于GAP操作,函數的詳細使用方法可以參考https://pytorch.org/docs/stable/generated/torch.nn.functional.adaptive_avg_pool2d.html?highlight=adaptive_avg_pool2d#torch.nn.functional.adaptive_avg_pool2d

  • RNN部分
  • 至此為止我們所使用的都是長度為12的時間序列,每個時間步代表一個月的信息。不同尺度的時間序列所攜帶的信息是不盡相同的,比如用長度為6的時間序列來表達一年的SST值,那么每個時間步所代表的就是兩個月的SST信息,這種時間尺度下的SST序列與長度為12的SST序列所反映的一年中SST變化趨勢等信息就不完全相同。所以,為了盡可能全面地挖掘更多信息,該TOP方案中用MaxPool層來獲得三種不同時間尺度的序列,同時,用RNN層來抽取序列的特征表達。RNN非常適合用于線性序列的自動特征提取,例如對于形狀為(T,C1)的一條輸入數據,R經過節點數為C2的RNN層就能抽取出長度為C2的向量,由于RNN由前往后進行信息線性傳遞的網絡結構,抽取出的向量能夠很好地表達序列中的依賴關系。

    此時三種不同時間尺度的序列都抽取出了一個向量來表示,將向量拼接起來再經過一個全連接層就得到了24個月的預測序列。

    # 構造模型 class Model(nn.Module):def __init__(self):super().__init__()self.conv = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3)self.tcnn1 = TCNNBlock(64, 64, 3, 1, 1, 1)self.tcnn2 = TCNNBlock(64, 128, 3, 1, 2, 1)self.tcnn3 = TCNNBlock(128, 128, 3, 1, 1, 1)self.tcnn4 = TCNNBlock(128, 256, 3, 1, 2, 1)self.tcnn5 = TCNNBlock(256, 256, 3, 1, 1, 1)self.rnn = nn.RNN(256, 256, batch_first=True)self.maxpool = nn.MaxPool1d(2)self.fc = nn.Linear(256*3, 24)def forward(self, x):# 轉換輸入形狀N, T, H, W, C = x.shapex = x.permute(0, 1, 4, 2, 3).contiguous()x = x.view(N*T, C, H, W)# 經過一個卷積層x = self.conv(x)_, C_new, H_new, W_new = x.shapex = x.view(N, T, C_new, H_new, W_new)# TCNN部分for i in range(3):x = self.tcnn1(x)x = self.tcnn2(x)for i in range(2):x = self.tcnn3(x)x = self.tcnn4(x)for i in range(2):x = self.tcnn5(x)# 全局平均池化x = F.adaptive_avg_pool2d(x, (1, 1)).squeeze()# RNN部分,分別得到長度為T、T/2、T/4三種時間尺度的特征表達,注意轉換RNN層輸出的格式hidden_state = []for i in range(3):x, h = self.rnn(x)h = h.squeeze()hidden_state.append(h)x = self.maxpool(x.transpose(1, 2)).transpose(1, 2)x = torch.cat(hidden_state, dim=1)x = self.fc(x)return x model = Model() print(model) Model((conv): Conv2d(4, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))(tcnn1): TCNNBlock((tcn): TCNBlock((bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv): Conv1d(64, 64, kernel_size=(3,), stride=(1,), padding=(1,))(bn2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))(cnn): CNNBlock((bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)))(tcnn2): TCNNBlock((tcn): TCNBlock((bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv): Conv1d(64, 128, kernel_size=(3,), stride=(1,), padding=(1,))(bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(res): Conv1d(64, 128, kernel_size=(1,), stride=(1,)))(cnn): CNNBlock((bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(res): Conv2d(128, 128, kernel_size=(1, 1), stride=(2, 2))))(tcnn3): TCNNBlock((tcn): TCNBlock((bn1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv): Conv1d(128, 128, kernel_size=(3,), stride=(1,), padding=(1,))(bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))(cnn): CNNBlock((bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)))(tcnn4): TCNNBlock((tcn): TCNBlock((bn1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv): Conv1d(128, 256, kernel_size=(3,), stride=(1,), padding=(1,))(bn2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(res): Conv1d(128, 256, kernel_size=(1,), stride=(1,)))(cnn): CNNBlock((bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(res): Conv2d(256, 256, kernel_size=(1, 1), stride=(2, 2))))(tcnn5): TCNNBlock((tcn): TCNBlock((bn1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,))(bn2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))(cnn): CNNBlock((bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)))(rnn): RNN(256, 256, batch_first=True)(maxpool): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)(fc): Linear(in_features=768, out_features=24, bias=True) )

    模型訓練

    # 采用RMSE作為損失函數 def RMSELoss(y_pred,y_true):loss = torch.sqrt(torch.mean((y_pred-y_true)**2, dim=0)).sum()return loss model_weights = './task04_model_weights.pth' device = 'cuda' if torch.cuda.is_available() else 'cpu' model = Model().to(device) criterion = RMSELoss optimizer = optim.Adam(model.parameters(), lr=1e-3) epochs = 10 train_losses, valid_losses = [], [] scores = [] best_score = float('-inf') preds = np.zeros((len(y_valid),24))for epoch in range(epochs):print('Epoch: {}/{}'.format(epoch+1, epochs))# 模型訓練model.train()losses = 0for data, labels in tqdm(trainloader):data = data.to(device)labels = labels.to(device)optimizer.zero_grad()pred = model(data)loss = criterion(pred, labels)losses += loss.cpu().detach().numpy()loss.backward()optimizer.step()train_loss = losses / len(trainloader)train_losses.append(train_loss)print('Training Loss: {:.3f}'.format(train_loss))# 模型驗證model.eval()losses = 0with torch.no_grad():for i, data in tqdm(enumerate(validloader)):data, labels = datadata = data.to(device)labels = labels.to(device)pred = model(data)loss = criterion(pred, labels)losses += loss.cpu().detach().numpy()preds[i*batch_size:(i+1)*batch_size] = pred.detach().cpu().numpy()valid_loss = losses / len(validloader)valid_losses.append(valid_loss)print('Validation Loss: {:.3f}'.format(valid_loss))s = score(y_valid, preds)scores.append(s)print('Score: {:.3f}'.format(s))# 保存最佳模型權重if s > best_score:best_score = scheckpoint = {'best_score': s,'state_dict': model.state_dict()}torch.save(checkpoint, model_weights) Epoch: 1/10100%|██████████| 100/100 [00:51<00:00, 1.95it/s]Training Loss: 18.0994it [00:00, 6.19it/s]Validation Loss: 16.756 Score: -4.320 Epoch: 2/10100%|██████████| 100/100 [00:45<00:00, 2.17it/s]Training Loss: 16.9554it [00:00, 6.41it/s]Validation Loss: 17.657 Score: -32.332 Epoch: 3/10100%|██████████| 100/100 [00:45<00:00, 2.18it/s]Training Loss: 16.6394it [00:00, 6.29it/s]Validation Loss: 19.156 Score: -25.483 Epoch: 4/10100%|██████████| 100/100 [00:45<00:00, 2.17it/s]Training Loss: 16.1734it [00:00, 6.29it/s]Validation Loss: 18.130 Score: -15.470 Epoch: 5/10100%|██████████| 100/100 [00:45<00:00, 2.17it/s]Training Loss: 15.8184it [00:00, 6.28it/s]Validation Loss: 17.367 Score: -14.745 Epoch: 6/10100%|██████████| 100/100 [00:46<00:00, 2.17it/s]Training Loss: 15.4644it [00:00, 6.28it/s]Validation Loss: 18.289 Score: -4.441 Epoch: 7/10100%|██████████| 100/100 [00:46<00:00, 2.17it/s]Training Loss: 15.1754it [00:00, 6.26it/s]Validation Loss: 18.604 Score: -21.144 Epoch: 8/10100%|██████████| 100/100 [00:46<00:00, 2.17it/s]Training Loss: 15.0044it [00:00, 6.27it/s]Validation Loss: 18.593 Score: -27.508 Epoch: 9/10100%|██████████| 100/100 [00:46<00:00, 2.17it/s]Training Loss: 14.5784it [00:00, 6.28it/s]Validation Loss: 18.264 Score: -19.113 Epoch: 10/10100%|██████████| 100/100 [00:46<00:00, 2.17it/s]Training Loss: 14.3304it [00:00, 6.27it/s]Validation Loss: 17.739 Score: -18.628 # 繪制訓練/驗證曲線 def training_vis(train_losses, valid_losses):# 繪制損失函數曲線fig = plt.figure(figsize=(8,4))# subplot lossax1 = fig.add_subplot(121)ax1.plot(train_losses, label='train_loss')ax1.plot(valid_losses,label='val_loss')ax1.set_xlabel('Epochs')ax1.set_ylabel('Loss')ax1.set_title('Loss on Training and Validation Data')ax1.legend()plt.tight_layout() training_vis(train_losses, valid_losses)

    模型評估

    在測試集上評估模型效果。

    # 加載最佳模型權重 checkpoint = torch.load('../input/ai-earth-model-weights/task04_model_weights.pth') model = Model() model.load_state_dict(checkpoint['state_dict']) <All keys matched successfully> # 測試集路徑 test_path = '../input/ai-earth-tests/' # 測試集標簽路徑 test_label_path = '../input/ai-earth-tests-labels/' import os# 讀取測試數據和測試數據的標簽 files = os.listdir(test_path) X_test = [] y_test = [] for file in files:X_test.append(np.load(test_path + file))y_test.append(np.load(test_label_path + file)) X_test = np.array(X_test) y_test = np.array(y_test) X_test.shape, y_test.shape ((103, 12, 24, 72, 4), (103, 24)) testset = AIEarthDataset(X_test, y_test) testloader = DataLoader(testset, batch_size=batch_size, shuffle=False) # 在測試集上評估模型效果 model.eval() model.to(device) preds = np.zeros((len(y_test),24)) for i, data in tqdm(enumerate(testloader)):data, labels = datadata = data.to(device)labels = labels.to(device)pred = model(data)preds[i*batch_size:(i+1)*batch_size] = pred.detach().cpu().numpy() s = score(y_test, preds) print('Score: {:.3f}'.format(s)) 4it [00:00, 12.75it/s]Score: 20.274

    總結

    • 該方案充分考慮到數據量小、特征少的數據情況,對時間和空間分別進行卷積操作,交替地提取時間和空間信息,用GAP層對提取的信息進行降維,盡可能減少每一層的參數量、增加模型層數以提取更豐富的特征。
    • 該方案考慮到不同時間尺度序列所攜帶的信息不同,用池化層變換時間尺度,并用RNN進行信息提取,綜合三種不同時間尺度的序列信息得到最終的預測序列。
    • 該方案同樣選擇了自己設計模型,在構造模型時充分考慮了數據集情況和問題背景,并能靈活運用各種網絡層來處理特定問題,這種模型構造思路要求對不同網絡層的作用有較為深刻地理解,方案中各種網絡層的用法值得大家學習和借鑒。

    參考文獻

  • Top1思路分享:https://tianchi.aliyun.com/forum/postDetail?spm=5176.12586969.1002.6.561d482cp7CFlx&postId=210391
  • 總結

    以上是生活随笔為你收集整理的【算法竞赛学习】气象海洋预测-Task4 模型建立之 TCNN+RNN的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。