日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

DGL教程【一】使用Cora数据集进行分类

發(fā)布時間:2024/9/18 编程问答 43 豆豆
生活随笔 收集整理的這篇文章主要介紹了 DGL教程【一】使用Cora数据集进行分类 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

本教程將演示如何構(gòu)建一個基于半監(jiān)督的節(jié)點分類任務(wù)的GNN網(wǎng)絡(luò),任務(wù)基于一個小數(shù)據(jù)集Cora,這是一個將論文作為節(jié)點,引用關(guān)系作為邊的網(wǎng)絡(luò)結(jié)構(gòu)。

任務(wù)就是預(yù)測一個論文的所屬分類。每一個論文包含一個詞頻信息作為屬性特征。

首先安裝dgl

pip install dgl -i https://pypi.douban.com/simple/

加載Cora數(shù)據(jù)集

import dgl.datadataset = dgl.data.CoraGraphDataset() print('Number of categories:', dataset.num_classes)

這樣會自動下載Cora數(shù)據(jù)集到Extracting file to C:\Users\vincent\.dgl\cora_v2\目錄下,輸出結(jié)果如下:

Downloading C:\Users\vincent\.dgl\cora_v2.zip from https://data.dgl.ai/dataset/cora_v2.zip... Extracting file to C:\Users\vincent\.dgl\cora_v2 Finished data loading and preprocessing.NumNodes: 2708NumEdges: 10556NumFeats: 1433NumClasses: 7NumTrainingSamples: 140NumValidationSamples: 500NumTestSamples: 1000 Done saving data into cached files. Number of categories: 7

一個DGL數(shù)據(jù)集可能包含多個Graph,但是Cora數(shù)據(jù)集僅包含一個Graph:

g = dataset[0]

一個DGL圖可以通過字典的形式存儲節(jié)點的屬性ndata和邊的屬性edata。在DGL Cora數(shù)據(jù)集中,graph包含下面幾個節(jié)點特征:

  • train_mask:一個bool 類型的tensor,表示一個節(jié)點是不是屬于training set
  • val_mask: 一個bool 類型的tensor,表示一個節(jié)點是不是屬于validation set
  • test_mask:一個bool 類型的tensor,表示一個節(jié)點是不是屬于test set
  • label:節(jié)點的分類標(biāo)簽
  • feat:節(jié)點的屬性
print('Node features') print(g.ndata) print('Edge features') print(g.edata)

輸出結(jié)果:

Node features {'feat': tensor([[0., 0., 0., ..., 0., 0., 0.],[0., 0., 0., ..., 0., 0., 0.],[0., 0., 0., ..., 0., 0., 0.],...,[0., 0., 0., ..., 0., 0., 0.],[0., 0., 0., ..., 0., 0., 0.],[0., 0., 0., ..., 0., 0., 0.]]), 'label': tensor([3, 4, 4, ..., 3, 3, 3]), 'test_mask': tensor([False, False, False, ..., True, True, True]), 'train_mask': tensor([ True, True, True, ..., False, False, False]), 'val_mask': tensor([False, False, False, ..., False, False, False])} Edge features {}

定義一個GNN網(wǎng)絡(luò)

我們將構(gòu)建一個兩層的GCN網(wǎng)絡(luò),每一層通過聚合鄰居信息來計算一個節(jié)點表示。

為了構(gòu)建這樣一個多層的GCN,我們可以簡單的堆疊dgl.nn.GraphConv模塊,這個模塊繼承了torch.nn.Module。

import torch import torch.nn as nn import dgl.data from dgl.nn.pytorch import GraphConv import torch.nn.functional as Fdataset = dgl.data.CoraGraphDataset() print('Number of categories:', dataset.num_classes) g = dataset[0] print('Node features') print(g.ndata) print('Edge features') print(g.edata)class GCN(nn.Module):def __init__(self, in_feats, h_feats, num_classes):super(GCN, self).__init__()self.conv1 = GraphConv(in_feats, h_feats)self.conv2 = GraphConv(h_feats, num_classes)def forward(self, g, in_feat):h = self.conv1(g, in_feat)h = F.relu(h)h = self.conv2(g, h)return h# Create the model with given dimensions model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes) print(model)

DGL實現(xiàn)了很多當(dāng)下流行的聚合鄰居的模塊,我們可以只用一行代碼就可以使用。

訓(xùn)練GCN

使用DGL訓(xùn)練GCN與訓(xùn)練其他Pytorch神經(jīng)網(wǎng)絡(luò)過程類似:

import torch import torch.nn as nn import dgl.data from dgl.nn.pytorch import GraphConv import torch.nn.functional as Fdataset = dgl.data.CoraGraphDataset() print('Number of categories:', dataset.num_classes) g = dataset[0] print('Node features') print(g.ndata) print('Edge features') print(g.edata)class GCN(nn.Module):def __init__(self, in_feats, h_feats, num_classes):super(GCN, self).__init__()self.conv1 = GraphConv(in_feats, h_feats)self.conv2 = GraphConv(h_feats, num_classes)def forward(self, g, in_feat):h = self.conv1(g, in_feat)h = F.relu(h)h = self.conv2(g, h)return hdef train(g, model):optimizer = torch.optim.Adam(model.parameters(), lr=0.01)best_val_acc = 0best_test_acc = 0features = g.ndata['feat']labels = g.ndata['label']train_mask = g.ndata['train_mask']val_mask = g.ndata['val_mask']test_mask = g.ndata['test_mask']for e in range(100):# Forwardlogits = model(g, features)# Compute predictionpred = logits.argmax(1)# Compute loss# Note that you should only compute the losses of the nodes in the training set.loss = F.cross_entropy(logits[train_mask], labels[train_mask])# Compute accuracy on training/validation/testtrain_acc = (pred[train_mask] == labels[train_mask]).float().mean()val_acc = (pred[val_mask] == labels[val_mask]).float().mean()test_acc = (pred[test_mask] == labels[test_mask]).float().mean()# Save the best validation accuracy and the corresponding test accuracy.if best_val_acc < val_acc:best_val_acc = val_accbest_test_acc = test_acc# Backwardoptimizer.zero_grad()loss.backward()optimizer.step()if e % 5 == 0:print('In epoch {}, loss: {:.3f}, val acc: {:.3f} (best {:.3f}), test acc: {:.3f} (best {:.3f})'.format(e, loss, val_acc, best_val_acc, test_acc, best_test_acc))# Create the model with given dimensions model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes) print(model) train(g, model)

輸出結(jié)果:

In epoch 0, loss: 1.946, val acc: 0.134 (best 0.134), test acc: 0.138 (best 0.138) In epoch 5, loss: 1.892, val acc: 0.506 (best 0.522), test acc: 0.499 (best 0.539) In epoch 10, loss: 1.806, val acc: 0.600 (best 0.612), test acc: 0.633 (best 0.636) In epoch 15, loss: 1.698, val acc: 0.594 (best 0.612), test acc: 0.626 (best 0.636) In epoch 20, loss: 1.567, val acc: 0.632 (best 0.632), test acc: 0.653 (best 0.653) In epoch 25, loss: 1.417, val acc: 0.712 (best 0.712), test acc: 0.700 (best 0.700) In epoch 30, loss: 1.251, val acc: 0.738 (best 0.738), test acc: 0.737 (best 0.737) In epoch 35, loss: 1.079, val acc: 0.746 (best 0.746), test acc: 0.751 (best 0.751) In epoch 40, loss: 0.909, val acc: 0.746 (best 0.748), test acc: 0.758 (best 0.756) In epoch 45, loss: 0.751, val acc: 0.738 (best 0.748), test acc: 0.766 (best 0.756) In epoch 50, loss: 0.612, val acc: 0.744 (best 0.748), test acc: 0.767 (best 0.756) In epoch 55, loss: 0.494, val acc: 0.752 (best 0.752), test acc: 0.773 (best 0.773) In epoch 60, loss: 0.399, val acc: 0.762 (best 0.762), test acc: 0.776 (best 0.776) In epoch 65, loss: 0.322, val acc: 0.762 (best 0.766), test acc: 0.776 (best 0.776) In epoch 70, loss: 0.262, val acc: 0.764 (best 0.768), test acc: 0.778 (best 0.775) In epoch 75, loss: 0.215, val acc: 0.766 (best 0.768), test acc: 0.778 (best 0.775) In epoch 80, loss: 0.178, val acc: 0.766 (best 0.768), test acc: 0.779 (best 0.775) In epoch 85, loss: 0.149, val acc: 0.766 (best 0.768), test acc: 0.780 (best 0.775) In epoch 90, loss: 0.126, val acc: 0.768 (best 0.768), test acc: 0.779 (best 0.775) In epoch 95, loss: 0.107, val acc: 0.768 (best 0.768), test acc: 0.776 (best 0.775)

在GPU上進(jìn)行訓(xùn)練

在GPU上訓(xùn)練需要將模型和數(shù)據(jù)通過to()方法放到GPU上:

g = g.to('cuda') model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes).to('cuda') train(g, model)

總結(jié)

以上是生活随笔為你收集整理的DGL教程【一】使用Cora数据集进行分类的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。