當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门]（7）

發布時間：2023/12/15 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门]（7）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

【論文閱讀】A Gentle Introduction to Graph Neural Networks [圖神經網絡入門]（7）

Into the Weeds
- Other types of graphs (multigraphs, hypergraphs, hypernodes, hierarchical graphs)
- Sampling Graphs and Batching in GNNs
- Inductive biases
- Comparing aggregation operations
- GCN as subgraph function approximators
- Edges and the Graph Dual
- Graph convolutions as matrix multiplications, and matrix multiplications as walks on a graph
- Graph Attention Networks
- Graph explanations and attributions
- Generative modelling
Final thoughts
- 參考文獻
Citation

Into the Weeds

擴展

Next, we have a few sections on a myriad of graph-related topics that are relevant for GNNs.
接下來，我們將用幾節來討論與GNN相關的一些圖相關模型。

Other types of graphs (multigraphs, hypergraphs, hypernodes, hierarchical graphs)

其他類型的圖(多重圖、超圖、超節點、層次圖)

While we only described graphs with vectorized information for each attribute, graph structures are more flexible and can accommodate other types of information. Fortunately, the message passing framework is flexible enough that often adapting GNNs to more complex graph structures is about defining how information is passed and updated by new graph attributes.
對于每個屬性而言，我們雖然只使用向量化信息來描述圖，但是實際上圖結構更加靈活，并且可以容納其他類型的信息。幸運的是，消息傳遞框架足夠靈活，通常需要將GNN調整為更復雜的圖結構，即定義信息如何通過新的圖屬性傳遞和更新。

For example, we can consider multi-edge graphs or multigraphs [32] where a pair of nodes can share multiple types of edges, this happens when we want to model the interactions between nodes differently based on their type. For example with a social network, we can specify edge types based on the type of relationships (acquaintance, friend, family). A GNN can be adapted by having different types of message passing steps for each edge type. We can also consider nested graphs, where for example a node represents a graph, also called a hypernode graph. [33] Nested graphs are useful for representing hierarchical information. For example, we can consider a network of molecules, where a node represents a molecule and an edge is shared between two molecules if we have a way (reaction) of transforming one to the other [34] [35] . In this case, we can learn on a nested graph by having a GNN that learns representations at the molecule level and another at the reaction network level, and alternate between them during training.
例如，我們可以考慮多邊圖或多重圖[32]，其中一對節點可以共享多種類型的邊，當我們希望根據節點的類型對它們之間的交互進行不同的建模時，就會發生這種情況。例如，對于社交網絡，我們可以根據關系類型(熟人、朋友、家人)指定邊的類型。對于每種類型的邊，可以使用不同類型的消息傳遞步驟來調整GNN。我們也可以考慮嵌套圖，例如一個節點表示一個圖，也稱為超節點圖。[33] 嵌套圖對于表示層次信息很有用。例如，我們可以考慮一個分子圖模型，其中一個節點代表一個分子，如果我們有一種方法(反應)將一個分子轉化為另一個分子，則兩個分子共享一條邊 [34] [35]。在這種情況下，我們可以通過一個GNN在分子層面和反應層面級對分子圖進行學習表示，并在訓練期間交替學習表示，從而在嵌套圖上學習。

Another type of graph is a hypergraph [36] , where an edge can be connected to multiple nodes instead of just two. For a given graph, we can build a hypergraph by identifying communities of nodes and assigning a hyper-edge that is connected to all nodes in a community.
另一種類型的圖是超圖 [36]，其中一條邊可以連接到多個節點，而不僅僅是兩個節點。對于給定的圖，我們可以通過識別節點社區并分配連接到社區中所有節點的超邊（可以連接到多個節點的邊）來構建超圖。

圖1 一些復雜圖的示意圖。在左邊，我們有一個多重圖的例子，它有三種邊類型，包括有向邊。在右邊我們有一個三層的層次圖，中間層的節點是超節點。

How to train and design GNNs that have multiple types of graph attributes is a current area of research [37] , [38] .
如何訓練和設計具有多種圖屬性類型的GNN是當前研究的一個領域，[37]，[38]。

Sampling Graphs and Batching in GNNs

GNN中的采樣圖和批處理

A common practice for training neural networks is to update network parameters with gradients calculated on randomized constant size (batch size) subsets of the training data (mini-batches). This practice presents a challenge for graphs due to the variability in the number of nodes and edges adjacent to each other, meaning that we cannot have a constant batch size. The main idea for batching with graphs is to create subgraphs that preserve essential properties of the larger graph. This graph sampling operation is highly dependent on context and involves sub-selecting nodes and edges from a graph. These operations might make sense in some contexts (citation networks) and in others, these might be too strong of an operation (molecules, where a subgraph simply represents a new, smaller molecule). How to sample a graph is an open research question. [39] If we care about preserving structure at a neighborhood level, one way would be to randomly sample a uniform number of nodes, our node-set. Then add neighboring nodes of distance k adjacent to the node-set, including their edges. [40] Each neighborhood can be considered an individual graph and a GNN can be trained on batches of these subgraphs. The loss can be masked to only consider the node-set since all neighboring nodes would have incomplete neighborhoods. A more efficient strategy might be to first randomly sample a single node, expand its neighborhood to distance k, and then pick the other node within the expanded set. These operations can be terminated once a certain amount of nodes, edges, or subgraphs are constructed. If the context allows, we can build constant size neighborhoods by picking an initial node-set and then sub-sampling a constant number of nodes (e.g randomly, or via a random walk or Metropolis algorithm [41] ).
訓練神經網絡的一種常見做法是用訓練數據的隨機常數大小(批量大小)子集(小批量)計算出的梯度來更新網絡參數。由于相鄰的節點和邊的數量的可變性，這一實踐給圖帶來了挑戰，這意味著我們不能有一個恒定的批處理大小。對圖進行批處理的主要思想是創建保留較大圖的基本屬性的子圖。這種圖采樣操作高度依賴于上下文，涉及從圖中選擇子節點和邊。這些操作可能在某些情況下(引用網絡)是有意義的，而在其他情況下，這些操作可能過于強大(分子，其中一個子圖只是表示一個新的、更小的分子)。如何對圖表進行抽樣是一個開放性的研究問題。[39] 如果我們想在鄰域水平上保持結構，一種方法是隨機抽樣一個均勻數目的節點，我們的節點集。然后添加與節點集相鄰的距離為k的相鄰節點，包括它們的邊。[40] 每個鄰域可以被視為一個單獨的圖，并且可以在這些子圖的批量上訓練GNN。由于所有的鄰近節點都有不完整的鄰域，因此損失可以被掩蓋，只考慮節點集。一個更有效的策略可能是，首先對單個節點進行隨機抽樣，將其鄰域擴展到距離k，然后在擴展的集合中選擇另一個節點。一旦構造了一定數量的節點、邊或子圖，這些操作就可以終止。如果環境允許，我們可以通過選擇一個初始節點集，然后對一個恒定數量的節點進行子抽樣(例如隨機抽樣，或通過隨機遍歷或Metropolis算法[41])來構建恒定大小的鄰域。

圖2 對同一個圖進行四種不同的采樣。抽樣策略的選擇在很大程度上取決于上下文，因為它們會生成不同的圖統計分布(#節點，#邊，等等)。對于高度連通的圖，邊也可以下采樣。

有關其中一些采樣方法法中如果小伙伴們不太懂可以參考相關采樣算法

Sampling a graph is particularly relevant when a graph is large enough that it cannot be fit in memory. Inspiring new architectures and training strategies such as Cluster-GCN [42] and GraphSaint [43] . We expect graph datasets to continue growing in size in the future.
當一個圖大到無法裝入內存時，對圖進行采樣尤為重要。啟發新的架構和訓練策略，如Cluster-GCN [42] 和GraphSaint [43]。我們預計未來圖的數據集的規模將繼續增長。

Inductive biases

歸納偏差

When building a model to solve a problem on a specific kind of data, we want to specialize our models to leverage the characteristics of that data. When this is done successfully, we often see better predictive performance, lower training time, fewer parameters and better generalization.
在構建模型以解決特定類型數據上的問題時，我們希望將模型特殊化，以適應該數據的特征。當這一點成功地完成時，我們通常會看到更好的預測性能，更低的訓練時間，更少的參數和更好的泛化。

When labeling on images, for example, we want to take advantage of the fact that a dog is still a dog whether it is in the top-left or bottom-right corner of an image. Thus, most image models use convolutions, which are translation invariant. For text, the order of the tokens is highly important, so recurrent neural networks process data sequentially. Further, the presence of one token (e.g. the word ‘not’) can affect the meaning of the rest of a sentence, and so we need components that can ‘attend’ to other parts of the text, which transformer models like BERT and GPT-3 can do. These are some examples of inductive biases, where we are identifying symmetries or regularities in the data and adding modelling components that take advantage of these properties.
例如，當在圖像上做標簽時，我們希望利用這樣一個事實，即無論在圖像的左上角還是右下角，一只狗仍然是一只狗。因此，大多數圖像模型使用卷積，因為它是平移不變的。對于文本，符號的順序非常重要，因此循環神經網絡適合處理順序數據。此外，一個標記(例如單詞“not”)的出現會影響句子其余部分的意思，因此我們需要能夠“注意”文本其他部分的組成，這是像BERT和GPT-3這樣的轉換模型可以做到的。這是一些歸納偏差的例子，我們在其中識別數據中的對稱性或規律性，并添加利用這些屬性的建模組成部分。

In the case of graphs, we care about how each graph component (edge, node, global) is related to each other so we seek models that have a relational inductive bias. [19] A model should preserve explicit relationships between entities (adjacency matrix) and preserve graph symmetries (permutation invariance). We expect problems where the interaction between entities is important will benefit from a graph structure. Concretely, this means designing transformation on sets: the order of operation on nodes or edges should not matter and the operation should work on a variable number of inputs.
在圖的情況下，我們關心每個圖組件(邊、節點、全局)是如何相互關聯的，因此我們尋找具有關系歸納偏向的模型。[19] 模型應保持實體之間的顯式關系(鄰接矩陣)，并保持圖的對稱性(置換不變性)。我們期望實體之間的交互很重要的問題將從圖結構中受益。具體地說，這意味著在集合上設計變換: 節點或邊的操作順序不重要，操作可以在一個可變數量的輸入上進行。

Comparing aggregation operations

比較聚合操作

Pooling information from neighboring nodes and edges is a critical step in any reasonably powerful GNN architecture. Because each node has a variable number of neighbors, and because we want a differentiable method of aggregating this information, we want to use a smooth aggregation operation that is invariant to node ordering and the number of nodes provided.
在任何功能相當強大的GNN模型結構中，集中來自鄰近節點和邊的信息都是一個關鍵步驟。由于每個節點的鄰居數量是可變的，而且我們需要一種可微分的方法來聚合這些信息，因此我們希望使用一種平滑的聚合操作，該操作對節點順序和所提供的節點數量是不變的。

Selecting and designing optimal aggregation operations is an open research topic. [44] A desirable property of an aggregation operation is that similar inputs provide similar aggregated outputs, and vice-versa. Some very simple candidate permutation-invariant operations are sum, mean, and max. Summary statistics like variance also work. All of these take a variable number of inputs, and provide an output that is the same, no matter the input ordering. Let’s explore the difference between these operations.
選擇和設計最優聚合操作是一個開放的研究課題。[44] 聚合操作的一個理想屬性是，類似的輸入提供類似的聚合輸出，反之亦然。一些非常簡單的候選置換不變運算是和、均值和最大值。像方差這樣的聚合方式也是可行的。無論輸入順序如何，所有這些都接受數量可變的輸入，并提供相同的輸出。讓我們探討一下這些操作之間的區別。

圖3 沒有哪種池類型總是能夠區分像max池化在左邊和sum / mean池化在右邊這樣的圖。

There is no operation that is uniformly the best choice. The mean operation can be useful when nodes have a highly-variable number of neighbors or you need a normalized view of the features of a local neighborhood. The max operation can be useful when you want to highlight single salient features in local neighborhoods. Sum provides a balance between these two, by providing a snapshot of the local distribution of features, but because it is not normalized, can also highlight outliers. In practice, sum is commonly used.
沒有一種操作總是的最佳的選擇。當節點的鄰居數量高度可變時，或者需要對局部鄰居的特征進行規范化處理時，mean操作非常有用。當您想要突出局部社區的單個顯著特征時，max操作可能很有用。sum通過提供特性的局部分布的快照，在這兩者之間提供了一種平衡，但是因為它不是標準化的，所以也可以突出顯示離群值。在實踐中，通常使用sum。

Designing aggregation operations is an open research problem that intersects with machine learning on sets. [45] New approaches such as Principal Neighborhood aggregation [27] take into account several aggregation operations by concatenating them and adding a scaling function that depends on the degree of connectivity of the entity to aggregate. Meanwhile, domain specific aggregation operations can also be designed. One example lies with the “Tetrahedral Chirality” aggregation operators [46] .
集合上的聚合操作設計是一個與機器學習交叉的開放研究問題。[45] 諸如Principal Neighborhood aggregation [27] 這樣的新方法通過將這些聚合操作連接起來，并添加一個依賴于實體的連接性程度的擴展函數來考慮聚合操作。同時，還可以設計特定于領域的聚合操作。一個例子就是“Tetrahedral Chirality”聚集算子 [46]。

GCN as subgraph function approximators

GCN作為子圖函數逼近器

Another way to see GCN (and MPNN) of k-layers with a 1-degree neighbor lookup is as a neural network that operates on learned embeddings of subgraphs of size k. [47] [44]
另一種觀察k層的GCN(和MPNN)和距離為1度的鄰居查找的方法是，將其作為一個神經網絡，在大小為k的子圖的學習embeddings中運行。[47] [44]

When focusing on one node, after k-layers, the updated node representation has a limited viewpoint of all neighbors up to k-distance, essentially a subgraph representation. Same is true for edge representations.
當聚焦于一個節點時，在k層之后，更新的節點表示具有到k距離為止所有鄰居的有節點，本質上是一個子圖表示。對于邊的表示也是如此。

So a GCN is collecting all possible subgraphs of size k and learning vector representations from the vantage point of one node or edge. The number of possible subgraphs can grow combinatorially, so enumerating these subgraphs from the beginning vs building them dynamically as in a GCN, might be prohibitive.
所以GCN是收集所有大小為k的可能的子圖，并從一個節點或邊的有利位置進行向量表示。適合的子圖數量可以組合式增長，所以如果不是像在GCN中那樣動態地構建這些子圖，而是從一開始就枚舉它們，可能是行不通的。

圖4 n-gram GNN

Edges and the Graph Dual

邊與圖的對偶

One thing to note is that edge predictions and node predictions, while seemingly different, often reduce to the same problem: an edge prediction task on a graph $G$ can be phrased as a node-level prediction on $G$ ’s dual.
需要注意的一點是，邊的預測和節點預測雖然看起來不同，但往往歸結為同一個問題: 圖 $G$ 上的邊層面的預測任務可以被描述為 $G$ 對偶的節點層面的預測。

To obtain $G$ ’s dual, we can convert nodes to edges (and edges to nodes). A graph and its dual contain the same information, just expressed in a different way. Sometimes this property makes solving problems easier in one representation than another, like frequencies in Fourier space. In short, to solve an edge classification problem on $G$ , we can think about doing graph convolutions on $G$ ’s dual (which is the same as learning edge representations on $G$ ), this idea was developed with Dual-Primal Graph Convolutional Networks. [48]
為了得到 $G$ 的對偶，我們可以將節點轉換為邊(或將邊轉換為節點)。圖及其對偶包含相同的信息，只是表示方式不同而已。有時，這個特性使得在一種表示法（節點的表示/邊的表示）中解決問題比在另一種表示法中容易，比如傅里葉空間中的頻率。簡而言之，為了解決 $G$ 上的邊的分類問題，我們可以考慮在 $G$ 的對偶上進行圖卷積(這與在 $G$ 上學習邊緣表示是一樣的)，這個想法是由Dual - Primal圖卷積網絡發展而來的。[48]

Graph convolutions as matrix multiplications, and matrix multiplications as walks on a graph

圖的卷積是矩陣乘法，矩陣乘法是在圖上游走

We’ve talked a lot about graph convolutions and message passing, and of course, this raises the question of how do we implement these operations in practice? For this section, we explore some of the properties of matrix multiplication, message passing, and its connection to traversing a graph.
我們已經討論了很多關于圖卷積和信息傳遞的內容，當然，這也出現了一個問題，即如何在實踐中實現這些操作?在本節中，我們將探討矩陣乘法的一些屬性、信息傳遞以及它與遍歷圖的連接。

The first point we want to illustrate is that the matrix multiplication of an adjacent matrix $A$ $n_{nodes}×n_{nodes}$ with a node feature matrix {X} of size $n_{nodes}×node_{dim}$ implements an simple message passing with a summation aggregation. Let the matrix be $B = A X$ , we can observe that any entry $B_{ij}$ can be expressed as $<ArowiX˙columnj>=Ai,1X1,j+Ai,2X2,j+?+Ai,nXn,j=∑Ai,k>0Xk,j<A_{row_i}\ \dot{X}_{column_j}>=A_{i,1} X_{1,j}+A_{i,2} X_{2,j}+?+A_{i,n} X_{n,j}=∑_{A_{i,k}>0}X_{k,j}$ . Because $A_{i,k}$
are binary entries only when a edge exists between $node_i$ and $node_j$ , the inner product is essentially “gathering” all node features values of dimension $j$ ” that share an edge with $node_i$ . It should be noted that this message passing is not updating the representation of the node features, just pooling neighboring node features. But this can be easily adapted by passing $X$ through your favorite differentiable transformation (e.g. MLP) before or after the matrix multiply.
我們想要說明的第一點是，大小為 $n_{nodes}×n_{nodes}$ 的鄰接矩陣 $A$ 與大小為 $n_{nodes}×node_{dim}$ 的節點特征矩陣X的矩陣乘法實現了一個簡單的信息傳遞和聚合操作。設矩陣為 $B = A X$ ，可以看到任意項 $B_{ij}$ 都可以表示為 $<ArowiX˙columnj>=Ai,1X1,j+Ai,2X2,j+?+Ai,nXn,j=∑Ai,k>0Xk,j<A_{row_i}\ \dot{X}_{column_j}>=A_{i,1} X_{1,j}+A_{i,2} X_{2,j}+?+A_{i,n} X_{n,j}=∑_{A_{i,k}>0}X_{k,j}$ 。因為只有當節點 $i$ 和節點 $k$ 之間存在一條邊時， $A_{i,k}$ 才是二進制項，所以內積本質上是“收集”與 $node_i$ 共享一條邊的維數為j的所有節點特征值。值得注意的是，這個信息傳遞并不是更新節點特征的表示方法，只是對鄰近節點的特征進行了池化操作。但是在矩陣乘法之前或之后通過合適于你的可微變換(例如MLP)傳遞 $X$ 可以很容易地進行調整。

From this view, we can appreciate the benefit of using adjacency lists. Due to the expected sparsity of $A$ we don’t have to sum all values where $A_{i,j}$ is zero. As long as we have an operation to gather values based on an index, we should be able to just retrieve positive entries. Additionally, this matrix multiply-free approach frees us from using summation as an aggregation operation.
從這個角度來看，我們可以體會到使用鄰接表的好處。由于 $A$ 的期望稀疏性我們不需要對所有 $A_{i,j}$ 為0的值求和。只要我們有一個基于索引收集值的操作，我們就應該能夠檢索正（大于0）的條目。此外，這種無需矩陣乘法的方法使我們不必將求和作為聚合操作來使用。

We can imagine that applying this operation multiple times allows us to propagate information at greater distances. In this sense, matrix multiplication is a form of traversing over a graph. This relationship is also apparent when we look at powers $A^K$ of the adjacency matrix. If we consider the matrix $A^2$ , the term $A_{ij}^2$ counts all walks of length 2 from $node_i$ to $node_j$ and can be expressed as the inner product $A_{row_i}, A_{column_j}>=A_{i,1} A_{1,j}+A_{i,2} A_{2,j}+?+A_{i,n} A_{n,j}$ . The intuition is that the first term $a_{i,1}$ $a_{1,j}$ is only positive under two conditions, there is edge that connects $node_i$ to $node_1$ and another edge that connects $node_1$ to $node_j$ . In other words, both edges form a path of length 2 that goes from $node_i$ to $node_j$ passing by $node_1$ . Due to the summation, we are counting over all possible intermediate nodes. This intuition carries over when we consider $A^3=AA^2$ … and so on to $A^k$ .
我們可以想象，多次應用這個操作可以讓我們在更大的距離上傳遞信息。從這個意義上說，矩陣乘法是遍歷圖的一種形式。當我們看鄰接矩陣的 $A^K$ 時，這種關系也很明顯。如果我們考慮矩陣 $A^2$ ，項 $A_{ij}^2$ 計算從節點i到節點 $j$ 所有長度為2的步長，可以表示為內積 $A_{row_i}, A_{column_j}>=A_{i,1} A_{1,j}+A_{i,2} A_{2,j}+?+A_{i,n} A_{n,j}$ 。直覺告訴我們第一項 $a_{i,1}$ $a_{1,j}$ 只有在兩種情況下是正的，有一條邊連接著 $node_i$ 和 $node_1$ ，另一條邊連接著 $node_1$ 和 $node_j$ 。換句話說，兩條邊的長度都是2，從 $node_i$ 到 $node_j$ 經過 $node_1$ ，由于求和，我們在計算所有可能的中間節點。當我們考慮 $A^3=AA^2$ …一直到 $A^k$ ，這種直覺依然有效。

There are deeper connections on how we can view matrices as graphs to explore [49] [50] [51] .
在如何將矩陣視為圖進行探索方面，還有更深層次的聯系[49] [50] [51]。

Graph Attention Networks

圖注意力網絡

Another way of communicating information between graph attributes is via attention. [52] For example, when we consider the sum-aggregation of a node and its 1-degree neighboring nodes we could also consider using a weighted sum.The challenge then is to associate weights in a permutation invariant fashion. One approach is to consider a scalar scoring function that assigns weights based on pairs of nodes ( $f(node_i,node_j)$ ). In this case, the scoring function can be interpreted as a function that measures how relevant a neighboring node is in relation to the center node. Weights can be normalized, for example with a softmax function to focus most of the weight on a neighbor most relevant for a node in relation to a task. This concept is the basis of Graph Attention Networks (GAT) [53] and Set Transformers [54] . Permutation invariance is preserved, because scoring works on pairs of nodes. A common scoring function is the inner product and nodes are often transformed before scoring into query and key vectors via a linear map to increase the expressivity of the scoring mechanism. Additionally for interpretability, the scoring weights can be used as a measure of the importance of an edge in relation to a task.
圖屬性之間傳遞信息的另一種方式是通過注意力[52]。例如，當我們考慮一個節點及其1次鄰近節點的sum聚合操作時，我們也可以考慮使用加權和。那么，其帶來的挑戰就是以一種置換不變的方式關聯權重。一種方法是考慮一個標量評分函數，根據節點( $f(node_i,node_j)$ )分配權重。在這種情況下，評分函數可以被解釋為一個用來衡量一個相鄰節點與中心節點之間相關性的函數。權重可以歸一化，例如使用softmax函數，將權重的大部分集中在與任務相關的節點最相關的鄰居上。這個概念是圖注意力網絡(GAT) [53] 和設置Transformers [54] 的基礎。排列不變性被保留，因為評分工作對節點。常用的評分函數是內積，評分前通常通過線性映射將節點轉換為查詢和關鍵向量，以增加評分機制的表達力。此外，對于可解釋性，評分權重可以用來衡量與任務相關優勢的重要性。

圖5 一個節點相對于相鄰節點的注意力示意圖。對每條邊計算、歸一化并用于加權節點embeddings。

Additionally, transformers can be viewed as GNNs with an attention mechanism [55] . Under this view, the transformer models several elements (i.g. character tokens) as nodes in a fully connected graph and the attention mechanism is assigning edge embeddings to each node-pair which are used to compute attention weights. The difference lies in the assumed pattern of connectivity between entities, a GNN is assuming a sparse pattern and the Transformer is modelling all connections.
此外，transformers可以看作是具有注意力機制的GNN [55] 。在這種觀點下，變壓器將幾個元素(例如字符標記)建模為一個全連通圖中的節點，注意力機制是為每個節點對分配邊embeddings，這些節點對用于計算注意權重。不同之處在于實體之間連接的假設模式，GNN假設稀疏模式，Transformer對所有連接建模。

Graph explanations and attributions

圖的解釋性和attribution技術

When deploying GNN in the wild we might care about model interpretability for building credibility, debugging or scientific discovery. The graph concepts that we care to explain vary from context to context. For example, with molecules we might care about the presence or absence of particular subgraphs [56] , while in a citation network we might care about the degree of connectedness of an article. Due to the variety of graph concepts, there are many ways to build explanations. GNNExplainer [57] casts this problem as extracting the most relevant subgraph that is important for a task. Attribution techniques [58] assign ranked importance values to parts of a graph that are relevant for a task. Because realistic and challenging graph problems can be generated synthetically, GNNs can serve as a rigorous and repeatable testbed for evaluating attribution techniques [59] .
在部署GNN模型時，我們可能會關心模型的可解釋性，以建立可信性、調試或科學發現。我們要解釋的圖概念因上下文的不同而不同。例如，對于分子，我們可能關心特定子圖的存在與否[56]，而在引文網絡中，我們可能關心文章的連通性程度。由于圖概念的多樣性，有許多方法來構建解釋。GNNExplainer [57]將此問題轉換為提取對任務至關重要的最相關的子圖。Attribution技術[58]將排序的重要性值分配給圖中與任務相關的部分。由于可以綜合生成現實且具有挑戰性的圖問題，因此GNN可以作為評估attribution技術的嚴格且可重復的測試平臺[59]。

圖6 圖的一些可解釋技術的示意圖。屬性將排序的值分配給圖屬性。排名可以作為提取可能與任務相關的連接子圖的基礎。

Generative modelling

生成模型

Besides learning predictive models on graphs, we might also care about learning a generative model for graphs. With a generative model we can generate new graphs by sampling from a learned distribution or by completing a graph given a starting point. A relevant application is in the design of new drugs, where novel molecular graphs with specific properties are desired as candidates to treat a disease.
除了學習圖上的預測模型之外，我們還可能關心學習圖的生成模型。在生成模型中，我們可以通過從一個學習過的分布中抽樣或通過給定一個起點完成一個圖來生成新的圖。一個相關的應用是在新藥的設計中，需要具有特定特性的新型分子圖作為治療疾病的備選。

A key challenge with graph generative models lies in modelling the topology of a graph, which can vary dramatically in size and has $N_{nodes}^2$ terms. One solution lies in modelling the adjacency matrix directly like an image with an autoencoder framework. [60] The prediction of the presence or absence of an edge is treated as a binary classification task. The $N_{nodes}^2$ term can be avoided by only predicting known edges and a subset of the edges that are not present. The graphVAE learns to model positive patterns of connectivity and some patterns of non-connectivity in the adjacency matrix.
圖的生成模型的一個關鍵挑戰是對圖的拓撲結構進行建模，圖的拓撲結構大小可以有很大的變化，并且有 $N_{nodes}^2$ 個項。一種解決方案是直接像使用自動編碼器框架的圖像一樣建模鄰接矩陣[60]。對邊存在與否的預測被視為一項二元分類任務。 $N_{nodes}^2$ 項可以通過只預測已知邊和不存在邊的子集來避免。graphVAE學習建模鄰接矩陣中連接的正模式和一些非連接的模式。

Another approach is to build a graph sequentially, by starting with a graph and applying discrete actions such as addition or subtraction of nodes and edges iteratively. To avoid estimating a gradient for discrete actions we can use a policy gradient. This has been done via an auto-regressive model, such a RNN [61] , or in a reinforcement learning scenario. [62] Furthermore, sometimes graphs can be modeled as just sequences with grammar elements. [63] [64]
另一種方法是按順序構建圖，從圖開始，迭代地應用離散操作(如節點和邊的加法或減法)。為了避免估計離散動作的梯度，我們可以使用策略梯度。這是通過自回歸模型，如RNN [61]，或在強化學習場景中完成的[62]。此外，有時可以將圖建模為帶有語法元素的序列。[63] [64]

Final thoughts

總結與思考

Graphs are a powerful and rich structured data type that have strengths and challenges that are very different from those of images and text. In this article, we have outlined some of the milestones that researchers have come up with in building neural network based models that process graphs. We have walked through some of the important design choices that must be made when using these architectures, and hopefully the GNN playground can give an intuition on what the empirical results of these design choices are. The success of GNNs in recent years creates a great opportunity for a wide range of new problems, and we are excited to see what the field will bring.
圖是一種功能強大且豐富的結構化數據類型，它的優勢和挑戰與圖像和文本非常不同。在本文中，我們概述了研究人員在構建基于處理圖形的神經網絡模型時所提出的一些里程碑。我們已經討論了在使用這些架構時必須做出的一些重要設計選擇，希望GNN playground能夠直觀地了解這些設計選擇的經驗結果是什么。近年來GNN的成功為一系列新問題創造了巨大的機遇，我們很高興看到該領域將帶來什么。

參考文獻

[19] Relational inductive biases, deep learning, and graph networks Battaglia, P.W., Hamrick, J.B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gulcehre, C., Song, F., Ballard, A., Gilmer, J., Dahl, G., Vaswani, A., Allen, K., Nash, C., Langston, V., Dyer, C., Heess, N., Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, Y. and Pascanu, R., 2018.

[27] Principal Neighbourhood Aggregation for Graph Nets Corso, G., Cavalleri, L., Beaini, D., Lio, P. and Velickovic, P., 2020.

[32] Graph Theory Harary, F., 1969.

[33] A nested-graph model for the representation and manipulation of complex objects Poulovassilis, A. and Levene, M., 1994. ACM Transactions on Information Systems, Vol 12(1), pp. 35--68.

[34] Modeling polypharmacy side effects with graph convolutional networks Zitnik, M., Agrawal, M. and Leskovec, J., 2018. Bioinformatics, Vol 34(13), pp. i457--i466.

[35] Machine learning in chemical reaction space Stocker, S., Csanyi, G., Reuter, K. and Margraf, J.T., 2020. Nat. Commun., Vol 11(1), pp. 5505.

[36] Graphs and Hypergraphs Berge, C., 1976. Elsevier.

[37] HyperGCN: A New Method of Training Graph Convolutional Networks on Hypergraphs Yadati, N., Nimishakavi, M., Yadav, P., Nitin, V., Louis, A. and Talukdar, P., 2018.

[38] Hierarchical Message-Passing Graph Neural Networks Zhong, Z., Li, C. and Pang, J., 2020.

[39] Little Ball of Fur Rozemberczki, B., Kiss, O. and Sarkar, R., 2020. Proceedings of the 29th ACM International Conference on Information & Knowledge Management.

[40] Sampling from large graphs Leskovec, J. and Faloutsos, C., 2006. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06.

[41] Metropolis Algorithms for Representative Subgraph Sampling Hubler, C., Kriegel, H., Borgwardt, K. and Ghahramani, Z., 2008. 2008 Eighth IEEE International Conference on Data Mining.

[42] Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks Chiang, W., Liu, X., Si, S., Li, Y., Bengio, S. and Hsieh, C., 2019.

[43] GraphSAINT: Graph Sampling Based Inductive Learning Method Zeng, H., Zhou, H., Srivastava, A., Kannan, R. and Prasanna, V., 2019.

[44] How Powerful are Graph Neural Networks? Xu, K., Hu, W., Leskovec, J. and Jegelka, S., 2018.

[45] Rep the Set: Neural Networks for Learning Set Representations Skianis, K., Nikolentzos, G., Limnios, S. and Vazirgiannis, M., 2019.

[46] Message Passing Networks for Molecules with Tetrahedral Chirality Pattanaik, L., Ganea, O., Coley, I., Jensen, K.F., Green, W.H. and Coley, C.W., 2020.

[47] N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules Liu, S., Demirel, M.F. and Liang, Y., 2018.

[48] Dual-Primal Graph Convolutional Networks Monti, F., Shchur, O., Bojchevski, A., Litany, O., Gunnemann, S. and Bronstein, M.M., 2018.

[49] Viewing matrices & probability as graphs Bradley, T..

[50] Graphs and Matrices Bapat, R.B., 2014. Springer.

[51] Modern Graph Theory Bollobas, B., 2013. Springer Science & Business Media.

[52] Attention Is All You Need Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I., 2017.

[53] Graph Attention Networks Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P. and Bengio, Y., 2017.

[54] Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks Lee, J., Lee, Y., Kim, J., Kosiorek, A.R., Choi, S. and Teh, Y.W., 2018.

[55] Transformers are Graph Neural Networks Joshi, C., 2020. NTU Graph Deep Learning Lab.

[56] Using Attribution to Decode Dataset Bias in Neural Network Models for Chemistry McCloskey, K., Taly, A., Monti, F., Brenner, M.P. and Colwell, L., 2018.

[57] GNNExplainer: Generating Explanations for Graph Neural Networks Ying, Z., Bourgeois, D., You, J., Zitnik, M. and Leskovec, J., 2019. Advances in Neural Information Processing Systems, Vol 32, pp. 9244--9255. Curran Associates, Inc.

[58] Explainability Methods for Graph Convolutional Neural Networks Pope, P.E., Kolouri, S., Rostami, M., Martin, C.E. and Hoffmann, H., 2019. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Evaluating Attribution for Graph Neural Networks Sanchez-Lengeling, B., Wei, J., Lee, B., Reif, E., Qian, W., Wang, Y., McCloskey, K.J., Colwell, L. and Wiltschko, A.B., 2020. Advances in Neural Information Processing Systems 33.

[60] Variational Graph Auto-Encoders Kipf, T.N. and Welling, M., 2016.

[61] GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models You, J., Ying, R., Ren, X., Hamilton, W.L. and Leskovec, J., 2018.

[62] Optimization of Molecules via Deep Reinforcement Learning Zhou, Z., Kearnes, S., Li, L., Zare, R.N. and Riley, P., 2019. Sci. Rep., Vol 9(1), pp. 1--10. Nature Publishing Group.

[63] Self-Referencing Embedded Strings (SELFIES): A 100% robust molecular string representation Krenn, M., Hase, F., Nigam, A., Friederich, P. and Aspuru-Guzik, A., 2019.

[64] GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation Goyal, N., Jain, H.V. and Ranu, S., 2020.

Citation

For attribution in academic contexts, please cite this work as

Sanchez-Lengeling, et al., "A Gentle Introduction to Graph Neural Networks", Distill, 2021.

BibTeX citation

@article{sanchez-lengeling2021a,author = {Sanchez-Lengeling, Benjamin and Reif, Emily and Pearce, Adam and Wiltschko, Alexander B.},title = {A Gentle Introduction to Graph Neural Networks},journal = {Distill},year = {2021},note = {https://distill.pub/2021/gnn-intro},doi = {10.23915/distill.00033} }

總結

以上是生活随笔為你收集整理的【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门]（7）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： PC暴跌：戴尔猛裁员还要加快停用中国芯
下一篇：图神经网络（二）GCN的性质（2）GCN