【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门](5)
【論文閱讀】A Gentle Introduction to Graph Neural Networks [圖神經(jīng)網(wǎng)絡(luò)入門](5)
Graph Neural Networks
圖神經(jīng)網(wǎng)絡(luò)
Now that the graph’s description is in a matrix format that is permutation invariant, we will describe using graph neural networks (GNNs) to solve graph prediction tasks. A GNN is an optimizable transformation on all attributes of the graph (nodes, edges, global-context) that preserves graph symmetries (permutation invariances). We’re going to build GNNs using the “message passing neural network” framework proposed by Gilmer et al. [18] using the Graph Nets architecture schematics introduced by Battaglia et al. [19] GNNs adopt a “graph-in, graph-out” architecture meaning that these model types accept a graph as input, with information loaded into its nodes, edges and global-context, and progressively transform these embeddings, without changing the connectivity of the input graph.
既然圖的描述是排列不變的矩陣格式,我們將使用圖神經(jīng)網(wǎng)絡(luò)(GNN)來描述以解決圖預(yù)測任務(wù)。GNN是對圖的所有屬性(節(jié)點、邊、全局上下文)的一種可優(yōu)化的轉(zhuǎn)換,它保持了圖的對稱性(排列不變性)。我們將使用Gilmer等人提出的“信息傳遞神經(jīng)網(wǎng)絡(luò)”[18]框架來構(gòu)建GNN。使用Battaglia等人介紹的圖網(wǎng)架構(gòu)示意圖[19]。GNN采用一種“圖入,圖出”的體系結(jié)構(gòu),這意味著這些模型類型接受一個圖作為輸入,將信息加載到它的節(jié)點、邊和全局上下文中,并逐步轉(zhuǎn)換這些embedding,而不改變輸入圖的連通性。
The simplest GNN
最簡單的GNN
With the numerical representation of graphs that we’ve constructed above (with vectors instead of scalars), we are now ready to build a GNN. We will start with the simplest GNN architecture, one where we learn new embeddings for all graph attributes (nodes, edges, global), but where we do not yet use the connectivity of the graph.
使用我們上面構(gòu)建的圖的數(shù)字表示(使用向量而不是標(biāo)量),我們現(xiàn)在可以構(gòu)建GNN了。我們將從最簡單的GNN架構(gòu)開始,在這個架構(gòu)中,我們學(xué)習(xí)所有圖屬性(節(jié)點、邊、全局)的新embeddings,但我們還沒有使用圖的連通性。
For simplicity, the previous diagrams used scalars to represent graph attributes; in practice feature vectors, or embeddings, are much more useful.
You could also call it a GNN block. Because it contains multiple operations/layers (like a ResNet block).
為簡單起見,前面的圖使用標(biāo)量來表示圖的屬性;在實踐中,特征向量或embeddings更有用。
你也可以叫它GNN block。因為它包含多個操作/層(與ResNet block類似)。
This GNN uses a separate multilayer perceptron (MLP) (or your favorite differentiable model) on each component of a graph; we call this a GNN layer. For each node vector, we apply the MLP and get back a learned node-vector. We do the same for each edge, learning a per-edge embedding, and also for the global-context vector, learning a single embedding for the entire graph.
該GNN使用一個獨立的多層感知器(MLP)(或你喜歡的可微模型)在圖的每個組件; 我們稱之為GNN層。對于每個節(jié)點向量,我們應(yīng)用MLP并得到一個學(xué)習(xí)后的節(jié)點向量。我們對每條邊做同樣的操作,學(xué)習(xí)每條邊的embedding,對全局上下文向量也做同樣的操作,學(xué)習(xí)整個圖的單個embedding。
A single layer of a simple GNN. A graph is the input, and each component (V,E,U) gets updated by a MLP to produce a new graph. Each function subscript indicates a separate function for a different graph attribute at the n-th layer of a GNN model.
一個簡單的GNN單層。輸入是一個圖,MLP更新每個參數(shù)(V,E,U)以生成一個新的圖。每個函數(shù)下標(biāo)表示GNN模型第n層的不同圖形屬性的單獨函數(shù)。
As is common with neural networks modules or layers, we can stack these GNN layers together.
與神經(jīng)網(wǎng)絡(luò)模塊或?qū)右粯?#xff0c;我們可以將這些GNN層堆疊在一起。
Because a GNN does not update the connectivity of the input graph, we can describe the output graph of a GNN with the same adjacency list and the same number of feature vectors as the input graph. But, the output graph has updated embeddings, since the GNN has updated each of the node, edge and global-context representations.
由于GNN不更新輸入的圖的連通性,我們可以用與輸入的圖相同的鄰接表和相同數(shù)量的特征向量來描述GNN的輸出圖。但是,輸出圖更新了embeddings,因為GNN更新了每個節(jié)點、邊和全局上下文表示。
GNN Predictions by Pooling Information
通過池化信息來預(yù)測GNN
We have built a simple GNN, but how do we make predictions in any of the tasks we described above?
我們已經(jīng)構(gòu)建了一個簡單的GNN,但是我們?nèi)绾卧谏厦婷枋龅娜蝿?wù)中進行預(yù)測呢?
We will consider the case of binary classification, but this framework can easily be extended to the multi-class or regression case. If the task is to make binary predictions on nodes, and the graph already contains node information, the approach is straightforward?—?for each node embedding, apply a linear classifier.
我們先來考慮二分類問題,這個框架可以很容易地擴展到多分類或回歸的問題。如果任務(wù)是對節(jié)點進行二分類預(yù)測,而圖中已經(jīng)包含了節(jié)點信息,那么這種方法很簡單——對于每個embedding的節(jié)點,應(yīng)用一個線性分類器。
We could imagine a social network, where we wish to anonymize user data (nodes) by not using them, and only using relational data (edges). One instance of such a scenario is the node task we specified in the Node-level task subsection. In the Karate club example, this would be just using the number of meetings between people to determine the alliance to Mr. Hi or John H.
我們可以想象一個社交網(wǎng)絡(luò),在這個網(wǎng)絡(luò)中,我們希望匿名化用戶數(shù)據(jù)(節(jié)點),不使用它們,只使用關(guān)系數(shù)據(jù)(邊)。這種場景的一個實例是我們在節(jié)點層面的任務(wù)小節(jié)中指定的節(jié)點任務(wù)。在空手道俱樂部的例子中,這將只是使用人們之間的會議數(shù)量來決定與Mr. Hi或John H的聯(lián)盟。
However, it is not always so simple. For instance, you might have information in the graph stored in edges, but no information in nodes, but still need to make predictions on nodes. We need a way to collect information from edges and give them to nodes for prediction. We can do this by pooling. Pooling proceeds in two steps:
1.For each item to be pooled, gather each of their embeddings and concatenate them into a matrix.
2.The gathered embeddings are then aggregated, usually via a sum operation.
然而,事情并不總是那么簡單。例如,你可能在圖里存儲在邊中的信息,但在節(jié)點中沒有信息,但仍然需要對節(jié)點進行預(yù)測。我們需要一種方法從邊收集信息,并將它們交給節(jié)點進行預(yù)測。我們可以通過pooling(池化)來實現(xiàn)。池化操作實現(xiàn)分兩個步驟:
1.對于要進行池化操作的每個項,將它們的每個embeddings集合起來,并將它們連接到一個矩陣中。
2.然后,通常使用求和操作對收集到的embeddings進行聚合。
For a more in-depth discussion on aggregation operations go to the Comparing aggregation operations section.
有關(guān)聚合操作的更深入討論,請參考聚合操作比較一節(jié)。
We represent the pooling operation by the letter ρρρ, and denote that we are gathering information from edges to nodes as ρEn→Vnρ_{E_n→V_n}ρEn?→Vn??.
我們用字母ρρρ表示池化操作,并表示我們從邊收集信息到節(jié)點ρEn→Vnρ_{E_n→V_n}ρEn?→Vn??。
Hover over a node (black node) to visualize which edges are gathered and aggregated to produce an embedding for that target node.
將鼠標(biāo)懸停在一個節(jié)點(黑節(jié)點)上,可以查看收集和聚合了哪些邊以生成目標(biāo)節(jié)點的嵌入。
So If we only have edge-level features, and are trying to predict binary node information, we can use pooling to route (or pass) information to where it needs to go. The model looks like this.
因此,如果我們只有邊層面的特征,并試圖預(yù)測二分類節(jié)點信息,我們可以使用池化操作路由(或傳遞)信息到它需要去的地方。模型是這樣的。
If we only have node-level features, and are trying to predict binary edge-level information, the model looks like this.
如果我們只有節(jié)點層面的特征,并試圖預(yù)測二分類問題邊層面的信息,那么模型看起來是這樣的。
One example of such a scenario is the edge task we specified in Edge level task sub section. Nodes can be recognized as image entities, and we are trying to predict if the entities share a relationship (binary edges).
這種場景的一個例子是我們在邊層面任務(wù)子節(jié)中指定的邊層面任務(wù)。節(jié)點可以被識別為圖像中的實體,我們試圖預(yù)測這些實體是否共享一個關(guān)系(邊的二分類)。
If we only have node-level features, and need to predict a binary global property, we need to gather all available node information together and aggregate them. This is similar to Global Average Pooling layers in CNNs. The same can be done for edges.
如果我們只有節(jié)點層面的特征,并且需要預(yù)測二分類問題的全局屬性,那么我們需要將所有可用的節(jié)點信息收集在一起,并對它們進行聚合。這類似于CNN中的全局平均池化層(Average Pooling)。對于邊也可以這樣做。
This is a common scenario for predicting molecular properties. For example, we have atomic information, connectivity and we would like to know the toxicity of a molecule (toxic/not toxic), or if it has a particular odor (rose/not rose).
這是預(yù)測分子性質(zhì)的一個常見場景。例如,我們有原子信息,連通性,我們想知道一個分子的毒性(有毒/無毒),或者它是否有特定的氣味(玫瑰香/非玫瑰香)。
In our examples, the classification model ccc can easily be replaced with any differentiable model, or adapted to multi-class classification using a generalized linear model.
在我們的例子中,分類模型ccc可以很容易地用任何可微分模型替換,或者使用廣義線性模型適應(yīng)多分類問題。
Now we’ve demonstrated that we can build a simple GNN model, and make binary predictions by routing information between different parts of the graph. This pooling technique will serve as a building block for constructing more sophisticated GNN models. If we have new graph attributes, we just have to define how to pass information from one attribute to another.
現(xiàn)在,我們已經(jīng)演示了構(gòu)建一個簡單的GNN模型,并通過在圖的不同部分之間傳遞信息來進行二分類預(yù)測。這種池化技術(shù)將作為構(gòu)建更復(fù)雜的GNN模型的基礎(chǔ)。如果我們有新的圖屬性,我們只需要定義如何將信息從一個屬性傳遞到另一個屬性即可。
Note that in this simplest GNN formulation, we’re not using the connectivity of the graph at all inside the GNN layer. Each node is processed independently, as is each edge, as well as the global context. We only use connectivity when pooling information for prediction.
請注意,在這個最簡單的GNN架構(gòu)中,我們根本沒有在GNN層中使用圖的連通性。每個節(jié)點、每條邊以及全局上下文都是獨立處理的。我們只在將信息用于預(yù)測時使用連通性。
Passing messages between parts of the graph
圖各個部分之間的信息傳遞
We could make more sophisticated predictions by using pooling within the GNN layer, in order to make our learned embeddings aware of graph connectivity. We can do this using message passing[18], where neighboring nodes or edges exchange information and influence each other’s updated embeddings.
通過在GNN層中使用池化操作,我們可以做出更復(fù)雜的預(yù)測,使我們學(xué)到的embeddings關(guān)聯(lián)到圖的連通性。我們可以使用信息傳遞[18]來實現(xiàn)這一點,即相鄰節(jié)點或邊交換信息并影響彼此的更新embeddings。
Message passing works in three steps:
1.For each node in the graph, gather all the neighboring node embeddings (or messages), which is the ggg function described above.
2.Aggregate all messages via an aggregate function (like sum).
3.All pooled messages are passed through an update function, usually a learned neural network.
信息傳遞工作分為三個步驟:
1.對于圖中的每個節(jié)點,收集所有的相鄰節(jié)點embeddings (或信息),即上面描述的ggg函數(shù)。
2.通過聚合函數(shù)(與sum函數(shù)類似)聚合所有信息。
3.所有匯集的信息都通過一個更新函數(shù)傳遞,通常是一個學(xué)習(xí)過的神經(jīng)網(wǎng)絡(luò)。
You could also 1) gather messages, 3) update them and 2) aggregate them and still have a permutation invariant operation.[20]
您還可以1)收集信息,3)更新信息,2)聚合信息,并且仍然具有置換不變操作。[20]
Just as pooling can be applied to either nodes or edges, message passing can occur between either nodes or edges.
正如池化操作可以應(yīng)用于節(jié)點或邊一樣,信息傳遞也可以發(fā)生在節(jié)點或邊之間。
These steps are key for leveraging the connectivity of graphs. We will build more elaborate variants of message passing in GNN layers that yield GNN models of increasing expressiveness and power.
這些步驟是利用圖連通性的關(guān)鍵。我們將在GNN層中構(gòu)建信息傳遞的更復(fù)雜的變體,從而產(chǎn)生可提高表達(dá)能力和功能的GNN模型。
Hover over a node, to highlight adjacent nodes and visualize the adjacent embedding that would be pooled, updated and stored.
將鼠標(biāo)懸停在一個節(jié)點上,以突出顯示相鄰的節(jié)點,并顯示將要合并、更新和存儲的相鄰embedding。
This sequence of operations, when applied once, is the simplest type of message-passing GNN layer.
如果應(yīng)用一次,這個操作序列就是信息傳遞GNN層的最簡單類型。
This is reminiscent of standard convolution: in essence, message passing and convolution are operations to aggregate and process the information of an element’s neighbors in order to update the element’s value. In graphs, the element is a node, and in images, the element is a pixel. However, the number of neighboring nodes in a graph can be variable, unlike in an image where each pixel has a set number of neighboring elements.
這讓我們想起了標(biāo)準(zhǔn)卷積神經(jīng)網(wǎng)絡(luò):從本質(zhì)上講,信息傳遞和卷積是聚合和處理元素鄰居信息以更新元素值的操作。在圖形中,元素是一個節(jié)點,而在圖像中,元素是一個像素。然而,圖中相鄰節(jié)點的數(shù)量可以是可變的,不像在圖像中,每個像素都有一組相鄰元素。
By stacking message passing GNN layers together, a node can eventually incorporate information from across the entire graph: after three layers, a node has information about the nodes three steps away from it.
通過將傳遞GNN層的信息疊加在一起,一個節(jié)點最終可以整合來自整個圖的信息: 經(jīng)過三層后,一個節(jié)點就擁有了距離它三步遠(yuǎn)的節(jié)點的信息。
We can update our architecture diagram to include this new source of information for nodes:
我們可以更新我們的架構(gòu)圖來包含這個新的節(jié)點信息源:
Schematic for a GCN architecture, which updates node representations of a graph by pooling neighboring nodes at a distance of one degree.
一種GCN架構(gòu)的示意圖,它通過將距離為一的相鄰節(jié)點池化來更新圖的節(jié)點表示。
Learning edge representations
學(xué)習(xí)邊表示
Our dataset does not always contain all types of information (node, edge, and global context). When we want to make a prediction on nodes, but our dataset only has edge information, we showed above how to use pooling to route information from edges to nodes, but only at the final prediction step of the model. We can share information between nodes and edges within the GNN layer using message passing.
我們的數(shù)據(jù)集并不總是包含所有類型的信息(節(jié)點、邊和全局上下文)。當(dāng)我們想要對節(jié)點進行預(yù)測,但我們的數(shù)據(jù)集只有邊的信息時,我們在上面展示了如何使用池化操作將信息從邊傳遞到節(jié)點,但只在模型的最終預(yù)測步驟。我們可以使用消息傳遞在GNN層的節(jié)點和邊之間共享信息。
We can incorporate the information from neighboring edges in the same way we used neighboring node information earlier, by first pooling the edge information, transforming it with an update function, and storing it.
我們可以將來自相鄰邊的信息合并到一起,就像我們之前使用相鄰節(jié)點信息的方式一樣,首先將邊的信息合并到一起,用一個更新函數(shù)對其進行轉(zhuǎn)換,然后存儲它。
However, the node and edge information stored in a graph are not necessarily the same size or shape, so it is not immediately clear how to combine them. One way is to learn a linear mapping from the space of edges to the space of nodes, and vice versa. Alternatively, one may concatenate them together before the update function.
然而,存儲在圖中的節(jié)點和邊的信息不一定是相同的大小或形狀,因此,如何將它們組合起來還不是立即就清楚的。一種方法是學(xué)習(xí)從邊空間到節(jié)點空間的線性映射,反之亦然。或者,可以在更新函數(shù)之前將它們連接在一起。
Architecture schematic for Message Passing layer. The first step “prepares” a message composed of information from an edge and it’s connected nodes and then “passes” the message to the node.
消息傳遞層的架構(gòu)示意圖。第一步“準(zhǔn)備”一條由來自邊及其連接節(jié)點的信息組成的消息,然后將消息“傳遞”給該節(jié)點。
Which graph attributes we update and in which order we update them is one design decision when constructing GNNs. We could choose whether to update node embeddings before edge embeddings, or the other way around. This is an open area of research with a variety of solutions– for example we could update in a ‘weave’ fashion [21] where we have four updated representations that get combined into new node and edge representations: node to node (linear), edge to edge (linear), node to edge (edge layer), edge to node (node layer).
在構(gòu)造GNN時,我們需要決定更新哪些圖屬性以及更新它們的順序。我們可以選擇是否在邊緣嵌入之前更新節(jié)點嵌入,或者反之。這是一個開放的研究領(lǐng)域與各種解決方案——例如我們可以更新“編織”的方式[21],我們有四個更新表示,組合成新節(jié)點和邊緣表示:節(jié)點到節(jié)點(線性),邊到邊(線性),節(jié)點到邊(邊層面),邊到節(jié)點(節(jié)點層面)。
Some of the different ways we might combine edge and node representation in a GNN layer.
在GNN層中結(jié)合邊和節(jié)點表示的一些不同方法。
Adding global representations
增加全局表示
There is one flaw with the networks we have described so far: nodes that are far away from each other in the graph may never be able to efficiently transfer information to one another, even if we apply message passing several times. For one node, If we have k-layers, information will propagate at most k-steps away. This can be a problem for situations where the prediction task depends on nodes, or groups of nodes, that are far apart. One solution would be to have all nodes be able to pass information to each other. Unfortunately for large graphs, this quickly becomes computationally expensive (although this approach, called ‘virtual edges’, has been used for small graphs such as molecules). [18]
到目前為止,我們所描述的網(wǎng)絡(luò)還存在一個缺陷:圖中彼此相距較遠(yuǎn)的節(jié)點可能永遠(yuǎn)無法有效地相互傳遞信息,即使我們應(yīng)用了多次消息傳遞。對于一個節(jié)點,如果我們有k層,信息將在最多k步的距離傳播。在預(yù)測任務(wù)依賴于距離很遠(yuǎn)的節(jié)點或節(jié)點組的情況下,這會成為一個問題。一種解決方案是讓所有節(jié)點都能夠互相傳遞信息。不幸的是,對于大的圖來說,這很快就會增加計算成本(盡管這種被稱為“虛邊”的方法已經(jīng)被用于小的圖,如分子)。[18]
One solution to this problem is by using the global representation of a graph (U) which is sometimes called a master node[19] [18] or context vector. This global context vector is connected to all other nodes and edges in the network, and can act as a bridge between them to pass information, building up a representation for the graph as a whole. This creates a richer and more complex representation of the graph than could have otherwise been learned.
解決這個問題的一種方法是使用圖(U)的全局表示,它有時被稱為主節(jié)點[19] [18]或上下文向量。這個全局上下文向量連接到網(wǎng)絡(luò)中所有其他的節(jié)點和邊,作為它們之間傳遞信息的橋梁,建立了整個圖的表示。這將創(chuàng)建一個更豐富、更復(fù)雜的圖形表示,而不是通過其他方法來學(xué)習(xí)。
Schematic of a Graph Nets architecture leveraging global representations.
利用全局表示的Graph Nets體系結(jié)構(gòu)示意圖。
In this view all graph attributes have learned representations, so we can leverage them during pooling by conditioning the information of our attribute of interest with respect to the rest. For example, for one node we can consider information from neighboring nodes, connected edges and the global information. To condition the new node embedding on all these possible sources of information, we can simply concatenate them. Additionally we may also map them to the same space via a linear map and add them or apply a feature-wise modulation layer [22], which can be considered a type of featurize-wise attention mechanism.
在這個視圖中,所有的圖形屬性都已經(jīng)學(xué)習(xí)了表示,所以我們可以在池化過程中利用它們,方法是將我們感興趣的屬性的信息與其他屬性相比較。例如,對于一個節(jié)點,我們可以考慮來自相鄰節(jié)點、連通邊和全局信息的信息。為了調(diào)整embedding到所有這些可能的信息源上的新節(jié)點,我們可以簡單地將它們連接起來。此外,我們還可以通過線性映射將它們映射到相同的空間,并添加它們或應(yīng)用特征調(diào)整層[22],這可以認(rèn)為是一種特征注意力機制。
Schematic for conditioning the information of one node based on three other embeddings (adjacent nodes, adjacent edges, global). This step corresponds to the node operations in the Graph Nets Layer.
基于其他三種embeddings (相鄰節(jié)點、相鄰邊、全局)來調(diào)節(jié)一個節(jié)點信息的示意圖。此步驟對應(yīng)于圖網(wǎng)層中的節(jié)點操作。
參考文獻
[18] Neural Message Passing for Quantum Chemistry Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O. and Dahl, G.E., 2017. Proceedings of the 34th International Conference on Machine Learning, Vol 70, pp. 1263--1272. PMLR.
[19] Relational inductive biases, deep learning, and graph networks Battaglia, P.W., Hamrick, J.B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gulcehre, C., Song, F., Ballard, A., Gilmer, J., Dahl, G., Vaswani, A., Allen, K., Nash, C., Langston, V., Dyer, C., Heess, N., Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, Y. and Pascanu, R., 2018.
[20] Deep Sets Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. and Smola, A., 2017.
[21] Molecular graph convolutions: moving beyond fingerprints Kearnes, S., McCloskey, K., Berndl, M., Pande, V. and Riley, P., 2016. J. Comput. Aided Mol. Des., Vol 30(8), pp. 595--608.
< name = "ref22">[22] Feature-wise transformations Dumoulin, V., Perez, E., Schucher, N., Strub, F., Vries, H.d., Courville, A. and Bengio, Y., 2018. Distill, Vol 3(7), pp. e11.總結(jié)
以上是生活随笔為你收集整理的【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门](5)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 余承东独揽大权?曝华为智能车业务迎来多项
- 下一篇: 【论文阅读】A Gentle Intro