當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门]（6）

發布時間：2023/12/15 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门]（6）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

【論文閱讀】A Gentle Introduction to Graph Neural Networks [圖神經網絡入門]（6）

GNN playground
- Some empirical GNN design lessons
- 參考文獻

GNN playground

GNN游樂場

We’ve described a wide range of GNN components here, but how do they actually differ in practice? This GNN playground allows you to see how these different components and architectures contribute to a GNN’s ability to learn a real task.
我們已經了解了各種各樣的GNN組件，但是它們在實踐中有什么不同呢?這個GNN游樂場允許您了解這些不同的組件和體系結構如何幫助GNN學習實際任務的能力。

Our playground shows a graph-level prediction task with small molecular graphs. We use the the Leffingwell Odor Dataset [23] [24], which is composed of molecules with associated odor percepts (labels). Predicting the relation of a molecular structure (graph) to its smell is a 100 year-old problem straddling chemistry, physics, neuroscience, and machine learning.
我們的游樂場展示了一個帶有小分子圖的圖預測任務。我們使用Leffingwell氣味數據集[23] [24]，它由分子和相關的氣味感知器(標簽)組成。預測分子結構(圖)與氣味的關系是一個百年難題，橫跨化學、物理、神經科學和機器學習。

To simplify the problem, we consider only a single binary label per molecule, classifying if a molecular graph smells “pungent” or not, as labeled by a professional perfumer. We say a molecule has a “pungent” scent if it has a strong, striking smell. For example, garlic and mustard, which might contain the molecule allyl alcohol have this quality. The molecule piperitone, often used for peppermint-flavored candy, is also described as having a pungent smell.
為了簡化這個問題，我們只考慮每個分子一個單一的二元標簽，如果分子圖聞起來“刺鼻”或不刺鼻，就按照專業調香師的標簽進行分類。我們說一個分子有“刺鼻”氣味，如果它有一種強烈的、刺鼻的氣味。例如，大蒜和芥末中可能含有烯丙醇分子，具有這種特性。胡椒酮分子，通常用于薄荷味的糖果，也被描述為具有刺激性的氣味。

We represent each molecule as a graph, where atoms are nodes containing a one-hot encoding for its atomic identity (Carbon, Nitrogen, Oxygen, Fluorine) and bonds are edges containing a one-hot encoding its bond type (single, double, triple or aromatic).
我們將每個分子表示為一個圖，其中原子是包含一個one-hot編碼(碳、氮、氧、氟)的節點，而鍵是包含一個one-hot編碼(單鍵、雙鍵、三鍵或芳香鍵)的鍵類型的邊。

Our general modeling template for this problem will be built up using sequential GNN layers, followed by a linear model with a sigmoid activation for classification. The design space for our GNN has many levers that can customize the model:

1.The number of GNN layers, also called the depth.

2.The dimensionality of each attribute when updated. The update function is a 1-layer MLP with a relu activation function and a layer norm for normalization of activations.

3.The aggregation function used in pooling: max, mean or sum.

4.The graph attributes that get updated, or styles of message passing: nodes, edges and global representation. We control these via boolean toggles (on or off). A baseline model would be a graph-independent GNN (all message-passing off) which aggregates all data at the end into a single global attribute. Toggling on all message-passing functions yields a GraphNets architecture.

我們將使用順序的GNN層來對這個問題進行建模，然后使用一個用于分類的sigmoid激活的線性模型。我們的GNN設計架構需要許多建模的手段:

1.GNN層數，又稱深度。

2.更新時每個屬性的維度。更新函數是一個具有relu激活函數和激活規范化層規范的單層MLP。

3.在池化操作中使用的聚合函數:max, mean或者sum。

4.被更新的圖屬性或消息傳遞的屬性:節點、邊和全局表示。我們通過布爾開關(on或off)來控制這些。基線模型將是一個獨立于圖的GNN(所有信息傳遞)，它在最后將所有數據聚合到一個全局屬性中。切換所有信息傳遞函數會生成一個GraphNets體系結構。

To better understand how a GNN is learning a task-optimized representation of a graph, we also look at the penultimate layer activations of the GNN. These ‘graph embeddings’ are the outputs of the GNN model right before prediction. Since we are using a generalized linear model for prediction, a linear mapping is enough to allow us to see how we are learning representations around the decision boundary.
為了更好地理解GNN如何學習圖的任務最佳化表示，我們還將研究GNN的倒數第二層激活。這些“graph embeddings”是GNN模型在預測之前的輸出。由于我們使用廣義線性模型進行預測，一個線性映射就足以讓我們看到如何圍繞決策邊界進行學習表示。

Since these are high dimensional vectors, we reduce them to 2D via principal component analysis (PCA). A perfect model would visibility separate labeled data, but since we are reducing dimensionality and also have imperfect models, this boundary might be harder to see.
由于這些是高維向量，我們通過主成分分析(PCA)將它們簡化為2D。一個完美的模型可以看到單獨的標簽數據，但是由于我們在降低維數，所以美中不足的是，這個邊界可能較難看到。

Play around with different model architectures to build your intuition. For example, see if you can edit the molecule on the left to make the model prediction increase. Do the same edits have the same effects for different model architectures?
使用不同的模型架構來構建您的認識。例如，看看你是否可以編輯左邊的分子，使模型預測增加。對于不同的模型體系結構，相同的編輯是否具有相同的效果?

This playground is running live on the browser in tfjs.
這個游樂場可以在瀏覽器上實時運行tfjs框架。

圖1 Edit the molecule to see how the prediction changes, or change the model params to load a different model. Select a different molecule in the scatter plot.
編輯分子，看看預測如何變化，或改變模型參數，以加載不同的模型。在散點圖中選擇一個不同的分子。

該實驗一共有6個參數可以變換，分別是Depth（模型層數/GNN層數）、Aggregation function（聚合函數）、Node embedding size（節點embedding大小）、Edge embedding size（邊embedding大小）、Global embedding size（全局embedding大小）。

Aggregation function（聚合函數）類似于卷積神經網絡池化層中的聚合函數，只不過卷積神經網絡池化層中的聚合函數Sum函數并不常見。

Node embedding size也可以理解為表示節點的向量長度，Edge embedding size和Global embedding size亦然，這三個變量可以設置為空，也就是在條件未知的情況下去預測。

右邊的散點圖為數據集中每個分子圖的預測結果與真實標簽的重合程度。其中空心圈為Ground Truth，也就是一個點的圈邊緣部分表示這個分子圖的標簽值; 實心圈為Model Prediction，也就是一個點的實心部分表示這個分子圖的預測結果; 紅色代表具有刺激性氣味，藍色代表不具有刺激性氣味; 那么也就是說，如果一個點的圈邊部分和實心部分如果顏色都是一樣的，說明我們的模型對數據集中的該分子圖的是否具有刺激性氣味預測正確，反之，則預測失敗; MODEL AUC為模型的預測準確率; Pungent為刺激性氣味的程度。

下面為數據集中某個分子圖的預測成功和預測失敗的例子，可以看到預測成功的分子（圖2）的Model Prediction為4% pungent，Ground Truth為not pungent，這說明預測該分子的刺激程度為4%，這與其標簽為非刺激性基本吻合，說明預測成功; 預測失敗的分子（圖3）的Model Prediction為64% pungent，Ground Truth為not pungent，這說明預測該分子的刺激程度為64%，屬于具有刺激性氣味，這與其標簽為非刺激性不吻合，說明預測失敗; 該模型是在3層GNN，聚合函數為Sum，節點embedding大小為50，邊embedding大小為10，全局embedding大小為50的參數下完成預測，其預測精確度為0.75。

圖2 預測成功的分子圖

圖3 預測失敗的分子圖

當然，你還可以在左邊自定義分子圖結構送入模型來預測該分子是否具有刺激性氣味，圖4為我們輸入了一個分子圖，其預測結果為9%刺激性氣味，屬于不具有刺激性氣味，由于該圖是我們自定義的，所以其標簽為unknown。

圖4 自定義分子圖及其預測結果

Some empirical GNN design lessons

一些GNN設計建議

When exploring the architecture choices above, you might have found some models have better performance than others. Are there some clear GNN design choices that will give us better performance? For example, do deeper GNN models perform better than shallower ones? or is there a clear choice between aggregation functions? The answers are going to depend on the data, [25] [26] , and even different ways of featurizing and constructing graphs can give different answers.
在探索上面的架構選擇時，您可能會發現某些模型的準確率比其他模型更高。是否有一些明確的GNN設計架構，將給我們更好的性能?例如，較深的GNN模型是否比較淺的模型表現更好?或者在幾種聚合函數之間有一個最好的選擇?答案取決于數據，[25] [26]，甚至不同的表示和構造圖的方法也會得出不同的結論。

With the following interactive figure, we explore the space of GNN architectures and the performance of this task across a few major design choices: Style of message passing, the dimensionality of embeddings, number of layers, and aggregation operation type.
在下圖中，我們通過幾個主要的設計選擇來探索GNN架構的空間和該任務的性能:信息傳遞的方式、embedding的維度、GNN的層數和聚合操作的類型。

Each point in the scatter plot represents a model: the x axis is the number of trainable variables, and the y axis is the performance. Hover over a point to see the GNN architecture parameters.
散點圖中的每個點代表一個模型:x軸是可訓練變量的數量，y軸是準確率。將鼠標懸停在一個點上可以看到GNN架構參數。

圖5 每個模型的性能與它的可訓練變量的數量的散點圖。將鼠標懸停在一個點上可以看到GNN架構參數。

圖6 最高準確率模型的參數

The first thing to notice is that, surprisingly, a higher number of parameters does correlate with higher performance. GNNs are a very parameter-efficient model type: for even a small number of parameters (3k) we can already find models with high performance.
首先要注意是，參數數量越多，性能出乎意料的越好。GNN是一種非常高效的參數模型: 即使是少量的參數(3k)，我們仍然能夠找到高性能的模型。

Next, we can look at the distributions of performance aggregated based on the dimensionality of the learned representations for different graph attributes.
接下來，我們可以查看基于不同圖屬性的表示學習的維數聚合的性能分布。

圖7 在不同的節點、邊和全局維度上聚合模型的性能。

We can notice that models with higher dimensionality tend to have better mean and lower bound performance but the same trend is not found for the maximum. Some of the top-performing models can be found for smaller dimensions. Since higher dimensionality is going to also involve a higher number of parameters, these observations go in hand with the previous figure.
我們可以注意到，模型的維數越高，其均值和最小值（準確率最低的模型）的性能越好，但最大值沒有出現相同的趨勢。對于較低維度的模型，可以找到一些性能最好的模型。由于更高的維數也將涉及更多的參數，這些觀察結果與前面的圖是一致的。

其實這段話也就是想告訴我們，節點、邊、全局向量的維度并不是越多模型的性能就越強，模型的性能與許多參數息息相關。

Next we can see the breakdown of performance based on the number of GNN layers.
接下來，我們可以看到GNN層數對模型性能的影響。

圖8 GNN層數與模型性能的關系圖，模型性能與參數數量的關系散點圖。每個點都根據層數著色。將鼠標懸停在一個點上可以看到GNN架構參數。

The box plot shows a similar trend, while the mean performance tends to increase with the number of layers, the best performing models do not have three or four layers, but two. Furthermore, the lower bound for performance decreases with four layers. This effect has been observed before, GNN with a higher number of layers will broadcast information at a higher distance and can risk having their node representations ‘diluted’ from many successive iterations [27].
箱形圖也顯示了類似的趨勢，雖然平均性能隨著層數的增加而增加，但性能最好的模型不是三層或四層，而是兩層。此外，性能的最小值隨著層的增加而降低。這種效應之前已經觀察到，具有較多層數的GNN將以更遠的距離傳遞信息，并可能在許多連續迭代中使其節點表示被“稀釋”[27]。

Does our dataset have a preferred aggregation operation? Our following figure breaks down performance in terms of aggregation type.
我們的數據集有聚合操作的最佳選擇嗎?下圖按照聚合操作的類型對模型性能進行了解析。

圖9 聚合類型與模型性能的關系圖，模型性能與參數個數的關系散點圖。每個點都根據聚合類型著色。將鼠標懸停在一個點上可以看到GNN架構參數。

Overall it appears that sum has a very slight improvement on the mean performance, but max or mean can give equally good models. This is useful to contextualize when looking at the discriminatory/expressive capabilities of aggregation operations .
總的來說，sum函數似乎對平均性能有非常輕微的改善，但max或mean可以給出性能同樣好的模型。選擇聚合操作的類型/表達能力時對于將其置于全局上下文中非常有用。

The previous explorations have given mixed messages. We can find mean trends where more complexity gives better performance but we can find clear counterexamples where models with fewer parameters, number of layers, or dimensionality perform better. One trend that is much clearer is about the number of attributes that are passing information to each other.
之前的探索給出了復雜的信息。我們可以發現模型復雜程度越高，性能越好的平均趨勢，但我們也可以發現參數越少、層數越少或維數越少的明顯反例。一個更明顯的趨勢是相互傳遞信息的屬性的數量。

Here we break down performance based on the style of message passing. On both extremes, we consider models that do not communicate between graph entities (“none”) and models that have messaging passed between nodes, edges, and globals.
這里，我們將根據信息傳遞的方式對模型性能進行解析。在這兩種極端情況下，我們考慮的模型在圖實體(“none”)之間不進行信息傳遞，而模型在節點、邊和全局之間進行信息傳遞。

圖10 信息傳遞與模型性能的關系圖，模型性能與參數數量的關系散點圖。每個點都通過信息傳遞著色。將鼠標懸停在一個點上可以看到GNN架構參數

Overall we see that the more graph attributes are communicating, the better the performance of the average model. Our task is centered on global representations, so explicitly learning this attribute also tends to improve performance. Our node representations also seem to be more useful than edge representations, which makes sense since more information is loaded in these attributes.
總的來說，我們看到能夠進行信息傳遞的圖屬性越多，平均模型的性能就越好。我們的任務以全局表示為中心，因此明確地學習這個屬性也有助于提高性能。我們的節點表示似乎也比邊表示更有用，這是有意義的，因為在這些屬性中加載了更多的信息。

There are many directions you could go from here to get better performance. We wish two highlight two general directions, one related to more sophisticated graph algorithms and another towards the graph itself.
為了獲得更好的性能，你可以從很多方面著手。我們希望突出兩個方向，一個與更復雜的圖算法有關，另一個與圖本身有關。

Up until now, our GNN is based on a neighborhood-based pooling operation. There are some graph concepts that are harder to express in this way, for example a linear graph path (a connected chain of nodes). Designing new mechanisms in which graph information can be extracted, executed and propagated in a GNN is one current research area [28] , [29] , [30] , [31] .
到目前為止，我們的GNN是基于鄰居共享操作的。有些圖概念很難用這種方式表達，例如線性圖路徑(連接的節點鏈)。設計一種新的機制，使圖信息能夠在GNN中提取、執行和傳遞是當前的研究領域之一[28] , [29] , [30] , [31]。

One of the frontiers of GNN research is not making new models and architectures, but “how to construct graphs”, to be more precise, imbuing graphs with additional structure or relations that can be leveraged. As we loosely saw, the more graph attributes are communicating the more we tend to have better models. In this particular case, we could consider making molecular graphs more feature rich, by adding additional spatial relationships between nodes, adding edges that are not bonds, or explicit learnable relationships between subgraphs.
GNN研究的前沿領域之一不是制造新的模型和架構，而是“如何構造圖”，更精確地說，為圖注入可以利用的額外結構或關系。正如我們所看到的，圖屬性之間的信息傳遞越多，我們的模型性能就越好。在這種特殊情況下，我們可以考慮通過添加節點之間的額外空間關系、添加非鍵的邊或子圖之間的顯式可學習關系，使子圖的特征更加豐富。

See more in Other types of graphs.
更多信息見其他類型的圖。

參考文獻

[23] Leffingwell Odor Dataset Sanchez-Lengeling, B., Wei, J.N., Lee, B.K., Gerkin, R.C., Aspuru-Guzik, A. and Wiltschko, A.B., 2020.

[24] Machine Learning for Scent: Learning Generalizable Perceptual Representations of Small Molecules Sanchez-Lengeling, B., Wei, J.N., Lee, B.K., Gerkin, R.C., Aspuru-Guzik, A. and Wiltschko, A.B., 2019.

[25] Benchmarking Graph Neural Networks Dwivedi, V.P., Joshi, C.K., Laurent, T., Bengio, Y. and Bresson, X., 2020.

[26] Design Space for Graph Neural Networks You, J., Ying, R. and Leskovec, J., 2020.

[27] Principal Neighbourhood Aggregation for Graph Nets Corso, G., Cavalleri, L., Beaini, D., Lio, P. and Velickovic, P., 2020.

[28] Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning Markowitz, E., Balasubramanian, K., Mirtaheri, M., Abu-El-Haija, S., Perozzi, B., Ver Steeg, G. and Galstyan, A., 2021.

[29] Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels Du, S.S., Hou, K., Poczos, B., Salakhutdinov, R., Wang, R. and Xu, K., 2019.

[30] Representation Learning on Graphs with Jumping Knowledge Networks Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K. and Jegelka, S., 2018.

[31] Neural Execution of Graph Algorithms Velickovic, P., Ying, R., Padovano, M., Hadsell, R. and Blundell, C., 2019.

總結

以上是生活随笔為你收集整理的【论文阅读】A Gentle Introduction to Graph Neural Networks [图神经网络入门]（6）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：真心没想到！五旬男子开车撞上护栏：只因刚
下一篇：【论文阅读】A Gentle Intro