當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Paper：《Graph Neural Networks: A Review of Methods and Applications》翻译与解读

發(fā)布時間：2025/3/21 编程问答 51 豆豆

生活随笔收集整理的這篇文章主要介紹了 Paper：《Graph Neural Networks: A Review of Methods and Applications》翻译与解读小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

Paper：《Graph Neural Networks: A Review of Methods and Applications》翻譯與解讀

《Graph Neural Networks: A Review of Methods and Applications》翻譯與解讀

Abstract

1. Introduction

2. Neural Networks as Relational Graphs

2.1. Message Exchange over Graphs

2.2. Fixed-width MLPs as Relational Graphs

2.3. General Neural Networks as Relational Graphs

3. Exploring Relational Graphs

3.1. Selection of Graph Measures

3.2. Design of Graph Generators

3.3. Controlling Computational Budget

4. Experimental Setup

4.1. Base Architectures

4.2. Exploration with Relational Graphs

5. Results

5.1. A Sweet Spot for Top Neural Networks

5.2. Neural Network Performance as a Smooth Function over Graph Measures

5.3. Consistency across Architectures

5.4. Quickly Identifying a Sweet Spot

5.5. Network Science and Neuroscience Connections

6. Related Work

7. Discussions

8. Conclusion

Acknowledgments

《Graph Neural Networks: A Review of Methods and Applications》翻譯與解讀

原論文地址：
https://arxiv.org/pdf/2007.06559.pdf
https://arxiv.org/abs/2007.06559

Comments:

ICML 2020

[Submitted on 13 Jul 2020]

Subjects:

Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Cite as:

arXiv:2007.06559?[cs.LG]
(or?arXiv:2007.06559v1?[cs.LG]?for this version)

Abstract

Neural networks are often represented as graphs of connections between neurons. However, despite their wide use, there is currently little understanding of the relationship between the graph structure of the neural network and its predictive performance. Here we systematically investigate how does the graph structure of neural networks affect their predictive performance. To this end, we develop a novel graph-based representation of neural networks called relational graph, where layers of neural network computation correspond to rounds of message exchange along the graph structure. Using this representation we show that: (1) a "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance; (2) neural network's performance is approximately a smooth function of the clustering coefficient and average path length of its relational graph; (3) our findings are consistent across many different tasks and datasets; (4) the sweet spot can be identified efficiently; (5) top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks. Our work opens new directions for the design of neural architectures and the understanding on neural networks in general.

神經(jīng)網(wǎng)絡(luò)通常用神經(jīng)元之間的連接圖來表示。然而，盡管它們被廣泛使用，目前人們對神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu)與其預測性能之間的關(guān)系知之甚少。在這里，我們系統(tǒng)地研究如何圖結(jié)構(gòu)的神經(jīng)網(wǎng)絡(luò)影響其預測性能。為此，我們開發(fā)了一種新的基于圖的神經(jīng)網(wǎng)絡(luò)表示，稱為關(guān)系圖，其中神經(jīng)網(wǎng)絡(luò)計算的層對應(yīng)于沿著圖結(jié)構(gòu)的消息交換輪數(shù)。利用這種表示法，我們證明:

(1)關(guān)系圖的“最佳點”可以顯著提高神經(jīng)網(wǎng)絡(luò)的預測性能;
(2)神經(jīng)網(wǎng)絡(luò)的性能近似為其關(guān)系圖的聚類系數(shù)和平均路徑長度的平滑函數(shù);
(3)我們的發(fā)現(xiàn)在許多不同的任務(wù)和數(shù)據(jù)集上是一致的;
(4)有效識別最佳點;
(5)性能最好的神經(jīng)網(wǎng)絡(luò)具有與真實生物神經(jīng)網(wǎng)絡(luò)驚人相似的圖結(jié)構(gòu)。我們的工作為一般神經(jīng)網(wǎng)絡(luò)的設(shè)計和理解開辟了新的方向。

1. Introduction

Deep neural networks consist of neurons organized into layers and connections between them. Architecture of a neural network can be captured by its “computational graph” where neurons are represented as nodes and directed edges link neurons in different layers. Such graphical representation demonstrates how the network passes and transforms the information from its input neurons, through hidden layers?all the way to the output neurons (McClelland et al., 1986). While it has been widely observed that performance of neural networks depends on their architecture (LeCun et al., 1998; Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2015; He et al., 2016), there is currently little systematic understanding on the relation between a neural network’s accuracy and its underlying graph structure. This is especially important for the neural architecture search, which today exhaustively searches over all possible connectivity patterns (Ying et al., 2019). From this perspective, several open questions arise:

Is there a systematic link between the network structure and its predictive performance?
What are structural signatures of well-performing neural networks?
How do such structural signatures generalize across tasks and datasets?
Is there an efficient way to check whether a given neural network is promising or not?

Establishing such a relation is both scientifically and practically important because it would have direct consequences on designing more efficient and more accurate architectures. It would also inform the design of new hardware architectures that execute neural networks. Understanding the graph structures that underlie neural networks would also advance the science of deep learning.

However, establishing the relation between network architecture and its accuracy is nontrivial, because it is unclear how to map a neural network to a graph (and vice versa). The natural choice would be to use computational graph representation but it has many limitations: (1) lack of generality: Computational graphs are constrained by the allowed graph properties, e.g., these graphs have to be directed and acyclic (DAGs), bipartite at the layer level, and single-in-single-out at the network level (Xie et al., 2019). This limits the use of the rich tools developed for general graphs. (2) Disconnection with biology/neuroscience: Biological neural networks have a much richer and less templatized structure (Fornito et al., 2013). There are information exchanges, rather than just single-directional flows, in the brain networks (Stringer et al., 2018). Such biological or neurological models cannot be simply represented by directed acyclic graphs.

深層神經(jīng)網(wǎng)絡(luò)由神經(jīng)元組成，這些神經(jīng)元被組織成層，并在層與層之間建立聯(lián)系。神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)可以通過它的“計算圖”來捕獲，其中神經(jīng)元被表示為節(jié)點，有向邊連接不同層次的神經(jīng)元。這樣的圖形表示演示了網(wǎng)絡(luò)如何傳遞和轉(zhuǎn)換來自輸入神經(jīng)元的信息，通過隱藏層一直到輸出神經(jīng)元(McClelland et al.， 1986)。雖然已廣泛觀察到神經(jīng)網(wǎng)絡(luò)的性能取決于其結(jié)構(gòu)(LeCun et al.， 1998;Krizhevsky等，2012;Simonyan & Zisserman, 2015;Szegedy等，2015;對于神經(jīng)網(wǎng)絡(luò)的精度與其底層圖結(jié)構(gòu)之間的關(guān)系，目前尚無系統(tǒng)的認識。這對于神經(jīng)結(jié)構(gòu)搜索來說尤為重要，如今，神經(jīng)結(jié)構(gòu)搜索遍尋所有可能的連通性模式(Ying等人，2019)。從這個角度來看，幾個開放的問題出現(xiàn)了:

網(wǎng)絡(luò)結(jié)構(gòu)和它的預測性能之間是否有系統(tǒng)的聯(lián)系?
性能良好的神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)特征是什么?
這種結(jié)構(gòu)特征如何在任務(wù)和數(shù)據(jù)集之間泛化?
有沒有一種有效的方法來檢查一個給定的神經(jīng)網(wǎng)絡(luò)是否有前途?

建立這樣的關(guān)系在科學上和實踐上都很重要，因為它將直接影響到設(shè)計更高效、更精確的架構(gòu)。它還將指導執(zhí)行神經(jīng)網(wǎng)絡(luò)的新硬件架構(gòu)的設(shè)計。理解神經(jīng)網(wǎng)絡(luò)的圖形結(jié)構(gòu)也將推進深度學習科學。

然而，建立網(wǎng)絡(luò)架構(gòu)與其準確度之間的關(guān)系并非無關(guān)緊要，因為還不清楚如何將神經(jīng)網(wǎng)絡(luò)映射到圖(反之亦然)。自然選擇是使用計算圖表示,但它有很多限制:(1)缺乏普遍性:計算圖被限制允許圖形屬性,例如,這些圖表必須指導和無環(huán)(無進取心的人),在層級別由兩部分構(gòu)成的,single-in-single-out在網(wǎng)絡(luò)層(謝et al ., 2019)。這限制了為一般圖形開發(fā)的豐富工具的使用。(2)與生物學/神經(jīng)科學的分離:生物神經(jīng)網(wǎng)絡(luò)具有更豐富的、更少圣殿化的結(jié)構(gòu)(Fornito et al.， 2013)。大腦網(wǎng)絡(luò)中存在著信息交換，而不僅僅是單向流動(Stringer et al.， 2018)。這樣的生物或神經(jīng)模型不能簡單地用有向無環(huán)圖來表示。

Here we systematically study the relationship between the graph structure of a neural network and its predictive performance. We develop a new way of representing a neural network as a graph, which we call relational graph. Our?key insight is to focus on message exchange, rather than just on directed data flow. As a simple example, for a fixedwidth fully-connected layer, we can represent one input channel and one output channel together as a single node, and an edge in the relational graph represents the message exchange between the two nodes (Figure 1(a)). Under this formulation, using appropriate message exchange definition, we show that the relational graph can represent many types of neural network layers (a fully-connected layer, a convolutional layer, etc.), while getting rid of many constraints of computational graphs (such as directed, acyclic, bipartite, single-in-single-out). One neural network layer corresponds to one round of message exchange over a relational graph, and to obtain deep networks, we perform message exchange over the same graph for several rounds. Our new representation enables us to build neural networks that are richer and more diverse and analyze them using well-established tools of network science (Barabasi & Psfai ′ , 2016).

We then design a graph generator named WS-flex that allows us to systematically explore the design space of neural networks (i.e., relation graphs). Based on the insights from neuroscience, we characterize neural networks by the clustering coefficient and average path length of their relational graphs (Figure 1(c)). Furthermore, our framework is flexible and general, as we can translate relational graphs into diverse neural architectures, including Multilayer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), ResNets, etc. with controlled computational budgets (Figure 1(d)).

本文系統(tǒng)地研究了神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu)與其預測性能之間的關(guān)系。我們提出了一種將神經(jīng)網(wǎng)絡(luò)表示為圖的新方法，即關(guān)系圖。我們的重點是關(guān)注消息交換，而不僅僅是定向數(shù)據(jù)流。作為一個簡單的例子，對于固定寬度的全連接層，我們可以將一個輸入通道和一個輸出通道一起表示為單個節(jié)點，關(guān)系圖中的一條邊表示兩個節(jié)點之間的消息交換(圖1(a))。在此公式下，利用適當?shù)南⒔粨Q定義，我們表明關(guān)系圖可以表示多種類型的神經(jīng)網(wǎng)絡(luò)層(全連通層、卷積層等)，同時擺脫了計算圖的許多約束(如有向、無環(huán)、二部圖、單入單出)。一個神經(jīng)網(wǎng)絡(luò)層對應(yīng)于在關(guān)系圖上進行一輪消息交換，為了獲得深度網(wǎng)絡(luò)，我們在同一圖上進行幾輪消息交換。我們的新表示使我們能夠構(gòu)建更加豐富和多樣化的神經(jīng)網(wǎng)絡(luò)，并使用成熟的網(wǎng)絡(luò)科學工具對其進行分析(Barabasi & Psfai, 2016)。

然后我們設(shè)計了一個名為WS-flex的圖形生成器，它允許我們系統(tǒng)地探索神經(jīng)網(wǎng)絡(luò)的設(shè)計空間。關(guān)系圖)。基于神經(jīng)科學的見解，我們通過聚類系數(shù)和關(guān)系圖的平均路徑長度來描述神經(jīng)網(wǎng)絡(luò)(圖1(c))。此外，我們的框架是靈活和通用的，因為我們可以將關(guān)系圖轉(zhuǎn)換為不同的神經(jīng)結(jié)構(gòu)，包括多層感知器(MLPs)、卷積神經(jīng)網(wǎng)絡(luò)(CNNs)、ResNets等，并控制計算預算(圖1(d))。

Using standard image classification datasets CIFAR-10 and ImageNet, we conduct a systematic study on how the architecture of neural networks affects their predictive performance. We make several important empirical observations:

A “sweet spot” of relational graphs lead to neural net?works with significantly improved performance under controlled computational budgets (Section 5.1).
Neural network’s performance is approximately a smooth function of the clustering coefficient and average path length of its relational graph. (Section 5.2).
Our findings are consistent across many architectures (MLPs, CNNs, ResNets, EfficientNet) and tasks (CIFAR-10, ImageNet) (Section 5.3).
The sweet spot can be identified efficiently. Identifying a sweet spot only requires a few samples of relational graphs and a few epochs of training (Section 5.4).
Well-performing neural networks have graph structure surprisingly similar to those of real biological neural networks (Section 5.5).

Our results have implications for designing neural network architectures, advancing the science of deep learning and improving our understanding of neural networks in general.

使用標準圖像分類數(shù)據(jù)集CIFAR-10和ImageNet，我們對神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)如何影響其預測性能進行了系統(tǒng)研究。我們做了幾個重要的經(jīng)驗觀察:

關(guān)系圖的一個“最佳點”可以使神經(jīng)網(wǎng)絡(luò)在受控的計算預算下顯著提高性能(第5.1節(jié))。
神經(jīng)網(wǎng)絡(luò)的性能近似于其聚類系數(shù)和關(guān)系圖平均路徑長度的平滑函數(shù)。(5.2節(jié))。
我們的發(fā)現(xiàn)在許多體系結(jié)構(gòu)(MLPs, CNNs, resnet, EfficientNet)和任務(wù)(ci遠-10,ImageNet)中是一致的(第5.3節(jié))。
最佳點可以被有效地確定。確定一個最佳點只需要幾個關(guān)系圖的樣本和幾個時期的訓練(第5.4節(jié))。
性能良好的神經(jīng)網(wǎng)絡(luò)的圖形結(jié)構(gòu)與真實的生物神經(jīng)網(wǎng)絡(luò)驚人地相似(第5.5節(jié))。
我們的研究結(jié)果對設(shè)計神經(jīng)網(wǎng)絡(luò)架構(gòu)、推進深度學習的科學以及提高我們對神經(jīng)網(wǎng)絡(luò)的總體理解都有意義。

Figure 1: Overview of our approach. (a) A layer of a neural network can be viewed as a relational graph where we connect nodes that exchange messages. (b) More examples of neural network layers and relational graphs. (c) We explore the design space of relational graphs according to their graph measures, including average path length and clustering coefficient, where the complete graph corresponds to a fully-connected layer. (d) We translate these relational graphs to neural networks and study how their predictive performance depends on the graph measures of their corresponding relational graphs.

圖1:我們方法的概述。(a)神經(jīng)網(wǎng)絡(luò)的一層可以看作是一個關(guān)系圖，在這里我們連接交換消息的節(jié)點。(b)神經(jīng)網(wǎng)絡(luò)層和關(guān)系圖的更多例子。(c)我們根據(jù)關(guān)系圖的圖度量來探索關(guān)系圖的設(shè)計空間，包括平均路徑長度和聚類系數(shù)，其中完全圖對應(yīng)一個全連通層。我們將這些關(guān)系圖轉(zhuǎn)換為神經(jīng)網(wǎng)絡(luò)，并研究它們的預測性能如何取決于對應(yīng)關(guān)系圖的圖度量。

2. Neural Networks as Relational Graphs

To explore the graph structure of neural networks, we first introduce the concept of our relational graph representation and its instantiations. We demonstrate how our representation can capture diverse neural network architectures under a unified framework. Using the language of graph in the context of deep learning helps bring the two worlds together and establish a foundation for our study.

為了探討神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu)，我們首先介紹關(guān)系圖表示的概念及其實例。我們將演示如何在統(tǒng)一框架下捕獲不同的神經(jīng)網(wǎng)絡(luò)架構(gòu)。在深度學習的背景下使用graph語言有助于將這兩個世界結(jié)合起來，為我們的研究奠定基礎(chǔ)。

2.1. Message Exchange over Graphs

We start by revisiting the definition of a neural network from the graph perspective. We define a graph G = (V, E) by its node set V = {v1, ..., vn} and edge set E ? {(vi , vj )|vi , vj ∈ V}. We assume each node v has a node feature scalar/vector xv.
我們首先從圖的角度重新討論神經(jīng)網(wǎng)絡(luò)的定義。通過圖G = (V, E)的節(jié)點集V = {v1，…， vn}和邊集E?{(vi, vj)|vi, vj∈V}。我們假設(shè)每個節(jié)點v有一個節(jié)點特征標量/向量xv。

Table 1: Diverse neural architectures expressed in the language of relational graphs. These architectures are usually implemented as complete relational graphs, while we systematically explore more graph structures for these architectures.

表1:用關(guān)系圖語言表示的各種神經(jīng)結(jié)構(gòu)。這些架構(gòu)通常被實現(xiàn)為完整的關(guān)系圖，而我們系統(tǒng)地為這些架構(gòu)探索更多的圖結(jié)構(gòu)。

Figure 2: Example of translating a 4-node relational graph to a 4-layer 65-dim MLP. We highlight the message exchange for node x1. Using different definitions of xi , fi(·), AGG(·) and R (those defined in Table 1), relational graphs can be translated to diverse neural architectures.

圖2:將4節(jié)點關(guān)系圖轉(zhuǎn)換為4層65-dim MLP的示例。我們將重點介紹節(jié)點x1的消息交換。使用xi、fi(·)、AGG(·)和R(表1中定義的)的不同定義，關(guān)系圖可以轉(zhuǎn)換為不同的神經(jīng)結(jié)構(gòu)。

We call graph G a relational graph, when it is associated with message exchanges between neurons. Specifically, a message exchange is defined by a message function, whose input is a node’s feature and output is a message, and an aggregation function, whose input is a set of messages and output is the updated node feature. At each round of message exchange, each node sends messages to its neighbors, and aggregates incoming messages from its neighbors. Each message is transformed at each edge through a message function f(·), then they are aggregated at each node via an aggregation function AGG(·). Suppose we conduct R rounds of message exchange, then the r-th round of message exchange for a node v can be described as

Equation 1 provides a general definition for message exchange. In the remainder of this section, we discuss how this general message exchange definition can be instantiated as different neural architectures. We summarize the different instantiations in Table 1, and provide a concrete example of instantiating a 4-layer 65-dim MLP in Figure 2.

我們稱圖G為關(guān)系圖，當它與神經(jīng)元之間的信息交換有關(guān)時。具體來說，消息交換由消息函數(shù)和聚合函數(shù)定義，前者的輸入是節(jié)點的特性，輸出是消息，后者的輸入是一組消息，輸出是更新后的節(jié)點特性。在每一輪消息交換中，每個節(jié)點向它的鄰居發(fā)送消息，并聚合從它的鄰居傳入的消息。每個消息通過消息函數(shù)f(·)在每個邊進行轉(zhuǎn)換，然后通過聚合函數(shù)AGG(·)在每個節(jié)點進行聚合。假設(shè)我們進行了R輪的消息交換，那么節(jié)點v的第R輪消息交換可以描述為

公式1提供了消息交換的一般定義。在本節(jié)的其余部分中，我們將討論如何將這個通用消息交換定義實例化為不同的神經(jīng)結(jié)構(gòu)。我們在表1中總結(jié)了不同的實例，并在圖2中提供了實例化4層65-dim MLP的具體示例。

2.2. Fixed-width MLPs as Relational Graphs

A Multilayer Perceptron (MLP) consists of layers of computation units (neurons), where each neuron performs a weighted sum over scalar inputs and outputs, followed by some non-linearity. Suppose the r-th layer of an MLP takes x (r) as input and x (r+1) as output, then a neuron computes:

一個多層感知器(MLP)由多層計算單元(神經(jīng)元)組成，其中每個神經(jīng)元執(zhí)行標量輸入和輸出的加權(quán)和，然后是一些非線性。假設(shè)MLP的第r層以x (r)為輸入，x (r+1)為輸出，則有一個神經(jīng)元計算:

The above discussion reveals that a fixed-width MLP can be viewed as a complete relational graph with a special message exchange function. Therefore, a fixed-width MLP is a special case under a much more general model family, where the message function, aggregation function, and most importantly, the relation graph structure can vary.

This insight allows us to generalize fixed-width MLPs from using complete relational graph to any general relational graph G. Based on the general definition of message exchange in Equation 1, we have:

上面的討論表明，可以將固定寬度的MLP視為具有特殊消息交換功能的完整關(guān)系圖。因此，固定寬度的MLP是更為通用的模型系列中的一種特殊情況，其中消息函數(shù)、聚合函數(shù)以及最重要的關(guān)系圖結(jié)構(gòu)可能會發(fā)生變化。

這使得我們可以將固定寬度MLPs從使用完全關(guān)系圖推廣到任何一般關(guān)系圖g。根據(jù)公式1中消息交換的一般定義，我們有:

2.3. General Neural Networks as Relational Graphs

The graph viewpoint in Equation 3 lays the foundation of representing fixed-width MLPs as relational graphs. In this?section, we discuss how we can further generalize relational graphs to general neural networks.

Variable-width MLPs as relational graphs. An important design consideration for general neural networks is that layer width often varies through out the network. For example, in CNNs, a common practice is to double the layer width (number of feature channels) after spatial down-sampling.

式3中的圖點為將定寬MLPs表示為關(guān)系圖奠定了基礎(chǔ)。在這一節(jié)中，我們將討論如何進一步將關(guān)系圖推廣到一般的神經(jīng)網(wǎng)絡(luò)。

可變寬度MLPs作為關(guān)系圖。對于一般的神經(jīng)網(wǎng)絡(luò)，一個重要的設(shè)計考慮是網(wǎng)絡(luò)的層寬經(jīng)常是不同的。例如，在CNNs中，常用的做法是在空間下采樣后將層寬(特征通道數(shù))增加一倍。

Note that under this definition, the maximum number of nodes of a relational graph is bounded by the width of the narrowest layer in the corresponding neural network (since the feature dimension for each node must be at least 1).

注意，在這個定義下，關(guān)系圖的最大節(jié)點數(shù)以對應(yīng)神經(jīng)網(wǎng)絡(luò)中最窄層的寬度為界(因為每個節(jié)點的特征維數(shù)必須至少為1)。

Modern neural architectures as relational graphs. Finally, we generalize relational graphs to represent modern neural architectures with more sophisticated designs. For?example, to represent a ResNet (He et al., 2016), we keep the residual connections between layers unchanged. To represent neural networks with bottleneck transform (He et al., 2016), a relational graph alternatively applies message exchange with 3×3 and 1×1 convolution; similarly, in the efficient computing setup, the widely used separable convolution (Howard et al., 2017; Chollet, 2017) can be viewed as alternatively applying message exchange with 3×3 depth-wise convolution and 1×1 convolution.

Overall, relational graphs provide a general representation for neural networks. With proper definitions of node features and message exchange, relational graphs can represent diverse neural architectures, as is summarized in Table 1.

作為關(guān)系圖的現(xiàn)代神經(jīng)結(jié)構(gòu)。最后，我們推廣了關(guān)系圖，用更復雜的設(shè)計來表示現(xiàn)代神經(jīng)結(jié)構(gòu)。例如，為了表示ResNet (He et al.， 2016)，我們保持層之間的剩余連接不變。為了用瓶頸變換表示神經(jīng)網(wǎng)絡(luò)(He et al.， 2016)，關(guān)系圖交替應(yīng)用3×3和1×1卷積的消息交換;同樣，在高效的計算設(shè)置中，廣泛使用的可分離卷積(Howard et al.， 2017;Chollet, 2017)可以看作是3×3深度卷積和1×1卷積交替應(yīng)用消息交換。

總的來說，關(guān)系圖提供了神經(jīng)網(wǎng)絡(luò)的一般表示。通過正確定義節(jié)點特性和消息交換，關(guān)系圖可以表示不同的神經(jīng)結(jié)構(gòu)，如表1所示。

3. Exploring Relational Graphs

In this section, we describe in detail how we design and explore the space of relational graphs defined in Section 2, in order to study the relationship between the graph structure of neural networks and their predictive performance. Three main components are needed to make progress: (1) graph measures that characterize graph structural properties, (2) graph generators that can generate diverse graphs, and (3) a way to control the computational budget, so that the differences in performance of different neural networks are due to their diverse relational graph structures.

在本節(jié)中，我們將詳細描述如何設(shè)計和探索第二節(jié)中定義的關(guān)系圖空間，以研究神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu)與其預測性能之間的關(guān)系。三個主要組件是需要取得進展:(1)圖的措施描述圖的結(jié)構(gòu)性質(zhì),(2)圖形發(fā)生器,可以產(chǎn)生不同的圖表,和(3)一種方法來控制計算預算,以便不同神經(jīng)網(wǎng)絡(luò)的性能的差異是由于各自不同的關(guān)系圖結(jié)構(gòu)。

3.1. Selection of Graph Measures

Given the complex nature of graph structure, graph measures are often used to characterize graphs. In this paper, we focus on one global graph measure, average path length, and one local graph measure, clustering coefficient. Notably, these two measures are widely used in network science (Watts & Strogatz, 1998) and neuroscience (Sporns, 2003; Bassett & Bullmore, 2006). Specifically, average path length measures the average shortest path distance between any pair of nodes; clustering coefficient measures the proportion of edges between the nodes within a given node’s neighborhood, divided by the number of edges that could possibly exist between them, averaged over all the nodes. There are other graph measures that can be used for analysis, which are included in the Appendix.

由于圖結(jié)構(gòu)的復雜性，圖測度通常被用來刻畫圖的特征。本文主要研究了一個全局圖測度，即平均路徑長度，和一個局部圖測度，即聚類系數(shù)。值得注意的是，這兩種方法在網(wǎng)絡(luò)科學(Watts & Strogatz, 1998)和神經(jīng)科學(Sporns, 2003;巴西特和布爾莫爾，2006年)。具體來說，平均路徑長度度量任意對節(jié)點之間的平均最短路徑距離;聚類系數(shù)度量給定節(jié)點鄰域內(nèi)節(jié)點之間的邊的比例，除以它們之間可能存在的邊的數(shù)量，平均到所有節(jié)點上。還有其他可以用于分析的圖表度量，包括在附錄中。

3.2. Design of Graph Generators

Given selected graph measures, we aim to generate diverse graphs that can cover a large span of graph measures, using a graph generator. However, such a goal requires careful generator designs: classic graph generators can only generate a limited class of graphs, while recent learning-based graph generators are designed to imitate given exemplar graphs (Kipf & Welling, 2017; Li et al., 2018b; You et al., 2018a;b; 2019a).

Limitations of existing graph generators. To illustrate the limitation of existing graph generators, we investigate the following classic graph generators: (1) Erdos-R ? enyi ′ (ER) model that can sample graphs with given node and edge number uniformly at random (Erdos & R ? enyi ′ , 1960); (2) Watts-Strogatz (WS) model that can generate graphs with small-world properties (Watts & Strogatz, 1998); (3) Barabasi-Albert (BA) model that can generate scale-free ′ graphs (Albert & Barabasi ′ , 2002); (4) Harary model that can generate graphs with maximum connectivity (Harary, 1962); (5) regular ring lattice graphs (ring graphs); (6) complete graphs. For all types of graph generators, we control the number of nodes to be 64, enumerate all possible discrete parameters and grid search over all continuous parameters of the graph generator. We generate 30 random graphs with different random seeds under each parameter setting. In total, we generate 486,000 WS graphs, 53,000 ER graphs, 8,000 BA graphs, 1,800 Harary graphs, 54 ring graphs and 1 complete graph (more details provided in the Appendix). In Figure 3, we can observe that graphs generated by those classic graph generators have a limited span in the space of average path length and clustering coefficient.

給定選定的圖度量，我們的目標是使用圖生成器生成能夠覆蓋大范圍圖度量的不同圖。然而，這樣的目標需要仔細的生成器設(shè)計:經(jīng)典的圖形生成器只能生成有限的圖形類，而最近的基于學習的圖形生成器被設(shè)計用來模仿給定的范例圖形(Kipf & Welling, 2017;李等，2018b;You等，2018a;b;2019年)。

現(xiàn)有圖形生成器的局限性。為了說明現(xiàn)有圖生成器的限制，我們調(diào)查以下經(jīng)典圖生成器:(1)可以隨機統(tǒng)一地用給定節(jié)點和邊數(shù)采樣圖的厄多斯-R圖表生成器(厄多斯& R他們是在1960年);(2)能夠生成具有小世界特性圖的Watts-Strogatz (WS)模型(Watts & Strogatz, 1998);(3) barabsi -Albert (BA)模型，可以生成無尺度的’圖(Albert & Barabasi, 2002);(4)可生成最大連通性圖的Harary模型(Harary, 1962);(5)正則環(huán)格圖(環(huán)圖);(6)完成圖表。對于所有類型的圖生成器，我們控制節(jié)點數(shù)為64，枚舉所有可能的離散參數(shù)，并對圖生成器的所有連續(xù)參數(shù)進行網(wǎng)格搜索。我們在每個參數(shù)設(shè)置下生成30個帶有不同隨機種子的隨機圖。總共生成486,000個WS圖、53,000個ER圖、8,000個BA圖、1,800個Harary圖、54個環(huán)圖和1個完整圖(詳情見附錄)。在圖3中，我們可以看到經(jīng)典的圖生成器生成的圖在平均路徑長度和聚類系數(shù)的空間中具有有限的跨度。

WS-flex graph generator. Here we propose the WS-flex graph generator that can generate graphs with a wide coverage of graph measures; notably, WS-flex graphs almost encompass all the graphs generated by classic random generators mentioned above, as is shown in Figure 3. WSflex generator generalizes WS model by relaxing the constraint that all the nodes have the same degree before random rewiring. Specifically, WS-flex generator is parametrized by node n, average degree k and rewiring probability p. The number of edges is determined as e = bn ? k/2c. Specifically, WS-flex generator first creates a ring graph where each node connects to be/nc neighboring nodes; then the generator randomly picks e mod n nodes and connects each node to one closest neighboring node; finally, all the edges are randomly rewired with probability p. We use WS-flex generator to smoothly sample within the space of clustering coefficient and average path length, then sub-sample 3942 graphs for our experiments, as is shown in Figure 1(c).

WS-flex圖生成器。在這里，我們提出了WS-flex圖形生成器，它可以生成覆蓋范圍廣泛的圖形度量;值得注意的是，WS-flex圖幾乎包含了上面提到的經(jīng)典隨機生成器生成的所有圖，如圖3所示。WSflex生成器是對WS模型的一般化，它放寬了隨機重新布線前所有節(jié)點具有相同程度的約束。具體來說，WS-flex生成器由節(jié)點n、平均度k和重新布線概率p參數(shù)化。邊的數(shù)量確定為e = bn?k/2c。具體來說，WS-flex生成器首先創(chuàng)建一個環(huán)圖，其中每個節(jié)點連接到be/nc相鄰節(jié)點;然后隨機選取e mod n個節(jié)點，將每個節(jié)點連接到一個最近的相鄰節(jié)點;最后，以概率p隨機重新連接所有的邊。我們使用WS-flex生成器在聚類系數(shù)和平均路徑長度的空間內(nèi)平滑采樣，然后進行我們實驗的子樣本3942張圖，如圖1(c)所示。

3.3. Controlling Computational Budget

To compare the neural networks translated by these diverse graphs, it is important to ensure that all networks have approximately the same complexity, so that the differences in performance are due to their relational graph structures. We use FLOPS (# of multiply-adds) as the metric. We first compute the FLOPS of our baseline network instantiations (i.e. complete relational graph), and use them as the reference complexity in each experiment. As described in Section 2.3, a relational graph structure can be instantiated as a neural network with variable width, by partitioning dimensions or channels into disjoint set of node features. Therefore, we?can conveniently adjust the width of a neural network to match the reference complexity (within 0.5% of baseline FLOPS) without changing the relational graph structures. We provide more details in the Appendix.

為了比較由這些不同圖轉(zhuǎn)換的神經(jīng)網(wǎng)絡(luò)，確保所有網(wǎng)絡(luò)具有近似相同的復雜性是很重要的，這樣性能上的差異是由于它們的關(guān)系圖結(jié)構(gòu)造成的。我們使用FLOPS (number of multiply- added)作為度量標準。我們首先計算我們的基線網(wǎng)絡(luò)實例化的失敗(即完全關(guān)系圖)，并使用它們作為每個實驗的參考復雜度。如2.3節(jié)所述，通過將維度或通道劃分為節(jié)點特征的不相交集，可以將關(guān)系圖結(jié)構(gòu)實例化為寬度可變的神經(jīng)網(wǎng)絡(luò)。因此，我們可以方便地調(diào)整神經(jīng)網(wǎng)絡(luò)的寬度，以匹配參考復雜度(在基線失敗的0.5%以內(nèi))，而不改變關(guān)系圖結(jié)構(gòu)。我們在附錄中提供了更多細節(jié)。

4. Experimental Setup

Considering the large number of candidate graphs (3942 in total) that we want to explore, we first investigate graph structure of MLPs on the CIFAR-10 dataset (Krizhevsky, 2009) which has 50K training images and 10K validation images. We then further study the larger and more complex task of ImageNet classification (Russakovsky et al., 2015), which consists of 1K image classes, 1.28M training images and 50K validation images.

考慮到我們想要探索的候選圖數(shù)量很大(總共3942個)，我們首先在CIFAR-10數(shù)據(jù)集(Krizhevsky, 2009)上研究MLPs的圖結(jié)構(gòu)，該數(shù)據(jù)集有50K訓練圖像和10K驗證圖像。然后，我們進一步研究了更大、更復雜的ImageNet分類任務(wù)(Russakovsky et al.， 2015)，包括1K圖像類、1.28M訓練圖像和50K驗證圖像。

4.1. Base Architectures

For CIFAR-10 experiments, We use a 5-layer MLP with 512 hidden units as the baseline architecture. The input of the MLP is a 3072-d flattened vector of the (32×32×3) image, the output is a 10-d prediction. Each MLP layer has a ReLU non-linearity and a BatchNorm layer (Ioffe & Szegedy, 2015). We train the model for 200 epochs with batch size 128, using cosine learning rate schedule (Loshchilov & Hutter, 2016) with an initial learning rate of 0.1 (annealed to 0, no restarting). We train all MLP models with 5 different random seeds and report the averaged results.

在CIFAR-10實驗中，我們使用一個5層的MLP和512個隱藏單元作為基線架構(gòu)。MLP的輸入是(32×32×3)圖像的3072 d平坦向量，輸出是10 d預測。每個MLP層具有ReLU非線性和BatchNorm層(Ioffe & Szegedy, 2015)。我們使用余弦學習率計劃(Loshchilov & Hutter, 2016)訓練批量大小為128的200個epoch的模型，初始學習率為0.1(退火到0，不重新啟動)。我們用5種不同的隨機種子訓練所有的MLP模型，并報告平均結(jié)果。

For ImageNet experiments, we use three ResNet-family architectures, including (1) ResNet-34, which only consists of basic blocks of 3×3 convolutions (He et al., 2016); (2) ResNet-34-sep, a variant where we replace all 3×3 dense convolutions in ResNet-34 with 3×3 separable convolutions (Chollet, 2017); (3) ResNet-50, which consists of bottleneck blocks (He et al., 2016) of 1×1, 3×3, 1×1 convolutions. Additionally, we use EfficientNet-B0 architecture (Tan & Le, 2019) that achieves good performance in the small computation regime. Finally, we use a simple 8- layer CNN with 3×3 convolutions. The model has 3 stages with [64, 128, 256] hidden units. Stride-2 convolutions are used for down-sampling. The stem and head layers are?the same as a ResNet. We train all the ImageNet models for 100 epochs using cosine learning rate schedule with initial learning rate of 0.1. Batch size is 256 for ResNetfamily models and 512 for EfficientNet-B0. We train all ImageNet models with 3 random seeds and report the averaged performance. All the baseline architectures have a complete relational graph structure. The reference computational complexity is 2.89e6 FLOPS for MLP, 3.66e9 FLOPS for ResNet-34, 0.55e9 FLOPS for ResNet-34-sep, 4.09e9 FLOPS for ResNet-50, 0.39e9 FLOPS for EffcientNet-B0, and 0.17e9 FLOPS for 8-layer CNN. Training an MLP model roughly takes 5 minutes on a NVIDIA Tesla V100 GPU, and training a ResNet model on ImageNet roughly takes a day on 8 Tesla V100 GPUs with data parallelism. We provide more details in Appendix.

在ImageNet實驗中，我們使用了三種resnet系列架構(gòu)，包括(1)ResNet-34，它只包含3×3卷積的基本塊(He et al.， 2016);(2) ResNet-34-sep，將ResNet-34中的所有3×3稠密卷積替換為3×3可分離卷積(Chollet, 2017);(3) ResNet-50，由1×1,3×3,1×1卷積的瓶頸塊(He et al.， 2016)組成。此外，我們使用了efficiency - net - b0架構(gòu)(Tan & Le, 2019)，在小計算環(huán)境下取得了良好的性能。最后，我們使用一個簡單的8層CNN, 3×3卷積。模型有3個階段，隱含單元為[64,128,256]。Stride-2卷積用于下采樣。莖和頭層與ResNet相同。我們使用初始學習率為0.1的余弦學習率計劃對所有ImageNet模型進行100個epoch的訓練。批大小是256的ResNetfamily模型和512的效率網(wǎng)- b0。我們用3個隨機種子訓練所有的ImageNet模型，并報告平均性能。所有的基線架構(gòu)都有一個完整的關(guān)系圖結(jié)構(gòu)。MLP的參考計算復雜度是2.89e6失敗，ResNet-34的3.66e9失敗，ResNet-34-sep的0.55e9失敗，ResNet-50的4.09e9失敗，EffcientNet-B0的0.39e9失敗，8層CNN的0.17e9失敗。在NVIDIA Tesla V100 GPU上訓練一個MLP模型大約需要5分鐘，而在ImageNet上訓練一個ResNet模型大約需要一天，在8個Tesla V100 GPU上使用數(shù)據(jù)并行。我們在附錄中提供更多細節(jié)。

Figure 4: Key results. The computational budgets of all the experiments are rigorously controlled. Each visualized result is averaged over at least 3 random seeds. A complete graph with C = 1 and L = 1 (lower right corner) is regarded as the baseline. (a)(c) Graph measures vs. neural network performance. The best graphs significantly outperform the baseline complete graphs. (b)(d) Single graph measure vs. neural network performance. Relational graphs that fall within the given range are shown as grey points. The overall smooth function is indicated by the blue regression line. (e) Consistency across architectures. Correlations of the performance of the same set of 52 relational graphs when translated to different neural architectures are shown. (f) Summary of all the experiments. Best relational graphs (the red crosses) consistently outperform the baseline complete graphs across different settings. Moreover, we highlight the “sweet spots” (red rectangular regions), in which relational graphs are not statistically worse than the best relational graphs (bins with red crosses). Bin values of 5-layer MLP on CIFAR-10 are average over all the relational graphs whose C and L fall into the given bin

圖4:關(guān)鍵結(jié)果。所有實驗的計算預算都是嚴格控制的。每個可視化結(jié)果至少在3個隨機種子上取平均值。以C = 1, L = 1(右下角)的完整圖作為基線。(a)(c)圖形測量與神經(jīng)網(wǎng)絡(luò)性能。最好的圖表明顯優(yōu)于基線完整的圖表。(b)(d)單圖測量與神經(jīng)網(wǎng)絡(luò)性能。在給定范圍內(nèi)的關(guān)系圖顯示為灰色點。整體平滑函數(shù)用藍色回歸線表示。(e)架構(gòu)之間的一致性。當轉(zhuǎn)換到不同的神經(jīng)結(jié)構(gòu)時，同一組52個關(guān)系圖的性能的相關(guān)性被顯示出來。(f)總結(jié)所有實驗。在不同的設(shè)置中，最佳關(guān)系圖(紅色叉)的表現(xiàn)始終優(yōu)于基線完整圖。此外，我們強調(diào)了“甜蜜點”(紅色矩形區(qū)域)，其中關(guān)系圖在統(tǒng)計上并不比最佳關(guān)系圖(紅色叉的箱子)差。CIFAR-10上5層MLP的Bin值是C和L屬于給定Bin的所有關(guān)系圖的平均值

4.2. Exploration with Relational Graphs

For all the architectures, we instantiate each sampled relational graph as a neural network, using the corresponding definitions outlined in Table 1. Specifically, we replace all the dense layers (linear layers, 3×3 and 1×1 convolution layers) with their relational graph counterparts. We leave?the input and output layer unchanged and keep all the other designs (such as down-sampling, skip-connections, etc.) intact. We then match the reference computational complexity for all the models, as discussed in Section 3.3.

對于所有的架構(gòu)，我們使用表1中列出的相應(yīng)定義將每個抽樣的關(guān)系圖實例化為一個神經(jīng)網(wǎng)絡(luò)。具體來說，我們將所有的稠密層(線性層、3×3和1×1卷積層)替換為對應(yīng)的關(guān)系圖。我們保持輸入和輸出層不變，并保持所有其他設(shè)計(如下采樣、跳接等)不變。然后我們匹配所有模型的參考計算復雜度，如3.3節(jié)所討論的。

For CIFAR-10 MLP experiments, we study 3942 sampled relational graphs of 64 nodes as described in Section 3.2. For ImageNet experiments, due to high computational cost, we sub-sample 52 graphs uniformly from the 3942 graphs. Since EfficientNet-B0 is a small model with a layer that has only 16 channels, we can not reuse the 64-node graphs sampled for other setups. We re-sample 48 relational graphs with 16 nodes following the same procedure in Section 3.

對于CIFAR-10 MLP實驗，我們研究了包含64個節(jié)點的3942個抽樣關(guān)系圖，如3.2節(jié)所述。在ImageNet實驗中，由于計算量大，我們從3942個圖中均勻地抽取52個圖。因為efficient - b0是一個只有16個通道的層的小模型，我們不能在其他設(shè)置中重用64節(jié)點圖。我們按照第3節(jié)中相同的步驟，對48個有16個節(jié)點的關(guān)系圖重新采樣。

5. Results

In this section, we summarize the results of our experiments and discuss our key findings. We collect top-1 errors for all the sampled relational graphs on different tasks and architectures, and also record the graph measures (average path length L and clustering coefficient C) for each sampled graph. We present these results as heat maps of graph measures vs. predictive performance (Figure 4(a)(c)(f)).

在本節(jié)中，我們將總結(jié)我們的實驗結(jié)果并討論我們的主要發(fā)現(xiàn)。我們收集了不同任務(wù)和架構(gòu)上的所有抽樣關(guān)系圖的top-1錯誤，并記錄了每個抽樣圖的圖度量(平均路徑長度L和聚類系數(shù)C)。我們將這些結(jié)果作為圖表測量與預測性能的熱圖(圖4(a)(c)(f))。

5.1. A Sweet Spot for Top Neural Networks

Overall, the heat maps of graph measures vs. predictive performance (Figure 4(f)) show that there exist graph structures that can outperform the complete graph (the pixel on bottom right) baselines. The best performing relational graph can outperform the complete graph baseline by 1.4% top-1 error on CIFAR-10, and 0.5% to 1.2% for models on ImageNet. Notably, we discover that top-performing graphs tend to cluster into a sweet spot in the space defined by C and L (red rectangles in Figure 4(f)). We follow these steps to identify a sweet spot: (1) we downsample and aggregate the 3942 graphs in Figure 4(a) into a coarse resolution of 52 bins, where each bin records the performance of graphs that fall into the bin; (2) we identify the bin with best average performance (red cross in Figure 4(f)); (3) we conduct onetailed t-test over each bin against the best-performing bin, and record the bins that are not significantly worse than the best-performing bin (p-value 0.05 as threshold). The minimum area rectangle that covers these bins is visualized as a sweet spot. For 5-layer MLP on CIFAR-10, the sweet spot is C ∈ [0.10, 0.50], L ∈ [1.82, 2.75].

總的來說，圖度量與預測性能的熱圖(圖4(f))表明，存在的圖結(jié)構(gòu)可以超過整個圖(右下角的像素)基線。在CIFAR-10中，表現(xiàn)最好的關(guān)系圖的top-1誤差比整個圖基線高出1.4%，而在ImageNet中，模型的top-1誤差為0.5%到1.2%。值得注意的是，我們發(fā)現(xiàn)性能最好的圖往往聚集在C和L定義的空間中的一個最佳點(圖4(f)中的紅色矩形)。我們按照以下步驟來確定最佳點:(1)我們向下采樣并將圖4(a)中的3942個圖匯總為52個大致分辨率的bin，每個bin記錄落入bin的圖的性能;(2)我們確定平均性能最佳的bin(圖4(f)中的紅十字會);(3)對每個箱子與性能最好的箱子進行最小t檢驗，記錄性能不明顯差的箱子(p-value 0.05為閾值)。覆蓋這些箱子的最小面積矩形被可視化為一個甜點點。對于CIFAR-10上的5層MLP，最優(yōu)點C∈[0.10,0.50]，L∈[1.82,2.75]。

5.2. Neural Network Performance as a Smooth Function over Graph Measures

In Figure 4(f), we observe that neural network’s predictive performance is approximately a smooth function of the clustering coefficient and average path length of its relational graph. Keeping one graph measure fixed in a small range (C ∈ [0.4, 0.6], L ∈ [2, 2.5]), we visualize network performances against the other measure (shown in Figure 4(b)(d)). We use second degree polynomial regression to visualize the overall trend. We observe that both clustering coefficient and average path length are indicative of neural network performance, demonstrating a smooth U-shape correlation

在圖4(f)中，我們觀察到神經(jīng)網(wǎng)絡(luò)的預測性能近似是其聚類系數(shù)和關(guān)系圖平均路徑長度的平滑函數(shù)。將一個圖度量固定在一個小范圍內(nèi)(C∈[0.4,0.6]，L∈[2,2.5])，我們將網(wǎng)絡(luò)性能與另一個度量進行可視化(如圖4(b)(d)所示)。我們使用二次多項式回歸來可視化總體趨勢。我們觀察到，聚類系數(shù)和平均路徑長度都是神經(jīng)網(wǎng)絡(luò)性能的指標，呈平滑的u形相關(guān)

5.3. Consistency across Architectures

Figure 5: Quickly identifying a sweet spot. Left: The correlation between sweet spots identified using fewer samples of relational graphs and using all 3942 graphs. Right: The correlation between sweet spots identified at the intermediate training epochs and the final epoch (100 epochs).

圖5:快速確定最佳點。左圖:使用較少的關(guān)系圖樣本和使用全部3942幅圖識別出的甜點之間的相關(guān)性。右圖:中間訓練時期和最后訓練時期(100個時期)確定的甜蜜點之間的相關(guān)性。

Given that relational graph defines a shared design space across various neural architectures, we observe that relational graphs with certain graph measures may consistently perform well regardless of how they are instantiated.

Qualitative consistency. We visually observe in Figure 4(f) that the sweet spots are roughly consistent across different architectures. Specifically, if we take the union of the sweet spots across architectures, we have C ∈ [0.43, 0.50], L ∈ [1.82, 2.28] which is the consistent sweet spot across architectures. Moreover, the U-shape trends between graph measures and corresponding neural network performance, shown in Figure 4(b)(d), are also visually consistent.

Quantitative consistency. To further quantify this consistency across tasks and architectures, we select the 52 bins?in the heat map in Figure 4(f), where the bin value indicates the average performance of relational graphs whose graph measures fall into the bin range. We plot the correlation of the 52 bin values across different pairs of tasks, shown in Figure 4(e). We observe that the performance of relational graphs with certain graph measures correlates across different tasks and architectures. For example, even though a ResNet-34 has much higher complexity than a 5-layer MLP, and ImageNet is a much more challenging dataset than CIFAR-10, a fixed set relational graphs would perform similarly in both settings, indicated by a Pearson correlation of 0.658 (p-value < 10?8 ).

假設(shè)關(guān)系圖定義了跨各種神經(jīng)結(jié)構(gòu)的共享設(shè)計空間，我們觀察到，無論如何實例化，具有特定圖度量的關(guān)系圖都可以始終執(zhí)行得很好。

定性的一致性。在圖4(f)中，我們可以直觀地看到，不同架構(gòu)之間的甜點點基本上是一致的。具體來說，如果我們?nèi)】缂軜?gòu)的甜蜜點的并集，我們有C∈[0.43,0.50]，L∈[1.82,2.28]，這是跨架構(gòu)的一致的甜蜜點。此外，圖4(b)(d)所示的圖測度與對應(yīng)的神經(jīng)網(wǎng)絡(luò)性能之間的u形趨勢在視覺上也是一致的。

量化一致性。為了進一步量化跨任務(wù)和架構(gòu)的一致性，我們在圖4(f)的熱圖中選擇了52個bin，其中bin值表示圖度量在bin范圍內(nèi)的關(guān)系圖的平均性能。我們繪制52個bin值在不同任務(wù)對之間的相關(guān)性，如圖4(e)所示。我們觀察到，具有特定圖形的關(guān)系圖的性能度量了不同任務(wù)和架構(gòu)之間的關(guān)聯(lián)。例如，盡管ResNet-34比5層MLP復雜得多，ImageNet是一個比ciremote -10更具挑戰(zhàn)性的數(shù)據(jù)集，一個固定的集合關(guān)系圖在兩種設(shè)置中表現(xiàn)相似，通過0.658的Pearson相關(guān)性表示(p值< 10?8)。

5.4. Quickly Identifying a Sweet Spot

Training thousands of relational graphs until convergence might be computationally prohibitive. Therefore, we quantitatively show that a sweet spot can be identified with much less computational cost, e.g., by sampling fewer graphs and training for fewer epochs.

How many graphs are needed? Using the 5-layer MLP on CIFAR-10 as an example, we consider the heat map over 52 bins in Figure 4(f) which is computed using 3942 graph samples. We investigate if a similar heat map can be produced with much fewer graph samples. Specifically, we sub-sample the graphs in each bin while making sure each bin has at least one graph. We then compute the correlation between the 52 bin values computed using all 3942 graphs and using sub-sampled fewer graphs, as is shown in Figure 5 (left). We can see that bin values computed using only 52 samples have a high 0.90 Pearson correlation with the bin values computed using full 3942 graph samples. This finding suggests that, in practice, much fewer graphs are needed to conduct a similar analysis.

訓練成千上萬的關(guān)系圖，直到運算上無法收斂為止。因此，我們定量地表明，可以用更少的計算成本來確定一個最佳點，例如，通過采樣更少的圖和訓練更少的epoch。

需要多少個圖?以CIFAR-10上的5層MLP為例，我們考慮圖4(f)中52個箱子上的熱圖，該熱圖使用3942個圖樣本計算。我們研究了是否可以用更少的圖表樣本制作類似的熱圖。具體來說，我們對每個容器中的圖進行子采樣，同時確保每個容器至少有一個圖。然后，我們計算使用所有3942圖和使用更少的次采樣圖計算的52個bin值之間的相關(guān)性，如圖5(左)所示。我們可以看到，僅使用52個樣本計算的bin值與使用全部3942個圖樣本計算的bin值有很高的0.90 Pearson相關(guān)性。這一發(fā)現(xiàn)表明，實際上，進行類似分析所需的圖表要少得多。

5.5. Network Science and Neuroscience Connections

Network science. The average path length that we measure characterizes how well information is exchanged across the network (Latora & Marchiori, 2001), which aligns with our definition of relational graph that consists of rounds of message exchange. Therefore, the U-shape correlation in Figure 4(b)(d) might indicate a trade-off between message exchange efficiency (Sengupta et al., 2013) and capability of learning distributed representations (Hinton, 1984). Neuroscience. The best-performing relational graph that we discover surprisingly resembles biological neural networks, as is shown in Table 2 and Figure 6. The similarities are in two-fold: (1) the graph measures (L and C) of top artificial neural networks are highly similar to biological neural networks; (2) with the relational graph representation, we can translate biological neural networks to 5-layer MLPs, and found that these networks also outperform the baseline complete graphs. While our findings are preliminary, our approach opens up new possibilities for interdisciplinary research in network science, neuroscience and deep learning.

網(wǎng)絡(luò)科學。我們測量的平均路徑長度表征了信息在網(wǎng)絡(luò)中交換的良好程度(Latora & Marchiori, 2001)，這與我們對包含輪消息交換的關(guān)系圖的定義一致。因此，圖4(b)(d)中的u形相關(guān)性可能表明消息交換效率(Sengupta et al.， 2013)和學習分布式表示的能力(Hinton, 1984)之間的權(quán)衡。神經(jīng)科學。我們發(fā)現(xiàn)的性能最好的關(guān)系圖與生物神經(jīng)網(wǎng)絡(luò)驚人地相似，如表2和圖6所示。相似點有兩方面:

(1)top人工神經(jīng)網(wǎng)絡(luò)的圖測度(L和C)與生物神經(jīng)網(wǎng)絡(luò)高度相似;
(2)利用關(guān)系圖表示，我們可以將生物神經(jīng)網(wǎng)絡(luò)轉(zhuǎn)換為5層MLPs，并發(fā)現(xiàn)這些網(wǎng)絡(luò)的性能也優(yōu)于基線完整圖。

雖然我們的發(fā)現(xiàn)還處于初步階段，但我們的方法為網(wǎng)絡(luò)科學、神經(jīng)科學和深度學習領(lǐng)域的跨學科研究開辟了新的可能性。

6. Related Work

Neural network connectivity. The design of neural network connectivity patterns has been focused on computational graphs at different granularity: the macro structures, i.e. connectivity across layers (LeCun et al., 1998; Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2015; He et al., 2016; Huang et al., 2017; Tan & Le, 2019), and the micro structures, i.e. connectivity within a layer (LeCun et al., 1998; Xie et al., 2017; Zhang et al., 2018; Howard et al., 2017; Dao et al., 2019; Alizadeh et al., 2019). Our current exploration focuses on the latter, but the same methodology can be extended to the macro space. Deep Expander Networks (Prabhu et al., 2018) adopt expander graphs to generate bipartite structures. RandWire (Xie et al., 2019) generates macro structures using existing graph generators. However, the statistical relationships between graph structure measures and network predictive performances were not explored in those works. Another related work is Cross-channel Communication Networks (Yang et al., 2019) which aims to encourage the neuron communication through message passing, where only a complete graph structure is considered.

神經(jīng)網(wǎng)絡(luò)的連通性。神經(jīng)網(wǎng)絡(luò)連通性模式的設(shè)計一直關(guān)注于不同粒度的計算圖:宏觀結(jié)構(gòu)，即跨層連通性(LeCun et al.， 1998;Krizhevsky等，2012;Simonyan & Zisserman, 2015;Szegedy等，2015;He et al.， 2016;黃等，2017;Tan & Le, 2019)和微觀結(jié)構(gòu)，即層內(nèi)的連通性(LeCun et al.， 1998;謝等，2017;張等，2018;Howard等人，2017;Dao等，2019年;Alizadeh等，2019)。我們目前的研究重點是后者，但同樣的方法可以擴展到宏觀空間。深度擴展器網(wǎng)絡(luò)(Prabhu et al.， 2018)采用擴展器圖生成二部圖結(jié)構(gòu)。RandWire (Xie等人，2019)使用現(xiàn)有的圖生成器生成宏結(jié)構(gòu)。然而，圖結(jié)構(gòu)測度與網(wǎng)絡(luò)預測性能之間的統(tǒng)計關(guān)系并沒有在這些工作中探索。另一項相關(guān)工作是跨通道通信網(wǎng)絡(luò)(Yang et al.， 2019)，旨在通過消息傳遞促進神經(jīng)元通信，其中只考慮了完整的圖結(jié)構(gòu)。

Neural architecture search. Efforts on learning the connectivity patterns at micro (Ahmed & Torresani, 2018; Wortsman et al., 2019; Yang et al., 2018), or macro (Zoph & Le, 2017; Zoph et al., 2018) level mostly focus on improving learning/search algorithms (Liu et al., 2018; Pham et al., 2018; Real et al., 2019; Liu et al., 2019). NAS-Bench101 (Ying et al., 2019) defines a graph search space by enumerating DAGs with constrained sizes (≤ 7 nodes, cf. 64-node graphs in our work). Our work points to a new path: instead of exhaustively searching over all the possible connectivity patterns, certain graph generators and graph measures could define a smooth space where the search cost could be significantly reduced.

神經(jīng)結(jié)構(gòu)搜索。在micro學習連接模式的努力(Ahmed & Torresani, 2018;Wortsman等人，2019年;Yang et al.， 2018)，或macro (Zoph & Le, 2017;Zoph等，2018)水平主要關(guān)注于改進學習/搜索算法(Liu等，2018;Pham等人，2018年;Real等人，2019年;Liu等，2019)。NAS-Bench101 (Ying et al.， 2019)通過枚舉大小受限的DAGs(≤7個節(jié)點，我們的工作cf. 64節(jié)點圖)來定義圖搜索空間。我們的工作指向了一個新的路徑:不再對所有可能的連通性模式進行窮舉搜索，某些圖生成器和圖度量可以定義一個平滑的空間，在這個空間中搜索成本可以顯著降低。

7. Discussions

Hierarchical graph structure of neural networks. As the first step in this direction, our work focuses on graph structures at the layer level. Neural networks are intrinsically hierarchical graphs (from connectivity of neurons to that of layers, blocks, and networks) which constitute a more complex design space than what is considered in this paper. Extensive exploration in that space will be computationally prohibitive, but we expect our methodology and findings to generalize.

Efficient implementation. Our current implementation uses standard CUDA kernels thus relies on weight masking, which leads to worse wall-clock time performance compared with baseline complete graphs. However, the practical adoption of our discoveries is not far-fetched. Complementary?to our work, there are ongoing efforts such as block-sparse kernels (Gray et al., 2017) and fast sparse ConvNets (Elsen et al., 2019) which could close the gap between theoretical FLOPS and real-world gains. Our work might also inform the design of new hardware architectures, e.g., biologicallyinspired ones with spike patterns (Pei et al., 2019).

神經(jīng)網(wǎng)絡(luò)的層次圖結(jié)構(gòu)。作為這個方向的第一步，我們的工作集中在層層次上的圖結(jié)構(gòu)。神經(jīng)網(wǎng)絡(luò)本質(zhì)上是層次圖(從神經(jīng)元的連通性到層、塊和網(wǎng)絡(luò)的連通性)，它構(gòu)成了比本文所考慮的更復雜的設(shè)計空間。在那個空間進行廣泛的探索在計算上是不可能的，但我們希望我們的方法和發(fā)現(xiàn)可以一般化。

高效的實現(xiàn)。我們目前的實現(xiàn)使用標準CUDA內(nèi)核，因此依賴于weight masking，這導致wall-clock時間性能比基線完整圖更差。然而，實際應(yīng)用我們的發(fā)現(xiàn)并不牽強。作為我們工作的補充，還有一些正在進行的工作，如塊稀疏核(Gray et al.， 2017)和快速稀疏卷積網(wǎng)絡(luò)(Elsen et al.， 2019)，它們可以縮小理論失敗和現(xiàn)實收獲之間的差距。我們的工作也可能為新的硬件架構(gòu)的設(shè)計提供信息，例如，受生物學啟發(fā)的帶有spike圖案的架構(gòu)(Pei et al.， 2019)。

Prior vs. Learning. We currently utilize the relational graph representation as a structural prior, i.e., we hard-wire the graph structure on neural networks throughout training. It has been shown that deep ReLU neural networks can automatically learn sparse representations (Glorot et al., 2011). A further question arises: without imposing graph priors, does any graph structure emerge from training a (fully-connected) neural network?

Figure 7: Prior vs. Learning. Results for 5-layer MLPs on CIFAR-10. We highlight the best-performing graph when used as a structural prior. Additionally, we train a fullyconnected MLP, and visualize the learned weights as a relational graph (different points are graphs under different thresholds). The learned graph structure moves towards the “sweet spot” after training but does not close the gap.

之前與學習。我們目前利用關(guān)系圖表示作為結(jié)構(gòu)先驗，即。，在整個訓練過程中，我們將圖形結(jié)構(gòu)硬連接到神經(jīng)網(wǎng)絡(luò)上。已有研究表明，深度ReLU神經(jīng)網(wǎng)絡(luò)可以自動學習稀疏表示(Glorot et al.， 2011)。一個進一步的問題出現(xiàn)了:在不強加圖先驗的情況下，訓練一個(完全連接的)神經(jīng)網(wǎng)絡(luò)會產(chǎn)生任何圖結(jié)構(gòu)嗎?

圖7:先驗與學習。5層MLPs在CIFAR-10上的結(jié)果。當使用結(jié)構(gòu)先驗時，我們會突出顯示表現(xiàn)最佳的圖。此外，我們訓練一個完全連接的MLP，并將學習到的權(quán)重可視化為一個關(guān)系圖(不同的點是不同閾值下的圖)。學習后的圖結(jié)構(gòu)在訓練后向“最佳點”移動，但并沒有縮小差距。

As a preliminary exploration, we “reverse-engineer” a trained neural network and study the emerged relational graph structure. Specifically, we train a fully-connected 5-layer MLP on CIFAR-10 (the same setup as in previous experiments). We then try to infer the underlying relational graph structure of the network via the following steps: (1) to get nodes in a relational graph, we stack the weights from all the hidden layers and group them into 64 nodes, following the procedure described in Section 2.2; (2) to get undirected edges, the weights are summed by their transposes; (3) we compute the Frobenius norm of the weights as the edge value; (4) we get a sparse graph structure by binarizing edge values with a certain threshold.

We show the extracted graphs under different thresholds in Figure 7. As expected, the extracted graphs at initialization follow the patterns of E-R graphs (Figure 3(left)), since weight matrices are randomly i.i.d. initialized. Interestingly, after training to convergence, the extracted graphs are no longer E-R random graphs and move towards the sweet spot region we found in Section 5. Note that there is still a gap?between these learned graphs and the best-performing graph imposed as a structural prior, which might explain why a fully-connected MLP has inferior performance.

In our experiments, we also find that there are a few special cases where learning the graph structure can be superior (i.e., when the task is simple and the network capacity is abundant). We provide more discussions in the Appendix. Overall, these results further demonstrate that studying the graph structure of a neural network is crucial for understanding its predictive performance.

作為初步的探索，我們“逆向工程”一個訓練過的神經(jīng)網(wǎng)絡(luò)和研究出現(xiàn)的關(guān)系圖結(jié)構(gòu)。具體來說，我們在CIFAR-10上訓練了一個完全連接的5層MLP(與之前的實驗相同的設(shè)置)。然后嘗試通過以下步驟來推斷網(wǎng)絡(luò)的底層關(guān)系圖結(jié)構(gòu):(1)為了得到關(guān)系圖中的節(jié)點，我們將所有隱含層的權(quán)值進行疊加，并按照2.2節(jié)的步驟將其分組為64個節(jié)點;(2)對權(quán)值的轉(zhuǎn)置求和，得到無向邊;(3)計算權(quán)值的Frobenius范數(shù)作為邊緣值;(4)通過對具有一定閾值的邊值進行二值化，得到一種稀疏圖結(jié)構(gòu)。

我們在圖7中顯示了在不同閾值下提取的圖。正如預期的那樣，初始化時提取的圖遵循E-R圖的模式(圖3(左))，因為權(quán)重矩陣是隨機初始化的。有趣的是，經(jīng)過收斂訓練后，提取的圖不再是E-R隨機圖，而是朝著我們在第5節(jié)中發(fā)現(xiàn)的最佳點區(qū)域移動。請注意，在這些學習圖和作為結(jié)構(gòu)先驗的最佳性能圖之間仍然存在差距，這可能解釋了為什么完全連接的MLP性能較差。

在我們的實驗中，我們也發(fā)現(xiàn)在一些特殊的情況下學習圖結(jié)構(gòu)是更好的。當任務(wù)簡單且網(wǎng)絡(luò)容量充足時)。我們在附錄中提供了更多的討論。總的來說，這些結(jié)果進一步證明了研究神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu)對于理解其預測性能是至關(guān)重要的。

Unified view of Graph Neural Networks (GNNs) and general neural architectures. The way we define neural networks as a message exchange function over graphs is partly inspired by GNNs (Kipf & Welling, 2017; Hamilton et al., 2017; Velickovi ˇ c et al. ′ , 2018). Under the relational graph representation, we point out that GNNs are a special class of general neural architectures where: (1) graph structure is regarded as the input instead of part of the neural architecture; consequently, (2) message functions are shared across all the edges to respect the invariance properties of the input graph. Concretely, recall how we define general neural networks as relational graphs:

Therefore, our work offers a unified view of GNNs and general neural architecture design, which we hope can bridge the two communities and inspire new innovations. On one hand, successful techniques in general neural architectures can be naturally introduced to the design of GNNs, such as separable convolution (Howard et al., 2017), group normalization (Wu & He, 2018) and Squeeze-and-Excitation block (Hu et al., 2018); on the other hand, novel GNN architectures (You et al., 2019b; Chen et al., 2019) beyond the commonly used paradigm (i.e., Equation 6) may inspire more advanced neural architecture designs.

圖形神經(jīng)網(wǎng)絡(luò)(GNNs)和一般神經(jīng)結(jié)構(gòu)的統(tǒng)一視圖。我們將神經(jīng)網(wǎng)絡(luò)定義為圖形上的信息交換功能的方式，部分受到了gnn的啟發(fā)(Kipf & Welling, 2017;Hamilton等，2017;Velickoviˇc et al .′, 2018)。在關(guān)系圖表示下，我們指出gnn是一類特殊的一般神經(jīng)結(jié)構(gòu)，其中:(1)將圖結(jié)構(gòu)作為輸入，而不是神經(jīng)結(jié)構(gòu)的一部分;因此，(2)消息函數(shù)在所有邊之間共享，以尊重輸入圖的不變性。具體地說，回想一下我們是如何將一般的神經(jīng)網(wǎng)絡(luò)定義為關(guān)系圖的:

因此，我們的工作提供了一個關(guān)于gnn和一般神經(jīng)結(jié)構(gòu)設(shè)計的統(tǒng)一觀點，我們希望能夠搭建這兩個社區(qū)的橋梁，激發(fā)新的創(chuàng)新。一方面，一般神經(jīng)結(jié)構(gòu)中的成功技術(shù)可以自然地引入到gnn的設(shè)計中，如可分離卷積(Howard et al.， 2017)、群歸一化(Wu & He, 2018)和擠壓-激勵塊(Hu et al.， 2018);另一方面，新的GNN架構(gòu)(You et al.， 2019b;陳等人，2019)超越了常用的范式(即，(6)可以啟發(fā)更先進的神經(jīng)結(jié)構(gòu)設(shè)計。

8. Conclusion

In sum, we propose a new perspective of using relational graph representation for analyzing and understanding neural networks. Our work suggests a new transition from studying conventional computation architecture to studying graph structure of neural networks. We show that well-established graph techniques and methodologies offered in other science disciplines (network science, neuroscience, etc.) could contribute to understanding and designing deep neural networks. We believe this could be a fruitful avenue of future research that tackles more complex situations.

最后，我們提出了一種利用關(guān)系圖表示來分析和理解神經(jīng)網(wǎng)絡(luò)的新觀點。我們的工作提出了一個新的過渡，從研究傳統(tǒng)的計算結(jié)構(gòu)到研究神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu)。我們表明，在其他科學學科(網(wǎng)絡(luò)科學、神經(jīng)科學等)中提供的成熟的圖形技術(shù)和方法可以有助于理解和設(shè)計深度神經(jīng)網(wǎng)絡(luò)。我們相信這將是未來解決更復雜情況的研究的一個富有成效的途徑。

Acknowledgments

This work is done during Jiaxuan You’s internship at Facebook AI Research. Jure Leskovec is a Chan Zuckerberg Biohub investigator. The authors thank Alexander Kirillov, Ross Girshick, Jonathan Gomes Selman, Pan Li for their helpful discussions.

這項工作是在You Jiaxuan在Facebook AI Research實習期間完成的。Jure Leskovec是陳-扎克伯格生物中心的調(diào)查員。作者感謝Alexander Kirillov, Ross Girshick, Jonathan Gomes Selman和Pan Li的討論。

總結(jié)

以上是生活随笔為你收集整理的Paper：《Graph Neural Networks: A Review of Methods and Applications》翻译与解读的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： TF之AE：AE实现TF自带数据集数字真
下一篇： CSDN：借助工具对【本博客访问来源】进

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

Paper：《Graph Neural Networks: A Review of Methods and Applications》翻译与解读

《Graph Neural Networks: A Review of Methods and Applications》翻譯與解讀

Abstract

1. Introduction

2. Neural Networks as Relational Graphs

2.1. Message Exchange over Graphs

2.2. Fixed-width MLPs as Relational Graphs

2.3. General Neural Networks as Relational Graphs

3. Exploring Relational Graphs

3.1. Selection of Graph Measures

3.2. Design of Graph Generators

3.3. Controlling Computational Budget

4. Experimental Setup

4.1. Base Architectures

4.2. Exploration with Relational Graphs

5. Results

5.1. A Sweet Spot for Top Neural Networks

5.2. Neural Network Performance as a Smooth Function over Graph Measures

5.3. Consistency across Architectures

5.4. Quickly Identifying a Sweet Spot

5.5. Network Science and Neuroscience Connections

6. Related Work

7. Discussions

8. Conclusion

Acknowledgments

總結(jié)