Paper:《Graph Neural Networks: A Review of Methods and Applications》翻译与解读
Paper:《Graph Neural Networks: A Review of Methods and Applications》翻譯與解讀
?
?
目錄
《Graph Neural Networks: A Review of Methods and Applications》翻譯與解讀
Abstract
1. Introduction
2. Neural Networks as Relational Graphs
2.1. Message Exchange over Graphs
2.2. Fixed-width MLPs as Relational Graphs
2.3. General Neural Networks as Relational Graphs
3. Exploring Relational Graphs
3.1. Selection of Graph Measures
3.2. Design of Graph Generators
3.3. Controlling Computational Budget
4. Experimental Setup
4.1. Base Architectures
4.2. Exploration with Relational Graphs
5. Results
5.1. A Sweet Spot for Top Neural Networks
5.2. Neural Network Performance as a Smooth Function over Graph Measures
5.3. Consistency across Architectures
5.4. Quickly Identifying a Sweet Spot
5.5. Network Science and Neuroscience Connections
6. Related Work
7. Discussions
8. Conclusion
Acknowledgments
?
?
《Graph Neural Networks: A Review of Methods and Applications》翻譯與解讀
原論文地址:
https://arxiv.org/pdf/2007.06559.pdf
https://arxiv.org/abs/2007.06559
| Comments: | ICML 2020 [Submitted on 13 Jul 2020] |
| Subjects: | Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI); Machine Learning (stat.ML) |
| Cite as: | arXiv:2007.06559?[cs.LG] (or?arXiv:2007.06559v1?[cs.LG]?for this version) |
?
Abstract
| Neural networks are often represented as graphs of connections between neurons. However, despite their wide use, there is currently little understanding of the relationship between the graph structure of the neural network and its predictive performance. Here we systematically investigate how does the graph structure of neural networks affect their predictive performance. To this end, we develop a novel graph-based representation of neural networks called relational graph, where layers of neural network computation correspond to rounds of message exchange along the graph structure. Using this representation we show that: (1) a "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance; (2) neural network's performance is approximately a smooth function of the clustering coefficient and average path length of its relational graph; (3) our findings are consistent across many different tasks and datasets; (4) the sweet spot can be identified efficiently; (5) top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks. Our work opens new directions for the design of neural architectures and the understanding on neural networks in general. | 神經(jīng)網(wǎng)絡(luò)通常用神經(jīng)元之間的連接圖來(lái)表示。然而,盡管它們被廣泛使用,目前人們對(duì)神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu)與其預(yù)測(cè)性能之間的關(guān)系知之甚少。在這里,我們系統(tǒng)地研究如何圖結(jié)構(gòu)的神經(jīng)網(wǎng)絡(luò)影響其預(yù)測(cè)性能。為此,我們開(kāi)發(fā)了一種新的基于圖的神經(jīng)網(wǎng)絡(luò)表示,稱(chēng)為關(guān)系圖,其中神經(jīng)網(wǎng)絡(luò)計(jì)算的層對(duì)應(yīng)于沿著圖結(jié)構(gòu)的消息交換輪數(shù)。利用這種表示法,我們證明:
|
?
?
1. Introduction
| Deep neural networks consist of neurons organized into layers and connections between them. Architecture of a neural network can be captured by its “computational graph” where neurons are represented as nodes and directed edges link neurons in different layers. Such graphical representation demonstrates how the network passes and transforms the information from its input neurons, through hidden layers?all the way to the output neurons (McClelland et al., 1986). While it has been widely observed that performance of neural networks depends on their architecture (LeCun et al., 1998; Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2015; He et al., 2016), there is currently little systematic understanding on the relation between a neural network’s accuracy and its underlying graph structure. This is especially important for the neural architecture search, which today exhaustively searches over all possible connectivity patterns (Ying et al., 2019). From this perspective, several open questions arise:
Establishing such a relation is both scientifically and practically important because it would have direct consequences on designing more efficient and more accurate architectures. It would also inform the design of new hardware architectures that execute neural networks. Understanding the graph structures that underlie neural networks would also advance the science of deep learning. However, establishing the relation between network architecture and its accuracy is nontrivial, because it is unclear how to map a neural network to a graph (and vice versa). The natural choice would be to use computational graph representation but it has many limitations: (1) lack of generality: Computational graphs are constrained by the allowed graph properties, e.g., these graphs have to be directed and acyclic (DAGs), bipartite at the layer level, and single-in-single-out at the network level (Xie et al., 2019). This limits the use of the rich tools developed for general graphs. (2) Disconnection with biology/neuroscience: Biological neural networks have a much richer and less templatized structure (Fornito et al., 2013). There are information exchanges, rather than just single-directional flows, in the brain networks (Stringer et al., 2018). Such biological or neurological models cannot be simply represented by directed acyclic graphs. | 深層神經(jīng)網(wǎng)絡(luò)由神經(jīng)元組成,這些神經(jīng)元被組織成層,并在層與層之間建立聯(lián)系。神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)可以通過(guò)它的“計(jì)算圖”來(lái)捕獲,其中神經(jīng)元被表示為節(jié)點(diǎn),有向邊連接不同層次的神經(jīng)元。這樣的圖形表示演示了網(wǎng)絡(luò)如何傳遞和轉(zhuǎn)換來(lái)自輸入神經(jīng)元的信息,通過(guò)隱藏層一直到輸出神經(jīng)元(McClelland et al., 1986)。雖然已廣泛觀察到神經(jīng)網(wǎng)絡(luò)的性能取決于其結(jié)構(gòu)(LeCun et al., 1998;Krizhevsky等,2012;Simonyan & Zisserman, 2015;Szegedy等,2015;對(duì)于神經(jīng)網(wǎng)絡(luò)的精度與其底層圖結(jié)構(gòu)之間的關(guān)系,目前尚無(wú)系統(tǒng)的認(rèn)識(shí)。這對(duì)于神經(jīng)結(jié)構(gòu)搜索來(lái)說(shuō)尤為重要,如今,神經(jīng)結(jié)構(gòu)搜索遍尋所有可能的連通性模式(Ying等人,2019)。從這個(gè)角度來(lái)看,幾個(gè)開(kāi)放的問(wèn)題出現(xiàn)了:
建立這樣的關(guān)系在科學(xué)上和實(shí)踐上都很重要,因?yàn)樗鼘⒅苯佑绊懙皆O(shè)計(jì)更高效、更精確的架構(gòu)。它還將指導(dǎo)執(zhí)行神經(jīng)網(wǎng)絡(luò)的新硬件架構(gòu)的設(shè)計(jì)。理解神經(jīng)網(wǎng)絡(luò)的圖形結(jié)構(gòu)也將推進(jìn)深度學(xué)習(xí)科學(xué)。 然而,建立網(wǎng)絡(luò)架構(gòu)與其準(zhǔn)確度之間的關(guān)系并非無(wú)關(guān)緊要,因?yàn)?span style="color:#f33b45;">還不清楚如何將神經(jīng)網(wǎng)絡(luò)映射到圖(反之亦然)。自然選擇是使用計(jì)算圖表示,但它有很多限制:(1)缺乏普遍性:計(jì)算圖被限制允許圖形屬性,例如,這些圖表必須指導(dǎo)和無(wú)環(huán)(無(wú)進(jìn)取心的人),在層級(jí)別由兩部分構(gòu)成的,single-in-single-out在網(wǎng)絡(luò)層(謝et al ., 2019)。這限制了為一般圖形開(kāi)發(fā)的豐富工具的使用。(2)與生物學(xué)/神經(jīng)科學(xué)的分離:生物神經(jīng)網(wǎng)絡(luò)具有更豐富的、更少圣殿化的結(jié)構(gòu)(Fornito et al., 2013)。大腦網(wǎng)絡(luò)中存在著信息交換,而不僅僅是單向流動(dòng)(Stringer et al., 2018)。這樣的生物或神經(jīng)模型不能簡(jiǎn)單地用有向無(wú)環(huán)圖來(lái)表示。 |
| Here we systematically study the relationship between the graph structure of a neural network and its predictive performance. We develop a new way of representing a neural network as a graph, which we call relational graph. Our?key insight is to focus on message exchange, rather than just on directed data flow. As a simple example, for a fixedwidth fully-connected layer, we can represent one input channel and one output channel together as a single node, and an edge in the relational graph represents the message exchange between the two nodes (Figure 1(a)). Under this formulation, using appropriate message exchange definition, we show that the relational graph can represent many types of neural network layers (a fully-connected layer, a convolutional layer, etc.), while getting rid of many constraints of computational graphs (such as directed, acyclic, bipartite, single-in-single-out). One neural network layer corresponds to one round of message exchange over a relational graph, and to obtain deep networks, we perform message exchange over the same graph for several rounds. Our new representation enables us to build neural networks that are richer and more diverse and analyze them using well-established tools of network science (Barabasi & Psfai ′ , 2016). We then design a graph generator named WS-flex that allows us to systematically explore the design space of neural networks (i.e., relation graphs). Based on the insights from neuroscience, we characterize neural networks by the clustering coefficient and average path length of their relational graphs (Figure 1(c)). Furthermore, our framework is flexible and general, as we can translate relational graphs into diverse neural architectures, including Multilayer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), ResNets, etc. with controlled computational budgets (Figure 1(d)). | 本文系統(tǒng)地研究了神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu)與其預(yù)測(cè)性能之間的關(guān)系。我們提出了一種將神經(jīng)網(wǎng)絡(luò)表示為圖的新方法,即關(guān)系圖。我們的重點(diǎn)是關(guān)注消息交換,而不僅僅是定向數(shù)據(jù)流。作為一個(gè)簡(jiǎn)單的例子,對(duì)于固定寬度的全連接層,我們可以將一個(gè)輸入通道和一個(gè)輸出通道一起表示為單個(gè)節(jié)點(diǎn),關(guān)系圖中的一條邊表示兩個(gè)節(jié)點(diǎn)之間的消息交換(圖1(a))。在此公式下,利用適當(dāng)?shù)南⒔粨Q定義,我們表明關(guān)系圖可以表示多種類(lèi)型的神經(jīng)網(wǎng)絡(luò)層(全連通層、卷積層等),同時(shí)擺脫了計(jì)算圖的許多約束(如有向、無(wú)環(huán)、二部圖、單入單出)。一個(gè)神經(jīng)網(wǎng)絡(luò)層對(duì)應(yīng)于在關(guān)系圖上進(jìn)行一輪消息交換,為了獲得深度網(wǎng)絡(luò),我們?cè)谕粓D上進(jìn)行幾輪消息交換。我們的新表示使我們能夠構(gòu)建更加豐富和多樣化的神經(jīng)網(wǎng)絡(luò),并使用成熟的網(wǎng)絡(luò)科學(xué)工具對(duì)其進(jìn)行分析(Barabasi & Psfai, 2016)。 然后我們?cè)O(shè)計(jì)了一個(gè)名為WS-flex的圖形生成器,它允許我們系統(tǒng)地探索神經(jīng)網(wǎng)絡(luò)的設(shè)計(jì)空間。關(guān)系圖)。基于神經(jīng)科學(xué)的見(jiàn)解,我們通過(guò)聚類(lèi)系數(shù)和關(guān)系圖的平均路徑長(zhǎng)度來(lái)描述神經(jīng)網(wǎng)絡(luò)(圖1(c))。此外,我們的框架是靈活和通用的,因?yàn)槲覀兛梢詫㈥P(guān)系圖轉(zhuǎn)換為不同的神經(jīng)結(jié)構(gòu),包括多層感知器(MLPs)、卷積神經(jīng)網(wǎng)絡(luò)(CNNs)、ResNets等,并控制計(jì)算預(yù)算(圖1(d))。 |
| Using standard image classification datasets CIFAR-10 and ImageNet, we conduct a systematic study on how the architecture of neural networks affects their predictive performance. We make several important empirical observations:
Our results have implications for designing neural network architectures, advancing the science of deep learning and improving our understanding of neural networks in general. | 使用標(biāo)準(zhǔn)圖像分類(lèi)數(shù)據(jù)集CIFAR-10和ImageNet,我們對(duì)神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)如何影響其預(yù)測(cè)性能進(jìn)行了系統(tǒng)研究。我們做了幾個(gè)重要的經(jīng)驗(yàn)觀察:
? |
| Figure 1: Overview of our approach. (a) A layer of a neural network can be viewed as a relational graph where we connect nodes that exchange messages. (b) More examples of neural network layers and relational graphs. (c) We explore the design space of relational graphs according to their graph measures, including average path length and clustering coefficient, where the complete graph corresponds to a fully-connected layer. (d) We translate these relational graphs to neural networks and study how their predictive performance depends on the graph measures of their corresponding relational graphs. 圖1:我們方法的概述。(a)神經(jīng)網(wǎng)絡(luò)的一層可以看作是一個(gè)關(guān)系圖,在這里我們連接交換消息的節(jié)點(diǎn)。(b)神經(jīng)網(wǎng)絡(luò)層和關(guān)系圖的更多例子。(c)我們根據(jù)關(guān)系圖的圖度量來(lái)探索關(guān)系圖的設(shè)計(jì)空間,包括平均路徑長(zhǎng)度和聚類(lèi)系數(shù),其中完全圖對(duì)應(yīng)一個(gè)全連通層。我們將這些關(guān)系圖轉(zhuǎn)換為神經(jīng)網(wǎng)絡(luò),并研究它們的預(yù)測(cè)性能如何取決于對(duì)應(yīng)關(guān)系圖的圖度量。 | |
?
?
2. Neural Networks as Relational Graphs
| To explore the graph structure of neural networks, we first introduce the concept of our relational graph representation and its instantiations. We demonstrate how our representation can capture diverse neural network architectures under a unified framework. Using the language of graph in the context of deep learning helps bring the two worlds together and establish a foundation for our study. | 為了探討神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu),我們首先介紹關(guān)系圖表示的概念及其實(shí)例。我們將演示如何在統(tǒng)一框架下捕獲不同的神經(jīng)網(wǎng)絡(luò)架構(gòu)。在深度學(xué)習(xí)的背景下使用graph語(yǔ)言有助于將這兩個(gè)世界結(jié)合起來(lái),為我們的研究奠定基礎(chǔ)。 |
2.1. Message Exchange over Graphs
| We start by revisiting the definition of a neural network from the graph perspective. We define a graph G = (V, E) by its node set V = {v1, ..., vn} and edge set E ? {(vi , vj )|vi , vj ∈ V}. We assume each node v has a node feature scalar/vector xv. Table 1: Diverse neural architectures expressed in the language of relational graphs. These architectures are usually implemented as complete relational graphs, while we systematically explore more graph structures for these architectures. 表1:用關(guān)系圖語(yǔ)言表示的各種神經(jīng)結(jié)構(gòu)。這些架構(gòu)通常被實(shí)現(xiàn)為完整的關(guān)系圖,而我們系統(tǒng)地為這些架構(gòu)探索更多的圖結(jié)構(gòu)。 Figure 2: Example of translating a 4-node relational graph to a 4-layer 65-dim MLP. We highlight the message exchange for node x1. Using different definitions of xi , fi(·), AGG(·) and R (those defined in Table 1), relational graphs can be translated to diverse neural architectures. 圖2:將4節(jié)點(diǎn)關(guān)系圖轉(zhuǎn)換為4層65-dim MLP的示例。我們將重點(diǎn)介紹節(jié)點(diǎn)x1的消息交換。使用xi、fi(·)、AGG(·)和R(表1中定義的)的不同定義,關(guān)系圖可以轉(zhuǎn)換為不同的神經(jīng)結(jié)構(gòu)。 | |
| We call graph G a relational graph, when it is associated with message exchanges between neurons. Specifically, a message exchange is defined by a message function, whose input is a node’s feature and output is a message, and an aggregation function, whose input is a set of messages and output is the updated node feature. At each round of message exchange, each node sends messages to its neighbors, and aggregates incoming messages from its neighbors. Each message is transformed at each edge through a message function f(·), then they are aggregated at each node via an aggregation function AGG(·). Suppose we conduct R rounds of message exchange, then the r-th round of message exchange for a node v can be described as Equation 1 provides a general definition for message exchange. In the remainder of this section, we discuss how this general message exchange definition can be instantiated as different neural architectures. We summarize the different instantiations in Table 1, and provide a concrete example of instantiating a 4-layer 65-dim MLP in Figure 2. | 我們稱(chēng)圖G為關(guān)系圖,當(dāng)它與神經(jīng)元之間的信息交換有關(guān)時(shí)。具體來(lái)說(shuō),消息交換由消息函數(shù)和聚合函數(shù)定義,前者的輸入是節(jié)點(diǎn)的特性,輸出是消息,后者的輸入是一組消息,輸出是更新后的節(jié)點(diǎn)特性。在每一輪消息交換中,每個(gè)節(jié)點(diǎn)向它的鄰居發(fā)送消息,并聚合從它的鄰居傳入的消息。每個(gè)消息通過(guò)消息函數(shù)f(·)在每個(gè)邊進(jìn)行轉(zhuǎn)換,然后通過(guò)聚合函數(shù)AGG(·)在每個(gè)節(jié)點(diǎn)進(jìn)行聚合。假設(shè)我們進(jìn)行了R輪的消息交換,那么節(jié)點(diǎn)v的第R輪消息交換可以描述為 公式1提供了消息交換的一般定義。在本節(jié)的其余部分中,我們將討論如何將這個(gè)通用消息交換定義實(shí)例化為不同的神經(jīng)結(jié)構(gòu)。我們?cè)诒?中總結(jié)了不同的實(shí)例,并在圖2中提供了實(shí)例化4層65-dim MLP的具體示例。 |
?
2.2. Fixed-width MLPs as Relational Graphs
| A Multilayer Perceptron (MLP) consists of layers of computation units (neurons), where each neuron performs a weighted sum over scalar inputs and outputs, followed by some non-linearity. Suppose the r-th layer of an MLP takes x (r) as input and x (r+1) as output, then a neuron computes: | 一個(gè)多層感知器(MLP)由多層計(jì)算單元(神經(jīng)元)組成,其中每個(gè)神經(jīng)元執(zhí)行標(biāo)量輸入和輸出的加權(quán)和,然后是一些非線性。假設(shè)MLP的第r層以x (r)為輸入,x (r+1)為輸出,則有一個(gè)神經(jīng)元計(jì)算: |
| The above discussion reveals that a fixed-width MLP can be viewed as a complete relational graph with a special message exchange function. Therefore, a fixed-width MLP is a special case under a much more general model family, where the message function, aggregation function, and most importantly, the relation graph structure can vary. This insight allows us to generalize fixed-width MLPs from using complete relational graph to any general relational graph G. Based on the general definition of message exchange in Equation 1, we have: | 上面的討論表明,可以將固定寬度的MLP視為具有特殊消息交換功能的完整關(guān)系圖。因此,固定寬度的MLP是更為通用的模型系列中的一種特殊情況,其中消息函數(shù)、聚合函數(shù)以及最重要的關(guān)系圖結(jié)構(gòu)可能會(huì)發(fā)生變化。 這使得我們可以將固定寬度MLPs從使用完全關(guān)系圖推廣到任何一般關(guān)系圖g。根據(jù)公式1中消息交換的一般定義,我們有: |
?
2.3. General Neural Networks as Relational Graphs
| The graph viewpoint in Equation 3 lays the foundation of representing fixed-width MLPs as relational graphs. In this?section, we discuss how we can further generalize relational graphs to general neural networks. Variable-width MLPs as relational graphs. An important design consideration for general neural networks is that layer width often varies through out the network. For example, in CNNs, a common practice is to double the layer width (number of feature channels) after spatial down-sampling. | 式3中的圖點(diǎn)為將定寬MLPs表示為關(guān)系圖奠定了基礎(chǔ)。在這一節(jié)中,我們將討論如何進(jìn)一步將關(guān)系圖推廣到一般的神經(jīng)網(wǎng)絡(luò)。 可變寬度MLPs作為關(guān)系圖。對(duì)于一般的神經(jīng)網(wǎng)絡(luò),一個(gè)重要的設(shè)計(jì)考慮是網(wǎng)絡(luò)的層寬經(jīng)常是不同的。例如,在CNNs中,常用的做法是在空間下采樣后將層寬(特征通道數(shù))增加一倍。 |
| Note that under this definition, the maximum number of nodes of a relational graph is bounded by the width of the narrowest layer in the corresponding neural network (since the feature dimension for each node must be at least 1). | 注意,在這個(gè)定義下,關(guān)系圖的最大節(jié)點(diǎn)數(shù)以對(duì)應(yīng)神經(jīng)網(wǎng)絡(luò)中最窄層的寬度為界(因?yàn)槊總€(gè)節(jié)點(diǎn)的特征維數(shù)必須至少為1)。 |
| Modern neural architectures as relational graphs. Finally, we generalize relational graphs to represent modern neural architectures with more sophisticated designs. For?example, to represent a ResNet (He et al., 2016), we keep the residual connections between layers unchanged. To represent neural networks with bottleneck transform (He et al., 2016), a relational graph alternatively applies message exchange with 3×3 and 1×1 convolution; similarly, in the efficient computing setup, the widely used separable convolution (Howard et al., 2017; Chollet, 2017) can be viewed as alternatively applying message exchange with 3×3 depth-wise convolution and 1×1 convolution. Overall, relational graphs provide a general representation for neural networks. With proper definitions of node features and message exchange, relational graphs can represent diverse neural architectures, as is summarized in Table 1. | 作為關(guān)系圖的現(xiàn)代神經(jīng)結(jié)構(gòu)。最后,我們推廣了關(guān)系圖,用更復(fù)雜的設(shè)計(jì)來(lái)表示現(xiàn)代神經(jīng)結(jié)構(gòu)。例如,為了表示ResNet (He et al., 2016),我們保持層之間的剩余連接不變。為了用瓶頸變換表示神經(jīng)網(wǎng)絡(luò)(He et al., 2016),關(guān)系圖交替應(yīng)用3×3和1×1卷積的消息交換;同樣,在高效的計(jì)算設(shè)置中,廣泛使用的可分離卷積(Howard et al., 2017;Chollet, 2017)可以看作是3×3深度卷積和1×1卷積交替應(yīng)用消息交換。 總的來(lái)說(shuō),關(guān)系圖提供了神經(jīng)網(wǎng)絡(luò)的一般表示。通過(guò)正確定義節(jié)點(diǎn)特性和消息交換,關(guān)系圖可以表示不同的神經(jīng)結(jié)構(gòu),如表1所示。 |
?
3. Exploring Relational Graphs
| In this section, we describe in detail how we design and explore the space of relational graphs defined in Section 2, in order to study the relationship between the graph structure of neural networks and their predictive performance. Three main components are needed to make progress: (1) graph measures that characterize graph structural properties, (2) graph generators that can generate diverse graphs, and (3) a way to control the computational budget, so that the differences in performance of different neural networks are due to their diverse relational graph structures. | 在本節(jié)中,我們將詳細(xì)描述如何設(shè)計(jì)和探索第二節(jié)中定義的關(guān)系圖空間,以研究神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu)與其預(yù)測(cè)性能之間的關(guān)系。三個(gè)主要組件是需要取得進(jìn)展:(1)圖的措施描述圖的結(jié)構(gòu)性質(zhì),(2)圖形發(fā)生器,可以產(chǎn)生不同的圖表,和(3)一種方法來(lái)控制計(jì)算預(yù)算,以便不同神經(jīng)網(wǎng)絡(luò)的性能的差異是由于各自不同的關(guān)系圖結(jié)構(gòu)。 |
?
3.1. Selection of Graph Measures
| Given the complex nature of graph structure, graph measures are often used to characterize graphs. In this paper, we focus on one global graph measure, average path length, and one local graph measure, clustering coefficient. Notably, these two measures are widely used in network science (Watts & Strogatz, 1998) and neuroscience (Sporns, 2003; Bassett & Bullmore, 2006). Specifically, average path length measures the average shortest path distance between any pair of nodes; clustering coefficient measures the proportion of edges between the nodes within a given node’s neighborhood, divided by the number of edges that could possibly exist between them, averaged over all the nodes. There are other graph measures that can be used for analysis, which are included in the Appendix. | 由于圖結(jié)構(gòu)的復(fù)雜性,圖測(cè)度通常被用來(lái)刻畫(huà)圖的特征。本文主要研究了一個(gè)全局圖測(cè)度,即平均路徑長(zhǎng)度,和一個(gè)局部圖測(cè)度,即聚類(lèi)系數(shù)。值得注意的是,這兩種方法在網(wǎng)絡(luò)科學(xué)(Watts & Strogatz, 1998)和神經(jīng)科學(xué)(Sporns, 2003;巴西特和布爾莫爾,2006年)。具體來(lái)說(shuō),平均路徑長(zhǎng)度度量任意對(duì)節(jié)點(diǎn)之間的平均最短路徑距離;聚類(lèi)系數(shù)度量給定節(jié)點(diǎn)鄰域內(nèi)節(jié)點(diǎn)之間的邊的比例,除以它們之間可能存在的邊的數(shù)量,平均到所有節(jié)點(diǎn)上。還有其他可以用于分析的圖表度量,包括在附錄中。 |
?
3.2. Design of Graph Generators
| Given selected graph measures, we aim to generate diverse graphs that can cover a large span of graph measures, using a graph generator. However, such a goal requires careful generator designs: classic graph generators can only generate a limited class of graphs, while recent learning-based graph generators are designed to imitate given exemplar graphs (Kipf & Welling, 2017; Li et al., 2018b; You et al., 2018a;b; 2019a). Limitations of existing graph generators. To illustrate the limitation of existing graph generators, we investigate the following classic graph generators: (1) Erdos-R ? enyi ′ (ER) model that can sample graphs with given node and edge number uniformly at random (Erdos & R ? enyi ′ , 1960); (2) Watts-Strogatz (WS) model that can generate graphs with small-world properties (Watts & Strogatz, 1998); (3) Barabasi-Albert (BA) model that can generate scale-free ′ graphs (Albert & Barabasi ′ , 2002); (4) Harary model that can generate graphs with maximum connectivity (Harary, 1962); (5) regular ring lattice graphs (ring graphs); (6) complete graphs. For all types of graph generators, we control the number of nodes to be 64, enumerate all possible discrete parameters and grid search over all continuous parameters of the graph generator. We generate 30 random graphs with different random seeds under each parameter setting. In total, we generate 486,000 WS graphs, 53,000 ER graphs, 8,000 BA graphs, 1,800 Harary graphs, 54 ring graphs and 1 complete graph (more details provided in the Appendix). In Figure 3, we can observe that graphs generated by those classic graph generators have a limited span in the space of average path length and clustering coefficient. | 給定選定的圖度量,我們的目標(biāo)是使用圖生成器生成能夠覆蓋大范圍圖度量的不同圖。然而,這樣的目標(biāo)需要仔細(xì)的生成器設(shè)計(jì):經(jīng)典的圖形生成器只能生成有限的圖形類(lèi),而最近的基于學(xué)習(xí)的圖形生成器被設(shè)計(jì)用來(lái)模仿給定的范例圖形(Kipf & Welling, 2017;李等,2018b;You等,2018a;b;2019年)。 現(xiàn)有圖形生成器的局限性。為了說(shuō)明現(xiàn)有圖生成器的限制,我們調(diào)查以下經(jīng)典圖生成器:(1)可以隨機(jī)統(tǒng)一地用給定節(jié)點(diǎn)和邊數(shù)采樣圖的厄多斯-R圖表生成器(厄多斯& R他們是在1960年);(2)能夠生成具有小世界特性圖的Watts-Strogatz (WS)模型(Watts & Strogatz, 1998);(3) barabsi -Albert (BA)模型,可以生成無(wú)尺度的’圖(Albert & Barabasi, 2002);(4)可生成最大連通性圖的Harary模型(Harary, 1962);(5)正則環(huán)格圖(環(huán)圖);(6)完成圖表。對(duì)于所有類(lèi)型的圖生成器,我們控制節(jié)點(diǎn)數(shù)為64,枚舉所有可能的離散參數(shù),并對(duì)圖生成器的所有連續(xù)參數(shù)進(jìn)行網(wǎng)格搜索。我們?cè)诿總€(gè)參數(shù)設(shè)置下生成30個(gè)帶有不同隨機(jī)種子的隨機(jī)圖。總共生成486,000個(gè)WS圖、53,000個(gè)ER圖、8,000個(gè)BA圖、1,800個(gè)Harary圖、54個(gè)環(huán)圖和1個(gè)完整圖(詳情見(jiàn)附錄)。在圖3中,我們可以看到經(jīng)典的圖生成器生成的圖在平均路徑長(zhǎng)度和聚類(lèi)系數(shù)的空間中具有有限的跨度。 ? |
| WS-flex graph generator. Here we propose the WS-flex graph generator that can generate graphs with a wide coverage of graph measures; notably, WS-flex graphs almost encompass all the graphs generated by classic random generators mentioned above, as is shown in Figure 3. WSflex generator generalizes WS model by relaxing the constraint that all the nodes have the same degree before random rewiring. Specifically, WS-flex generator is parametrized by node n, average degree k and rewiring probability p. The number of edges is determined as e = bn ? k/2c. Specifically, WS-flex generator first creates a ring graph where each node connects to be/nc neighboring nodes; then the generator randomly picks e mod n nodes and connects each node to one closest neighboring node; finally, all the edges are randomly rewired with probability p. We use WS-flex generator to smoothly sample within the space of clustering coefficient and average path length, then sub-sample 3942 graphs for our experiments, as is shown in Figure 1(c). | WS-flex圖生成器。在這里,我們提出了WS-flex圖形生成器,它可以生成覆蓋范圍廣泛的圖形度量;值得注意的是,WS-flex圖幾乎包含了上面提到的經(jīng)典隨機(jī)生成器生成的所有圖,如圖3所示。WSflex生成器是對(duì)WS模型的一般化,它放寬了隨機(jī)重新布線前所有節(jié)點(diǎn)具有相同程度的約束。具體來(lái)說(shuō),WS-flex生成器由節(jié)點(diǎn)n、平均度k和重新布線概率p參數(shù)化。邊的數(shù)量確定為e = bn?k/2c。具體來(lái)說(shuō),WS-flex生成器首先創(chuàng)建一個(gè)環(huán)圖,其中每個(gè)節(jié)點(diǎn)連接到be/nc相鄰節(jié)點(diǎn);然后隨機(jī)選取e mod n個(gè)節(jié)點(diǎn),將每個(gè)節(jié)點(diǎn)連接到一個(gè)最近的相鄰節(jié)點(diǎn);最后,以概率p隨機(jī)重新連接所有的邊。我們使用WS-flex生成器在聚類(lèi)系數(shù)和平均路徑長(zhǎng)度的空間內(nèi)平滑采樣,然后進(jìn)行我們實(shí)驗(yàn)的子樣本3942張圖,如圖1(c)所示。 |
?
?
3.3. Controlling Computational Budget
| To compare the neural networks translated by these diverse graphs, it is important to ensure that all networks have approximately the same complexity, so that the differences in performance are due to their relational graph structures. We use FLOPS (# of multiply-adds) as the metric. We first compute the FLOPS of our baseline network instantiations (i.e. complete relational graph), and use them as the reference complexity in each experiment. As described in Section 2.3, a relational graph structure can be instantiated as a neural network with variable width, by partitioning dimensions or channels into disjoint set of node features. Therefore, we?can conveniently adjust the width of a neural network to match the reference complexity (within 0.5% of baseline FLOPS) without changing the relational graph structures. We provide more details in the Appendix. ? | 為了比較由這些不同圖轉(zhuǎn)換的神經(jīng)網(wǎng)絡(luò),確保所有網(wǎng)絡(luò)具有近似相同的復(fù)雜性是很重要的,這樣性能上的差異是由于它們的關(guān)系圖結(jié)構(gòu)造成的。我們使用FLOPS (number of multiply- added)作為度量標(biāo)準(zhǔn)。我們首先計(jì)算我們的基線網(wǎng)絡(luò)實(shí)例化的失敗(即完全關(guān)系圖),并使用它們作為每個(gè)實(shí)驗(yàn)的參考復(fù)雜度。如2.3節(jié)所述,通過(guò)將維度或通道劃分為節(jié)點(diǎn)特征的不相交集,可以將關(guān)系圖結(jié)構(gòu)實(shí)例化為寬度可變的神經(jīng)網(wǎng)絡(luò)。因此,我們可以方便地調(diào)整神經(jīng)網(wǎng)絡(luò)的寬度,以匹配參考復(fù)雜度(在基線失敗的0.5%以?xún)?nèi)),而不改變關(guān)系圖結(jié)構(gòu)。我們?cè)诟戒浿刑峁┝烁嗉?xì)節(jié)。 |
?
4. Experimental Setup
| Considering the large number of candidate graphs (3942 in total) that we want to explore, we first investigate graph structure of MLPs on the CIFAR-10 dataset (Krizhevsky, 2009) which has 50K training images and 10K validation images. We then further study the larger and more complex task of ImageNet classification (Russakovsky et al., 2015), which consists of 1K image classes, 1.28M training images and 50K validation images. | 考慮到我們想要探索的候選圖數(shù)量很大(總共3942個(gè)),我們首先在CIFAR-10數(shù)據(jù)集(Krizhevsky, 2009)上研究MLPs的圖結(jié)構(gòu),該數(shù)據(jù)集有50K訓(xùn)練圖像和10K驗(yàn)證圖像。然后,我們進(jìn)一步研究了更大、更復(fù)雜的ImageNet分類(lèi)任務(wù)(Russakovsky et al., 2015),包括1K圖像類(lèi)、1.28M訓(xùn)練圖像和50K驗(yàn)證圖像。 |
?
?
4.1. Base Architectures
| For CIFAR-10 experiments, We use a 5-layer MLP with 512 hidden units as the baseline architecture. The input of the MLP is a 3072-d flattened vector of the (32×32×3) image, the output is a 10-d prediction. Each MLP layer has a ReLU non-linearity and a BatchNorm layer (Ioffe & Szegedy, 2015). We train the model for 200 epochs with batch size 128, using cosine learning rate schedule (Loshchilov & Hutter, 2016) with an initial learning rate of 0.1 (annealed to 0, no restarting). We train all MLP models with 5 different random seeds and report the averaged results. | 在CIFAR-10實(shí)驗(yàn)中,我們使用一個(gè)5層的MLP和512個(gè)隱藏單元作為基線架構(gòu)。MLP的輸入是(32×32×3)圖像的3072 d平坦向量,輸出是10 d預(yù)測(cè)。每個(gè)MLP層具有ReLU非線性和BatchNorm層(Ioffe & Szegedy, 2015)。我們使用余弦學(xué)習(xí)率計(jì)劃(Loshchilov & Hutter, 2016)訓(xùn)練批量大小為128的200個(gè)epoch的模型,初始學(xué)習(xí)率為0.1(退火到0,不重新啟動(dòng))。我們用5種不同的隨機(jī)種子訓(xùn)練所有的MLP模型,并報(bào)告平均結(jié)果。 |
| For ImageNet experiments, we use three ResNet-family architectures, including (1) ResNet-34, which only consists of basic blocks of 3×3 convolutions (He et al., 2016); (2) ResNet-34-sep, a variant where we replace all 3×3 dense convolutions in ResNet-34 with 3×3 separable convolutions (Chollet, 2017); (3) ResNet-50, which consists of bottleneck blocks (He et al., 2016) of 1×1, 3×3, 1×1 convolutions. Additionally, we use EfficientNet-B0 architecture (Tan & Le, 2019) that achieves good performance in the small computation regime. Finally, we use a simple 8- layer CNN with 3×3 convolutions. The model has 3 stages with [64, 128, 256] hidden units. Stride-2 convolutions are used for down-sampling. The stem and head layers are?the same as a ResNet. We train all the ImageNet models for 100 epochs using cosine learning rate schedule with initial learning rate of 0.1. Batch size is 256 for ResNetfamily models and 512 for EfficientNet-B0. We train all ImageNet models with 3 random seeds and report the averaged performance. All the baseline architectures have a complete relational graph structure. The reference computational complexity is 2.89e6 FLOPS for MLP, 3.66e9 FLOPS for ResNet-34, 0.55e9 FLOPS for ResNet-34-sep, 4.09e9 FLOPS for ResNet-50, 0.39e9 FLOPS for EffcientNet-B0, and 0.17e9 FLOPS for 8-layer CNN. Training an MLP model roughly takes 5 minutes on a NVIDIA Tesla V100 GPU, and training a ResNet model on ImageNet roughly takes a day on 8 Tesla V100 GPUs with data parallelism. We provide more details in Appendix. | 在ImageNet實(shí)驗(yàn)中,我們使用了三種resnet系列架構(gòu),包括(1)ResNet-34,它只包含3×3卷積的基本塊(He et al., 2016);(2) ResNet-34-sep,將ResNet-34中的所有3×3稠密卷積替換為3×3可分離卷積(Chollet, 2017);(3) ResNet-50,由1×1,3×3,1×1卷積的瓶頸塊(He et al., 2016)組成。此外,我們使用了efficiency - net - b0架構(gòu)(Tan & Le, 2019),在小計(jì)算環(huán)境下取得了良好的性能。最后,我們使用一個(gè)簡(jiǎn)單的8層CNN, 3×3卷積。模型有3個(gè)階段,隱含單元為[64,128,256]。Stride-2卷積用于下采樣。莖和頭層與ResNet相同。我們使用初始學(xué)習(xí)率為0.1的余弦學(xué)習(xí)率計(jì)劃對(duì)所有ImageNet模型進(jìn)行100個(gè)epoch的訓(xùn)練。批大小是256的ResNetfamily模型和512的效率網(wǎng)- b0。我們用3個(gè)隨機(jī)種子訓(xùn)練所有的ImageNet模型,并報(bào)告平均性能。所有的基線架構(gòu)都有一個(gè)完整的關(guān)系圖結(jié)構(gòu)。MLP的參考計(jì)算復(fù)雜度是2.89e6失敗,ResNet-34的3.66e9失敗,ResNet-34-sep的0.55e9失敗,ResNet-50的4.09e9失敗,EffcientNet-B0的0.39e9失敗,8層CNN的0.17e9失敗。在NVIDIA Tesla V100 GPU上訓(xùn)練一個(gè)MLP模型大約需要5分鐘,而在ImageNet上訓(xùn)練一個(gè)ResNet模型大約需要一天,在8個(gè)Tesla V100 GPU上使用數(shù)據(jù)并行。我們?cè)诟戒浿刑峁└嗉?xì)節(jié)。 |
| Figure 4: Key results. The computational budgets of all the experiments are rigorously controlled. Each visualized result is averaged over at least 3 random seeds. A complete graph with C = 1 and L = 1 (lower right corner) is regarded as the baseline. (a)(c) Graph measures vs. neural network performance. The best graphs significantly outperform the baseline complete graphs. (b)(d) Single graph measure vs. neural network performance. Relational graphs that fall within the given range are shown as grey points. The overall smooth function is indicated by the blue regression line. (e) Consistency across architectures. Correlations of the performance of the same set of 52 relational graphs when translated to different neural architectures are shown. (f) Summary of all the experiments. Best relational graphs (the red crosses) consistently outperform the baseline complete graphs across different settings. Moreover, we highlight the “sweet spots” (red rectangular regions), in which relational graphs are not statistically worse than the best relational graphs (bins with red crosses). Bin values of 5-layer MLP on CIFAR-10 are average over all the relational graphs whose C and L fall into the given bin 圖4:關(guān)鍵結(jié)果。所有實(shí)驗(yàn)的計(jì)算預(yù)算都是嚴(yán)格控制的。每個(gè)可視化結(jié)果至少在3個(gè)隨機(jī)種子上取平均值。以C = 1, L = 1(右下角)的完整圖作為基線。(a)(c)圖形測(cè)量與神經(jīng)網(wǎng)絡(luò)性能。最好的圖表明顯優(yōu)于基線完整的圖表。(b)(d)單圖測(cè)量與神經(jīng)網(wǎng)絡(luò)性能。在給定范圍內(nèi)的關(guān)系圖顯示為灰色點(diǎn)。整體平滑函數(shù)用藍(lán)色回歸線表示。(e)架構(gòu)之間的一致性。當(dāng)轉(zhuǎn)換到不同的神經(jīng)結(jié)構(gòu)時(shí),同一組52個(gè)關(guān)系圖的性能的相關(guān)性被顯示出來(lái)。(f)總結(jié)所有實(shí)驗(yàn)。在不同的設(shè)置中,最佳關(guān)系圖(紅色叉)的表現(xiàn)始終優(yōu)于基線完整圖。此外,我們強(qiáng)調(diào)了“甜蜜點(diǎn)”(紅色矩形區(qū)域),其中關(guān)系圖在統(tǒng)計(jì)上并不比最佳關(guān)系圖(紅色叉的箱子)差。CIFAR-10上5層MLP的Bin值是C和L屬于給定Bin的所有關(guān)系圖的平均值 | |
?
4.2. Exploration with Relational Graphs
| For all the architectures, we instantiate each sampled relational graph as a neural network, using the corresponding definitions outlined in Table 1. Specifically, we replace all the dense layers (linear layers, 3×3 and 1×1 convolution layers) with their relational graph counterparts. We leave?the input and output layer unchanged and keep all the other designs (such as down-sampling, skip-connections, etc.) intact. We then match the reference computational complexity for all the models, as discussed in Section 3.3. | 對(duì)于所有的架構(gòu),我們使用表1中列出的相應(yīng)定義將每個(gè)抽樣的關(guān)系圖實(shí)例化為一個(gè)神經(jīng)網(wǎng)絡(luò)。具體來(lái)說(shuō),我們將所有的稠密層(線性層、3×3和1×1卷積層)替換為對(duì)應(yīng)的關(guān)系圖。我們保持輸入和輸出層不變,并保持所有其他設(shè)計(jì)(如下采樣、跳接等)不變。然后我們匹配所有模型的參考計(jì)算復(fù)雜度,如3.3節(jié)所討論的。 |
| For CIFAR-10 MLP experiments, we study 3942 sampled relational graphs of 64 nodes as described in Section 3.2. For ImageNet experiments, due to high computational cost, we sub-sample 52 graphs uniformly from the 3942 graphs. Since EfficientNet-B0 is a small model with a layer that has only 16 channels, we can not reuse the 64-node graphs sampled for other setups. We re-sample 48 relational graphs with 16 nodes following the same procedure in Section 3. | 對(duì)于CIFAR-10 MLP實(shí)驗(yàn),我們研究了包含64個(gè)節(jié)點(diǎn)的3942個(gè)抽樣關(guān)系圖,如3.2節(jié)所述。在ImageNet實(shí)驗(yàn)中,由于計(jì)算量大,我們從3942個(gè)圖中均勻地抽取52個(gè)圖。因?yàn)閑fficient - b0是一個(gè)只有16個(gè)通道的層的小模型,我們不能在其他設(shè)置中重用64節(jié)點(diǎn)圖。我們按照第3節(jié)中相同的步驟,對(duì)48個(gè)有16個(gè)節(jié)點(diǎn)的關(guān)系圖重新采樣。 |
?
?
5. Results
| In this section, we summarize the results of our experiments and discuss our key findings. We collect top-1 errors for all the sampled relational graphs on different tasks and architectures, and also record the graph measures (average path length L and clustering coefficient C) for each sampled graph. We present these results as heat maps of graph measures vs. predictive performance (Figure 4(a)(c)(f)). | 在本節(jié)中,我們將總結(jié)我們的實(shí)驗(yàn)結(jié)果并討論我們的主要發(fā)現(xiàn)。我們收集了不同任務(wù)和架構(gòu)上的所有抽樣關(guān)系圖的top-1錯(cuò)誤,并記錄了每個(gè)抽樣圖的圖度量(平均路徑長(zhǎng)度L和聚類(lèi)系數(shù)C)。我們將這些結(jié)果作為圖表測(cè)量與預(yù)測(cè)性能的熱圖(圖4(a)(c)(f))。 |
?
5.1. A Sweet Spot for Top Neural Networks
| Overall, the heat maps of graph measures vs. predictive performance (Figure 4(f)) show that there exist graph structures that can outperform the complete graph (the pixel on bottom right) baselines. The best performing relational graph can outperform the complete graph baseline by 1.4% top-1 error on CIFAR-10, and 0.5% to 1.2% for models on ImageNet. Notably, we discover that top-performing graphs tend to cluster into a sweet spot in the space defined by C and L (red rectangles in Figure 4(f)). We follow these steps to identify a sweet spot: (1) we downsample and aggregate the 3942 graphs in Figure 4(a) into a coarse resolution of 52 bins, where each bin records the performance of graphs that fall into the bin; (2) we identify the bin with best average performance (red cross in Figure 4(f)); (3) we conduct onetailed t-test over each bin against the best-performing bin, and record the bins that are not significantly worse than the best-performing bin (p-value 0.05 as threshold). The minimum area rectangle that covers these bins is visualized as a sweet spot. For 5-layer MLP on CIFAR-10, the sweet spot is C ∈ [0.10, 0.50], L ∈ [1.82, 2.75]. | 總的來(lái)說(shuō),圖度量與預(yù)測(cè)性能的熱圖(圖4(f))表明,存在的圖結(jié)構(gòu)可以超過(guò)整個(gè)圖(右下角的像素)基線。在CIFAR-10中,表現(xiàn)最好的關(guān)系圖的top-1誤差比整個(gè)圖基線高出1.4%,而在ImageNet中,模型的top-1誤差為0.5%到1.2%。值得注意的是,我們發(fā)現(xiàn)性能最好的圖往往聚集在C和L定義的空間中的一個(gè)最佳點(diǎn)(圖4(f)中的紅色矩形)。我們按照以下步驟來(lái)確定最佳點(diǎn):(1)我們向下采樣并將圖4(a)中的3942個(gè)圖匯總為52個(gè)大致分辨率的bin,每個(gè)bin記錄落入bin的圖的性能;(2)我們確定平均性能最佳的bin(圖4(f)中的紅十字會(huì));(3)對(duì)每個(gè)箱子與性能最好的箱子進(jìn)行最小t檢驗(yàn),記錄性能不明顯差的箱子(p-value 0.05為閾值)。覆蓋這些箱子的最小面積矩形被可視化為一個(gè)甜點(diǎn)點(diǎn)。對(duì)于CIFAR-10上的5層MLP,最優(yōu)點(diǎn)C∈[0.10,0.50],L∈[1.82,2.75]。 |
?
5.2. Neural Network Performance as a Smooth Function over Graph Measures
| In Figure 4(f), we observe that neural network’s predictive performance is approximately a smooth function of the clustering coefficient and average path length of its relational graph. Keeping one graph measure fixed in a small range (C ∈ [0.4, 0.6], L ∈ [2, 2.5]), we visualize network performances against the other measure (shown in Figure 4(b)(d)). We use second degree polynomial regression to visualize the overall trend. We observe that both clustering coefficient and average path length are indicative of neural network performance, demonstrating a smooth U-shape correlation | 在圖4(f)中,我們觀察到神經(jīng)網(wǎng)絡(luò)的預(yù)測(cè)性能近似是其聚類(lèi)系數(shù)和關(guān)系圖平均路徑長(zhǎng)度的平滑函數(shù)。將一個(gè)圖度量固定在一個(gè)小范圍內(nèi)(C∈[0.4,0.6],L∈[2,2.5]),我們將網(wǎng)絡(luò)性能與另一個(gè)度量進(jìn)行可視化(如圖4(b)(d)所示)。我們使用二次多項(xiàng)式回歸來(lái)可視化總體趨勢(shì)。我們觀察到,聚類(lèi)系數(shù)和平均路徑長(zhǎng)度都是神經(jīng)網(wǎng)絡(luò)性能的指標(biāo),呈平滑的u形相關(guān) |
?
?
5.3. Consistency across Architectures
| Figure 5: Quickly identifying a sweet spot. Left: The correlation between sweet spots identified using fewer samples of relational graphs and using all 3942 graphs. Right: The correlation between sweet spots identified at the intermediate training epochs and the final epoch (100 epochs). | 圖5:快速確定最佳點(diǎn)。左圖:使用較少的關(guān)系圖樣本和使用全部3942幅圖識(shí)別出的甜點(diǎn)之間的相關(guān)性。右圖:中間訓(xùn)練時(shí)期和最后訓(xùn)練時(shí)期(100個(gè)時(shí)期)確定的甜蜜點(diǎn)之間的相關(guān)性。 |
| Given that relational graph defines a shared design space across various neural architectures, we observe that relational graphs with certain graph measures may consistently perform well regardless of how they are instantiated. Qualitative consistency. We visually observe in Figure 4(f) that the sweet spots are roughly consistent across different architectures. Specifically, if we take the union of the sweet spots across architectures, we have C ∈ [0.43, 0.50], L ∈ [1.82, 2.28] which is the consistent sweet spot across architectures. Moreover, the U-shape trends between graph measures and corresponding neural network performance, shown in Figure 4(b)(d), are also visually consistent. Quantitative consistency. To further quantify this consistency across tasks and architectures, we select the 52 bins?in the heat map in Figure 4(f), where the bin value indicates the average performance of relational graphs whose graph measures fall into the bin range. We plot the correlation of the 52 bin values across different pairs of tasks, shown in Figure 4(e). We observe that the performance of relational graphs with certain graph measures correlates across different tasks and architectures. For example, even though a ResNet-34 has much higher complexity than a 5-layer MLP, and ImageNet is a much more challenging dataset than CIFAR-10, a fixed set relational graphs would perform similarly in both settings, indicated by a Pearson correlation of 0.658 (p-value < 10?8 ). | 假設(shè)關(guān)系圖定義了跨各種神經(jīng)結(jié)構(gòu)的共享設(shè)計(jì)空間,我們觀察到,無(wú)論如何實(shí)例化,具有特定圖度量的關(guān)系圖都可以始終執(zhí)行得很好。 定性的一致性。在圖4(f)中,我們可以直觀地看到,不同架構(gòu)之間的甜點(diǎn)點(diǎn)基本上是一致的。具體來(lái)說(shuō),如果我們?nèi)】缂軜?gòu)的甜蜜點(diǎn)的并集,我們有C∈[0.43,0.50],L∈[1.82,2.28],這是跨架構(gòu)的一致的甜蜜點(diǎn)。此外,圖4(b)(d)所示的圖測(cè)度與對(duì)應(yīng)的神經(jīng)網(wǎng)絡(luò)性能之間的u形趨勢(shì)在視覺(jué)上也是一致的。 量化一致性。為了進(jìn)一步量化跨任務(wù)和架構(gòu)的一致性,我們?cè)趫D4(f)的熱圖中選擇了52個(gè)bin,其中bin值表示圖度量在bin范圍內(nèi)的關(guān)系圖的平均性能。我們繪制52個(gè)bin值在不同任務(wù)對(duì)之間的相關(guān)性,如圖4(e)所示。我們觀察到,具有特定圖形的關(guān)系圖的性能度量了不同任務(wù)和架構(gòu)之間的關(guān)聯(lián)。例如,盡管ResNet-34比5層MLP復(fù)雜得多,ImageNet是一個(gè)比ciremote -10更具挑戰(zhàn)性的數(shù)據(jù)集,一個(gè)固定的集合關(guān)系圖在兩種設(shè)置中表現(xiàn)相似,通過(guò)0.658的Pearson相關(guān)性表示(p值< 10?8)。 |
?
5.4. Quickly Identifying a Sweet Spot
| Training thousands of relational graphs until convergence might be computationally prohibitive. Therefore, we quantitatively show that a sweet spot can be identified with much less computational cost, e.g., by sampling fewer graphs and training for fewer epochs. How many graphs are needed? Using the 5-layer MLP on CIFAR-10 as an example, we consider the heat map over 52 bins in Figure 4(f) which is computed using 3942 graph samples. We investigate if a similar heat map can be produced with much fewer graph samples. Specifically, we sub-sample the graphs in each bin while making sure each bin has at least one graph. We then compute the correlation between the 52 bin values computed using all 3942 graphs and using sub-sampled fewer graphs, as is shown in Figure 5 (left). We can see that bin values computed using only 52 samples have a high 0.90 Pearson correlation with the bin values computed using full 3942 graph samples. This finding suggests that, in practice, much fewer graphs are needed to conduct a similar analysis. | 訓(xùn)練成千上萬(wàn)的關(guān)系圖,直到運(yùn)算上無(wú)法收斂為止。因此,我們定量地表明,可以用更少的計(jì)算成本來(lái)確定一個(gè)最佳點(diǎn),例如,通過(guò)采樣更少的圖和訓(xùn)練更少的epoch。 需要多少個(gè)圖?以CIFAR-10上的5層MLP為例,我們考慮圖4(f)中52個(gè)箱子上的熱圖,該熱圖使用3942個(gè)圖樣本計(jì)算。我們研究了是否可以用更少的圖表樣本制作類(lèi)似的熱圖。具體來(lái)說(shuō),我們對(duì)每個(gè)容器中的圖進(jìn)行子采樣,同時(shí)確保每個(gè)容器至少有一個(gè)圖。然后,我們計(jì)算使用所有3942圖和使用更少的次采樣圖計(jì)算的52個(gè)bin值之間的相關(guān)性,如圖5(左)所示。我們可以看到,僅使用52個(gè)樣本計(jì)算的bin值與使用全部3942個(gè)圖樣本計(jì)算的bin值有很高的0.90 Pearson相關(guān)性。這一發(fā)現(xiàn)表明,實(shí)際上,進(jìn)行類(lèi)似分析所需的圖表要少得多。 |
| ? |
?
5.5. Network Science and Neuroscience Connections
| Network science. The average path length that we measure characterizes how well information is exchanged across the network (Latora & Marchiori, 2001), which aligns with our definition of relational graph that consists of rounds of message exchange. Therefore, the U-shape correlation in Figure 4(b)(d) might indicate a trade-off between message exchange efficiency (Sengupta et al., 2013) and capability of learning distributed representations (Hinton, 1984). Neuroscience. The best-performing relational graph that we discover surprisingly resembles biological neural networks, as is shown in Table 2 and Figure 6. The similarities are in two-fold: (1) the graph measures (L and C) of top artificial neural networks are highly similar to biological neural networks; (2) with the relational graph representation, we can translate biological neural networks to 5-layer MLPs, and found that these networks also outperform the baseline complete graphs. While our findings are preliminary, our approach opens up new possibilities for interdisciplinary research in network science, neuroscience and deep learning. | 網(wǎng)絡(luò)科學(xué)。我們測(cè)量的平均路徑長(zhǎng)度表征了信息在網(wǎng)絡(luò)中交換的良好程度(Latora & Marchiori, 2001),這與我們對(duì)包含輪消息交換的關(guān)系圖的定義一致。因此,圖4(b)(d)中的u形相關(guān)性可能表明消息交換效率(Sengupta et al., 2013)和學(xué)習(xí)分布式表示的能力(Hinton, 1984)之間的權(quán)衡。神經(jīng)科學(xué)。我們發(fā)現(xiàn)的性能最好的關(guān)系圖與生物神經(jīng)網(wǎng)絡(luò)驚人地相似,如表2和圖6所示。相似點(diǎn)有兩方面:
雖然我們的發(fā)現(xiàn)還處于初步階段,但我們的方法為網(wǎng)絡(luò)科學(xué)、神經(jīng)科學(xué)和深度學(xué)習(xí)領(lǐng)域的跨學(xué)科研究開(kāi)辟了新的可能性。 |
?
6. Related Work
| Neural network connectivity. The design of neural network connectivity patterns has been focused on computational graphs at different granularity: the macro structures, i.e. connectivity across layers (LeCun et al., 1998; Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2015; He et al., 2016; Huang et al., 2017; Tan & Le, 2019), and the micro structures, i.e. connectivity within a layer (LeCun et al., 1998; Xie et al., 2017; Zhang et al., 2018; Howard et al., 2017; Dao et al., 2019; Alizadeh et al., 2019). Our current exploration focuses on the latter, but the same methodology can be extended to the macro space. Deep Expander Networks (Prabhu et al., 2018) adopt expander graphs to generate bipartite structures. RandWire (Xie et al., 2019) generates macro structures using existing graph generators. However, the statistical relationships between graph structure measures and network predictive performances were not explored in those works. Another related work is Cross-channel Communication Networks (Yang et al., 2019) which aims to encourage the neuron communication through message passing, where only a complete graph structure is considered. | 神經(jīng)網(wǎng)絡(luò)的連通性。神經(jīng)網(wǎng)絡(luò)連通性模式的設(shè)計(jì)一直關(guān)注于不同粒度的計(jì)算圖:宏觀結(jié)構(gòu),即跨層連通性(LeCun et al., 1998;Krizhevsky等,2012;Simonyan & Zisserman, 2015;Szegedy等,2015;He et al., 2016;黃等,2017;Tan & Le, 2019)和微觀結(jié)構(gòu),即層內(nèi)的連通性(LeCun et al., 1998;謝等,2017;張等,2018;Howard等人,2017;Dao等,2019年;Alizadeh等,2019)。我們目前的研究重點(diǎn)是后者,但同樣的方法可以擴(kuò)展到宏觀空間。深度擴(kuò)展器網(wǎng)絡(luò)(Prabhu et al., 2018)采用擴(kuò)展器圖生成二部圖結(jié)構(gòu)。RandWire (Xie等人,2019)使用現(xiàn)有的圖生成器生成宏結(jié)構(gòu)。然而,圖結(jié)構(gòu)測(cè)度與網(wǎng)絡(luò)預(yù)測(cè)性能之間的統(tǒng)計(jì)關(guān)系并沒(méi)有在這些工作中探索。另一項(xiàng)相關(guān)工作是跨通道通信網(wǎng)絡(luò)(Yang et al., 2019),旨在通過(guò)消息傳遞促進(jìn)神經(jīng)元通信,其中只考慮了完整的圖結(jié)構(gòu)。 |
| Neural architecture search. Efforts on learning the connectivity patterns at micro (Ahmed & Torresani, 2018; Wortsman et al., 2019; Yang et al., 2018), or macro (Zoph & Le, 2017; Zoph et al., 2018) level mostly focus on improving learning/search algorithms (Liu et al., 2018; Pham et al., 2018; Real et al., 2019; Liu et al., 2019). NAS-Bench101 (Ying et al., 2019) defines a graph search space by enumerating DAGs with constrained sizes (≤ 7 nodes, cf. 64-node graphs in our work). Our work points to a new path: instead of exhaustively searching over all the possible connectivity patterns, certain graph generators and graph measures could define a smooth space where the search cost could be significantly reduced. | 神經(jīng)結(jié)構(gòu)搜索。在micro學(xué)習(xí)連接模式的努力(Ahmed & Torresani, 2018;Wortsman等人,2019年;Yang et al., 2018),或macro (Zoph & Le, 2017;Zoph等,2018)水平主要關(guān)注于改進(jìn)學(xué)習(xí)/搜索算法(Liu等,2018;Pham等人,2018年;Real等人,2019年;Liu等,2019)。NAS-Bench101 (Ying et al., 2019)通過(guò)枚舉大小受限的DAGs(≤7個(gè)節(jié)點(diǎn),我們的工作cf. 64節(jié)點(diǎn)圖)來(lái)定義圖搜索空間。我們的工作指向了一個(gè)新的路徑:不再對(duì)所有可能的連通性模式進(jìn)行窮舉搜索,某些圖生成器和圖度量可以定義一個(gè)平滑的空間,在這個(gè)空間中搜索成本可以顯著降低。 |
?
7. Discussions
| Hierarchical graph structure of neural networks. As the first step in this direction, our work focuses on graph structures at the layer level. Neural networks are intrinsically hierarchical graphs (from connectivity of neurons to that of layers, blocks, and networks) which constitute a more complex design space than what is considered in this paper. Extensive exploration in that space will be computationally prohibitive, but we expect our methodology and findings to generalize. Efficient implementation. Our current implementation uses standard CUDA kernels thus relies on weight masking, which leads to worse wall-clock time performance compared with baseline complete graphs. However, the practical adoption of our discoveries is not far-fetched. Complementary?to our work, there are ongoing efforts such as block-sparse kernels (Gray et al., 2017) and fast sparse ConvNets (Elsen et al., 2019) which could close the gap between theoretical FLOPS and real-world gains. Our work might also inform the design of new hardware architectures, e.g., biologicallyinspired ones with spike patterns (Pei et al., 2019). | 神經(jīng)網(wǎng)絡(luò)的層次圖結(jié)構(gòu)。作為這個(gè)方向的第一步,我們的工作集中在層層次上的圖結(jié)構(gòu)。神經(jīng)網(wǎng)絡(luò)本質(zhì)上是層次圖(從神經(jīng)元的連通性到層、塊和網(wǎng)絡(luò)的連通性),它構(gòu)成了比本文所考慮的更復(fù)雜的設(shè)計(jì)空間。在那個(gè)空間進(jìn)行廣泛的探索在計(jì)算上是不可能的,但我們希望我們的方法和發(fā)現(xiàn)可以一般化。 高效的實(shí)現(xiàn)。我們目前的實(shí)現(xiàn)使用標(biāo)準(zhǔn)CUDA內(nèi)核,因此依賴(lài)于weight masking,這導(dǎo)致wall-clock時(shí)間性能比基線完整圖更差。然而,實(shí)際應(yīng)用我們的發(fā)現(xiàn)并不牽強(qiáng)。作為我們工作的補(bǔ)充,還有一些正在進(jìn)行的工作,如塊稀疏核(Gray et al., 2017)和快速稀疏卷積網(wǎng)絡(luò)(Elsen et al., 2019),它們可以縮小理論失敗和現(xiàn)實(shí)收獲之間的差距。我們的工作也可能為新的硬件架構(gòu)的設(shè)計(jì)提供信息,例如,受生物學(xué)啟發(fā)的帶有spike圖案的架構(gòu)(Pei et al., 2019)。 |
| Prior vs. Learning. We currently utilize the relational graph representation as a structural prior, i.e., we hard-wire the graph structure on neural networks throughout training. It has been shown that deep ReLU neural networks can automatically learn sparse representations (Glorot et al., 2011). A further question arises: without imposing graph priors, does any graph structure emerge from training a (fully-connected) neural network? Figure 7: Prior vs. Learning. Results for 5-layer MLPs on CIFAR-10. We highlight the best-performing graph when used as a structural prior. Additionally, we train a fullyconnected MLP, and visualize the learned weights as a relational graph (different points are graphs under different thresholds). The learned graph structure moves towards the “sweet spot” after training but does not close the gap. | 之前與學(xué)習(xí)。我們目前利用關(guān)系圖表示作為結(jié)構(gòu)先驗(yàn),即。,在整個(gè)訓(xùn)練過(guò)程中,我們將圖形結(jié)構(gòu)硬連接到神經(jīng)網(wǎng)絡(luò)上。已有研究表明,深度ReLU神經(jīng)網(wǎng)絡(luò)可以自動(dòng)學(xué)習(xí)稀疏表示(Glorot et al., 2011)。一個(gè)進(jìn)一步的問(wèn)題出現(xiàn)了:在不強(qiáng)加圖先驗(yàn)的情況下,訓(xùn)練一個(gè)(完全連接的)神經(jīng)網(wǎng)絡(luò)會(huì)產(chǎn)生任何圖結(jié)構(gòu)嗎? ? ? ? ? ? ? ? ? 圖7:先驗(yàn)與學(xué)習(xí)。5層MLPs在CIFAR-10上的結(jié)果。當(dāng)使用結(jié)構(gòu)先驗(yàn)時(shí),我們會(huì)突出顯示表現(xiàn)最佳的圖。此外,我們訓(xùn)練一個(gè)完全連接的MLP,并將學(xué)習(xí)到的權(quán)重可視化為一個(gè)關(guān)系圖(不同的點(diǎn)是不同閾值下的圖)。學(xué)習(xí)后的圖結(jié)構(gòu)在訓(xùn)練后向“最佳點(diǎn)”移動(dòng),但并沒(méi)有縮小差距。 |
| As a preliminary exploration, we “reverse-engineer” a trained neural network and study the emerged relational graph structure. Specifically, we train a fully-connected 5-layer MLP on CIFAR-10 (the same setup as in previous experiments). We then try to infer the underlying relational graph structure of the network via the following steps: (1) to get nodes in a relational graph, we stack the weights from all the hidden layers and group them into 64 nodes, following the procedure described in Section 2.2; (2) to get undirected edges, the weights are summed by their transposes; (3) we compute the Frobenius norm of the weights as the edge value; (4) we get a sparse graph structure by binarizing edge values with a certain threshold. We show the extracted graphs under different thresholds in Figure 7. As expected, the extracted graphs at initialization follow the patterns of E-R graphs (Figure 3(left)), since weight matrices are randomly i.i.d. initialized. Interestingly, after training to convergence, the extracted graphs are no longer E-R random graphs and move towards the sweet spot region we found in Section 5. Note that there is still a gap?between these learned graphs and the best-performing graph imposed as a structural prior, which might explain why a fully-connected MLP has inferior performance. In our experiments, we also find that there are a few special cases where learning the graph structure can be superior (i.e., when the task is simple and the network capacity is abundant). We provide more discussions in the Appendix. Overall, these results further demonstrate that studying the graph structure of a neural network is crucial for understanding its predictive performance. | ? 作為初步的探索,我們“逆向工程”一個(gè)訓(xùn)練過(guò)的神經(jīng)網(wǎng)絡(luò)和研究出現(xiàn)的關(guān)系圖結(jié)構(gòu)。具體來(lái)說(shuō),我們?cè)贑IFAR-10上訓(xùn)練了一個(gè)完全連接的5層MLP(與之前的實(shí)驗(yàn)相同的設(shè)置)。然后嘗試通過(guò)以下步驟來(lái)推斷網(wǎng)絡(luò)的底層關(guān)系圖結(jié)構(gòu):(1)為了得到關(guān)系圖中的節(jié)點(diǎn),我們將所有隱含層的權(quán)值進(jìn)行疊加,并按照2.2節(jié)的步驟將其分組為64個(gè)節(jié)點(diǎn);(2)對(duì)權(quán)值的轉(zhuǎn)置求和,得到無(wú)向邊;(3)計(jì)算權(quán)值的Frobenius范數(shù)作為邊緣值;(4)通過(guò)對(duì)具有一定閾值的邊值進(jìn)行二值化,得到一種稀疏圖結(jié)構(gòu)。 ? 我們?cè)趫D7中顯示了在不同閾值下提取的圖。正如預(yù)期的那樣,初始化時(shí)提取的圖遵循E-R圖的模式(圖3(左)),因?yàn)闄?quán)重矩陣是隨機(jī)初始化的。有趣的是,經(jīng)過(guò)收斂訓(xùn)練后,提取的圖不再是E-R隨機(jī)圖,而是朝著我們?cè)诘?節(jié)中發(fā)現(xiàn)的最佳點(diǎn)區(qū)域移動(dòng)。請(qǐng)注意,在這些學(xué)習(xí)圖和作為結(jié)構(gòu)先驗(yàn)的最佳性能圖之間仍然存在差距,這可能解釋了為什么完全連接的MLP性能較差。 在我們的實(shí)驗(yàn)中,我們也發(fā)現(xiàn)在一些特殊的情況下學(xué)習(xí)圖結(jié)構(gòu)是更好的。當(dāng)任務(wù)簡(jiǎn)單且網(wǎng)絡(luò)容量充足時(shí))。我們?cè)诟戒浿刑峁┝烁嗟挠懻摗?偟膩?lái)說(shuō),這些結(jié)果進(jìn)一步證明了研究神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu)對(duì)于理解其預(yù)測(cè)性能是至關(guān)重要的。 ? ? |
| Unified view of Graph Neural Networks (GNNs) and general neural architectures. The way we define neural networks as a message exchange function over graphs is partly inspired by GNNs (Kipf & Welling, 2017; Hamilton et al., 2017; Velickovi ˇ c et al. ′ , 2018). Under the relational graph representation, we point out that GNNs are a special class of general neural architectures where: (1) graph structure is regarded as the input instead of part of the neural architecture; consequently, (2) message functions are shared across all the edges to respect the invariance properties of the input graph. Concretely, recall how we define general neural networks as relational graphs: Therefore, our work offers a unified view of GNNs and general neural architecture design, which we hope can bridge the two communities and inspire new innovations. On one hand, successful techniques in general neural architectures can be naturally introduced to the design of GNNs, such as separable convolution (Howard et al., 2017), group normalization (Wu & He, 2018) and Squeeze-and-Excitation block (Hu et al., 2018); on the other hand, novel GNN architectures (You et al., 2019b; Chen et al., 2019) beyond the commonly used paradigm (i.e., Equation 6) may inspire more advanced neural architecture designs. | 圖形神經(jīng)網(wǎng)絡(luò)(GNNs)和一般神經(jīng)結(jié)構(gòu)的統(tǒng)一視圖。我們將神經(jīng)網(wǎng)絡(luò)定義為圖形上的信息交換功能的方式,部分受到了gnn的啟發(fā)(Kipf & Welling, 2017;Hamilton等,2017;Velickoviˇc et al .′, 2018)。在關(guān)系圖表示下,我們指出gnn是一類(lèi)特殊的一般神經(jīng)結(jié)構(gòu),其中:(1)將圖結(jié)構(gòu)作為輸入,而不是神經(jīng)結(jié)構(gòu)的一部分;因此,(2)消息函數(shù)在所有邊之間共享,以尊重輸入圖的不變性。具體地說(shuō),回想一下我們是如何將一般的神經(jīng)網(wǎng)絡(luò)定義為關(guān)系圖的: 因此,我們的工作提供了一個(gè)關(guān)于gnn和一般神經(jīng)結(jié)構(gòu)設(shè)計(jì)的統(tǒng)一觀點(diǎn),我們希望能夠搭建這兩個(gè)社區(qū)的橋梁,激發(fā)新的創(chuàng)新。一方面,一般神經(jīng)結(jié)構(gòu)中的成功技術(shù)可以自然地引入到gnn的設(shè)計(jì)中,如可分離卷積(Howard et al., 2017)、群歸一化(Wu & He, 2018)和擠壓-激勵(lì)塊(Hu et al., 2018);另一方面,新的GNN架構(gòu)(You et al., 2019b;陳等人,2019)超越了常用的范式(即,(6)可以啟發(fā)更先進(jìn)的神經(jīng)結(jié)構(gòu)設(shè)計(jì)。 |
?
8. Conclusion
| In sum, we propose a new perspective of using relational graph representation for analyzing and understanding neural networks. Our work suggests a new transition from studying conventional computation architecture to studying graph structure of neural networks. We show that well-established graph techniques and methodologies offered in other science disciplines (network science, neuroscience, etc.) could contribute to understanding and designing deep neural networks. We believe this could be a fruitful avenue of future research that tackles more complex situations. | 最后,我們提出了一種利用關(guān)系圖表示來(lái)分析和理解神經(jīng)網(wǎng)絡(luò)的新觀點(diǎn)。我們的工作提出了一個(gè)新的過(guò)渡,從研究傳統(tǒng)的計(jì)算結(jié)構(gòu)到研究神經(jīng)網(wǎng)絡(luò)的圖結(jié)構(gòu)。我們表明,在其他科學(xué)學(xué)科(網(wǎng)絡(luò)科學(xué)、神經(jīng)科學(xué)等)中提供的成熟的圖形技術(shù)和方法可以有助于理解和設(shè)計(jì)深度神經(jīng)網(wǎng)絡(luò)。我們相信這將是未來(lái)解決更復(fù)雜情況的研究的一個(gè)富有成效的途徑。 |
?
Acknowledgments
| This work is done during Jiaxuan You’s internship at Facebook AI Research. Jure Leskovec is a Chan Zuckerberg Biohub investigator. The authors thank Alexander Kirillov, Ross Girshick, Jonathan Gomes Selman, Pan Li for their helpful discussions. | 這項(xiàng)工作是在You Jiaxuan在Facebook AI Research實(shí)習(xí)期間完成的。Jure Leskovec是陳-扎克伯格生物中心的調(diào)查員。作者感謝Alexander Kirillov, Ross Girshick, Jonathan Gomes Selman和Pan Li的討論。 |
?
?
?
?
?
?
?
?
總結(jié)
以上是生活随笔為你收集整理的Paper:《Graph Neural Networks: A Review of Methods and Applications》翻译与解读的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: TF之AE:AE实现TF自带数据集数字真
- 下一篇: CSDN:借助工具对【本博客访问来源】进