當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

网络模型计算量评估

發(fā)布時(shí)間：2024/4/18 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了网络模型计算量评估小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

計(jì)算量

訪存

計(jì)算量

計(jì)算性能指標(biāo)：

● FlOPS: floating point operations per second

計(jì)算量指標(biāo)：

● MACCs or MADDs: multiply accumulate operations

FLOPS和FLOPs的區(qū)別：

FLOPS：注意全大寫，是floating point operations per second的縮寫，意指每秒浮點(diǎn)運(yùn)算次數(shù)，理解為計(jì)算速度。是一個(gè)衡量硬件性能的指標(biāo)。

FLOPs：注意s小寫，是floating point operations的縮寫（s表復(fù)數(shù)），意指浮點(diǎn)運(yùn)算數(shù)，理解為計(jì)算量。可以用來(lái)衡量算法/模型的復(fù)雜度。

注意點(diǎn)：MACCs就是乘加次數(shù)，FLOPs就是乘與加的次數(shù)之和

點(diǎn)乘求和舉例說明：

● y = w[0]*x[0] + w[1]*x[1] + w[2]*x[2] + ... + w[n-1]*x[n-1]

w[0]*x[0] + ... 認(rèn)為是1個(gè)MACC，所以是n MACCs

上式乘加表達(dá)式中包含n個(gè)浮點(diǎn)乘法和n - 1浮點(diǎn)加法，所以是2n-1 FLOPS

一個(gè) MACC差不多是兩個(gè)FLOPS

注意點(diǎn): 嚴(yán)格的說，上述公式中只有n-1個(gè)加法，比乘法數(shù)少一個(gè)。這里MACC的數(shù)量是一個(gè)近似值，就像Big-O符號(hào)是一個(gè)算法復(fù)雜性的近似值一樣。

實(shí)際卷積計(jì)算量：

關(guān)于計(jì)算量相關(guān)的細(xì)節(jié)可以參考文章《PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE, ICLR2017》，

假設(shè)采用滑動(dòng)窗實(shí)現(xiàn)卷積且忽略非線性計(jì)算開銷，則卷積核的FLOPs為:

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??

神經(jīng)網(wǎng)絡(luò)各層FLOPs計(jì)算：

● Full Connected Layer：multiplying a vector of length I with an I × J matrix to get a vector of length J, takes I × J MACCs or (2I - 1) × J FLOPs.

● Activate Layer:? We do not measure these in MACCs but in FLOPs, because they’re not dot products.

● Convolution Layer: K × K × Cin × Hout × Wout × Cout MACCs

● Depthwise-Seperable Layer: (K × K × Cin × Hout × Wout) + (Cin × Hout × Wout × Cout) MACCs

->Cin × Hout × Wout × (K × K + Cout) MACCs

● Factor is K × K × Cout / (K × K + Cout).

訪存

● 計(jì)算量?jī)H僅是運(yùn)算速度的一個(gè)方面，另一個(gè)重要的方面是內(nèi)存帶寬(memory bandwidth)，甚至比計(jì)算量還重要

● 對(duì)現(xiàn)代計(jì)算機(jī)來(lái)說，a single memory access from main memory is much slower than a single computation — by a factor of about 100 or more!

● 一個(gè)網(wǎng)絡(luò)來(lái)說，內(nèi)存訪問需要訪問多少次呢？對(duì)每一層來(lái)說，包括了以下內(nèi)存訪問：

??? 1. 讀取每層的輸入

??? 2. 計(jì)算結(jié)果：包括了載入權(quán)重

??? 3. 輸出每層的結(jié)果

Memory for weights

● Full Connected：輸入I/輸出J，總共是(I+1)*J

● Convolutional layers have less weights than fully-connected layers：K × K × Cin × Cout

● 因?yàn)閮?nèi)存訪問非常慢，所以大量的內(nèi)存訪問給網(wǎng)絡(luò)運(yùn)行帶來(lái)的很大的影響，甚至超過了計(jì)算量

Feature maps and intermediate results

● Convolution Layer:

?(the weights here are negligible)

input = Hin × Win × Cin × K × K × Cout

output = Hout × Wout × Cout

weights = K × K × Cin × Cout + Cout

Example: Cin = 256, Cout = 512, H = W = 28, K = 3,S = 1

1、Normal convolution layer

? ? ?input = 28 × 28 × 256 × 3 × 3 × 512 = 924,844,032

? ? ?output = 28 × 28 × 512 = 401,408

? ? ?weights = 3 × 3 × 256 × 512 + 512 = 1,180,160

? ? ?total = 926,425,600

2、depthwise layer+pointwise layer

? ? 1)depthwise layer

? ? ? ? input = 28 × 28 × 256 × 3 × 3 = 1,806,336

? ? ? ? output = 28 × 28 × 256 = 200,704

? ? ? ? weights = 3 × 3 × 256 + 256 = 2,560

? ? ? ? total = 2,009,600

? ? ?2)pointwise layer

? ? ? ? input = 28 × 28 × 256 × 1 × 1 × 512 = 102,760,448

? ? ? ? output = 28 × 28 × 512 = 401,408

? ? ? ? weights = 1 × 1 × 256 × 512 + 512 = 131,584

? ? ? ? ?total = 103,293,440

? ? ? ? ?total of both layers = 105,303,040

案例研究

● Input dimension: 126x224

MobileNet V1 parameters (multiplier = 1.0): 1.6M

MobileNet V2 parameters (multiplier = 1.0): 0.5M

MobileNet V2 parameters (multiplier = 1.4): 1.0M

MobileNet V1 MACCs (multiplier = 1.0): 255M

MobileNet V2 MACCs (multiplier = 1.0): 111M

MobileNet V2 MACCs (multiplier = 1.4): 214M

MobileNet V1 memory accesses (multiplier = 1.0): 283M

MobileNet V2 memory accesses (multiplier = 1.0): 159M

MobileNet V2 memory accesses (multiplier = 1.4): 286M

MobileNet V2 (multiplier = 1.4) is slightly slower than MobileNet V1 (multiplier = 1.0)

This provides some proof for my hypothesis that the amount of memory accesses is the primary factor for determining the speed of the neural net.

結(jié)論

“I hope this shows that all these things — number of computations, number of parameters, and number of memory accesses — are deeply related. A model that works well on mobile needs to carefully balance those factors.”

總結(jié)

以上是生活随笔為你收集整理的网络模型计算量评估的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。