日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

网络模型计算量评估

發(fā)布時(shí)間:2024/4/18 编程问答 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 网络模型计算量评估 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

目錄

計(jì)算量

訪存


計(jì)算量

計(jì)算性能指標(biāo):

FlOPS: floating point operations per second

計(jì)算量指標(biāo):

● MACCs or MADDs: multiply accumulate operations

?

FLOPS和FLOPs的區(qū)別:

FLOPS:注意全大寫,是floating point operations per second的縮寫,意指每秒浮點(diǎn)運(yùn)算次數(shù),理解為計(jì)算速度。是一個(gè)衡量硬件性能的指標(biāo)。

FLOPs:注意s小寫,是floating point operations的縮寫(s表復(fù)數(shù)),意指點(diǎn)運(yùn)算數(shù),理解為計(jì)算量。可以用來(lái)衡量算法/模型的復(fù)雜度。

注意點(diǎn):MACCs就是乘加次數(shù),FLOPs就是乘與加的次數(shù)之和

?

點(diǎn)乘求和舉例說明

y = w[0]*x[0] + w[1]*x[1] + w[2]*x[2] + ... + w[n-1]*x[n-1]

w[0]*x[0] + ... 認(rèn)為是1個(gè)MACC,所以是n MACCs

上式乘加表達(dá)式中包含n個(gè)浮點(diǎn)乘法和n - 1浮點(diǎn)加法,所以是2n-1 FLOPS

一個(gè) MACC差不多是兩個(gè)FLOPS

注意點(diǎn): 嚴(yán)格的說,上述公式中只有n-1個(gè)加法,比乘法數(shù)少一個(gè)。這里MACC數(shù)量是一個(gè)近似值,就像Big-O符號(hào)是一個(gè)算法復(fù)雜性的近似值一樣。

?

實(shí)際卷積計(jì)算量:

關(guān)于計(jì)算量相關(guān)的細(xì)節(jié)可以參考文章PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE, ICLR2017》,

假設(shè)采用滑動(dòng)窗實(shí)現(xiàn)卷積且忽略非線性計(jì)算開銷,則卷積核的FLOPs為:

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??

神經(jīng)網(wǎng)絡(luò)各層FLOPs計(jì)算:

● Full Connected Layermultiplying a vector of length I with an I × J matrix to get a vector of length J, takes I × J MACCs or (2I - 1) × J FLOPs.

● Activate Layer:? We do not measure these in MACCs but in FLOPs, because they’re not dot products.

Convolution Layer: K × K × Cin × Hout × Wout × Cout MACCs

Depthwise-Seperable Layer: (K × K × Cin × Hout × Wout) + (Cin × Hout × Wout × Cout) MACCs

->Cin × Hout × Wout × (K × K + Cout) MACCs

● Factor is K × K × Cout / (K × K + Cout).

?

訪存

計(jì)算量?jī)H僅是運(yùn)算速度的一個(gè)方面,另一個(gè)重要的方面是內(nèi)存帶寬(memory bandwidth),甚至比計(jì)算量還重要

對(duì)現(xiàn)代計(jì)算機(jī)來(lái)說,a single memory access from main memory is much slower than a single computation — by a factor of about 100 or more!

一個(gè)網(wǎng)絡(luò)來(lái)說,內(nèi)存訪問需要訪問多少次呢?對(duì)每一層來(lái)說,包括了以下內(nèi)存訪問:

??? 1. 讀取每層的輸入

??? 2. 計(jì)算結(jié)果:包括了載入權(quán)重

??? 3. 輸出每層的結(jié)果

?

Memory for weights

● Full Connected:輸入I/輸出J,總共是(I+1)*J

Convolutional layers have less weights than fully-connected layersK × K × Cin × Cout

因?yàn)閮?nèi)存訪問非常慢,所以大量的內(nèi)存訪問給網(wǎng)絡(luò)運(yùn)行帶來(lái)的很大的影響,甚至超過了計(jì)算量

?

Feature maps and intermediate results

● Convolution Layer:

?(the weights here are negligible)

input = Hin × Win × Cin × K × K × Cout

output = Hout × Wout × Cout

weights = K × K × Cin × Cout + Cout

Example: Cin = 256, Cout = 512, H = W = 28, K = 3,S = 1

?

1、Normal convolution layer

? ? ?input = 28 × 28 × 256 × 3 × 3 × 512 = 924,844,032

? ? ?output = 28 × 28 × 512 = 401,408

? ? ?weights = 3 × 3 × 256 × 512 + 512 = 1,180,160

? ? ?total = 926,425,600

?

2、depthwise layer+pointwise layer

? ? 1)depthwise layer

? ? ? ? input = 28 × 28 × 256 × 3 × 3 = 1,806,336

? ? ? ? output = 28 × 28 × 256 = 200,704

? ? ? ? weights = 3 × 3 × 256 + 256 = 2,560

? ? ? ? total = 2,009,600

? ? ?2)pointwise layer

? ? ? ? input = 28 × 28 × 256 × 1 × 1 × 512 = 102,760,448

? ? ? ? output = 28 × 28 × 512 = 401,408

? ? ? ? weights = 1 × 1 × 256 × 512 + 512 = 131,584

? ? ? ? ?total = 103,293,440

? ? ? ? ?total of both layers = 105,303,040

?

案例研究

● Input dimension: 126x224

MobileNet V1 parameters (multiplier = 1.0): 1.6M

MobileNet V2 parameters (multiplier = 1.0): 0.5M

MobileNet V2 parameters (multiplier = 1.4): 1.0M

MobileNet V1 MACCs (multiplier = 1.0): 255M

MobileNet V2 MACCs (multiplier = 1.0): 111M

MobileNet V2 MACCs (multiplier = 1.4): 214M

MobileNet V1 memory accesses (multiplier = 1.0): 283M

MobileNet V2 memory accesses (multiplier = 1.0): 159M

MobileNet V2 memory accesses (multiplier = 1.4): 286M

MobileNet V2 (multiplier = 1.4) is slightly slower than MobileNet V1 (multiplier = 1.0)

This provides some proof for my hypothesis that the amount of memory accesses is the primary factor for determining the speed of the neural net.

?

結(jié)論

“I hope this shows that all these things — number of computations, number of parameters, and number of memory accesses — are deeply related. A model that works well on mobile needs to carefully balance those factors.”

總結(jié)

以上是生活随笔為你收集整理的网络模型计算量评估的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。