日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

浮点卷积winograd算法

發(fā)布時間:2024/4/18 编程问答 52 豆豆
生活随笔 收集整理的這篇文章主要介紹了 浮点卷积winograd算法 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

目錄

winograd算法簡介

知識直通車

winograd代碼實例解析


winograd算法簡介

現(xiàn)今的Winograd主要來源于1980年,由Shmuel Winograd提出減少FIR濾波器計算量的方法

Shmuel Winograd指出,對于輸出個數(shù)為m,有r個參數(shù)的FIR濾波器,不需要m*r次乘法計算,而僅僅需要:

次乘法計算即可。

下面是一個F(2,3)的例子,即輸出m=2個結(jié)果,參數(shù)r=3個:

可以看到,只需要4次乘法計算結(jié)果,發(fā)生在m1/m2/m3/m4的計算過程中,外加8次加法。注意,跟g相關(guān)的加減乘除可以在初始化時一口氣算好,不占計算量。

擴展到二維:

上式中,g為參數(shù),d為數(shù)據(jù),G/B/A都是轉(zhuǎn)換矩陣,Y是輸出

具體的公式推薦大家直接看論文

對于不同的卷積核和輸出大小,winograd的轉(zhuǎn)換矩陣各自不同。本文專注于conv3x3_s1的介紹。

?

知識直通車

知乎文章:https://zhuanlan.zhihu.com/p/72149270

winograd論文:https://arxiv.org/abs/1509.09308

github實例代碼(python):https://github.com/Sejudyblues/cs194-winograd/blob/master/winograd.py

?

winograd代碼實例解析

#include <stdio.h> #include <assert.h> #include <chrono> using namespace std::chrono;const float x[8][8] = { // input X{ 1.0, 0.0, 0.5, 0.0, 5.0, 0.4, 0.4, 0.8}, // X矩陣{-2.0, -2.0, -2.0, -2.0, -2.0, -2.0, -2.0, -2.0},{-2.0, 2.0, 2.1, 3.5, 6.0, 2.0, 4.4, 2.0},{ 1.0, 1.0, 1.0, 1.0, 1.0, 6.5, 1.4, 1.0},{ 1.0, -1.0, -1.2, -1.5, -7.6, -1.0, -1.2, -1.2},{ 1.0, 1.0, 1.0, 1.7, 1.6, 4.4, 1.6, 1.5},{ 1.0, -1.0, -3.5, -6.0, -1.3, -1.0, -1.6, -7.0},{ 0.0, 0.0, 0.0, 0.6, 0.2, 9.0, 0.4, 0.3} };const float weight[9] = { // Weight w1.2, 0.63, 0.4,0.2, 0.5, 0.56,1.5, 0.3, 0.74 };//U=GgG^T const float ktm[8][3] = { // Acutually Matrix G{ 1.0f, 0.0f, 0.0f}, // G矩陣{-2.0f/9, -2.0f/9, -2.0f/9},{-2.0f/9, 2.0f/9, -2.0f/9},{1.0f/90, 1.0f/45, 2.0f/45},{1.0f/90, -1.0f/45, 2.0f/45},{1.0f/45, 1.0f/90, 1.0f/180},{1.0f/45, -1.0f/90, 1.0f/180},{ 0.0f, 0.0f, 1.0f} };int main() {//w^Tconst float* k0 = weight;const float* k1 = weight+3;const float* k2 = weight+6;//U^T = G.g^T.G^Tfloat tmp[8][3]; // tmp = G.g^Tfor(int i=0; i<8; i++){tmp[i][0] = k0[0] * ktm[i][0] + k0[1] * ktm[i][1] + k0[2] * ktm[i][2];tmp[i][1] = k1[0] * ktm[i][0] + k1[1] * ktm[i][1] + k1[2] * ktm[i][2];tmp[i][2] = k2[0] * ktm[i][0] + k2[1] * ktm[i][1] + k2[2] * ktm[i][2];}//U^T = tmp.G^Tfloat U[64];for (int j=0; j<8; j++){float* tmpp = &tmp[j][0];for (int i=0; i<8; i++){U[j*8 + i] = tmpp[0] * ktm[i][0] + tmpp[1] * ktm[i][1] + tmpp[2] * ktm[i][2];}} // const float itm[8][8] = { // {1.0f, 0.0f, -5.25f, 0.00f, 5.25f, 0.00f, -1.0f, 0.0f}, // // {0.0f, 1.0f, 1.00f, -4.25f, -4.25f, 1.00f, 1.0f, 0.0f}, // {0.0f, -1.0f, 1.00f, 4.25f, -4.25f, -1.00f, 1.0f, 0.0f}, // // {0.0f, 0.5f, 0.25f, -2.50f, -1.25f, 2.00f, 1.0f, 0.0f}, // {0.0f, -0.5f, 0.25f, 2.50f, -1.25f, -2.00f, 1.0f, 0.0f}, // // {0.0f, 2.0f, 4.00f, -2.50f, -5.00f, 0.50f, 1.0f, 0.0f}, // {0.0f, -2.0f, 4.00f, 2.50f, -5.00f, -0.50f, 1.0f, 0.0f}, // // {0.0f, -1.0f, 0.00f, 5.25f, 0.00f, -5.25f, 0.0f, 1.0f} // };// 0 = r00 - r06 + (r04 - r02) * 5.25// 7 = r07 - r01 + (r03 - r05) * 5.25// 1 = (r02 + r06 - r04 * 4.25) + (r01 - r03 * 4.25 + r05)// 2 = (r02 + r06 - r04 * 4.25) - (r01 - r03 * 4.25 + r05)// 3 = (r06 + r02 * 0.25 - r04 * 1.25) + (r01 * 0.5 - r03 * 2.5 + r05 * 2)// 4 = (r06 + r02 * 0.25 - r04 * 1.25) - (r01 * 0.5 - r03 * 2.5 + r05 * 2)// reuse r04 * 1.25// reuse r03 * 2.5// 5 = (r06 + (r02 - r04 * 1.25) * 4) + (r01 * 2 - r03 * 2.5 + r05 * 0.5)// 6 = (r06 + (r02 - r04 * 1.25) * 4) - (r01 * 2 - r03 * 2.5 + r05 * 0.5)//V^T = B^T.d^T.Bfloat tempv[8][8];for (int m=0; m<8; m++){const float *r0 = &x[m][0];tempv[0][m] = r0[0] - r0[6] + (r0[4] - r0[2]) * 5.25f;tempv[7][m] = r0[7] - r0[1] + (r0[3] - r0[5]) * 5.25f;float tmp12a = (r0[2] + r0[6] - r0[4] * 4.25f);float tmp12b = (r0[1] + r0[5] - r0[3] * 4.25f);tempv[1][m] = tmp12a + tmp12b;tempv[2][m] = tmp12a - tmp12b;float tmp34a = (r0[6] + r0[2] * 0.25f - r0[4] * 1.25f);float tmp34b = (r0[1] * 0.5f - r0[3] * 2.5f + r0[5] * 2.f);tempv[3][m] = tmp34a + tmp34b;tempv[4][m] = tmp34a - tmp34b;float tmp56a = (r0[6] + (r0[2] - r0[4] * 1.25f) * 4.f);float tmp56b = (r0[1] * 2.f - r0[3] * 2.5f + r0[5] * 0.5f);tempv[5][m] = tmp56a + tmp56b;tempv[6][m] = tmp56a - tmp56b;}float V[64];for (int m=0; m<8; m++){const float* tmp0 = tempv[m];V[m*8] = tmp0[0] - tmp0[6] + (tmp0[4] - tmp0[2]) * 5.25f;V[m*8+7] = tmp0[7] - tmp0[1] + (tmp0[3] - tmp0[5]) * 5.25f;float tmp12a = (tmp0[2] + tmp0[6] - tmp0[4] * 4.25f);float tmp12b = (tmp0[1] - tmp0[3] * 4.25f + tmp0[5]);V[m*8+1] = tmp12a + tmp12b;V[m*8+2] = tmp12a - tmp12b;float tmp34a = (tmp0[6] + tmp0[2] * 0.25f - tmp0[4] * 1.25f);float tmp34b = (tmp0[1] * 0.5f - tmp0[3] * 2.5f + tmp0[5] * 2.f);V[m*8+3] = tmp34a + tmp34b;V[m*8+4] = tmp34a - tmp34b;float tmp56a = (tmp0[6] + (tmp0[2] - tmp0[4] * 1.25f) * 4.f);float tmp56b = (tmp0[1] * 2.f - tmp0[3] * 2.5f + tmp0[5] * 0.5f);V[m*8+5] = tmp56a + tmp56b;V[m*8+6] = tmp56a - tmp56b;}//Matrix M^T=U^T.V^Tfloat tempResult[64];for(int n=0;n<64;n++){tempResult[n] = V[n]*U[n]; }// const float otm[6][8] = { // {1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 32.0f, 32.0f, 0.0f}, // {0.0f, 1.0f, -1.0f, 2.0f, -2.0f, 16.0f,-16.0f, 0.0f}, // {0.0f, 1.0f, 1.0f, 4.0f, 4.0f, 8.0f, 8.0f, 0.0f}, // {0.0f, 1.0f, -1.0f, 8.0f, -8.0f, 4.0f, -4.0f, 0.0f}, // {0.0f, 1.0f, 1.0f, 16.0f, 16.0f, 2.0f, 2.0f, 0.0f}, // {0.0f, 1.0f, -1.0f, 32.0f, -32.0f, 1.0f, -1.0f, 1.0f} // };// 0 = r0 + (r1 + r2) + (r3 + r4) + (r5 + r6) * 32// 1 = (r1 - r2) + (r3 - r4) * 2 + (r5 - r6) * 16// 2 = (r1 + r2) + (r3 + r4) * 4 + (r5 + r6) * 8// 3 = (r1 - r2) + (r3 - r4) * 8 + (r5 - r6) * 4// 4 = (r1 + r2) + (r3 + r4) * 16+ (r5 + r6) * 2// 5 = r7 + (r1 - r2) + (r3 - r4) * 32+ (r5 - r6)//R = A^T.M.Afloat* pTempR = tempResult;float r[6][8];for (int m=0; m<8; m++){float tmp024a = pTempR[1] + pTempR[2];float tmp135a = pTempR[1] - pTempR[2];float tmp024b = pTempR[3] + pTempR[4];float tmp135b = pTempR[3] - pTempR[4];float tmp024c = pTempR[5] + pTempR[6];float tmp135c = pTempR[5] - pTempR[6];r[0][m] = pTempR[0] + tmp024a + tmp024b + tmp024c * 32;r[2][m] = tmp024a + tmp024b * 4 + tmp024c * 8;r[4][m] = tmp024a + tmp024b * 16 + tmp024c + tmp024c;r[1][m] = tmp135a + tmp135b + tmp135b + tmp135c * 16;r[3][m] = tmp135a + tmp135b * 8 + tmp135c * 4;r[5][m] = pTempR[7] + tmp135a + tmp135b * 32 + tmp135c;pTempR+=8;}float result[64]; float* pr = result;for (int m=0; m<6; m++){const float* tmp0 = r[m];float tmp024a = tmp0[1] + tmp0[2];float tmp135a = tmp0[1] - tmp0[2];float tmp024b = tmp0[3] + tmp0[4];float tmp135b = tmp0[3] - tmp0[4];float tmp024c = tmp0[5] + tmp0[6];float tmp135c = tmp0[5] - tmp0[6];pr[0+6*m] = tmp0[0] + tmp024a + tmp024b + tmp024c * 32;pr[2+6*m] = tmp024a + tmp024b * 4 + tmp024c * 8;pr[4+6*m] = tmp024a + tmp024b * 16 + tmp024c + tmp024c;pr[1+6*m] = tmp135a + tmp135b + tmp135b + tmp135c * 16;pr[3+6*m] = tmp135a + tmp135b * 8 + tmp135c * 4;pr[5+6*m] = tmp0[7] + tmp135a + tmp135b * 32 + tmp135c;}float output[36];int stride_h = 1;int stride_w = 1;for(int h=0;h<6;h++){int src_h = stride_h*h;for(int w=0;w<6;w++){float sum = 0;int src_w = stride_w*w;for(int k_h=0;k_h<3;k_h++){for(int k_w=0;k_w<3;k_w++){sum+=x[k_h+src_h][k_w+src_w]*weight[3*k_h+k_w];}}output[h*6+w] = sum;}}for(int i=0;i<36;i++)printf("%f==%f \n",output[i],result[i]);return 0; }

?

總結(jié)

以上是生活随笔為你收集整理的浮点卷积winograd算法的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。