當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

古老的视频去噪算法（FLT_GradualNoise）解析并优化，可实现1920*1080 YUV数据400fps的处理能力

發布時間：2023/12/8 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了古老的视频去噪算法（FLT_GradualNoise）解析并优化，可实现1920*1080 YUV数据400fps的处理能力小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本篇博文來自博主Imageshop，打賞或想要查閱更多內容可以移步至Imageshop。

轉載自：https://www.cnblogs.com/Imageshop/p/14224965.html? 侵刪

　這個好像沒有啥對應的論文可以找到，在百度上搜索也能找到一些相關的資料，不過就直接是代碼，可以看到其實來自于一個叫做DScaler的項目，在github上目前還能找到該項目的完整資料。

　　詳見：https://github.com/JohnAdders/DScaler/tree/f7d92b76678e24422c48d4a956c0486ee042786d

　　其中含有FLT_GradualNoise.c文件，我們復制以下代碼的注釋部分對算法的解釋：

? ? ? ? This algorithm is very similar to what Andrew Dowsey came up with in his "Adaptive?Temporal Averaging" for his DirectShow filter.? The algorithms differ in 1) their?block size, 2) their motion estimation(sum of absolute differences versus mean?squared error), 3) The addition of a "high tail," in which areas which have changed?a lot(but not too much) still cause a small amount of averaging with the previous?rame, and 4) rounding.

? ? ? ?The algorithm :

? ? ? ?This filter gets the sum of absolute differences between a four pixel?horizontal block in the current image and the same block in the preceding?frame.This isn't the best local motion measure, but it's very fast due to?the psadbw SSE instruction.

? ? ? ? This difference measure is used to determine the kind of averaging which will be?conducted.If it's more than the "noise reduction" parameter, motion is?inferred.In that case, we just use the new pixel values.If it's less than the?noise reduction, we use the ratio of(difference / noise reduction) to determine the?weighting of the old and new values.

? ? ? ?Somewhat more formally :

? ? ? ? ? N = Sum_block(| oldByte - newByte | )

? ? ? ? ? R = Noise Reduction parameter

? ? ? ? ? ?M = (motion evidence) = 1? ? ?if N / R >= 1.2

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0.999? ?if 1.2 > N / R >= 1

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? N / R? ?otherwise

? ? ? ? ?Result pixel = (bytewise)oldPixel * (1 - M) + newPixel * M

? ? ? ? Rounding has a very significant effect on the algorithm.In general, for?computational reasons, values are rounded down.An important exception?occurs when? ? ? ? ? ?M > 0 and oldPixel != newPixel??

?but

? ? ? ? ? oldPixel * (1 - M) + newPixel * M

? ? ? ?rounds to oldPixel.In that case, the Result pixel is rounded to one toward?the newPixel value.This makes sure that very gradual variation is maintained.

? ? ? 針對這個算法，作者提供了相關的匯編代碼，而且進行了非常詳細的注釋，但是這個匯編還不是普通的匯編，而是用的SIMD指令，因此，對于閱讀來說就非常的困難了，我大概花了10天左右，理解其思路，并用更加容易東的Intrinsic進行了重寫和優化。下面是一些編寫時的疑惑和解讀，共享下。

// 疑點1：對于YUV數據，這個程序是如何處理的？ // 答復：從原始的匯編代碼看，他對YUV分量是同步處理的，并沒有做特別的區分，前面說的四個像素，指的意思就是Y0 U0 Y1 V0 Y2 U1 Y3 V1這4個像素，不管是MMX指令還是SSE指令 // 他們的psadbw指令都是一次性執行八個字節數據的絕對值累加（SSE指令一次性執行2個8個字節的累加而已）。如果把這個算法換成RGB格式的數據，那范圍要麻煩了，要拆分RGB到各個獨立的分量了。 // 疑點2：上面提及默認的Rounding是向下的，但是一般要求只要Src和Prev有差異，就至少要向新像素有1個像素的偏移，以保證視頻的連續性，如何實現的。 // 答復：程序里對數據進行了判斷，如果Src和Prev不同，則設置偏移量至少是1（正1和負1都可以）,相同的話偏移量當然為0了。 // 另外，如果定點化后的偏移量大于65535，則設置偏移量為AbsDiff值，因為這個時候的由于程序移位計算的原因，直接算的值還會少1的。 (X * 65535) >> 16結果會為X - 1 // 疑點3：程序是如何進行優化的？ // 答復： (1) 在原始的代碼中，有這個0.999 if 1.2 > N / R >= 1，在作者提供的匯編代碼中，對這部分做了處理，他是通過一些比較和移位來實現的，把NoiseMultiplier更改為65534了（N/R>=1,就已經設為65535了) // 在本代碼中，個人覺得這個判斷毫無必要，0.999對結果的影響太小了，因此舍棄了，在作者提供的SSE和MMX代碼中，這個也舍棄了。 // (2) 定點化，程序中N/R涉及到除法運算，為了減少這個，我們將整體擴大65536倍，然后再乘以AbsDiff，這個時候需要除以65536，這樣可以利用_mm_mulhi_epu16來快速實現（不需要特別的移位指令了，也不需要轉換到32位） // 但是實際上，這里是有誤差的，因為這個函數不能做到四舍五入，建議使用_mm_mulhrs_epi16代替。同時注意如果N/R * 65536如果大于65535了，就對于了原始算式中的M=1了 ,這個時候就把他直接限定為65535了（不需要轉換到32位了） // 舉個例子，如果AbsDiff_Sum = 24，NoiseValue取值64，此時Multiplier的值為1024，則如果某個像素的newPixel - oldPixel = 10，則結果為 (24 * 1024 * 10) >> 16 = 3,但是實際的浮點為3.75，理論上應該取4更為合適。 // (3) oldPixel * (1 - M) + newPixel * M經過整理可以變為 oldPixel + (newPixel - oldPixel) * M, 此時配合newPixel - oldPixel的符號特性，可以使用_mm_adds_epu8和_mm_subs_epu8來實現最后的結果計算

　　總的來說這個算法，還是利用歷史幀的數據不斷的來平均誤差，減少視頻的噪音的，但是其可以充分利用快速計算8個字節數據的累加值的指令_mm_sad_epu8，可以達到非常恐怖的計算效率和速度。

　　測試1280*720大小視頻，去噪平均一幀約0.8ms，1920*1080視頻一幀需要約1.8ms（均位YUV422格式視頻）。

? ? ? ?由于這里上傳不了視頻，有需要了解該算法效果的，可以單獨聯系我，我可以提供個測試DEMO（DEMO太大，無法上傳），下面截兩張圖可以稍微看到區別。

總結

以上是生活随笔為你收集整理的古老的视频去噪算法（FLT_GradualNoise）解析并优化，可实现1920*1080 YUV数据400fps的处理能力的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：关于音视频的一些知识（demux、fil
下一篇：基于GUI的AWT,Swing写的一个餐