日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 >

MLIR算子量化Quantization

發(fā)布時(shí)間:2023/11/28 58 豆豆
生活随笔 收集整理的這篇文章主要介紹了 MLIR算子量化Quantization 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

MLIR算子量化Quantization
本文概述了MLIR量化系統(tǒng)的設(shè)計(jì)。雖然術(shù)語“量化”是高度過載的,用于將浮點(diǎn)計(jì)算轉(zhuǎn)換為以整數(shù)數(shù)學(xué)表示,適配的變量進(jìn)行推理的技術(shù)的相當(dāng)窄的范圍,如低位深度推理引擎(如TFLite)所支持的,各種加速器硬件和許多DSP。
很大程度上受到了本文所采用的方法的啟發(fā),其中包含了許多擴(kuò)展和修改。它具體記錄了MLIR在這一主題上的立場(chǎng),而不是一般性的參考。
Uniform quantization
o Fixed point values
o Affine values
o Relation
o Converting between real and fixed point or affine
? Usage within MLIR
? Quantization Dialect
o Quantized type
o Quantized type conversion operations
o Instrumentation and constraint operations
? Integration with simulated quantization at training time
? TFLite native quantization
o General algorithm
Uniform quantization均勻量子化
MLIR支持的主要量化機(jī)制,通過實(shí)數(shù)線上的等間距點(diǎn),來表示不動(dòng)點(diǎn)和仿射變換。

此外,該方案可以應(yīng)用于:
?每層per-layer:應(yīng)用于目標(biāo)類型中的每個(gè)值。
?每軸per-axis(也稱為每通道):沿張量類型的特定軸,分別應(yīng)用于每個(gè)索引。
? per-layer : Applying to every value within the target type.
? per-axis (also called per-channel) : Applying individually to each index along a specific axis of a tensor type.
定點(diǎn)值
定點(diǎn)值是實(shí)數(shù)除以刻度。將實(shí)數(shù)除以的結(jié)果稱為標(biāo)度值。
The realvalue=scaledvalue?scalereal_value = scaled_value * scale realv?alue=scaledv?alue?scale
縮放可以解釋為相鄰縮放值之間的距離(以實(shí)單位表示)。例如,如果標(biāo)度為π\(zhòng)piπ,則具有此標(biāo)度的定點(diǎn)值只能表示π\(zhòng)piπ的倍數(shù),而不能表示兩者之間的值。將任意實(shí)數(shù)轉(zhuǎn)換為給定值的固定點(diǎn)值的最大舍入誤差scalescale scale is scale2\frac{scale}{2} 2scale?
繼續(xù)上一示例,當(dāng)scale=πscale = \pi scale=π, 最大舍入誤差為π2\frac{\pi}{2} 2π?.
可以對(duì)具有不同比例的縮放值執(zhí)行乘法,使用與實(shí)值乘法相同的算法(注意,乘積縮放值具有KaTeX parse error: Undefined control sequence: \mbox at position 32: … = scale_{left \?m?b?o?x?{ } operand} * …).
可以對(duì)縮放值執(zhí)行加法,只要具有相同的縮放比例,使用相同的實(shí)值加法算法。在計(jì)算機(jī)上有符號(hào)整數(shù)表示縮放值,并對(duì)這些有符號(hào)整數(shù)執(zhí)行算子運(yùn)算變得很方便,因?yàn)榻Y(jié)果將是正確的縮放值。
Affine values
從數(shù)學(xué)上講,仿射值是將實(shí)值零點(diǎn)加到標(biāo)度值上的結(jié)果。或者(等價(jià)地),從仿射值中減去一個(gè)零點(diǎn)得到一個(gè)縮放值:
realvalue=scaledvalue?scale=(affinevalue?zeropoint)?scalereal_value = scaled_value * scale = (affine_value - zero_point) * scale realv?alue=scaledv?alue?scale=(affinev?alue?zerop?oint)?scale
從本質(zhì)上說,仿射值是縮放值的某個(gè)常量的移動(dòng)。算術(shù)(即加法、減法、乘法、除法)通常不能直接對(duì)仿射值執(zhí)行;它們必須首先轉(zhuǎn)換為等效的縮放值。
如上所述,使用仿射值的目的,更有效地表示在計(jì)算過程中實(shí)際遇到的實(shí)際值。將遇到的實(shí)數(shù)值不是圍繞實(shí)數(shù)零對(duì)稱的。假設(shè)在計(jì)算過程中遇到實(shí)零,應(yīng)表示為實(shí)零。
存儲(chǔ)由有符號(hào)整數(shù)表示的縮放值是低效的,因?yàn)槟承┯蟹?hào)整數(shù)永遠(yuǎn)不會(huì)被使用。實(shí)際上,與這些有符號(hào)整數(shù)對(duì)應(yīng)的位模式將被浪費(fèi)。
為了用整數(shù)值仿射值精確地表示實(shí)零,零點(diǎn)必須是最小仿射值和最大仿射值(含)之間的整數(shù)。例如,給定一個(gè)由8位無符號(hào)整數(shù)表示的仿射值,我們有:KaTeX parse error: Can't use function '\u' in math mode at position 11: 0\leq zero\?u? ?point\leq 255。這一點(diǎn)很重要,因?yàn)樵谏疃壬窠?jīng)網(wǎng)絡(luò)的卷積運(yùn)算中,經(jīng)常需要將輸入和輸出歸零,所以零必須是可精確表示的,否則結(jié)果會(huì)有偏差。
Relation
實(shí)值、固定點(diǎn)值和仿射值通過以下等式進(jìn)行關(guān)聯(lián),該等式演示了如何將一種類型的數(shù)字轉(zhuǎn)換為另一種類型:
realvalue=scaledvalue?scale=(affinevalue?zeropoint)?scalereal_value = scaled_value * scale = (affine_value - zero_point) * scale realv?alue=scaledv?alue?scale=(affinev?alue?zerop?oint)?scale
計(jì)算機(jī)通常使用有限位數(shù)存儲(chǔ)數(shù)學(xué)值。雖然上述轉(zhuǎn)換是精確的,但要將結(jié)果存儲(chǔ)在有限的位中,通常必須對(duì)轉(zhuǎn)換結(jié)果進(jìn)行舍入(這兩種情況都適用:使用浮點(diǎn)存儲(chǔ)和使用定點(diǎn)存儲(chǔ))。對(duì)舍入行為的全面討論超出了本文的范圍,除非另有說明,否則可以安全地假設(shè)舍入應(yīng)符合RNE的IEEE754默認(rèn)值(在硬件允許的情況下)。
Converting between real and fixed point or affine
To convert a real value to a fixed point value, we must know the scale. To convert a real value to an affine value, we must know the scale and the zero point.
Real to affine
要將實(shí)值元素的輸入張量(通常由浮點(diǎn)格式表示,通常為單精度),轉(zhuǎn)換為由整數(shù)類型(例如8位無符號(hào)整數(shù))表示的仿射元素張量,可以執(zhí)行以下轉(zhuǎn)換(不需要使用整型的所有可表示值):
KaTeX parse error: No such environment: align* at position 8: \begin{?a?l?i?g?n?*?}? af&fine_value_…
In the above, we assume that realvaluereal_valuerealv?alue is a Single, scalescalescale is a Single, roundToNearestIntegerroundToNearestIntegerroundToNearestInteger returns a signed 32-bit integer, and zeropointzero_pointzerop?oint is an unsigned 8-bit or 16-bit integer.
位深度和定點(diǎn)值的數(shù)目表示典型硬件上的常見類型,但不限于特定位深度或使用N位整數(shù)的整個(gè)范圍的要求。
仿射到實(shí)數(shù)
要將uint8或uint16表示的仿射元素的輸出張量,轉(zhuǎn)換為實(shí)值元素的張量(通常用浮點(diǎn)格式表示,通常為單精度),可以執(zhí)行以下轉(zhuǎn)換:
KaTeX parse error: No such environment: align* at position 8: \begin{?a?l?i?g?n?*?}? re&al_value_{S…
在上面的例子中,假設(shè)減法的結(jié)果,32位有符號(hào)整數(shù)格式,并且roundToNearestFloatroundToNearestFloatroundToNearestFloat返回Single精度。
仿射到不動(dòng)點(diǎn)
當(dāng)仿射標(biāo)度和不動(dòng)點(diǎn)標(biāo)度相同時(shí),從仿射值中減去零點(diǎn)得到等價(jià)的不固定值。
KaTeX parse error: Undefined control sequence: \mbox at position 34: …fine_value_{non\?m?b?o?x?{-}negative} - …
Fixed point to affine
當(dāng)仿射尺度和不動(dòng)點(diǎn)尺度相同時(shí),將零點(diǎn)加到不動(dòng)點(diǎn)的值上,得到等價(jià)的仿射值。
KaTeX parse error: Undefined control sequence: \mbox at position 19: …fine_value_{non\?m?b?o?x?{-}negative} = …
Usage within MLIR
MLIR中正在開發(fā)的量化系統(tǒng)有幾個(gè)內(nèi)容:
Quantization dialect containing:
o A family of QuantizedTypes which represent the mapping between expressed values (typically of a floating point computer type) and storage values (typically of an integral computer type).
o Type conversion ops for converting between types based on a QuantizedType and its expressed and storage sub-types.
o Instrumentation ops for assigning instrumentation points within the computation where runtime statistics may help guide the quantization process.
? Integration with simulated quantization at training time
? TFLite native quantization
o The TFLite op-set natively supports uniform-quantized variants.
o Passes and tools exist to convert directly from the TensorFlow dialect to the TFLite quantized operation set.
并不是所有的量子化應(yīng)用都會(huì)用到所有這些設(shè)置。TensorFlow到TensorFlow Lite的轉(zhuǎn)換,使用QuantizedTypes,但有自己的類型轉(zhuǎn)換算子和支持?jǐn)?shù)學(xué)的表達(dá)式。
Quantization Dialect
Quantized type
TODO: Flesh this section out.
? QuantizedType base class
? UniformQuantizedType
Quantized type conversion operations
? qcast : Convert from an expressed type to QuantizedType
? dcast : Convert from a QuantizedType to its expressed type
? scast : Convert between a QuantizedType and its storage type
Instrumentation and constraint operations
? const_fake_quant : Emulates the logic of the historic TensorFlow fake_quant_with_min_max_args operation.
? stats_ref : Declares that statistics should be gathered at this point with a unique key and made available to future passes of the solver.
? stats : Declares inline statistics (per layer and per axis) for the point in the computation. stats_ref ops are generally converted to statistical operations once trial runs have been performed.
? coupled_ref : Declares points in the computation to be coupled from a type inference perspective based on a unique key.
Integration with simulated quantization at training time
訓(xùn)練時(shí)與模擬量化的集成
TensorFlow歷來使用tf.quantization.fake_quant_模擬訓(xùn)練時(shí),量化效果的算子族。
正如最初實(shí)現(xiàn)的那樣,TensorFlow Lite是推理時(shí)此類操作的主要對(duì)象。當(dāng)啟用量化推斷時(shí),如果每個(gè)合格的張量都經(jīng)過一個(gè)適當(dāng)?shù)膫瘟炕?jié)點(diǎn)(張量可以應(yīng)用偽量化的規(guī)則,多少有些牽扯),那么TensorFlow Lite將使用偽量化操作的屬性,判斷如何從量化算子轉(zhuǎn)換為使用kernel子集。
在基于MLIR的量化中,偽量化算子將它們轉(zhuǎn)換成一個(gè)序列來處理的,該序列是
qcast*(quantize),然后是dcast(dequantize),具有適當(dāng)?shù)?em>UniformQuantizedType作為qcast算子的對(duì)象。

后續(xù)的編譯器傳遞保留量化,以某種方式模擬的知識(shí),同時(shí)允許編譯器靈活地移動(dòng)類型轉(zhuǎn)換,簡(jiǎn)化了計(jì)算,并將其轉(zhuǎn)換為基于積分算子的形式。
允許部分量化的計(jì)算,其中不能簡(jiǎn)化為積分運(yùn)算的部分,仍然以浮點(diǎn)形式執(zhí)行,并在邊界處進(jìn)行適當(dāng)?shù)霓D(zhuǎn)換。
TFLite native quantization
TODO: Flesh this out
General algorithm

  1. Take input min/max information and set the ArrayInfo (which really is InputOrOutputArrayInfo.
  2. In LegalizeTF, convert ArrayInfo min/max to tf.Quantize and tf.Dequantize nodes. (or tf.FakeQuant) Convert all constant FakeQuants to (tf.FQ -> tfl.Q -> tfl.DQ).
  3. Hardcode logic/propagation needs to happen here.
  4. Run TF constant folding.
  5. In PrepareTFL, convert all tf.FQ to (tfl.Q -> tfl.DQ).
  6. Run quantization pass that take (tfl.DQ (for both input and weights) -> op -> tfl.Q) and replaces with (op). Also replace (constant_float -> tfl.Q) with (constant_quant).

總結(jié)

以上是生活随笔為你收集整理的MLIR算子量化Quantization的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。