當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

利用torch.fx进行后量化

發(fā)布時(shí)間：2024/1/18 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了利用torch.fx进行后量化小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

torch.fx 量化支持——FX GRAPH MODE QUANTIZATION

torch.fx目前支持的量化方式：

Post Training Quantization
- Weight Only Quantization
- Dynamic Quantization
- Static Quantization
Quantization Aware Training
- Static Quantization

其中，Post Training Quantization中的Static Quantization和Dynamic Quantization提供了demo。

與Eager模式對(duì)比

簡(jiǎn)單來(lái)說(shuō)，fx提供一個(gè)Graph模式：

可以自動(dòng)插入量化節(jié)點(diǎn)（如quantize和dequantize），不需要手動(dòng)修改當(dāng)前的network及forward
這個(gè)模式下可以看到forward是怎么被自動(dòng)構(gòu)建的，可以進(jìn)行更精細(xì)的調(diào)整

Graph模式

局限：只有可以轉(zhuǎn)換為符號(hào)的部分（symbolically traceable）可以被量化，Data dependent control flow是不支持的。如果模型有些部分無(wú)法被符號(hào)化，則量化只能在模型的部分上工作，不能被符號(hào)化的部分會(huì)被跳過(guò)。

如果需要這些部分被量化：

重寫代碼讓這些部分symbolically traceable
將這些部分轉(zhuǎn)換成observed和quantized的子模塊

相關(guān)的具體操作見(jiàn)(PROTOTYPE) FX GRAPH MODE QUANTIZATION USER GUIDE。

訓(xùn)練后量化嘗試

環(huán)境準(zhǔn)備：

import torch import copy from torch.quantization import get_default_qconfig from torch.quantization.quantize_fx import prepare_fx, convert_fx, fuse_fx

步驟

準(zhǔn)備訓(xùn)練好的權(quán)重、數(shù)據(jù)及網(wǎng)絡(luò)模型

初始化網(wǎng)絡(luò)，加載訓(xùn)練好的權(quán)重（一般用copy.deepcopy保留原始模型），并將其置于eval模式：

float_model = load_model(saved_model_dir + float_model_file).to("cpu") float_model.eval() model_to_quantize = copy.deepcopy(float_model) model_to_quantize.eval()

指定量化模型的qconfig_dict

qconfig = get_default_qconfig("fbgemm") qconfig_dict = {"": qconfig}

qconfig是QConfig的一個(gè)實(shí)例，QConfig這個(gè)類就是維護(hù)了兩個(gè)observer，一個(gè)是activation所使用的observer，一個(gè)是op權(quán)重所使用的observer。

backendactivationweight

fbgemm (x86)	HistogramObserver (reduce_range=True)	PerChannelMinMaxObserver (default_per_channel_weight_observer)
qnnpack (arm)	HistogramObserver (reduce_range=False)	MinMaxObserver (default_weight_observer)
default	MinMaxObserver (default_observer)	MinMaxObserver (default_weight_observer)

準(zhǔn)備模型并打印模型：

prepared_model = prepare_fx(model_to_quantize, qconfig_dict) print(prepared_model.graph)

模型較準(zhǔn)

def calibrate(model, data_loader):model.eval()with torch.no_grad():for image, target in data_loader:model(image) calibrate(prepared_model, data_loader_test) # run calibration on sample data

量化模型

quantized_model = convert_fx(prepared_model) print(quantized_model)

對(duì)比量化前后，評(píng)估量化效果，包括模型大小、性能、時(shí)延等

總結(jié)

以上是生活随笔為你收集整理的利用torch.fx进行后量化的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

Fx
torch