當(dāng)前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

Ascend Pytorch算子适配层开发

發(fā)布時(shí)間：2023/11/28 生活经验 45 豆豆

生活随笔收集整理的這篇文章主要介紹了 Ascend Pytorch算子适配层开发小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Ascend Pytorch算子適配層開發(fā)
適配方法
找到和PyTorch算子功能對(duì)應(yīng)的NPU TBE算子，根據(jù)算子功能計(jì)算出輸出Tensor的size，再根據(jù)TBE算子原型構(gòu)造對(duì)應(yīng)的input/output/attr，傳遞給ACL完成TBE算子的執(zhí)行。
說明：
TBE算子實(shí)現(xiàn)的源文件存放路徑由開發(fā)套件包Toolkit的安裝方式?jīng)Q定：
? 若使用root用戶安裝，則存放在：/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe/impl/
? 若使用非root用戶安裝，則存放在：~/.local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe/impl/
開發(fā)者可以通過查看算子實(shí)現(xiàn)源文件，確定算子的功能。
存放路徑和命名格式
對(duì)NPU的TBE算子適配文件保存在pytorch/aten/src/ATen/native/npu目錄下，命名風(fēng)格采用大駝峰，命名格式：<算子名> + .cpp，如：AddKernelNpu.cpp。
適配步驟
須知：
適配代碼基于C++開發(fā)。

引入依賴頭文件。
#include “ATen/native/npu/utils/CalcuOpUtil.h”
#include “ATen/native/npu/utils/KernelNpuOutputSize.h”
#include “ATen/native/npu/utils/NpuUtils.h”
說明：
"CalcuOpUtil.h"文件中主要包含與ACL接口相關(guān)的函數(shù)。
"KernelNpuOutputSize.h"中主要包含算子輸出shape的推導(dǎo)函數(shù)。
"NpuUtils.h"文件中主要包含公共能力的函數(shù)。
定義Add算子適配主體函數(shù)。
結(jié)合native_functions.yaml 中 add算子的分發(fā)定義，算子適配中應(yīng)包含如下函數(shù)：
o add_npu_input 構(gòu)造輸入的NPUTensorDesc對(duì)象
o add_npu_output 構(gòu)造輸出的NPUTensorDesc對(duì)象
o add_npu_attr 構(gòu)造NPU TBE Add算子attr屬性
o add_out_npu 算子適配函數(shù)（yaml中npu派發(fā)函數(shù)，支持傳入輸出tensor），other參數(shù)支持 Tensor & Scalar
o add_npu 算子適配函數(shù)(yaml中npu派發(fā)函數(shù))，other參數(shù)支持 Tensor & Scalar
實(shí)現(xiàn)函數(shù) add_npu_input。
將NPU適配函數(shù)(add_npu_input)的輸入構(gòu)造成NPUTensorDesc對(duì)象。
// 輸入?yún)?shù)為"self": “Tensor"和"other”: "Tensor"時(shí)，適配函數(shù)add_npu_input的實(shí)現(xiàn)
SmallVector<NPUTensorDesc, N> add_npu_input(const Tensor& self,const Tensor& other) {
bool isSelfWrapped = CalcuOpUtil::is_scalar_wrapped_to_tensor(self);
bool isOtherWrapped = CalcuOpUtil::is_scalar_wrapped_to_tensor(other);
auto inputs = CalcuOpUtil::create_npu_input_tensor_desc({self, other});

// ‘t + 2’ to work with any type of tensor, not just LongTensor (which is what
// integersin Python represent).
if (isSelfWrapped && (!isOtherWrapped)) {
inputs[0].scalarType = other.scalar_type();
} else if (isOtherWrapped && (!isSelfWrapped)) {
inputs[1].scalarType = self.scalar_type();
}

return inputs;
}
// 輸入?yún)?shù)為"self": “Tensor"和"other”: "Scalar"時(shí)，適配函數(shù)add_npu_input的實(shí)現(xiàn)
SmallVector<NPUTensorDesc, N> add_npu_input(const Tensor& self,const Scalar& other) {
return CalcuOpUtil::create_npu_input_tensor_desc({self});
}
實(shí)現(xiàn)函數(shù) add_npu_output。
將函數(shù) add_npu_output的輸出tensor對(duì)象構(gòu)造成NPUTensorDesc對(duì)象。
// 輸出參數(shù)為 “Tensor” 時(shí)，適配函數(shù)add_npu_output的實(shí)現(xiàn)
SmallVector<NPUTensorDesc, N> add_npu_output(const Tensor& result) {
return CalcuOpUtil::create_npu_output_tensor_desc({result});
}
說明：
一般來說，算子的輸出不需要特殊處理，直接調(diào)用CreateNpuOutputTensorDesc即可。
實(shí)現(xiàn)函數(shù) add_npu_attr。
根據(jù)NPU TBE算子原型中所需的attr規(guī)格，將參數(shù)適配成NPU TBE算子原型所需要的attr屬性。
// 輸入?yún)?shù)為"other": “Tensor"和"alpha”: “Scalar"時(shí)，對(duì)應(yīng)的適配函數(shù)add_npu_attr實(shí)現(xiàn)
SmallVector<NPUAttrDesc, N> add_npu_attr(const Tensor& self, const Tensor& other, Scalar alpha) {
float value = CalcuOpUtil::get_scalar_float_value(alpha);
NPUAttrDesc npuAttrScalar = NPUAttrDesc(“alpha”, value);
SmallVector<NPUAttrDesc, N> attrs = {npuAttrScalar};
return attrs;
}
// 輸入?yún)?shù)為"other”: “Scalar"和"alpha”: "Scalar"時(shí)，對(duì)應(yīng)的適配函數(shù)adds_npu_attr實(shí)現(xiàn)
SmallVector<NPUAttrDesc, N> adds_npu_attr(const Tensor& self,const Scalar& other,const Scalar& alpha) {
float otherValue = CalcuOpUtil::get_scalar_float_value(other);
float alphaValue = CalcuOpUtil::get_scalar_float_value(alpha);
float value = otherValue * alphaValue;
NPUAttrDesc npuAttrValue = NPUAttrDesc(“value”, value);
SmallVector<NPUAttrDesc, N> attrs = {npuAttrValue};
return attrs;
}
實(shí)現(xiàn)函數(shù) add_out_npu。
Tensor& add_out_npu(Tensor& result, const Tensor& self, const Tensor& other, Scalar alpha) {

if (other.dim() == 0 && !other.is_npu()) {

    adds_out_npu(result, self, other.item(), alpha);

} else if (self.dim() == 0 && !self.is_npu()) {

    adds_out_npu(result, other, self.item(), alpha);

```
} else {
```

    // constructs the input and output NPUTensorDesc

    auto inputs = add_npu_input(self, other);

    auto outputs = add_npu_output({result});

    // constructs the attr of the NPUAttrDesc

    auto attrs = add_npu_attr(self, other, alpha);

```
    // executing the NPU operator   
```

    CalcuOpUtil::execute_npu_operate("Axpy", inputs, outputs, attrs);

```
}
```
```
return result;
```

}
說明：
add_out_npu和add_npu的差別是add_out_npu支持顯示指定輸出tensor，往輸出tensor中寫入結(jié)果。
26. 實(shí)現(xiàn)函數(shù) add_npu。
a. 定義并實(shí)現(xiàn)算子的shape推導(dǎo)函數(shù)，根據(jù)輸入?yún)?shù)計(jì)算輸出的size。
Shape推導(dǎo)函數(shù)定義規(guī)范：
“NPU適配函數(shù)名稱” + “" + “output” + "” + “size”，如add_npu_output_size()；
說明：
? Shape推導(dǎo)函數(shù)定義和實(shí)現(xiàn)存放在 pytorch/aten/src/ATen/native/npu/utils，對(duì)應(yīng)的頭文件和實(shí)現(xiàn)在 KernelNpuOutPutSize.h 和 KernelNpuOutPutSize.cpp中。
? 在KernelNpuOutPutSize.h中，函數(shù)存放位置按照函數(shù)名字排序。
//輸入?yún)?shù)為"self": “Tensor"和"other”: "Tensor"時(shí)，Shape推導(dǎo)該函數(shù)
SmallVector<int64_t, SIZE> add_npu_output_size(const Tensor& self,const Tensor& other) {
return broadcast_ops_npu_output_size(self, other); //定義Shape推導(dǎo)函數(shù)
}

// 輸入?yún)?shù)為"self": “Tensor"和"other”: “Scalar"時(shí)，Shape推導(dǎo)該函數(shù)
IntArrayRef add_npu_output_size(const Tensor& self, const Scalar& other) {
return input_same_output_size(self);
}
說明：
broadcast_ops_npu_output_size函數(shù)的作用是：當(dāng)兩個(gè)參數(shù)符合PyTorch廣播機(jī)制時(shí)，函數(shù)會(huì)將兩個(gè)參數(shù)自動(dòng)擴(kuò)展為相等大小
b. 調(diào)用對(duì)應(yīng)的shape推導(dǎo)函數(shù)計(jì)算輸出的size。
c. 根據(jù)輸出的size調(diào)用at::empty_with_ format創(chuàng)建輸出Tensor，函數(shù)支持指定輸出Tensor的format，默認(rèn)為NCHW格式。
說明：
當(dāng)前制定的Format設(shè)置規(guī)則為重型算子錨點(diǎn)擴(kuò)散+連續(xù)性法則混合規(guī)則。
? 重型算子如卷積、Matmul，只支持某種特定format，適配時(shí)顯示指定為其需要的format，format向周邊擴(kuò)散。
? 而連續(xù)性法則指的是算子對(duì)格式不敏感，算子format指定為與第一個(gè)輸入tensor的format相同即可。
? NPU中的卷積只支持NC1HWC0格式，所以需要顯式指定為NC1HWC0格式
d. 將構(gòu)造好的輸出Tensor和其他參數(shù)傳給add_out_npu進(jìn)行運(yùn)算
e. // 輸入?yún)?shù)為"self”: “Tensor"和"other”: “Tensor"時(shí)，對(duì)應(yīng)的適配函數(shù)add_npu實(shí)現(xiàn)
f. //調(diào)用對(duì)應(yīng)的Shape推導(dǎo)函數(shù)計(jì)算輸出的size
g. Tensor add_npu(const Tensor& self, const Tensor& other, Scalar alpha) {
h. Tensor outputTensor = add_dest_output(self, other);
i. auto outputSize = add_npu_output_size(self, other);
j.
k. //根據(jù)輸出的size調(diào)用at::empty_with_format創(chuàng)建輸出Tensor，函數(shù)支持指定輸出Tensor的format，默認(rèn)為NCHW格式
l. Tensor result = at::empty_with_format(outputSize, outputTensor.options(), CalcuOpUtil::get_tensor_npu_format(outputTensor));
m.
n. //將構(gòu)造好的輸出Tensor和其他參數(shù)傳給add_out_npu進(jìn)行運(yùn)算
o. add_out_npu(result, self, other, alpha);
p. return result;
q. }
r.
s. // 輸入?yún)?shù)為"self”: “Tensor"和"other”: "Scalar"時(shí)，對(duì)應(yīng)的適配函數(shù)add_npu實(shí)現(xiàn)
t. //調(diào)用對(duì)應(yīng)的Shape推導(dǎo)函數(shù)計(jì)算輸出的size
u. Tensor add_npu(const Tensor& self, Scalar other, Scalar alpha) {
v. auto outputSize = add_npu_output_size(self, other);
w.
x. //根據(jù)輸出的size調(diào)用at::empty_with_format創(chuàng)建輸出Tensor，函數(shù)支持指定輸出Tensor的format，默認(rèn)為NCHW格式
y. Tensor result = at::empty_with_format(outputSize, self.options(), CalcuOpUtil::get_tensor_npu_format(self));
z.
aa. //將構(gòu)造好的輸出Tensor和其他參數(shù)傳給add_out_npu進(jìn)行運(yùn)算
bb. adds_out_npu(result, self, other, alpha);
cc. return result;
}

總結(jié)

以上是生活随笔為你收集整理的Ascend Pytorch算子适配层开发的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Ascend昇腾计算
下一篇： Ascend Pytorch算子功能验证