日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 人文社科 > 生活经验 >内容正文

生活经验

TVM实现hardware backend

發(fā)布時(shí)間:2023/11/28 生活经验 43 豆豆
生活随笔 收集整理的這篇文章主要介紹了 TVM实现hardware backend 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

TVM實(shí)現(xiàn)hardware backend
官方的矩陣相加的示例如下:
2個(gè)矩陣相加的實(shí)現(xiàn)
for (int i = 0; i < n; ++i) {
C[i] = A[i] + B[i];
}
怎么優(yōu)化? 可以并行相加,如下
for (int bx = 0; bx < ceil(n / 64); ++bx) {
for (int tx = 0; tx < 64; ++tx) {
int i = bx * 64 + tx;
if (i < n) {
C[i] = A[i] + B[i];
}
}
}
其實(shí),就是把循環(huán)繼續(xù)拆,一個(gè)循環(huán)拆成2個(gè)循環(huán)。
https://docs.tvm.ai/faq.html
常見問題
如何安裝
請參見安裝TVM。
如何添加新的硬件后端
如果硬件后端支持LLVM,可以通過在target中設(shè)置正確的target三元組,直接生成代碼。
如果目標(biāo)硬件是GPU,嘗試使用cuda、opencl或vulkan后端。
如果目標(biāo)硬件是特殊加速器,檢查VTA:多功能張量加速器,將Codegen導(dǎo)入TVM。
對于上述所有情況,可能希望使用AutoTVM添加特定于目標(biāo)的優(yōu)化模板,參閱使用模板和AutoTVM進(jìn)行自動(dòng)調(diào)整。
除了使用LLVM的矢量化,可以利用硬件內(nèi)部函數(shù),嵌入微內(nèi)核,參閱使用Tensorize利用硬件內(nèi)部函數(shù)。
TVM與其它IR/DSL項(xiàng)目的關(guān)系
在深度學(xué)習(xí)系統(tǒng)中,通常有兩個(gè)層次的IR抽象。TensorFlow的XLA和英特爾的NGRAPH,都使用計(jì)算圖表示。這種表示是高級的,有助于執(zhí)行通用優(yōu)化,如內(nèi)存重用、布局轉(zhuǎn)換和自動(dòng)微分。
TVM采用了一種低層次的表示,明確地表達(dá)了內(nèi)存布局、并行模式、局部性和硬件原語等的選擇。這一層次的IR更接近直接的目標(biāo)硬件。低級IR采用了現(xiàn)有圖像處理語言(如Halide, darkroom)和循環(huán)變換工具(如基于循環(huán)和多面體的分析)的思想。特別關(guān)注于表達(dá)深度學(xué)習(xí)工作負(fù)載(如重復(fù)性)、針對不同硬件后端的優(yōu)化及嵌入框架,提供端到端編譯堆棧。
TVM與libDNN、cuDNN的關(guān)系
TVM可以將這些庫合并為外部調(diào)用。TVM的一個(gè)目標(biāo)是能夠生成高性能內(nèi)核。將以一種漸進(jìn)的方式發(fā)展TVM,因?yàn)閷W(xué)習(xí)手動(dòng)內(nèi)核制作技術(shù),作為DSL中的原語添加。
TVM和其它框架,如NNVM、XLA的區(qū)別?
在tvm看來,nnvm和xla都是計(jì)算圖級別的優(yōu)化,屬于high level優(yōu)化,注意做的是內(nèi)存復(fù)用、布局轉(zhuǎn)換和自動(dòng)微分。
tvm采用的是一種low level的表示,進(jìn)行的是內(nèi)存布局、并行模式、局部性和硬件原語等優(yōu)化。
TVM和libDNN、cuDNN關(guān)系?
tvm會去調(diào)用這些庫,與這些庫共存。
nnvm tensor operator
https://docs.tvm.ai/nnvm_top.html
分5個(gè)級別的op:
? level 1: 基礎(chǔ)op(34個(gè))
? level 2: 卷積op(6個(gè))
? level 3: 其它tensor op(19個(gè))
? level 4: 廣播和約簡op(39個(gè))
? level 5: 視覺op(5個(gè))
VTA:多功能張量加速器
多功能張量加速器(VTA)是一個(gè)開放、通用、可定制的深度學(xué)習(xí)加速器,具有完整的基于TVM的編譯器堆棧。設(shè)計(jì)VTA是為了揭示主流深度學(xué)習(xí)加速器最顯著和最常見的特征。TVM和VTA一起構(gòu)成一個(gè)端到端的軟硬件深度學(xué)習(xí)系統(tǒng)堆棧,包括硬件設(shè)計(jì)、驅(qū)動(dòng)程序、JIT運(yùn)行時(shí)和基于TVM的優(yōu)化編譯器堆棧。
VTA具有以下主要功能:
通用、模塊化、開源硬件。
簡化了部署到FPGA的工作流程。
模擬器對原型編譯的支持在常規(guī)工作站上傳遞。
用于模擬和FPGA硬件后端的基于Pynq的驅(qū)動(dòng)程序和JIT運(yùn)行時(shí)。
端到端TVM堆棧集成。
將Codegen導(dǎo)入TVM
隨著深度學(xué)習(xí)工作負(fù)載所針對的硬件設(shè)備數(shù)量不斷增加,用戶在各種設(shè)備上實(shí)現(xiàn)高性能所需的知識也不斷增加。為了讓數(shù)據(jù)科學(xué)家在開發(fā)新模型時(shí),不必?fù)?dān)心性能問題,硬件后端提供商要么為MKLDNN或cuDNN等庫,提供許多常用的深度學(xué)習(xí)算子,要么提供TensorRT等框架,讓用戶以某種方式描述其模型,實(shí)現(xiàn)高性能。然而,當(dāng)用戶嘗試在新的庫或設(shè)備上工作時(shí),必須學(xué)習(xí)新的編程接口。因此,對統(tǒng)一編程接口的需求變得越來越重要。
1)讓所有用戶和硬件后端提供商站在同一個(gè)頁面上,
2)提供一個(gè)可行的解決方案,允許專用硬件或庫,僅支持廣泛使用的具有極高性能的運(yùn)營商,但將不受支持的算子回退到CPU/GPU等常規(guī)設(shè)備。
將演示作為硬件后端提供商,如何輕松實(shí)現(xiàn)自己的codegen,將注冊為Relay后端編譯器,支持硬件設(shè)備/庫。介紹了兩種基于所需不同圖形表示的codegen:

  1. 生成C代碼。
    如果硬件已經(jīng)有一個(gè)經(jīng)過良好優(yōu)化的C/C++庫,如Intel CBLAS/MKL到CPU和NVIDIA CUBLAS到GPU,那么這就是需要的。幸運(yùn)的是,C源代碼模塊與TVM運(yùn)行時(shí)模塊完全兼容,生成的代碼可以由任何具有適當(dāng)編譯標(biāo)志的C/C++編譯器編譯,因此唯一的任務(wù)是實(shí)現(xiàn)一個(gè)codegen,為子圖生成C代碼,實(shí)現(xiàn)一個(gè)C源模塊,集成到TVM運(yùn)行時(shí)模塊中。在下一節(jié)中,將演示如何為硬件實(shí)現(xiàn)C代碼生成器。

  2. 希望生成任何其它圖形表示。

硬件可能需要其它形式的圖形表示,如JSON。在這種情況下,不僅需要實(shí)現(xiàn)codegen,還需要實(shí)現(xiàn)定制的TVM Runtime模塊,以讓TVM Runtime知道應(yīng)該如何實(shí)現(xiàn)圖形表示。如果已經(jīng)有一個(gè)完整的用于硬件圖形實(shí)現(xiàn)引擎,如GPU的TyRoSRT,這是一個(gè)可以考慮的解決方案。
完成codegen和Runtime后,可以讓用戶使用自定義標(biāo)記,對模型進(jìn)行注釋利用。最終用戶注釋和啟動(dòng)特定codegen的教程在這里(TBA)。
實(shí)現(xiàn)一個(gè)C代碼生成器
在這一部分中,將演示如何實(shí)現(xiàn)一個(gè)codegen,該codegen使用預(yù)實(shí)現(xiàn)的算子函數(shù),生成C代碼。為了簡化,示例codegen不依賴于第三方庫。相反,在C中手動(dòng)實(shí)現(xiàn)兩個(gè)宏:
#define CSOURCE_BINARY_OP_1D(p_ID_, p_OP_, p_DIM1_)
extern “C” void p_ID_(float* a, float* b, float* out) {
for (int64_t i = 0; i < p_DIM1_; ++i) {
out[i] = a[i] p_OP_ b[i];
}
}

#define CSOURCE_BINARY_OP_2D(p_ID_, p_OP_, p_DIM1_, p_DIM2_)
extern “C” void p_ID_(float* a, float* b, float* out) {
for (int64_t i = 0; i < p_DIM1_; ++i) {
for (int64_t j = 0; j < p_DIM2_; ++j) {
int64_t k = i * p_DIM2_ + j;
out[k] = a[k] p_OP_ b[k];
}
}
}
使用這兩個(gè)宏,可以生成一維和二維張量的二元運(yùn)算符。如給定一個(gè)子圖,如下所示。假設(shè)所有輸入都是shape為(10,10)的二維張量。
c_compiler_input0
|
add <-- c_compiler_input1
|
subtract <-- c_compiler_input2
|
multiply <-- c_compiler_input3
|
out
目標(biāo)是生成以下可編譯代碼,實(shí)現(xiàn)子圖:
#include <tvm/runtime/c_runtime_api.h>
#include <tvm/runtime/packed_func.h>
#include <dlpack/dlpack.h>
#include
#include
#include

#define GCC_BINARY_OP_1D(p_ID_, p_OP_, p_DIM1_)
extern “C” void p_ID_(float* a, float* b, float* out) {
for (int64_t i = 0; i < p_DIM1_; ++i) {
out[i] = a[i] p_OP_ b[i];
}
}

#define GCC_BINARY_OP_2D(p_ID_, p_OP_, p_DIM1_, p_DIM2_)
extern “C” void p_ID_(float* a, float* b, float* out) {
for (int64_t i = 0; i < p_DIM1_; ++i) {
for (int64_t j = 0; j < p_DIM2_; ++j) {
int64_t k = i * p_DIM2_ + j;
out[k] = a[k] p_OP_ b[k];
}
}
}

// Note 1
GCC_BINARY_OP_2D(gcc_0_0, *, 10, 10);
GCC_BINARY_OP_2D(gcc_0_1, -, 10, 10);
GCC_BINARY_OP_2D(gcc_0_2, +, 10, 10);

// Note 2
extern “C” void gcc_0_(float* gcc_input0, float* gcc_input1,
float* gcc_input2, float* gcc_input3, float* out) {
float* buf_0 = (float*)malloc(4 * 100);
float* buf_1 = (float*)malloc(4 * 100);
gcc_0_2(gcc_input0, gcc_input1, buf_0);
gcc_0_1(buf_0, gcc_input2, buf_1);
gcc_0_0(buf_1, gcc_input3, out);
free(buf_0);
free(buf_1);
}

// Note 3
extern “C” int gcc_0_wrapper(DLTensor* arg0, DLTensor* arg1, DLTensor* arg2,
DLTensor* arg3, DLTensor* out) {
gcc_0_(static_cast<float*>(arg0->data), static_cast<float*>(arg1->data),
static_cast<float*>(arg2->data), static_cast<float*>(arg3->data),
static_cast<float*>(out->data));
return 0;
}
TVM_DLL_EXPORT_TYPED_FUNC(gcc_0, gcc_0_wrapper);
Here we highlight the notes marked in the above code:
? Note 1 is the function implementation for the three nodes in the subgraph.
? Note 2 is a function to execute the subgraph by allocating intermediate buffers and invoking corresponding functions.
? Note 3 is a TVM runtime compatible wrapper function. It accepts a list of input tensors and one output tensor (the last argument), casts them to the right data type, and invokes the subgraph function described in Note 2. In addition, TVM_DLL_EXPORT_TYPED_FUNC is a TVM macro that generates another function gcc_0 with unified the function arguments by packing all tensors to TVMArgs. As a result, the TVM runtime can directly invoke gcc_0 to execute the subgraph without additional efforts. With the above code generated, TVM is able to compile it along with the rest parts of the graph and export a single library for deployment.
In the rest of this section, we will implement a codegen step-by-step to generate the above code. Your own codegen has to be located at src/relay/backend/contrib//. In our example, we name our codegen “codegen_c” and put it under /src/relay/backend/contrib/codegen_c/. Feel free to check this file for a complete implementation.
Specifically, we are going to implement two classes in this file and here is their relationship:
subgraph subgraph
TVM backend -----------------------------> CSourceCodegen -------------> CodegenC
^ | ^ |
| | | |
---------------------------------------- ------------------------
generated C source runtime module generated C code
When TVM backend finds a function (subgraph) in a Relay graph is annotated with the registered compiler tag (ccompiler in this example), TVM backend invokes CSourceCodegen and passes the subgraph. CSourceCodegen’s member function CreateCSourceModule will 1) generate C code for the subgraph, and 2) wrap the generated C code to a C source runtime module for TVM backend to compile and deploy. In particular, the C code generation is transparent to the CodegenC class because it provides many useful utilities to ease the code generation implementation. The following sections will implement these two classes in the bottom-up order.
Implement CodegenC
In src/relay/backend/contrib/codegen_c/codegen.cc, we first create a codegen class skeleton under the namespace of tvm.relay.contrib:
#include <tvm/relay/expr_functor.h>
#include <tvm/relay/transform.h>
#include <tvm/relay/type.h>
#include <tvm/runtime/module.h>
#include <tvm/runtime/object.h>

#include
#include

#include “codegen_c.h”

namespace tvm {
namespace relay {
namespace contrib {

class CodegenC : public ExprVisitor, public CodegenCBase {
public:
explicit CodegenC(const std::string& id) { this->ext_func_id_ = id; }

void VisitExpr_(const VarNode* node) { ; }
void VisitExpr_(const CallNode* call) final { ; }
std::string JIT() { ; }

private:
/*! \brief The function id that represents a C source function. /
std::string ext_func_id_ = “”;
/
! \brief The index of a wrapped C function. /
int func_idx = 0;
/
! \brief The index of allocated buffers. /
int buf_idx_ = 0;
/
! \brief The arguments of a C compiler compatible function. /
std::vectorstd::string ext_func_args_;
/
! \brief The statements of a C compiler compatible function. /
std::vectorstd::string ext_func_body;
/
! \brief The declaration statements of a C compiler compatible function. /
std::vectorstd::string func_decl_;
/
! \brief The declaration statements of buffers. /
std::vectorstd::string buf_decl_;
/
! \brief The name and index pairs for output. /
std::vector<std::pair<std::string, int>> out_;
}
The CodegenC class inherits two classes: ExprVisitor provides abilities to traverse subgraphs and collects the required information and generate subgraph functions such as gcc_0_; CodegenCBase provides abilities and utilities to generate wrapper functions such as gcc_0 in the above example. As can be seen, we only need to implement three functions in this codegen class to make it work.
Code Generation for Operators
We first implement VisitExpr_(const CallNode
call). This function visits all call nodes when traversing the subgraph. Each call node contains an operator that we want to offload to your hardware. As a result, we need to generate the corresponding C code with correct operators in topological order. We implement this function step-by-step as follows.

  1. Generate the function declaration
    Example Result: GCC_BINARY_OP_2D(gcc_0_0, *, 10, 10);
    To generate the function declaration, as shown above, we need 1) a function name (e.g., gcc_0_0), 2) the type of operator (e.g., *), and 3) the input tensor shape (e.g., (10, 10)). Fortunately, this information can be obtained easily from CallNode:
    std::ostringstream macro_stream;
    std::ostringstream decl_stream;
    std::ostringstream buf_stream;

// Generate a unique function name you like.
std::string func_name = ext_func_id_ + “_” + std::to_string(func_idx++);

// Make function declaration string.
macro_stream << “CSOURCE_BINARY_OP_” << call->args.size() << “D(” << func_name << ", ";

// Check the operator type.
if (IsOp(call, “add”)) {
macro_stream << “+”;
} else if (IsOp(call, “subtract”)) {
macro_stream << “-”;
} else if (IsOp(call, “multiply”)) {
macro_stream << “*”;
} else {
LOG(FATAL) << “Unrecognized op”;
}

// Extract the input tensor shape.
auto in_shape = GetShape(call->args[0]->checked_type());
for (size_t i = 0; i < in_shape.size(); ++i) {
macro_stream << ", " << in_shape[i];
}
macro_stream << “);”;
func_decl_.push_back(macro_stream.str());
As can be seen, we push the generated code to class member variables func_decl_. It means after we finish traversing the entire subgraph, we have collected all required function declarations and the only thing we need to do is having them compiled by GCC. The rest implementation of VisitExpr_(const CallNode* call) also follow this concept.
2. Generate the function call
Example Result: gcc_0_0(buf_1, gcc_input3, out);
After generating the function declaration, we need to generate a function call with proper inputs and outputs. To know which inputs or buffers we should put when calling this function, we have to visit its arguments:
bool first = true;
decl_stream << func_name << “(”;
for (size_t i = 0; i < call->args.size(); ++i) {
VisitExpr(call->args[i]); // Note 1
for (auto out : out_) {
if (!first) {
decl_stream << ", ";
}
first = false;
decl_stream << out.first;
}
}
// Note 2
Again, we want to highlight the notes in the above code:
Note 1: VisitExpr(call->args[i]) is a recursive call to visit arguments of the current function. An argument could be an output of another node or an input tensor. In our example implementation, we make sure every node updates a class variable out_ before leaving the visitor. Here is an illustration:
arg_node arg_node <- Visit arg (Note 1) arg_node
| | |
curr_node <- Process curr_node curr_node <- Put “buf_0” as an input buffer

(a) out_ = {} (b) out_ = {} ? out_ = {(“buf_0”, 20)}
We can see in the above figure, class variable out_ is empty before visiting the argument node, and it was filled with the output buffer name and size of arg_node. As a result, when we finished visiting the argument node, we know the proper input buffer we should put by looking at out_. You will find out how we update out_ at the end of this section as well as the next section.
Note 2: You may notice that we did not close the function call string in this step. The current function call string looks like: gcc_0_0(buf_1, gcc_input3. This is because we have not put the last argument (i.e., the output) to this call. The output of a function call could be either an allocated temporary buffer or the subgraph output tensor. For simplify, in this example, we allocate an output buffer for every call node (next step) and copy the result in the very last buffer to the output tensor.
3. Generate the output buffer
Example Result: float* buf_0 = (float*)malloc(4 * 100);
As mentioned in the previous step, in addition to the subgraph input and output tensors, we may also need buffers to keep the intermediate results. To generate the buffer, we extract the shape information to determine the buffer type and size:
// This example only supports single output.
auto type_node = call->checked_type().as();
ICHECK(type_node != nullptr && runtime::TypeMatch(type_node->dtype, kDLFloat, 32))
<< “Only support single output tensor with float type”;

// Generate a unique buffer name.
std::string out = “buf_” + std::to_string(buf_idx_++);

// Extract the shape to be the buffer size.
auto out_shape = GetShape(call->checked_type());
int out_size = 1;
for (size_t i = 0; i < out_shape.size(); ++i) {
out_size *= out_shape[i];
}

// Make the buffer allocation and push to the buffer declarations.
buf_stream << "float* " << out << " = (float*)std::malloc(4 * " << out_size << “);”;
buf_decl_.push_back(buf_stream.str());
After we have allocated the output buffer, we can now close the function call string and push the generated function call to a class variable ext_func_body.
decl_stream << ", " << out << “);”;
ext_func_body.push_back(decl_stream.str());
4. Update output buffer
To let the next node, which accepts the output of the current call node as its input, know which buffer it should take, we need to update the class variable out_ before leaving this visit function:
out_.clear();
out_.push_back({out, out_size});
Congratulations! we have finished the most difficult function in this class. In the next two sections, we just need to make up some minor missing parts in this function.
Code Generation for Input Variables
Recall that we collected the input buffer information by visiting the arguments of a call node (2nd step in the previous section), and handled the case when its argument is another call node (4th step). In this section, we demonstrate how to handle other nodes by taking VarNode as an example.
VarNode represents input tensors in a model. The only but important information it has is a name hint (e.g., data, weight, etc). When visiting a VarNode, we simply update class variable out_ to pass the name hint so that the descendant call nodes can generate the correct function call.
void VisitExpr_(const VarNode* node) {
ext_func_args_.push_back(node->name_hint());
out_.clear();
out_.push_back({node->name_hint(), 0});
}
Note that in this example we assume the subgraph we are offloading has only call nodes and variable nodes. If your subgraphs contain other types of nodes, such as TupleNode, then you also need to visit them and bypass the output buffer information.
Code Emitting
The final part in this codegen class is a JIT function that emits a C function for the subgraph and uses the C code we just generated as the function body. Remember, in addition to the subgraph function we generated in the previous sections, we also need a wrapper function with a unified argument for TVM runtime to invoke and pass data. Fortunately, the base class we inherited already provides an implementation, JitImpl, to generate the function. For example, we can invoke JitImpl as follows:
JitImpl(“gcc_0” /* Subgraph symbol (ID) /,
{“gcc_input0”, “gcc_input1”, “gcc_input2”, “gcc_input3”} /
Input arguments /,
{“float buf_0 = (float)malloc(4 * 20)”, …} /
Buffer allocations /,
{“gcc_0_2(gcc_input0, gcc_input1, buf_0);”} /
Function body /,
{“out”} /
Output */);
The above call will generate three functions (one from the TVM wrapper macro):

  1. The subgraph function gcc_0_ (with one more underline at the end of the function name) with all C code we generated to execute a subgraph.
  2. The wrapper function gcc_0__wrapper_ with a list of DLTensor arguments that casts data to the right type and invokes gcc_0_.
  3. The TVM runtime compatible function gcc_0 with TVM unified function arguments that unpacks TVM packed tensors and invokes gcc_0__wrapper_.
    Accordingly, the only thing we need in JIT implementation is passing all subgraph function code we generated to JitImpl:
    std::string JIT() {
    // Write function macros
    for (auto decl : func_decl_) {
    code_stream_ << decl << “\n”;
    }
    return JitImpl(ext_func_id_, ext_func_args_, buf_decl_, ext_func_body, out_);
    }
    All variables (ext_func_id, etc) we passed are class variables and were filled when we traversed the subgraph.
    Implement CSourceCodegen
    Again, let’s create a class skeleton and implement the required functions. Note that it inherits CSourceModuleCodegenBase
    class CSourceCodegen : public CSourceModuleCodegenBase {
    public:
    // Pass a subgraph function, and generate the C code.
    void GenCFunc(const Function& func) { ; }

// Use GenCFunc to generate the C code and wrap it as a C source module.
runtime::Module CreateCSourceModule(const NodeRef& ref) override { ; }

private:
std::ostringstream code_stream_;
};
Implement GenCFunc
GenCFunc simply uses the CodegenC we just implemented to traverse a Relay function (subgraph) and obtains the generated C code. The builtin function GetExtSymbol retrieves a unique symbol name (e.g., gcc_0) in the Relay function and we must use it as the C function name, because this symbol is going to be used for DSO runtime lookup.
void GenCFunc(const Function& func) {
ICHECK(func.defined()) << “Input error: expect a Relay function.”;

// Record the external symbol for runtime lookup.
auto sid = GetExtSymbol(func);

CodeGenC builder(sid);
builder.VisitExpr(func->body);
code_stream_ << builder.JIT();
}
Implement CreateCSourceModule
This function creates a runtime module for the external library. In this example, we create a CSourceModule that can be directly compiled and linked together with a TVM generated DSOModule. After you have implemented CodegenC, implementing this function is relatively straightforward:
runtime::Module CreateCSourceModule(const NodeRef& ref) override {
// Create headers
code_stream_ << “#include \n”;
code_stream_ << “#include \n”;
code_stream_ << “#include \n”;
code_stream_ << “#include <stdio.h>\n”;
code_stream_ << “#include \n”;
code_stream_ << “#include <tvm/runtime/c_runtime_api.h>\n”;
code_stream_ << “#include <dlpack/dlpack.h>\n”;

// Append some common macro for operator definition.
const char* operator_macro = R"op_macro(
#define CSOURCE_BINARY_OP_1D(p_ID_, p_OP_, p_DIM1_)
extern “C” void p_ID_(float* a, float* b, float* out) {
for (int64_t i = 0; i < p_DIM1_; ++i) {
out[i] = a[i] p_OP_ b[i];
}
}

#define CSOURCE_BINARY_OP_2D(p_ID_, p_OP_, p_DIM1_, p_DIM2_)
extern “C” void p_ID_(float* a, float* b, float* out) {
for (int64_t i = 0; i < p_DIM1_; ++i) {
for (int64_t j = 0; j < p_DIM2_; ++j) {
int64_t k = i * p_DIM2_ + j;
out[k] = a[k] p_OP_ b[k];
}
}
}
)op_macro";

code_stream_ << operator_macro << “\n\n”;

// Generate C code for the subgraph.
if (ref->IsInstance()) {
GenCFunc(Downcast(ref));
} else if (ref->IsInstancerelay::ModuleNode()) {
relay::Module mod = Downcastrelay::Module(ref);
for (const auto& it : mod->functions) {
GenCFunc(Downcast(it.second));
}
} else {
LOG(FATAL) << “The input ref is expected to be a Relay function or module”
<< “\n”;
}

// Create a CSourceModule
const auto* pf = runtime::Registry::Get(“module.csource_module_create”);
ICHECK(pf != nullptr) << “Cannot find csource module to create the external runtime module”;
return (pf)(code_stream_.str(), “cc”);
}
Register Your Codegen
The last step is registering your codegen to TVM backend. We first implement a simple function to invoke our codegen and generate a runtime module.
runtime::Module CCompiler(const NodeRef& ref) {
CSourceCodegen csource;
return csource.CreateCSourceModule(ref);
}
Finally, we register this function to TVM backend:
TVM_REGISTER_GLOBAL(“relay.ext.ccompiler”).set_body_typed(CCompiler);
where ccompiler is a customized tag to let TVM know this is the codegen it should use to generate and offload subgraphs when the subgraph is annotated with ccompiler.
Finally, a good practice is to set up a CMake configuration flag to include your compiler only for your customers. We first create a cmake file: cmake/modules/contrib/CODEGENC.cmake:
if(USE_CODEGENC)
file(GLOB CSOURCE_RELAY_CONTRIB_SRC src/relay/backend/contrib/codegen_c/codegen.cc)
list(APPEND COMPILER_SRCS ${CSOURCE_RELAY_CONTRIB_SRC})
endif(USE_CODEGENC)
So that users can configure whether to include your compiler when configuring TVM using config.cmake:
set(USE_CODEGENC ON)
Implement a Codegen for Your Representation
Although we have demonstrated how to implement a C codegen, your hardware may require other forms of graph representation, such as JSON. In this case, you could modify CodegenC class we have implemented to generate your own graph representation and implement a customized runtime module to let TVM runtime know how this graph representation should be executed.
To simplify, we define a graph representation named “ExampleJSON” in this guide. ExampleJSON does not mean the real JSON but just a simple representation for graphs without a control flow. For example, assuming we have the following subgraph named subgraph_0:
input0
|
add <-- input1
|
subtract <-- input2
|
multiply <-- input3
|
out
Then the ExampleJON of this subgraph looks like:
subgraph_0
input 0 10 10
input 1 10 10
input 2 10 10
input 3 10 10
add 4 inputs: 0 1 shape: 10 10
sub 5 inputs: 4 2 shape: 10 10
mul 6 inputs: 5 3 shape: 10 10
The input keyword declares an input tensor with its ID and shape; while the other statements describes computations in inputs: [input ID] shape: [shape] syntax.
In this section, our goal is to implement the following customized TVM runtime module to execute ExampleJSON graphs.
runtime::Module ExampleJsonCompiler(const NodeRef& ref) {
ExampleJsonCodeGen codegen(ref);
std::string code = codegen.gen(); // Note 1
const auto
pf = runtime::Registry::Get(“module.examplejson_module_create”); // Note 2
ICHECK(pf != nullptr) << “Cannot find ExampleJson module to create the external runtime module”;
return (*pf)(code);
}
TVM_REGISTER_GLOBAL(“relay.ext.examplejsoncompiler”).set_body_typed(ExampleJsonCompiler);
Note 1: We will implement a customized codegen later to generate a ExampleJSON code string by taking a subgraph.
Note 2: This line obtains a pointer to a function for creating the customized runtime module. You can see that it takes subgraph code in ExampleJSON format we just generated and initializes a runtime module.
In the following sections, we are going to introduce 1) how to implement ExampleJsonCodeGen and 2) how to implement and register examplejson_module_create.
Implement ExampleJsonCodeGen
Similar to the C codegen, we also derive ExampleJsonCodeGen from ExprVisitor to make use of visitor patterns for subgraph traversing. On the other hand, we do not have to inherit CodegenCBase because we do not need TVM C++ wrappers. The codegen class is implemented as follows:
#include <tvm/relay/expr_functor.h>
#include <tvm/relay/transform.h>
#include <tvm/relay/type.h>
#include <tvm/runtime/module.h>
#include <tvm/runtime/object.h>

#include
#include

namespace tvm {
namespace relay {
namespace contrib {

class ExampleJsonCodeGen : public ExprVisitor {
public:
explicit ExampleJsonCodeGen();

// Note 1
void VisitExpr_(const VarNode* node) { /* Skip in this example. */ }
void VisitExpr_(const CallNode* call) final { /* Skip in this example. */ }// Note 2
std::string gen(NodeRef& ref) {this->code = "";if (ref->IsInstance<FunctionNode>()) {this->visit(Downcast<Function>(ref));} else if (ref->IsInstance<relay::ModuleNode>()) {relay::Module mod = Downcast<relay::Module>(ref);for (const auto& it : mod->functions) {this->visit(Downcast<Function>(it.second));}} else {LOG(FATAL) << "The input ref is expected to be a Relay function or module";}return this->code;
}

private:
/*! \brief The function id that represents a C source function. */
std::string code;
}
Note 1: We again implement corresponding visitor functions to generate ExampleJSON code and store it to a class variable code (we skip the visitor function implementation in this example as their concepts are basically the same as C codegen). After finished the graph visiting, we should have an ExampleJSON graph in code.
Note 2: We define an internal API gen to take a subgraph and generate a ExampleJSON code. This API can be in an arbitrary name you prefer.
The next step is to implement a customized runtime to make use of the output of ExampleJsonCodeGen.
Implement a Customized Runtime
In this section, we will implement a customized TVM runtime step-by-step and register it to TVM runtime modules. The customized runtime should be located at src/runtime/contrib//. In our example, we name our runtime “example_ext_runtime”.
Again, we first define a customized runtime class as follows. The class has to be derived from TVM ModuleNode in order to be compatible with other TVM runtime modules.
#include <dmlc/logging.h>
#include <tvm/runtime/c_runtime_api.h>
#include <tvm/runtime/memory.h>
#include <tvm/runtime/module.h>
#include <tvm/runtime/ndarray.h>
#include <tvm/runtime/object.h>
#include <tvm/runtime/packed_func.h>
#include <tvm/runtime/registry.h>

#include
#include
#include
#include
#include
#include

namespace tvm {
namespace runtime {
class ExampleJsonModule : public ModuleNode {
public:
explicit ExampleJsonModule(std::string graph_json);

PackedFunc GetFunction(const std::string& name,
const ObjectPtr& sptr_to_self) final;

const char* type_key() const { return “examplejson”; }

void SaveToBinary(dmlc::Stream* stream) final;

static Module LoadFromBinary(void* strm);

static Module Create(const std::string& path);

std::string GetSource(const std::string& format = “”);

void Run(int id, const std::vector& inputs, int output);

void ParseJson(const std::string& json);

private:
/* \brief The json string that represents a computational graph. /
std::string graph_json_;
/
\brief The subgraph that being processed. /
std::string curr_subgraph_;
/
! \brief A simple graph from subgraph id to node entries. /
std::map<std::string, std::vector > graph_;
/
\brief A simple pool to contain the tensor for each node in the graph. /
std::vector data_entry_;
/
\brief A mapping from node id to op name. */
std::vectorstd::string op_id_;
};
In particular, there are some functions derived from ModuleNode that we must implement in ExampleJsonModule:
? Constructor: The constructor of this class should accept a subgraph (in your representation), process and store it in any format you like. The saved subgraph could be used by the following two functions.
? GetFunction: This is the most important function in this class. When TVM runtime wants to execute a subgraph with your compiler tag, TVM runtime invokes this function from your customized runtime module. It provides the function name as well as runtime arguments, and GetFunction should return a packed function implementation for TVM runtime to execute.
? SaveToBinary and LoadFromBinary: SaveToBinary serialize the runtime module to a binary format for later deployment. This function will be called by TVM when users use export_library API. On the other hand, since we are now using our own graph representation, we have to make sure that LoadFromBinary is able to construct the same runtime module by taking the serialized binary generated by SaveToBinary.
? GetSource (optional): If you would like to see the generated ExampleJSON code, you can implement this function to dump it; otherwise you can skip the implementation.
Other functions and class variables will be introduced along with the implementation of above must-have functions.
Implement Constructor
explicit ExampleJsonModule(std::string graph_json) {
this->graph_json_ = graph_json;
ParseJson(this->graph_json_);
}
Then, we implement ParseJson to parse a subgraph in ExampleJSON format and construct a graph in memory for later usage. Since we do not support subgraph with branches in this example, we simply use an array to store every nodes in a subgraph in order.
void ParseJson(const std::string& json) {
std::string line;
std::string curr_subgraph;
std::stringstream ss(json);

while (std::getline(ss, line, ‘\n’)) {
std::stringstream ss2(line);
std::string token;
int id = 0;

ss2 >> token;
if (token.find("subgraph_") != std::string::npos) {curr_subgraph = token;continue;
}ss2 >> id;
if (op_id_.size() <= static_cast<size_t>(id)) {op_id_.resize(id + 1);data_entry_.resize(id + 1);
}int64_t total_elements = 1;
std::vector<int64_t> shape;
if (token == "input") {int64_t size = 0;while (ss2 >> size) {total_elements *= size;shape.push_back(size);}
} else {op_id_[id] = token; // Note 1bool shape_data = false;NodeEntry entry;while (ss2 >> token) {if (token == "shape:") {shape_data = true;} else if (shape_data) {total_elements *= std::stoll(token);shape.push_back(std::stoll(token));} else if (token != "inputs:") {entry.inputs.push_back(std::stoi(token));}}entry.id = id;entry.output = id;graph_[curr_subgraph].push_back(entry); // Note 2
}
DLDevice dev;
dev.device_type = static_cast<DLDeviceType>(1);
dev.device_id = 0;
data_entry_[id] = NDArray::Empty(shape, DLDataType{kDLFloat, 32, 1}, dev); // Note 3

}
}
Note 1: We use a class variable op_id_ to map from subgraph node ID to the operator name (e.g., add) so that we can invoke the corresponding operator function in runtime.
Note 2: We use a class variable graph_ to map from subgraph name to an array of nodes. GetFunction will query graph nodes by a subgraph ID in runtime.
Note 3: We use a class variable data_entry_ to map from a subgraph node ID to a tensor data placeholder. We will put inputs and outputs to the corresponding data entry in runtime.
Implement GetFunction
After the construction, we should have the above class variables ready. We then implement GetFunction to provide executable subgraph functions to TVM runtime:
PackedFunc GetFunction(const std::string& name,
const ObjectPtr& sptr_to_self) final {
if (this->graph_.find(name) != this->graph_.end()) {
this->curr_subgraph_ = name;
return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {

  // Copy input tensors to corresponding data entries.for (auto i = 0; i < args.size(); ++i) {ICHECK(args[i].type_code() == kNDArrayContainer || args[i].type_code() == kArrayHandle)<< "Expect NDArray or DLTensor as inputs\n";if (args[i].type_code() == kArrayHandle) {DLTensor* arg = args[i];this->data_entry_[i].CopyFrom(arg);} else {NDArray arg = args[i];this->data_entry_[i].CopyFrom(arg);}}// Execute the subgraph.for (const auto& it : this->graph_[this->curr_subgraph_]) {this->Run(it.id, it.inputs, it.output);}ICHECK_GT(graph_.count(this->curr_subgraph_), 0U);// Copy the output from a data entry back to TVM runtime argument.auto out_idx = graph_[this->curr_subgraph_].back().output;if (args[args.size() - 1].type_code() == kArrayHandle) {DLTensor* arg = args[args.size() - 1];this->data_entry_[out_idx].CopyTo(arg);} else {NDArray arg = args[args.size() - 1];this->data_entry_[out_idx].CopyTo(arg);}*rv = data_entry_.back();
});

} else {
LOG(FATAL) << "Unknown subgraph: " << name << “\n”;
return PackedFunc();
}
}
As can be seen, GetFunction is composed of three major parts. The first part copies data from TVM runtime arguments to the corresponding data entries we assigned in the constructor. The second part executes the subgraph with Run function (will implement later) and saves the results to another data entry. The third part copies the results from the output data entry back to the corresponding TVM runtime argument for output.
Implement Run
Now let’s implement Run function. This function accepts 1) a subgraph ID, 2) a list of input data entry indexs, and 3) an output data entry index.
void Run(int id, const std::vector& inputs, int output) {
// Make a list data entry indexs.
std::vector args(inputs.begin(), inputs.end());
args.push_back(output);

// Initialize data holders.
std::vector values(args.size());
std::vector type_codes(args.size());

// Initialize a TVM arg setter with TVMValue and its type code.
TVMArgsSetter setter(values.data(), type_codes.data());

// Set each argument to its corresponding data entry.
if (op_id_[id] == “add” || op_id_[id] == “sub” || op_id_[id] == “mul”) {
for (size_t i = 0; i < args.size(); i++) {
setter(i, data_entry_[args[i]]);
}
}

// Invoke the corresponding operator function.
if (op_id_[id] == “add”) {
Add(values.data(), type_codes.data(), args.size());
} else if (op_id_[id] == “sub”) {
Sub(values.data(), type_codes.data(), args.size());
} else if (op_id_[id] == “mul”) {
Mul(values.data(), type_codes.data(), args.size());
} else {
LOG(FATAL) << "Unknown op: " << op_id_[id] << “\n”;
}
}
Run function mainly has two parts. The first part allocates a list of TVMValue, and maps corresponding data entry blocks. This will become the arguments of our operator functions. The second part than invokes our operator functions. Although we use the same C functions as the previous example, you can replace Add, Sub, and Mul with your own engine. You only need to make sure your engine stores the results to the last argument so that they can be transferred back to TVM runtime.
With above functions implemented, our customized codegen and runtime can now execute subgraphs. The last step is registering an API (examplejson_module_create) to create this module:
TVM_REGISTER_GLOBAL(“module.examplejson_module_create”)
.set_body_typed([](std::string code){
auto n = make_object(code);
return runtime::Module(n);
});
Implement SaveToBinary and LoadFromBinary
So far we have implemented the main features of a customized runtime so that it can be used as other TVM runtimes. However, when users want to save the built runtime to a disk for deployment, TVM has no idea about how to save it. This is the reason we want to implement SaveToBinary and LoadFromBinary, which tell TVM how should this customized runtime be persist and restored.
We first implement SaveToBinary function to allow users to save this module in disk.
void SaveToBinary(dmlc::Stream* stream) final {
stream->Write(this->graph_json_);
}
We can find that this function is pretty simple. Recall that the only argument we took in constructor is a subgraph representation, meaning that we only need a subgraph representation to construct/recover this customized runtime module. As a result, SaveToBinary simply writes the subgraph to an output DMLC stream. That is, when users use export_library API to export the module, the customized module will be an ExampleJSON stream of a subgraph.
Similarity, LoadFromBinary reads the subgraph stream and re-constructs the customized runtime module:
static Module LoadFromBinary(void* strm) {
dmlc::Stream* stream = static_castdmlc::Stream*(strm);
std::string graph_json;
stream->Read(&graph_json);
auto n = tvm::runtime::make_object(graph_json);
return Module(n);
}
We also need to register this function to enable the corresponding Python API:
TVM_REGISTER_GLOBAL(“module.loadbinary_examplejson”)
.set_body_typed(ExampleJsonModule::LoadFromBinary);
The above registration means when users call tvm.runtime.load_module(lib_path) API and the exported library has an ExampleJSON stream, our LoadFromBinary will be invoked to create the same customized runtime module.
In addition, if you want to support module creation directly from an ExampleJSON file, you can also implement a simple function and register a Python API as follows:
static Module Create(const std::string& path) {
std::ifstream filep;
filep.open(path, std::ios::in);
std::string graph_json;
std::string line;
while (std::getline(filep, line)) {
graph_json += line;
graph_json += “\n”;
}
filep.close();
auto n = tvm::runtime::make_object(graph_json);
return Module(n);
}

TVM_REGISTER_GLOBAL(“module.loadfile_examplejson”)
.set_body([](TVMArgs args, TVMRetValue* rv) {
rv = ExampleJsonModule::Create(args[0]);
});
It means users can manually write/modify an ExampleJSON file, and use Python API tvm.runtime.load_module(“mysubgraph.examplejson”, “examplejson”) to construct a customized module.
Summary
In summary, here is a checklist for you to refer:
? A codegen class derived from ExprVisitor and CodegenCBase (only for C codegen) with following functions.
o VisitExpr_(const CallNode
call) to collect call node information.
o Other visitor functions you needed to collect subgraph information.
o JIT to generate subgraph code.
o Register codegen.
? A function to create CSourceModule (for C codegen).
? A runtime module class derived from ModuleNode with following functions (for your graph representation).
o Constructor.
o GetFunction to generate a TVM runtime compatible PackedFunc.
o Run to execute a subgraph.
o Register a runtime creation API.
o SaveToBinary and LoadFromBinary to serialize/deserialize customized runtime module.
o Register LoadFromBinary API to support tvm.runtime.load_module(your_module_lib_path).
o (optional) Create to support customized runtime module construction from subgraph file in your representation.
? An annotator to annotate a user Relay program to make use of your compiler and runtime (TBA).

參考鏈接:
https://tvm.apache.org/docs/faq.html
https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html#relay-bring-your-own-codegen
https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html#tutorials-autotvm-sec
https://tvm.apache.org/docs/how_to/work_with_schedules/tensorize.html#tutorials-tensorize
https://www.jianshu.com/p/7a8d93522b07

總結(jié)

以上是生活随笔為你收集整理的TVM实现hardware backend的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。

人人爱人人爽 | 欧美大片aaa | 99久久精品午夜一区二区小说 | 日本精品久久久一区二区三区 | 99精品久久久久 | 久久久久久高清 | 国产一区网 | 毛片网站免费在线观看 | a久久久久 | 国产在线色站 | 亚洲精品乱码久久 | 久久精品欧美 | 日韩精品视频免费看 | 国产久草在线观看 | 97精品超碰一区二区三区 | 国产一区二区三区免费视频 | 久久成人精品视频 | 成年人网站免费观看 | 亚洲成免费| 在线观看你懂的网址 | 天天综合网天天综合色 | 久久免费视频一区 | 成人免费中文字幕 | 亚洲最大av网站 | 亚洲h视频在线 | 国产尤物在线视频 | 四虎成人精品永久免费av | 亚洲色图色| 欧美成人理伦片 | 久久久久久久久久久久久9999 | 国产在线精品国自产拍影院 | 一级黄色片在线免费观看 | 国产拍揄自揄精品视频麻豆 | 黄色一级在线视频 | 日日干夜夜草 | 日本久久99 | 香蕉在线视频观看 | 久久免费电影 | 人人干免费 | 伊人电影天堂 | 一区三区视频在线观看 | 国内精品久久久久久久久久久 | 久久视频这里有精品 | 欧美精品久久久久 | 午夜影院在线观看18 | 国产精品aⅴ | 欧洲高潮三级做爰 | 免费麻豆 | 丁香激情视频 | 久久人人爽人人爽人人片 | 日韩在线观看视频一区二区三区 | 久草视频免费观 | 国产精品免费视频一区二区 | 91精品啪| 久久精品久久精品久久39 | 久草视频中文在线 | 9999免费视频 | 欧美日韩在线观看一区 | 日韩欧美在线中文字幕 | 99精品国产99久久久久久福利 | 在线观看91 | 日韩精品久久中文字幕 | 一区二区三区日韩在线 | 99爱爱| 日韩资源在线 | 国产精品理论片在线观看 | 亚洲视频中文 | 日韩免费成人 | 中文字幕国内精品 | 国产免费亚洲高清 | 韩国视频一区二区三区 | 超碰精品在线观看 | 免费看搞黄视频网站 | 五月婷婷六月综合 | 成年性视频 | 福利精品在线 | 国产精品毛片一区 | 亚洲在线精品 | 国产成人精品午夜在线播放 | 偷拍精偷拍精品欧洲亚洲网站 | 国产午夜精品一区二区三区 | 久草在线在线 | 久久久午夜精品福利内容 | 在线观看av小说 | 在线影院av| 一区二区三区免费 | 国产精品入口传媒 | 天天天天色射综合 | 日本中文字幕在线看 | 91综合视频在线观看 | 久久人人爽爽人人爽人人片av | 中文在线√天堂 | 中文一区二区三区在线观看 | 国产成人l区 | 中文字幕激情 | av天天色 | 国产精品成人品 | 一区二区三区国产欧美 | 黄色软件视频大全免费下载 | 久久国产精品久久国产精品 | 在线观看成人 | 久草在线免费资源 | 婷婷激情网站 | 久草视频中文在线 | 久久精品视频播放 | 中文字幕一区二区三区在线观看 | 婷婷精品在线视频 | 一区二区三区四区精品视频 | 九色视频网| 欧美一级电影 | 国产一区二区久久久久 | 国产a精品 | 依人成人综合网 | 国产午夜不卡 | 91成人区| 夜夜操网站 | 精品v亚洲v欧美v高清v | 国产第一页在线播放 | 久久午夜剧场 | av免费网| 国产精品白丝jk白祙 | 日韩欧美综合 | 亚洲国产精品99久久久久久久久 | 免费亚洲片 | 国产黄色片久久 | 在线视频一二区 | 亚洲视频中文 | 激情综合网五月婷婷 | www最近高清中文国语在线观看 | 久久国产网 | av在线亚洲天堂 | 在线韩国电影免费观影完整版 | 97在线视频免费观看 | 黄色一级大片在线免费看产 | 98超碰在线 | 日韩电影一区二区三区在线观看 | 国产精品久久久久久久99 | 五月天丁香视频 | 亚洲毛片久久 | 免费黄色av. | 免费观看全黄做爰大片国产 | 国产不卡在线 | 天天色天天艹 | 国产91全国探花系列在线播放 | 日韩精品一区二区三区视频播放 | 欧美一级在线看 | 激情欧美在线观看 | 免费大片黄在线 | 免费人成在线观看 | av免费在线免费观看 | 国产成人99av超碰超爽 | 国产欧美日韩精品一区二区免费 | 久久国产精品电影 | 国产精品色在线 | 国产精品一区二区久久精品爱微奶 | 99婷婷狠狠成为人免费视频 | 久久精品视频在线观看免费 | 91精品国产乱码久久 | av电影在线观看完整版一区二区 | 久久久99精品免费观看app | 成人久久精品视频 | www.五月天色| 色香网| 久久成人一区二区 | 色婷婷激情综合 | 久久精品国产亚洲精品 | 亚洲一级电影在线观看 | 深夜国产福利 | 亚洲成av人片在线观看无 | 久久综合偷偷噜噜噜色 | 国产中文字幕在线免费观看 | 久久歪歪 | 亚洲最大的av网站 | 成人久久18免费网站麻豆 | 成人午夜电影免费在线观看 | 亚洲国产资源 | 99精品国产免费久久 | 亚洲综合视频在线 | 日韩久久久久久 | www.色五月| 欧美日韩一区二区三区免费视频 | 国产欧美高清 | www.超碰| 亚洲天堂免费视频 | 米奇四色影视 | 欧美综合干 | 97视频免费观看2区 亚洲视屏 | 综合网伊人 | 一级免费看 | 国产黄在线免费观看 | 久久久久久久久久免费 | 国产中文字幕av | 国产一区二区视频在线播放 | 91精品久久久久 | 国产成a人亚洲精v品在线观看 | 国产精品久久久久久一二三四五 | 97夜夜澡人人双人人人喊 | 欧美日韩精品在线 | 日韩欧美高清不卡 | 日韩久久久久久久久久久久 | 在线观看精品国产 | 免费看污网站 | 99久久这里只有精品 | 亚洲成人精品国产 | 97网在线观看 | 亚洲美女免费视频 | 国产精品免费av | 久久视频二区 | 黄色成人免费电影 | 国产成人精品在线 | 亚洲国产伊人 | 精品久久久久久亚洲 | 国产精品久久久久久一二三四五 | 亚洲第五色综合网 | 一区中文字幕电影 | 国产一区二区三区黄 | 99久久这里只有精品 | 日韩激情久久 | 欧美激情精品久久久久久 | 成人福利av | 日本中文字幕久久 | 欧美日韩久久不卡 | 欧美视频在线二区 | 欧美国产日韩一区二区三区 | 午夜视频在线观看一区二区三区 | 国产精品久久三 | 91麻豆精品 | 黄色大全在线观看 | 久草电影在线 | 亚洲jizzjizz日本少妇 | 又色又爽又激情的59视频 | 日韩视频免费在线观看 | 久久99影院 | 久久涩视频 | 久久成人免费电影 | 五月天亚洲婷婷 | 麻豆视频免费网站 | 久久小视频| 91精品在线免费观看视频 | 亚洲爱av| 国偷自产中文字幕亚洲手机在线 | 国产天天综合 | 欧美激情视频一区二区三区 | 亚洲成av人影院 | 在线看污网站 | www..com黄色片 | 91av手机在线 | 婷婷电影在线观看 | 高清国产午夜精品久久久久久 | 婷婷精品视频 | 美女网站视频久久 | 91精品在线免费观看视频 | 国产大尺度视频 | 国偷自产视频一区二区久 | 久久高清av | 特级毛片aaa | 亚洲精品午夜久久久 | 日产乱码一二三区别在线 | 在线视频一二三 | 蜜臀av网站| 欧美日韩国产精品久久 | 欧美性一级观看 | 亚洲午夜av久久乱码 | 欧美夫妻性生活电影 | 麻豆精品传媒视频 | www.超碰 | 91丨porny丨九色 | 97碰在线 | 91香蕉久久| 人人讲下载 | 日本黄色免费在线 | 麻豆va一区二区三区久久浪 | 97国产精品视频 | 午夜视频一区二区三区 | 中国一级片在线观看 | 四虎在线观看视频 | 国产精品高潮呻吟久久久久 | 亚洲国产激情 | 亚洲黄色成人网 | 久久最新 | 黄色动态图xx | 日韩高清不卡在线 | 国内精品久久久精品电影院 | 在线 国产一区 | 亚洲 中文 欧美 日韩vr 在线 | 精品视频成人 | 色多多视频在线 | 国产只有精品 | 国产精品观看视频 | 久久久国产精品视频 | 日韩精品中文字幕有码 | 91资源在线观看 | 中文字幕在线一区观看 | 亚洲三级黄 | 国产视频一区二区三区在线 | 国产精品v欧美精品v日韩 | 欧美精品一区在线 | 欧美99热 | 欧美91视频 | 午夜影院先 | 色婷婷综合在线 | 亚洲天堂网站视频 | 日韩精品字幕 | 日韩欧美视频在线播放 | 美女国产| 国产自在线 | 亚洲一区二区视频 | 精品视频专区 | 在线91播放 | 久久99国产精品自在自在app | 婷婷色 亚洲 | 日韩美女久久 | 久久免费看a级毛毛片 | 国产一级免费在线 | 久久久精品综合 | 国产人免费人成免费视频 | 国产精品久久久久久麻豆一区 | 国产精品 亚洲精品 | 成人小视频免费在线观看 | 亚洲午夜久久久久 | 91视频免费网站 | 国产系列精品av | 在线视频在线观看 | 日韩天堂在线观看 | 国产综合福利在线 | 日韩精品一区二区三区外面 | 日韩高清一 | 在线午夜电影神马影院 | 欧美一区二区三区不卡 | 国产成人免费网站 | 精品国产乱子伦一区二区 | 国产人成免费视频 | 亚州av网站 | 人人操日日干 | 久久麻豆视频 | 91最新视频| 韩国av免费在线观看 | 亚洲三级精品 | 992tv在线| 久久久久看片 | 黄色精品免费 | 国产中文字幕在线免费观看 | 免费看黄的 | 高清不卡毛片 | 激情久久影院 | 精品一区在线看 | 91传媒91久久久 | 免费观看mv大片高清 | 99操视频 | 最近最新中文字幕 | 日韩av免费观看网站 | 黄色特级一级片 | 成人va天堂| 国产精品欧美一区二区 | 国产又粗又硬又长又爽的视频 | 欧美精品久久久久性色 | 九九免费精品 | 九九九九热精品免费视频点播观看 | 午夜精品一区二区三区免费 | 天天操操操操操操 | 人人草人人做 | av免费电影网站 | 精品在线观看一区二区 | 国产精品一区二区三区四区在线观看 | 成人在线播放免费观看 | 欧美成人精品三级在线观看播放 | 婷婷国产在线观看 | 免费网站在线观看人 | 欧美va电影 | 天天插伊人 | 中文字幕在线观看的网站 | 成年人视频在线观看免费 | 色哟哟国产精品 | 在线观看韩日电影免费 | 亚洲精品 在线视频 | 日韩精品视频免费看 | 中文视频在线看 | 日韩三级视频 | 在线观看日本高清mv视频 | 成人日批视频 | 五月开心色 | 国产精品美女久久久久aⅴ 干干夜夜 | 国产视频97 | 91视频网址入口 | 日韩在线观看第一页 | 国产精品免费观看久久 | 日本中文字幕在线 | 激情六月婷婷久久 | 久久免费黄色网址 | 色综合 久久精品 | 国产97av| 手机看片久久 | 精品久久中文 | 日操操 | 美女视频久久 | 免费成人av在线看 | 国内久久久 | 色姑娘综合 | a视频免费 | 免费在线播放av电影 | 成人资源在线播放 | 尤物一区二区三区 | 免费中文字幕 | 亚洲精品成人网 | 在线观看国产区 | 久久蜜桃av | 成人网444ppp | 日韩在线国产精品 | 国产精品久久久网站 | 国产热re99久久6国产精品 | 亚洲成人av电影在线 | 超碰在线观看97 | 久久精品在线 | 午夜视频在线观看一区二区三区 | 亚洲午夜久久久久久久久久久 | 色在线国产 | 激情电影影院 | 人人狠狠综合久久亚洲婷 | 国产精品99久久久久久久久 | 91系列在线观看 | 精品久久美女 | 国产91亚洲精品 | 久色网| 色干综合 | 西西人体4444www高清视频 | 久久午夜电影网 | 国产日韩精品在线观看 | av中文字幕网址 | 国产在线国产 | 最近在线中文字幕 | 中文字幕成人在线观看 | 高潮久久久| 色综合天天狠天天透天天伊人 | 网站在线观看日韩 | 久久久久久不卡 | 51久久成人国产精品麻豆 | 麻豆影视在线播放 | 免费国产亚洲视频 | 日韩精品一区电影 | 粉嫩av一区二区三区入口 | 免费看的黄网站软件 | 综合av在线| 国产视频一区二区在线观看 | bayu135国产精品视频 | 狠狠狠狠狠狠狠干 | 国产精品久久久久久久久久免费 | 最近日本字幕mv免费观看在线 | 久久免费激情视频 | 日韩精品免费在线 | 国产精品成人自产拍在线观看 | 国产在线视频一区 | 久久久久网站 | 亚洲天堂色婷婷 | 欧美性色综合 | 国产裸体无遮挡 | 天天操天天干天天综合网 | 91资源在线播放 | 欧美aa一级| 欧美日本三级 | 国产亚洲在线观看 | 51久久夜色精品国产麻豆 | 天天操夜夜爱 | 99久久久久久国产精品 | 手机看片久久 | 国产小视频在线播放 | 亚洲黄色免费在线 | 色伊人网 | 狠狠干天天操 | 99热这里只有精品8 久久综合毛片 | 日韩高清dvd | 久久深爱网 | 亚洲精品女人久久久 | 综合网天天色 | 久久国产精品免费看 | 人人玩人人弄 | 国产亚洲精品女人久久久久久 | 91麻豆精品国产自产 | 国产成人精品一区二区三区网站观看 | 久久久久欧美精品 | 日本精品视频免费 | 在线观看免费高清视频大全追剧 | 蜜臀av性久久久久蜜臀aⅴ四虎 | 中文字幕日韩伦理 | 天堂av免费看 | 韩国三级一区 | 黄色小说视频网站 | 99久久久国产精品免费观看 | 国产成人av电影在线观看 | 久久久久欠精品国产毛片国产毛生 | 日韩中出在线 | 911香蕉视频 | 中文字幕av在线播放 | 久久观看最新视频 | 国产精品午夜免费福利视频 | 色婷婷婷 | 国产亚洲一区 | 91香蕉亚洲精品 | 久久一本综合 | 日韩精品免费在线 | 国产成人av网址 | 国产精品久久久久久久免费观看 | 成人免费视频在线观看 | www.久久视频 | 波多野结衣一区三区 | 亚州国产精品视频 | 欧美91片| 婷婷深爱 | 99精品视频免费 | 国产最新视频在线 | 国产成人福利片 | 中文字幕久久精品 | 日韩精品一区二区三区免费视频观看 | 麻豆91在线看 | 欧美色888| 二区精品视频 | 色综合天天狠天天透天天伊人 | 欧美网址在线观看 | 中文在线8资源库 | 免费网站看v片在线a | 久久视频免费 | 91麻豆文化传媒在线观看 | 国产手机视频 | 日韩视频精品在线 | 国产综合香蕉五月婷在线 | 午夜精品久久久久久久99婷婷 | 精品美女视频 | 国内精品久久久久久 | 2020天天干夜夜爽 | 国产在线观看午夜 | 美女网色 | 久草视频免费播放 | 日本特黄一级 | 一级成人免费视频 | 国产精品麻豆三级一区视频 | 黄色精品视频 | 99久久毛片 | 色亚洲网| 精品在线免费观看 | 黄色免费观看视频 | 日日操日日干 | av成人在线播放 | 九色琪琪久久综合网天天 | 日本黄色免费网站 | 亚洲国产高清视频 | 国产成人一级电影 | 国产精品乱看 | 91人人视频在线观看 | 国产精品永久免费视频 | 依人成人综合网 | 日韩在线免费视频观看 | www.久久色.com | 欧美精品一区二区三区一线天视频 | 日韩免费看的电影 | 午夜精品久久久久久久久久久 | 性色视频在线 | 91人人揉日日捏人人看 | 国产精品九九九九九九 | 美女视频久久久 | 国产高清免费视频 | 亚洲日本中文字幕在线观看 | 成人午夜剧场在线观看 | 亚洲激精日韩激精欧美精品 | 日韩在线精品视频 | 久久人人爽人人爽人人片av免费 | 中文字幕在线视频网站 | 黄色录像av| 丁香网婷婷| 精品不卡av | 激情视频综合网 | 国产区精品区 | 国产视频中文字幕在线观看 | 免费人人干 | 日日干影院 | 亚洲视频资源在线 | 国产亚洲精品久久久久秋 | 亚洲网站在线 | 日本高清xxxx | 亚洲人成免费 | 国产高清区 | 亚洲自拍偷拍色图 | 免费看片成年人 | 午夜精品久久久久 | 日韩高清在线看 | 欧美精品v国产精品 | 成人小视频在线 | 丁香五月亚洲综合在线 | 日韩a在线观看 | 国产又粗又长又硬免费视频 | 97麻豆视频| 中文字幕在线中文 | 色婷婷电影 | 超碰97国产精品人人cao | 亚洲 欧美 精品 | 中文综合在线 | 超级碰碰碰视频 | 久久久精品一区二区三区 | 狠狠干婷婷 | 亚洲伊人av | 福利视频区 | 在线免费试看 | 天天色天天操天天爽 | 国产精品久久久久久久久久久杏吧 | 国产黄色片免费观看 | 美女性爽视频国产免费app | 亚洲成aⅴ人片久久青草影院 | 日韩av中文在线 | 国产3p视频| 亚洲国产69| 亚洲色图美腿丝袜 | 日韩伦理片一区二区三区 | 五月天高清欧美mv | 五月天婷婷视频 | 日本3级在线观看 | 91麻豆精品国产91久久久使用方法 | 欧美一区三区四区 | 97人人澡人人爽人人模亚洲 | 开心激情网五月天 | 国产一级免费观看视频 | 精品久久久久久久久久久久久久久久 | 99久久精品国产一区二区成人 | 国产精品久久久久久久久免费 | 国产亚洲欧洲 | 国产精久久久久久妇女av | 在线免费观看麻豆视频 | 亚色视频在线观看 | 国产精品入口麻豆 | 日本韩国精品在线 | 天天操天天弄 | 日韩视频一区二区在线 | 91精品啪 | 超碰在线最新网址 | 日韩高清精品一区二区 | av爱干| 久久极品 | 五月天色综合 | 国产精品久久久久久久久久妇女 | 精品不卡av | 国产成人精品免费在线观看 | 精品视频在线看 | 国产中文字幕在线观看 | 精品国产99 | 国产成人一区二区三区影院在线 | 最近高清中文字幕在线国语5 | 久久午夜精品视频 | 青青河边草免费直播 | 中文字幕刺激在线 | 丁香婷婷综合色啪 | 日韩一区二区在线免费观看 | 夜夜摸夜夜爽 | 亚洲欧美怡红院 | av女优中文字幕在线观看 | 色久综合 | 深夜成人av | 夜夜视频 | 婷婷丁香国产 | 青春草视频在线播放 | 成年人在线免费看视频 | 天天做天天爱天天综合网 | 国产成人专区 | 国产精品淫 | 国产特级毛片 | 日韩激情一二三区 | 国产不卡精品视频 | 色久天 | 97精品一区 | 精品一区二区三区久久久 | av线上看| 婷婷视频 | av在线播放国产 | 亚洲高清视频在线观看免费 | 欧美精品国产综合久久 | 亚洲精品成人av在线 | 日韩高清一二三区 | 在线观看日韩精品 | 国产在线色站 | 91在线91| 欧美一级性 | 国产精品18久久久久久vr | 成人在线观看资源 | 日韩精品一区二区在线 | 91中文字幕在线播放 | 免费网站在线观看成人 | 麻豆一区二区三区视频 | 能在线看的av| 91麻豆精品国产 | 免费黄a大片| 中文字幕在线观看第一页 | 国产专区日韩专区 | 日韩在线国产精品 | 51精品国自产在线 | 最近中文字幕免费av | 国产乱视频 | 最近免费中文字幕 | 五月婷婷丁香综合 | 日韩精品欧美专区 | 国产精品1区2区3区在线观看 | 亚洲免费在线观看视频 | 国产精品一区二区中文字幕 | 天天天天射 | 精品a级片 | 成人观看视频 | 在线播放日韩 | 精品一区精品二区高清 | 在线观看一区 | 久久精品网站免费观看 | 精精国产xxxx视频在线播放 | 2023国产精品自产拍在线观看 | 国内免费的中文字幕 | 天天艹天天爽 | av午夜电影| 欧美国产高清 | bayu135国产精品视频 | 成年人免费看av | 人人射人人澡 | 日韩美在线 | 91久久精品一区 | 日韩精品一区二区三区丰满 | 美女视频一区 | 日韩高清一二三区 | 亚洲国产精品电影 | 成年人在线免费看视频 | 黄色的网站免费看 | 国产精品手机播放 | 亚洲女人天堂成人av在线 | 日韩免费在线观看视频 | 久久爱综合 | 亚洲国产精品va在线看 | 国产二区电影 | 中文字幕二区三区 | 99精品国产一区二区三区不卡 | 日韩精品一区在线播放 | 免费一级特黄录像 | 免费色av| 成人国产精品入口 | 午夜精品视频免费在线观看 | 久久av免费电影 | 日本精品一区二区 | 超黄视频网站 | 热久在线 | 日韩激情一二三区 | 久久久首页 | 99婷婷 | 国产一区二区播放 | 黄色免费网 | 黄色三级免费网址 | 少妇bbb搡bbbb搡bbbb | 少妇做爰k8经典 | 成人va在线观看 | www狠狠操| 97超碰人人澡人人 | jizz欧美性9 国产一区高清在线观看 | 久久视频一区二区 | 不卡的av中文字幕 | 欧美一级特黄aaaaaa大片在线观看 | 亚洲精品久久视频 | 成人一级视频在线观看 | 亚洲黄色片在线 | 久草久草久草久草 | 国产精品久久电影观看 | 国产精品久久久久久欧美 | 久久免费视频4 | 国产xx视频 | 国产精品欧美精品 | 中文字幕视频在线播放 | 蜜臀av性久久久久蜜臀aⅴ涩爱 | 久久精品一区二区三区国产主播 | 亚洲最新精品 | 亚洲性少妇性猛交wwww乱大交 | 欧美极品xxxx | 精品自拍av | 欧美夫妻生活视频 | 亚洲午夜精品一区二区三区电影院 | 激情深爱五月 | ,午夜性刺激免费看视频 | 亚洲片在线 | 一级特黄av | 日本视频不卡 | 亚洲激情影院 | 天天综合操 | 99精品国产一区二区三区不卡 | 亚洲精品ww| 中午字幕在线观看 | 在线亚洲欧美日韩 | 中文字幕美女免费在线 | 色资源中文字幕 | 午夜电影一区 | 性色av免费观看 | 国产中文字幕视频在线观看 | 免费色视频在线 | 91九色网址| 成人av在线影视 | 亚洲精品国产精品国自产观看 | 激情综合电影网 | 99成人在线视频 | 久久久久久久久电影 | av在线网站观看 | 中文字幕亚洲精品日韩 | 色丁香综合 | 欧美精品亚洲精品日韩精品 | av九九| 婷婷国产视频 | 中文字幕日本特黄aa毛片 | 久久激情影院 | 青青草国产成人99久久 | 久草网视频在线观看 | 91人人网| 欧美在线视频一区二区 | 免费看片亚洲 | 美女视频黄频大全免费 | 国产精品美女视频 | 亚洲精品国偷自产在线99热 | www.国产在线观看 | 激情五月五月婷婷 | 久久久久久久久精 | 中文字幕频道 | 欧美激情综合五月色丁香 | 免费视频a | 久久一精品 | 91干干干 | 久久99国产精品久久99 | 人人澡人摸人人添学生av | 一区二区三区在线免费观看视频 | 日韩精品大片 | 伊人国产在线观看 | 九九影视理伦片 | 四虎影视www | 久久久免费毛片 | 国产亚洲精品久久久久久移动网络 | 日韩理论 | 免费av试看| 日韩专区在线观看 | 美女国产精品 | 婷婷丁香自拍 | 欧美天天射 | 九九视频免费观看视频精品 | 日韩理论电影网 | 亚洲日本中文字幕在线观看 | 国产免费久久久久 | 日韩一区在线免费观看 | 91精品国产综合久久福利 | 国产精品青草综合久久久久99 | 欧美极度另类性三渗透 | 欧美性大胆 | 欧美一区二区三区激情视频 | 欧美亚洲精品一区 | 人人爽人人澡人人添人人人人 | 色天天久久 | 成人网444ppp | 四虎精品成人免费网站 | 中文字幕 国产精品 | 99国产精品久久久久老师 | 91免费视频国产 | 九九色视频 | 手机在线欧美 | 日韩免费看视频 | 正在播放 国产精品 | 日韩视频免费观看高清 | 国产日韩欧美视频 | 久久伦理| 激情综合网五月婷婷 | 免费在线观看毛片网站 | 欧美激情奇米色 | 欧美视频18| 国产 日韩 中文字幕 | 久久爱综合 | 国产精品密入口果冻 | 深爱激情五月综合 | 人人看看人人 | 九九九视频精品 | 毛片基地黄久久久久久天堂 | 亚洲欧洲国产视频 | 玖玖玖国产精品 | 国产美女精品久久久 | 麻豆成人精品 | 欧美日韩高清在线一区 | 99精品免费 | 久久国产视频网 | 免费国产在线精品 | 国产精品粉嫩 | 国产91亚洲 | 黄色美女免费网站 | 久久成人午夜视频 | 91精品国产欧美一区二区成人 | 国产一区精品在线观看 | 蜜臀久久99精品久久久久久网站 | 国产精品18久久久久久久久久久久 | 91视频亚洲 | 国产一区二区午夜 | 欧美一区二区三区在线 | 日韩在线观看电影 | 国产成人在线观看 | 麻豆久久久久久久 | 国产福利不卡视频 | 国产精品9区 | 一区二区三区av在线 | 99久在线精品99re8热视频 | 久久久.com| 91在线精品视频 | 国产最新视频在线 | 色婷婷综合久久久 | 国产成人精品久久久久蜜臀 | 在线亚洲人成电影网站色www | 天天综合网 天天 | 久草在线观| 国产精品久久久久久久久久三级 | 中文字幕在线播放一区二区 | 免费看wwwwwwwwwww的视频 久久久久久99精品 91中文字幕视频 | 精品一二三四五区 | 日韩一区二区免费视频 | 永久精品视频 | 青草视频在线 | www.xxxx变态.com| 国语精品视频 | 国产精品欧美精品 | 四虎在线观看精品视频 | 在线草 | 97免费| 中文字幕久久亚洲 | 天天干天天想 | 国产最新精品视频 | 久久在线免费 | 国产在线高清 | 丁香婷婷久久久综合精品国产 | 在线观看亚洲国产精品 | 国产999精品久久久久久绿帽 | 国产一区二区高清视频 | 日韩av电影中文字幕在线观看 | 欧美性猛片, | 免费在线观看av网址 | 亚洲精品色婷婷 | 精品在线视频一区二区三区 | 久久午夜国产 | 丁香综合 | 久久a国产| 久久国产精品电影 | 国产一二三四在线观看视频 | 日韩成人xxxx| 午夜性生活 | 中文字幕视频一区二区 | 欧美怡红院 | 日韩精品免费一区二区 | 最近中文国产在线视频 | 国产精品女主播一区二区三区 | 久久免费视频3 | 国产精彩视频一区二区 | 四虎永久视频 | 亚洲国产大片 | 国产精品一区在线观看 | 99婷婷狠狠成为人免费视频 | 免费看毛片网站 | 黄色在线成人 | 日韩理论| 久久久久亚洲精品 | 国产精品情侣视频 | 国产91影视 | 最新av免费在线观看 | 国产精品九九九九九 | 久草视频免费 | 超级碰碰碰碰 | 欧美一级在线 | 高清免费在线视频 | 三上悠亚一区二区在线观看 | 日韩综合精品 | 中文字幕高清 | 91麻豆国产福利在线观看 | 激情综合狠狠 | 精品久久久久一区二区国产 | 国内外成人在线 | 色播五月婷婷 | 99久久久久久久 | av三区在线 | 99视频网站 | 日韩精品国产一区 | 久久久免费视频播放 | 99视屏 | 国产一级精品视频 | 欧洲不卡av | 久久久99精品免费观看 | 日韩激情视频在线 | 91av官网| bbbbb女女女女女bbbbb国产 | 五月婷婷免费 | 97影视 | 高潮久久久久久 | 婷婷伊人综合 | 久久精品导航 | 亚洲1区 在线 | 午夜精品视频免费在线观看 | 91成人免费 | 日韩一二三在线 | 九九热免费精品视频 | 欧美大香线蕉线伊人久久 | 久久精品国产免费观看 | 久久久久国产一区二区三区四区 | 日本中文字幕网站 | 精品96久久久久久中文字幕无 | 久久草网站 | 色老板在线视频 |