日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 人文社科 > 生活经验 >内容正文

生活经验

Relay外部库使用

發布時間:2023/11/28 生活经验 82 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Relay外部库使用 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Relay外部庫使用
本文介紹如何將cuDNN或cuBLAS等外部庫與Relay一起使用。
Relay內部使用TVM生成目標特定的代碼。例如,使用cuda后端,TVM為用戶提供的網絡中的所有層生成cuda內核。有時將各種供應商開發的外部庫合并到Relay中也很有幫助。幸運的是,TVM具有透明地調用這些庫的機制。對于Relay用戶,要做的只是適當地設置目標字符串。
在可以使用Relay的外部庫之前,TVM必須與要使用的庫一起構建。例如,要使用cuDNN,需要啟用cmake / config.cmake中的USE_CUDNN選項,并在必要時指定cuDNN include和庫目錄。
首先,導入Relay和TVM。
import tvm
from tvm import te
import numpy as np
from tvm.contrib import graph_runtime as runtime
from tvm import relay
from tvm.relay import testing
import tvm.testing
創建一個簡單的網絡
創建一個非常簡單的網絡進行演示。由卷積,批處理歸一化和ReLU激活組成。
out_channels = 16
batch_size = 1

data = relay.var(“data”, relay.TensorType((batch_size, 3, 224, 224), “float32”))
weight = relay.var(“weight”)
bn_gamma = relay.var(“bn_gamma”)
bn_beta = relay.var(“bn_beta”)
bn_mmean = relay.var(“bn_mean”)
bn_mvar = relay.var(“bn_var”)

simple_net = relay.nn.conv2d(
data=data, weight=weight, kernel_size=(3, 3), channels=out_channels, padding=(1, 1)
)
simple_net = relay.nn.batch_norm(simple_net, bn_gamma, bn_beta, bn_mmean, bn_mvar)[0]
simple_net = relay.nn.relu(simple_net)
simple_net = relay.Function(relay.analysis.free_vars(simple_net), simple_net)

data_shape = (batch_size, 3, 224, 224)
net, params = testing.create_workload(simple_net)
使用cuda后端構建并運行
像往常一樣,使用cuda后端構建并運行此網絡。通過將日志記錄級別設置為DEBUG,Relay圖編譯的結果將作為偽代碼轉儲。
import logging

logging.basicConfig(level=logging.DEBUG) # to dump TVM IR after fusion

target = “cuda”
lib = relay.build_module.build(net, target, params=params)

ctx = tvm.context(target, 0)
data = np.random.uniform(-1, 1, size=data_shape).astype(“float32”)
module = runtime.GraphModule(lib"default")
module.set_input(“data”, data)
module.run()
out_shape = (batch_size, out_channels, 224, 224)
out = module.get_output(0, tvm.nd.empty(out_shape))
out_cuda = out.asnumpy()
生成的偽代碼應如下所示。注意如何將偏差添加,批處理規范化和ReLU激活融合到卷積內核中。TVM根據此表示生成單個融合內核。
produce tensor {
// attr [iter_var(blockIdx.z, , blockIdx.z)] thread_extent = 1
// attr [compute] storage_scope = “local”
allocate compute[float32 * 32]
// attr [pad_temp.shared] storage_scope = “shared”
allocate pad_temp.shared[float32 * 180]
// attr [placeholder.shared] storage_scope = “shared”
allocate placeholder.shared[float32 * 144]
// attr [iter_var(blockIdx.y, , blockIdx.y)] thread_extent = 28
// attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 14
// attr [iter_var(threadIdx.z, , threadIdx.z)] thread_extent = 4
// attr [iter_var(threadIdx.y, , threadIdx.y)] thread_extent = 1
// attr [iter_var(threadIdx.x, , threadIdx.x)] thread_extent = 16
produce compute {
compute[0] = 0.000000f
compute[1] = 0.000000f
compute[2] = 0.000000f
compute[3] = 0.000000f
compute[4] = 0.000000f
compute[5] = 0.000000f
compute[6] = 0.000000f
compute[7] = 0.000000f
compute[8] = 0.000000f
compute[9] = 0.000000f
compute[10] = 0.000000f
compute[11] = 0.000000f
compute[12] = 0.000000f
compute[13] = 0.000000f
compute[14] = 0.000000f
compute[15] = 0.000000f
compute[16] = 0.000000f
compute[17] = 0.000000f
compute[18] = 0.000000f
compute[19] = 0.000000f
compute[20] = 0.000000f
compute[21] = 0.000000f
compute[22] = 0.000000f
compute[23] = 0.000000f
compute[24] = 0.000000f
compute[25] = 0.000000f
compute[26] = 0.000000f
compute[27] = 0.000000f
compute[28] = 0.000000f
compute[29] = 0.000000f
compute[30] = 0.000000f
compute[31] = 0.000000f
for (rc.outer, 0, 3) {
produce pad_temp.shared {
// attr [iter_var(threadIdx.z, , threadIdx.z)] thread_extent = 4
// attr [iter_var(threadIdx.y, , threadIdx.y)] thread_extent = 1
// attr [iter_var(threadIdx.x, , threadIdx.x)] thread_extent = 16
if (likely(((threadIdx.z15) < (60 - threadIdx.x)))) {
if (likely((threadIdx.x < 15))) {
pad_temp.shared[(((((threadIdx.z
15) + threadIdx.x)/60)180) + ((((((threadIdx.z15) + threadIdx.x)/6) % 10)18) + ((((threadIdx.z3) + threadIdx.x)3) % 18)))] = tvm_if_then_else((((((1 - ((((threadIdx.z15) + threadIdx.x)/6) % 10)) <= (blockIdx.y8)) && ((blockIdx.y8) < (225 - ((((threadIdx.z15) + threadIdx.x)/6) % 10)))) && ((1 - ((((threadIdx.z3) + threadIdx.x)3) % 18)) <= (blockIdx.x16))) && ((blockIdx.x16) < (225 - ((((threadIdx.z3) + threadIdx.x)3) % 18)))), placeholder[((((((((blockIdx.y112) + blockIdx.x) + (rc.outer3136)) + ((((threadIdx.z15) + threadIdx.x)/60)9408))16) + ((((threadIdx.z3) + threadIdx.x)3) % 18)) + (((((threadIdx.z15) + threadIdx.x)/6) % 10)224)) + -225)], 0.000000f)
pad_temp.shared[(((((((threadIdx.z
15) + threadIdx.x)3) + 1)/180)180) + ((((((((threadIdx.z15) + threadIdx.x)3) + 1)/18) % 10)18) + (((((threadIdx.z3) + threadIdx.x)3) + 1) % 18)))] = tvm_if_then_else((((((1 - ((((((threadIdx.z15) + threadIdx.x)3) + 1)/18) % 10)) <= (blockIdx.y8)) && ((blockIdx.y8) < (225 - ((((((threadIdx.z15) + threadIdx.x)3) + 1)/18) % 10)))) && ((1 - (((((threadIdx.z3) + threadIdx.x)3) + 1) % 18)) <= (blockIdx.x16))) && ((blockIdx.x
16) < (225 - (((((threadIdx.z3) + threadIdx.x)3) + 1) % 18)))), placeholder[((((((((blockIdx.y112) + blockIdx.x) + (rc.outer3136)) + ((((((threadIdx.z15) + threadIdx.x)3) + 1)/180)9408))16) + (((((threadIdx.z3) + threadIdx.x)3) + 1) % 18)) + (((((((threadIdx.z15) + threadIdx.x)3) + 1)/18) % 10)224)) + -225)], 0.000000f)
pad_temp.shared[(((((((threadIdx.z
15) + threadIdx.x)3) + 2)/180)180) + ((((((((threadIdx.z15) + threadIdx.x)3) + 2)/18) % 10)18) + (((((threadIdx.z3) + threadIdx.x)3) + 2) % 18)))] = tvm_if_then_else((((((1 - ((((((threadIdx.z15) + threadIdx.x)3) + 2)/18) % 10)) <= (blockIdx.y8)) && ((blockIdx.y8) < (225 - ((((((threadIdx.z15) + threadIdx.x)3) + 2)/18) % 10)))) && ((1 - (((((threadIdx.z3) + threadIdx.x)3) + 2) % 18)) <= (blockIdx.x16))) && ((blockIdx.x
16) < (225 - (((((threadIdx.z
3) + threadIdx.x)3) + 2) % 18)))), placeholder[((((((((blockIdx.y112) + blockIdx.x) + (rc.outer
3136)) + ((((((threadIdx.z
15) + threadIdx.x)3) + 2)/180)9408))16) + (((((threadIdx.z3) + threadIdx.x)3) + 2) % 18)) + (((((((threadIdx.z15) + threadIdx.x)3) + 2)/18) % 10)224)) + -225)], 0.000000f)
}
}
}
produce placeholder.shared {
// attr [iter_var(threadIdx.z, , threadIdx.z)] thread_extent = 4
// attr [iter_var(threadIdx.y, , threadIdx.y)] thread_extent = 1
// attr [iter_var(threadIdx.x, , threadIdx.x)] thread_extent = 16
if (likely(((threadIdx.z
4) < (16 - (threadIdx.x/3))))) {
if (likely(((threadIdx.z
12) < (48 - threadIdx.x)))) {
if (likely((threadIdx.x < 12))) {
placeholder.shared[(((((threadIdx.z
4) + (threadIdx.x/3))3) + (threadIdx.x % 3))3)] = placeholder[(((((rc.outer + (threadIdx.z12)) + ((threadIdx.x/3)3))3) + (threadIdx.x % 3))3)]
placeholder.shared[((((((threadIdx.z
4) + (threadIdx.x/3))3) + (threadIdx.x % 3))3) + 1)] = placeholder[((((((rc.outer + (threadIdx.z12)) + ((threadIdx.x/3)3))3) + (threadIdx.x % 3))3) + 1)]
placeholder.shared[((((((threadIdx.z
4) + (threadIdx.x/3))3) + (threadIdx.x % 3))3) + 2)] = placeholder[((((((rc.outer + (threadIdx.z12)) + ((threadIdx.x/3)3))3) + (threadIdx.x % 3))3) + 2)]
}
}
}
}
compute[0] = (compute[0] + (pad_temp.shared[threadIdx.x]placeholder.shared[(threadIdx.z36)]))
compute[1] = (compute[1] + (pad_temp.shared[(threadIdx.x + 18)]placeholder.shared[(threadIdx.z36)]))
compute[2] = (compute[2] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[(threadIdx.z36)]))
compute[3] = (compute[3] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[(threadIdx.z36)]))
compute[4] = (compute[4] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[(threadIdx.z36)]))
compute[5] = (compute[5] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[(threadIdx.z36)]))
compute[6] = (compute[6] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[(threadIdx.z36)]))
compute[7] = (compute[7] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[(threadIdx.z36)]))
compute[8] = (compute[8] + (pad_temp.shared[threadIdx.x]placeholder.shared[((threadIdx.z36) + 9)]))
compute[9] = (compute[9] + (pad_temp.shared[(threadIdx.x + 18)]placeholder.shared[((threadIdx.z36) + 9)]))
compute[10] = (compute[10] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[((threadIdx.z36) + 9)]))
compute[11] = (compute[11] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[((threadIdx.z36) + 9)]))
compute[12] = (compute[12] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[((threadIdx.z36) + 9)]))
compute[13] = (compute[13] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[((threadIdx.z36) + 9)]))
compute[14] = (compute[14] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[((threadIdx.z36) + 9)]))
compute[15] = (compute[15] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[((threadIdx.z36) + 9)]))
compute[16] = (compute[16] + (pad_temp.shared[threadIdx.x]placeholder.shared[((threadIdx.z36) + 18)]))
compute[17] = (compute[17] + (pad_temp.shared[(threadIdx.x + 18)]placeholder.shared[((threadIdx.z36) + 18)]))
compute[18] = (compute[18] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[((threadIdx.z36) + 18)]))
compute[19] = (compute[19] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[((threadIdx.z36) + 18)]))
compute[20] = (compute[20] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[((threadIdx.z36) + 18)]))
compute[21] = (compute[21] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[((threadIdx.z36) + 18)]))
compute[22] = (compute[22] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[((threadIdx.z36) + 18)]))
compute[23] = (compute[23] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[((threadIdx.z36) + 18)]))
compute[24] = (compute[24] + (pad_temp.shared[threadIdx.x]placeholder.shared[((threadIdx.z36) + 27)]))
compute[25] = (compute[25] + (pad_temp.shared[(threadIdx.x + 18)]placeholder.shared[((threadIdx.z36) + 27)]))
compute[26] = (compute[26] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[((threadIdx.z36) + 27)]))
compute[27] = (compute[27] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[((threadIdx.z36) + 27)]))
compute[28] = (compute[28] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[((threadIdx.z36) + 27)]))
compute[29] = (compute[29] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[((threadIdx.z36) + 27)]))
compute[30] = (compute[30] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[((threadIdx.z36) + 27)]))
compute[31] = (compute[31] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[((threadIdx.z36) + 27)]))
compute[0] = (compute[0] + (pad_temp.shared[(threadIdx.x + 1)]placeholder.shared[((threadIdx.z36) + 1)]))
compute[1] = (compute[1] + (pad_temp.shared[(threadIdx.x + 19)]placeholder.shared[((threadIdx.z36) + 1)]))
compute[2] = (compute[2] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 1)]))
compute[3] = (compute[3] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 1)]))
compute[4] = (compute[4] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 1)]))
compute[5] = (compute[5] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 1)]))
compute[6] = (compute[6] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 1)]))
compute[7] = (compute[7] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 1)]))
compute[8] = (compute[8] + (pad_temp.shared[(threadIdx.x + 1)]placeholder.shared[((threadIdx.z36) + 10)]))
compute[9] = (compute[9] + (pad_temp.shared[(threadIdx.x + 19)]placeholder.shared[((threadIdx.z36) + 10)]))
compute[10] = (compute[10] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 10)]))
compute[11] = (compute[11] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 10)]))
compute[12] = (compute[12] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 10)]))
compute[13] = (compute[13] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 10)]))
compute[14] = (compute[14] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 10)]))
compute[15] = (compute[15] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 10)]))
compute[16] = (compute[16] + (pad_temp.shared[(threadIdx.x + 1)]placeholder.shared[((threadIdx.z36) + 19)]))
compute[17] = (compute[17] + (pad_temp.shared[(threadIdx.x + 19)]placeholder.shared[((threadIdx.z36) + 19)]))
compute[18] = (compute[18] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 19)]))
compute[19] = (compute[19] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 19)]))
compute[20] = (compute[20] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 19)]))
compute[21] = (compute[21] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 19)]))
compute[22] = (compute[22] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 19)]))
compute[23] = (compute[23] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 19)]))
compute[24] = (compute[24] + (pad_temp.shared[(threadIdx.x + 1)]placeholder.shared[((threadIdx.z36) + 28)]))
compute[25] = (compute[25] + (pad_temp.shared[(threadIdx.x + 19)]placeholder.shared[((threadIdx.z36) + 28)]))
compute[26] = (compute[26] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 28)]))
compute[27] = (compute[27] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 28)]))
compute[28] = (compute[28] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 28)]))
compute[29] = (compute[29] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 28)]))
compute[30] = (compute[30] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 28)]))
compute[31] = (compute[31] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 28)]))
compute[0] = (compute[0] + (pad_temp.shared[(threadIdx.x + 2)]placeholder.shared[((threadIdx.z36) + 2)]))
compute[1] = (compute[1] + (pad_temp.shared[(threadIdx.x + 20)]placeholder.shared[((threadIdx.z36) + 2)]))
compute[2] = (compute[2] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 2)]))
compute[3] = (compute[3] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 2)]))
compute[4] = (compute[4] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 2)]))
compute[5] = (compute[5] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 2)]))
compute[6] = (compute[6] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 2)]))
compute[7] = (compute[7] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 2)]))
compute[8] = (compute[8] + (pad_temp.shared[(threadIdx.x + 2)]placeholder.shared[((threadIdx.z36) + 11)]))
compute[9] = (compute[9] + (pad_temp.shared[(threadIdx.x + 20)]placeholder.shared[((threadIdx.z36) + 11)]))
compute[10] = (compute[10] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 11)]))
compute[11] = (compute[11] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 11)]))
compute[12] = (compute[12] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 11)]))
compute[13] = (compute[13] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 11)]))
compute[14] = (compute[14] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 11)]))
compute[15] = (compute[15] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 11)]))
compute[16] = (compute[16] + (pad_temp.shared[(threadIdx.x + 2)]placeholder.shared[((threadIdx.z36) + 20)]))
compute[17] = (compute[17] + (pad_temp.shared[(threadIdx.x + 20)]placeholder.shared[((threadIdx.z36) + 20)]))
compute[18] = (compute[18] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 20)]))
compute[19] = (compute[19] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 20)]))
compute[20] = (compute[20] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 20)]))
compute[21] = (compute[21] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 20)]))
compute[22] = (compute[22] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 20)]))
compute[23] = (compute[23] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 20)]))
compute[24] = (compute[24] + (pad_temp.shared[(threadIdx.x + 2)]placeholder.shared[((threadIdx.z36) + 29)]))
compute[25] = (compute[25] + (pad_temp.shared[(threadIdx.x + 20)]placeholder.shared[((threadIdx.z36) + 29)]))
compute[26] = (compute[26] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 29)]))
compute[27] = (compute[27] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 29)]))
compute[28] = (compute[28] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 29)]))
compute[29] = (compute[29] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 29)]))
compute[30] = (compute[30] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 29)]))
compute[31] = (compute[31] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 29)]))
compute[0] = (compute[0] + (pad_temp.shared[(threadIdx.x + 18)]placeholder.shared[((threadIdx.z36) + 3)]))
compute[1] = (compute[1] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[((threadIdx.z36) + 3)]))
compute[2] = (compute[2] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[((threadIdx.z36) + 3)]))
compute[3] = (compute[3] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[((threadIdx.z36) + 3)]))
compute[4] = (compute[4] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[((threadIdx.z36) + 3)]))
compute[5] = (compute[5] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[((threadIdx.z36) + 3)]))
compute[6] = (compute[6] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[((threadIdx.z36) + 3)]))
compute[7] = (compute[7] + (pad_temp.shared[(threadIdx.x + 144)]placeholder.shared[((threadIdx.z36) + 3)]))
compute[8] = (compute[8] + (pad_temp.shared[(threadIdx.x + 18)]placeholder.shared[((threadIdx.z36) + 12)]))
compute[9] = (compute[9] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[((threadIdx.z36) + 12)]))
compute[10] = (compute[10] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[((threadIdx.z36) + 12)]))
compute[11] = (compute[11] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[((threadIdx.z36) + 12)]))
compute[12] = (compute[12] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[((threadIdx.z36) + 12)]))
compute[13] = (compute[13] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[((threadIdx.z36) + 12)]))
compute[14] = (compute[14] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[((threadIdx.z36) + 12)]))
compute[15] = (compute[15] + (pad_temp.shared[(threadIdx.x + 144)]placeholder.shared[((threadIdx.z36) + 12)]))
compute[16] = (compute[16] + (pad_temp.shared[(threadIdx.x + 18)]placeholder.shared[((threadIdx.z36) + 21)]))
compute[17] = (compute[17] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[((threadIdx.z36) + 21)]))
compute[18] = (compute[18] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[((threadIdx.z36) + 21)]))
compute[19] = (compute[19] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[((threadIdx.z36) + 21)]))
compute[20] = (compute[20] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[((threadIdx.z36) + 21)]))
compute[21] = (compute[21] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[((threadIdx.z36) + 21)]))
compute[22] = (compute[22] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[((threadIdx.z36) + 21)]))
compute[23] = (compute[23] + (pad_temp.shared[(threadIdx.x + 144)]placeholder.shared[((threadIdx.z36) + 21)]))
compute[24] = (compute[24] + (pad_temp.shared[(threadIdx.x + 18)]placeholder.shared[((threadIdx.z36) + 30)]))
compute[25] = (compute[25] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[((threadIdx.z36) + 30)]))
compute[26] = (compute[26] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[((threadIdx.z36) + 30)]))
compute[27] = (compute[27] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[((threadIdx.z36) + 30)]))
compute[28] = (compute[28] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[((threadIdx.z36) + 30)]))
compute[29] = (compute[29] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[((threadIdx.z36) + 30)]))
compute[30] = (compute[30] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[((threadIdx.z36) + 30)]))
compute[31] = (compute[31] + (pad_temp.shared[(threadIdx.x + 144)]placeholder.shared[((threadIdx.z36) + 30)]))
compute[0] = (compute[0] + (pad_temp.shared[(threadIdx.x + 19)]placeholder.shared[((threadIdx.z36) + 4)]))
compute[1] = (compute[1] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 4)]))
compute[2] = (compute[2] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 4)]))
compute[3] = (compute[3] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 4)]))
compute[4] = (compute[4] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 4)]))
compute[5] = (compute[5] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 4)]))
compute[6] = (compute[6] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 4)]))
compute[7] = (compute[7] + (pad_temp.shared[(threadIdx.x + 145)]placeholder.shared[((threadIdx.z36) + 4)]))
compute[8] = (compute[8] + (pad_temp.shared[(threadIdx.x + 19)]placeholder.shared[((threadIdx.z36) + 13)]))
compute[9] = (compute[9] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 13)]))
compute[10] = (compute[10] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 13)]))
compute[11] = (compute[11] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 13)]))
compute[12] = (compute[12] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 13)]))
compute[13] = (compute[13] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 13)]))
compute[14] = (compute[14] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 13)]))
compute[15] = (compute[15] + (pad_temp.shared[(threadIdx.x + 145)]placeholder.shared[((threadIdx.z36) + 13)]))
compute[16] = (compute[16] + (pad_temp.shared[(threadIdx.x + 19)]placeholder.shared[((threadIdx.z36) + 22)]))
compute[17] = (compute[17] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 22)]))
compute[18] = (compute[18] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 22)]))
compute[19] = (compute[19] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 22)]))
compute[20] = (compute[20] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 22)]))
compute[21] = (compute[21] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 22)]))
compute[22] = (compute[22] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 22)]))
compute[23] = (compute[23] + (pad_temp.shared[(threadIdx.x + 145)]placeholder.shared[((threadIdx.z36) + 22)]))
compute[24] = (compute[24] + (pad_temp.shared[(threadIdx.x + 19)]placeholder.shared[((threadIdx.z36) + 31)]))
compute[25] = (compute[25] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 31)]))
compute[26] = (compute[26] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 31)]))
compute[27] = (compute[27] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 31)]))
compute[28] = (compute[28] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 31)]))
compute[29] = (compute[29] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 31)]))
compute[30] = (compute[30] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 31)]))
compute[31] = (compute[31] + (pad_temp.shared[(threadIdx.x + 145)]placeholder.shared[((threadIdx.z36) + 31)]))
compute[0] = (compute[0] + (pad_temp.shared[(threadIdx.x + 20)]placeholder.shared[((threadIdx.z36) + 5)]))
compute[1] = (compute[1] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 5)]))
compute[2] = (compute[2] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 5)]))
compute[3] = (compute[3] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 5)]))
compute[4] = (compute[4] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 5)]))
compute[5] = (compute[5] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 5)]))
compute[6] = (compute[6] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 5)]))
compute[7] = (compute[7] + (pad_temp.shared[(threadIdx.x + 146)]placeholder.shared[((threadIdx.z36) + 5)]))
compute[8] = (compute[8] + (pad_temp.shared[(threadIdx.x + 20)]placeholder.shared[((threadIdx.z36) + 14)]))
compute[9] = (compute[9] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 14)]))
compute[10] = (compute[10] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 14)]))
compute[11] = (compute[11] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 14)]))
compute[12] = (compute[12] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 14)]))
compute[13] = (compute[13] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 14)]))
compute[14] = (compute[14] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 14)]))
compute[15] = (compute[15] + (pad_temp.shared[(threadIdx.x + 146)]placeholder.shared[((threadIdx.z36) + 14)]))
compute[16] = (compute[16] + (pad_temp.shared[(threadIdx.x + 20)]placeholder.shared[((threadIdx.z36) + 23)]))
compute[17] = (compute[17] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 23)]))
compute[18] = (compute[18] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 23)]))
compute[19] = (compute[19] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 23)]))
compute[20] = (compute[20] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 23)]))
compute[21] = (compute[21] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 23)]))
compute[22] = (compute[22] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 23)]))
compute[23] = (compute[23] + (pad_temp.shared[(threadIdx.x + 146)]placeholder.shared[((threadIdx.z36) + 23)]))
compute[24] = (compute[24] + (pad_temp.shared[(threadIdx.x + 20)]placeholder.shared[((threadIdx.z36) + 32)]))
compute[25] = (compute[25] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 32)]))
compute[26] = (compute[26] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 32)]))
compute[27] = (compute[27] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 32)]))
compute[28] = (compute[28] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 32)]))
compute[29] = (compute[29] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 32)]))
compute[30] = (compute[30] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 32)]))
compute[31] = (compute[31] + (pad_temp.shared[(threadIdx.x + 146)]placeholder.shared[((threadIdx.z36) + 32)]))
compute[0] = (compute[0] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[((threadIdx.z36) + 6)]))
compute[1] = (compute[1] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[((threadIdx.z36) + 6)]))
compute[2] = (compute[2] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[((threadIdx.z36) + 6)]))
compute[3] = (compute[3] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[((threadIdx.z36) + 6)]))
compute[4] = (compute[4] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[((threadIdx.z36) + 6)]))
compute[5] = (compute[5] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[((threadIdx.z36) + 6)]))
compute[6] = (compute[6] + (pad_temp.shared[(threadIdx.x + 144)]placeholder.shared[((threadIdx.z36) + 6)]))
compute[7] = (compute[7] + (pad_temp.shared[(threadIdx.x + 162)]placeholder.shared[((threadIdx.z36) + 6)]))
compute[8] = (compute[8] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[((threadIdx.z36) + 15)]))
compute[9] = (compute[9] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[((threadIdx.z36) + 15)]))
compute[10] = (compute[10] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[((threadIdx.z36) + 15)]))
compute[11] = (compute[11] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[((threadIdx.z36) + 15)]))
compute[12] = (compute[12] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[((threadIdx.z36) + 15)]))
compute[13] = (compute[13] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[((threadIdx.z36) + 15)]))
compute[14] = (compute[14] + (pad_temp.shared[(threadIdx.x + 144)]placeholder.shared[((threadIdx.z36) + 15)]))
compute[15] = (compute[15] + (pad_temp.shared[(threadIdx.x + 162)]placeholder.shared[((threadIdx.z36) + 15)]))
compute[16] = (compute[16] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[((threadIdx.z36) + 24)]))
compute[17] = (compute[17] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[((threadIdx.z36) + 24)]))
compute[18] = (compute[18] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[((threadIdx.z36) + 24)]))
compute[19] = (compute[19] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[((threadIdx.z36) + 24)]))
compute[20] = (compute[20] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[((threadIdx.z36) + 24)]))
compute[21] = (compute[21] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[((threadIdx.z36) + 24)]))
compute[22] = (compute[22] + (pad_temp.shared[(threadIdx.x + 144)]placeholder.shared[((threadIdx.z36) + 24)]))
compute[23] = (compute[23] + (pad_temp.shared[(threadIdx.x + 162)]placeholder.shared[((threadIdx.z36) + 24)]))
compute[24] = (compute[24] + (pad_temp.shared[(threadIdx.x + 36)]placeholder.shared[((threadIdx.z36) + 33)]))
compute[25] = (compute[25] + (pad_temp.shared[(threadIdx.x + 54)]placeholder.shared[((threadIdx.z36) + 33)]))
compute[26] = (compute[26] + (pad_temp.shared[(threadIdx.x + 72)]placeholder.shared[((threadIdx.z36) + 33)]))
compute[27] = (compute[27] + (pad_temp.shared[(threadIdx.x + 90)]placeholder.shared[((threadIdx.z36) + 33)]))
compute[28] = (compute[28] + (pad_temp.shared[(threadIdx.x + 108)]placeholder.shared[((threadIdx.z36) + 33)]))
compute[29] = (compute[29] + (pad_temp.shared[(threadIdx.x + 126)]placeholder.shared[((threadIdx.z36) + 33)]))
compute[30] = (compute[30] + (pad_temp.shared[(threadIdx.x + 144)]placeholder.shared[((threadIdx.z36) + 33)]))
compute[31] = (compute[31] + (pad_temp.shared[(threadIdx.x + 162)]placeholder.shared[((threadIdx.z36) + 33)]))
compute[0] = (compute[0] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 7)]))
compute[1] = (compute[1] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 7)]))
compute[2] = (compute[2] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 7)]))
compute[3] = (compute[3] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 7)]))
compute[4] = (compute[4] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 7)]))
compute[5] = (compute[5] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 7)]))
compute[6] = (compute[6] + (pad_temp.shared[(threadIdx.x + 145)]placeholder.shared[((threadIdx.z36) + 7)]))
compute[7] = (compute[7] + (pad_temp.shared[(threadIdx.x + 163)]placeholder.shared[((threadIdx.z36) + 7)]))
compute[8] = (compute[8] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 16)]))
compute[9] = (compute[9] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 16)]))
compute[10] = (compute[10] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 16)]))
compute[11] = (compute[11] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 16)]))
compute[12] = (compute[12] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 16)]))
compute[13] = (compute[13] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 16)]))
compute[14] = (compute[14] + (pad_temp.shared[(threadIdx.x + 145)]placeholder.shared[((threadIdx.z36) + 16)]))
compute[15] = (compute[15] + (pad_temp.shared[(threadIdx.x + 163)]placeholder.shared[((threadIdx.z36) + 16)]))
compute[16] = (compute[16] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 25)]))
compute[17] = (compute[17] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 25)]))
compute[18] = (compute[18] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 25)]))
compute[19] = (compute[19] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 25)]))
compute[20] = (compute[20] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 25)]))
compute[21] = (compute[21] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 25)]))
compute[22] = (compute[22] + (pad_temp.shared[(threadIdx.x + 145)]placeholder.shared[((threadIdx.z36) + 25)]))
compute[23] = (compute[23] + (pad_temp.shared[(threadIdx.x + 163)]placeholder.shared[((threadIdx.z36) + 25)]))
compute[24] = (compute[24] + (pad_temp.shared[(threadIdx.x + 37)]placeholder.shared[((threadIdx.z36) + 34)]))
compute[25] = (compute[25] + (pad_temp.shared[(threadIdx.x + 55)]placeholder.shared[((threadIdx.z36) + 34)]))
compute[26] = (compute[26] + (pad_temp.shared[(threadIdx.x + 73)]placeholder.shared[((threadIdx.z36) + 34)]))
compute[27] = (compute[27] + (pad_temp.shared[(threadIdx.x + 91)]placeholder.shared[((threadIdx.z36) + 34)]))
compute[28] = (compute[28] + (pad_temp.shared[(threadIdx.x + 109)]placeholder.shared[((threadIdx.z36) + 34)]))
compute[29] = (compute[29] + (pad_temp.shared[(threadIdx.x + 127)]placeholder.shared[((threadIdx.z36) + 34)]))
compute[30] = (compute[30] + (pad_temp.shared[(threadIdx.x + 145)]placeholder.shared[((threadIdx.z36) + 34)]))
compute[31] = (compute[31] + (pad_temp.shared[(threadIdx.x + 163)]placeholder.shared[((threadIdx.z36) + 34)]))
compute[0] = (compute[0] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 8)]))
compute[1] = (compute[1] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 8)]))
compute[2] = (compute[2] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 8)]))
compute[3] = (compute[3] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 8)]))
compute[4] = (compute[4] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 8)]))
compute[5] = (compute[5] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 8)]))
compute[6] = (compute[6] + (pad_temp.shared[(threadIdx.x + 146)]placeholder.shared[((threadIdx.z36) + 8)]))
compute[7] = (compute[7] + (pad_temp.shared[(threadIdx.x + 164)]placeholder.shared[((threadIdx.z36) + 8)]))
compute[8] = (compute[8] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 17)]))
compute[9] = (compute[9] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 17)]))
compute[10] = (compute[10] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 17)]))
compute[11] = (compute[11] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 17)]))
compute[12] = (compute[12] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 17)]))
compute[13] = (compute[13] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 17)]))
compute[14] = (compute[14] + (pad_temp.shared[(threadIdx.x + 146)]placeholder.shared[((threadIdx.z36) + 17)]))
compute[15] = (compute[15] + (pad_temp.shared[(threadIdx.x + 164)]placeholder.shared[((threadIdx.z36) + 17)]))
compute[16] = (compute[16] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 26)]))
compute[17] = (compute[17] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 26)]))
compute[18] = (compute[18] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 26)]))
compute[19] = (compute[19] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 26)]))
compute[20] = (compute[20] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 26)]))
compute[21] = (compute[21] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 26)]))
compute[22] = (compute[22] + (pad_temp.shared[(threadIdx.x + 146)]placeholder.shared[((threadIdx.z36) + 26)]))
compute[23] = (compute[23] + (pad_temp.shared[(threadIdx.x + 164)]placeholder.shared[((threadIdx.z36) + 26)]))
compute[24] = (compute[24] + (pad_temp.shared[(threadIdx.x + 38)]placeholder.shared[((threadIdx.z36) + 35)]))
compute[25] = (compute[25] + (pad_temp.shared[(threadIdx.x + 56)]placeholder.shared[((threadIdx.z36) + 35)]))
compute[26] = (compute[26] + (pad_temp.shared[(threadIdx.x + 74)]placeholder.shared[((threadIdx.z36) + 35)]))
compute[27] = (compute[27] + (pad_temp.shared[(threadIdx.x + 92)]placeholder.shared[((threadIdx.z36) + 35)]))
compute[28] = (compute[28] + (pad_temp.shared[(threadIdx.x + 110)]placeholder.shared[((threadIdx.z36) + 35)]))
compute[29] = (compute[29] + (pad_temp.shared[(threadIdx.x + 128)]placeholder.shared[((threadIdx.z36) + 35)]))
compute[30] = (compute[30] + (pad_temp.shared[(threadIdx.x + 146)]placeholder.shared[((threadIdx.z36) + 35)]))
compute[31] = (compute[31] + (pad_temp.shared[(threadIdx.x + 164)]placeholder.shared[((threadIdx.z36) + 35)]))
}
}
tensor[(((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z
12544))16) + threadIdx.x)] = max(((compute[0]placeholder[(threadIdx.z4)]) + placeholder[(threadIdx.z4)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z
12544))16) + threadIdx.x) + 224)] = max(((compute[1]placeholder[(threadIdx.z4)]) + placeholder[(threadIdx.z4)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z
12544))16) + threadIdx.x) + 448)] = max(((compute[2]placeholder[(threadIdx.z4)]) + placeholder[(threadIdx.z4)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z
12544))16) + threadIdx.x) + 672)] = max(((compute[3]placeholder[(threadIdx.z4)]) + placeholder[(threadIdx.z4)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z
12544))16) + threadIdx.x) + 896)] = max(((compute[4]placeholder[(threadIdx.z4)]) + placeholder[(threadIdx.z4)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 1120)] = max(((compute[5]placeholder[(threadIdx.z4)]) + placeholder[(threadIdx.z4)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 1344)] = max(((compute[6]placeholder[(threadIdx.z4)]) + placeholder[(threadIdx.z4)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 1568)] = max(((compute[7]placeholder[(threadIdx.z4)]) + placeholder[(threadIdx.z4)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 50176)] = max(((compute[8]placeholder[((threadIdx.z4) + 1)]) + placeholder[((threadIdx.z4) + 1)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 50400)] = max(((compute[9]placeholder[((threadIdx.z4) + 1)]) + placeholder[((threadIdx.z4) + 1)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 50624)] = max(((compute[10]placeholder[((threadIdx.z4) + 1)]) + placeholder[((threadIdx.z4) + 1)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 50848)] = max(((compute[11]placeholder[((threadIdx.z4) + 1)]) + placeholder[((threadIdx.z4) + 1)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 51072)] = max(((compute[12]placeholder[((threadIdx.z4) + 1)]) + placeholder[((threadIdx.z4) + 1)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 51296)] = max(((compute[13]placeholder[((threadIdx.z4) + 1)]) + placeholder[((threadIdx.z4) + 1)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 51520)] = max(((compute[14]placeholder[((threadIdx.z4) + 1)]) + placeholder[((threadIdx.z4) + 1)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 51744)] = max(((compute[15]placeholder[((threadIdx.z4) + 1)]) + placeholder[((threadIdx.z4) + 1)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 100352)] = max(((compute[16]placeholder[((threadIdx.z4) + 2)]) + placeholder[((threadIdx.z4) + 2)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 100576)] = max(((compute[17]placeholder[((threadIdx.z4) + 2)]) + placeholder[((threadIdx.z4) + 2)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 100800)] = max(((compute[18]placeholder[((threadIdx.z4) + 2)]) + placeholder[((threadIdx.z4) + 2)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 101024)] = max(((compute[19]placeholder[((threadIdx.z4) + 2)]) + placeholder[((threadIdx.z4) + 2)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 101248)] = max(((compute[20]placeholder[((threadIdx.z4) + 2)]) + placeholder[((threadIdx.z4) + 2)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 101472)] = max(((compute[21]placeholder[((threadIdx.z4) + 2)]) + placeholder[((threadIdx.z4) + 2)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 101696)] = max(((compute[22]placeholder[((threadIdx.z4) + 2)]) + placeholder[((threadIdx.z4) + 2)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 101920)] = max(((compute[23]placeholder[((threadIdx.z4) + 2)]) + placeholder[((threadIdx.z4) + 2)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 150528)] = max(((compute[24]placeholder[((threadIdx.z4) + 3)]) + placeholder[((threadIdx.z4) + 3)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 150752)] = max(((compute[25]placeholder[((threadIdx.z4) + 3)]) + placeholder[((threadIdx.z4) + 3)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 150976)] = max(((compute[26]placeholder[((threadIdx.z4) + 3)]) + placeholder[((threadIdx.z4) + 3)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 151200)] = max(((compute[27]placeholder[((threadIdx.z4) + 3)]) + placeholder[((threadIdx.z4) + 3)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 151424)] = max(((compute[28]placeholder[((threadIdx.z4) + 3)]) + placeholder[((threadIdx.z4) + 3)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 151648)] = max(((compute[29]placeholder[((threadIdx.z4) + 3)]) + placeholder[((threadIdx.z4) + 3)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z12544))16) + threadIdx.x) + 151872)] = max(((compute[30]placeholder[((threadIdx.z4) + 3)]) + placeholder[((threadIdx.z4) + 3)]), 0.000000f)
tensor[((((((blockIdx.y
112) + blockIdx.x) + (threadIdx.z*12544))16) + threadIdx.x) + 152096)] = max(((compute[31]placeholder[((threadIdx.z4) + 3)]) + placeholder[((threadIdx.z4) + 3)]), 0.000000f)
}
將cuDNN用于卷積層
可以使用cuDNN將卷積內核替換為cuDNN。將選項“ -libs = cudnn”附加到目標字符串。
net, params = testing.create_workload(simple_net)
target = “cuda -libs=cudnn” # use cudnn for convolution
lib = relay.build_module.build(net, target, params=params)

ctx = tvm.context(target, 0)
data = np.random.uniform(-1, 1, size=data_shape).astype(“float32”)
module = runtime.GraphModule(lib"default")
module.set_input(“data”, data)
module.run()
out_shape = (batch_size, out_channels, 224, 224)
out = module.get_output(0, tvm.nd.empty(out_shape))
out_cudnn = out.asnumpy()
如果使用cuDNN,則Relay無法將卷積與其后的圖層融合在一起。層融合發生在TVM內部表示(IR)級別。Relay將外部庫視為黑匣子,無法與TVM IR融合。
下面的偽代碼顯示cuDNN卷積+偏差加+批處理范數+ ReLU分為兩個計算階段,一個階段用于cuDNN調用,另一個階段用于其余操作。
// attr [y] storage_scope = “global”
allocate y[float32 * 802816]
produce y {
// attr [0] extern_scope = 0
tvm_call_packed(“tvm.contrib.cudnn.conv2d.forward”, 1, 0, 1, 1, 1, 1, 1, 1, 1, tvm_stack_make_array(placeholder, tvm_stack_make_shape(1, 3, 224, 224), 0, 4, 0.000000f, 0), tvm_stack_make_array(placeholder, tvm_stack_make_shape(16, 3, 3, 3), 0, 4, 0.000000f, 0), tvm_stack_make_array(y, tvm_stack_make_shape(1, 16, 224, 224), 0, 4, 0.000000f, 0))
}
produce tensor {
// attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 256
// attr [iter_var(threadIdx.x, , threadIdx.x)] thread_extent = 512
for (ax0.ax1.fused.ax2.fused.ax3.fused.outer, 0, 7) {
if (likely(((blockIdx.x512) < ((802816 - (ax0.ax1.fused.ax2.fused.ax3.fused.outer131072)) - threadIdx.x)))) {
tensor[(((((((blockIdx.x512) + threadIdx.x) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer131072))/802816)802816) + (((((((blockIdx.x512) + threadIdx.x) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer131072))/224) % 224)224) + ((((blockIdx.x64) + threadIdx.x) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer32)) % 224))) + ((((((blockIdx.x512) + threadIdx.x) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer131072))/50176) % 16)50176))] = max(((y[(((((((blockIdx.x512) + threadIdx.x) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer131072))/802816)802816) + (((((((blockIdx.x512) + threadIdx.x) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer131072))/224) % 224)224) + ((((blockIdx.x64) + threadIdx.x) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer32)) % 224))) + ((((((blockIdx.x512) + threadIdx.x) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer131072))/50176) % 16)50176))]placeholder[(((((blockIdx.x512) + threadIdx.x) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer131072))/50176) % 16)]) + placeholder[(((((blockIdx.x512) + threadIdx.x) + (ax0.ax1.fused.ax2.fused.ax3.fused.outer*131072))/50176) % 16)]), 0.000000f)
}
}
}
驗證結果
可以檢查兩次運行的結果是否匹配。
tvm.testing.assert_allclose(out_cuda, out_cudnn, rtol=1e-5)
結論
本文介紹了cuDNN與Relay的用法。也支持cuBLAS。如果啟用了cuBLAS,將在完全連接的層(relay.dense)中使用。要使用cuBLAS,將目標字符串設置為“ cuda -libs = cublas”。可以將cuDNN和cuBLAS與“ cuda -libs = cudnn,cublas”一起使用。
對于ROCm后端,支持MIOpen和rocBLAS。可以通過目標“ rocm -libs = miopen,rocblas”啟用。
能夠使用外部庫是很棒的,需要牢記一些注意事項。
首先,使用外部庫,可能會限制對TVM和Relay的使用。例如,MIOpen目前僅支持NCHW布局和fp32數據類型,不能在TVM中使用其他布局或數據類型。
其次,更重要的是,外部庫限制了在圖形編譯過程中算子融合的可能性,如上所述。TVM和Relay旨在通過聯合算子級別和圖形級別優化來在各種硬件上實現最佳性能。應該繼續為TVM和Relay開發更好的優化方法,在必要時使用外部庫作為回退到現有實現的一種好方法。

總結

以上是生活随笔為你收集整理的Relay外部库使用的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

日韩欧美国产精品 | 91黄色视屏 | 日批在线观看 | 日韩精品视频在线观看网址 | 国产麻豆果冻传媒在线观看 | 国产中文字幕在线视频 | 91人人爽久久涩噜噜噜 | 韩国一区二区三区视频 | 91在线免费播放视频 | 手机看片久久 | 黄色片毛片 | 中文字幕乱视频 | 久久亚洲私人国产精品va | 99精品国产一区二区三区麻豆 | 色天天综合久久久久综合片 | 亚洲第一av在线播放 | 国产精品porn| 国产 中文 日韩 欧美 | 天天干 夜夜操 | 天天要夜夜操 | 久久在线观看视频 | 亚洲成人av一区二区 | 成人网大片 | 五月激情久久久 | 97成人在线观看 | 日日婷婷夜日日天干 | 超碰97中文 | 日韩免费电影在线观看 | 国产精品中文字幕在线观看 | 在线精品亚洲一区二区 | 国产中年夫妇高潮精品视频 | 久久久久久久久久影视 | 亚洲午夜av电影 | 亚洲视频在线观看网站 | 国产精品igao视频网网址 | 国产精品一区二区在线看 | 麻豆国产精品视频 | 国产中文欧美日韩在线 | 亚洲国产精品激情在线观看 | 久久 地址| 亚州av免费 | 国产精品视频区 | 久久久精品一区二区 | 天海翼一区二区三区免费 | 久久久久久久99精品免费观看 | 岛国片在线| а天堂中文最新一区二区三区 | 91探花视频| 精品国产伦一区二区三区观看说明 | 欧美小视频在线观看 | 欧美激情精品 | 99久久国产免费免费 | 久热国产视频 | 欧美日韩久 | 毛片a级片 | 久久黄网站 | av黄色免费网站 | 久久精品99精品国产香蕉 | 国产中文字幕第一页 | 国产亚洲精品久 | 99热999 | 久久网站av| 午夜视频在线观看一区二区 | 免费日韩高清 | 天天干,天天射,天天操,天天摸 | 美女福利视频一区二区 | 日韩在线网址 | 91麻豆精品国产91久久久更新时间 | 久久成人麻豆午夜电影 | 久久av不卡 | 精品亚洲一区二区 | 视频一区亚洲 | 91视频国产高清 | 婷婷网址| 日韩在线一区二区免费 | 欧美日韩一区二区三区不卡 | 波多野结衣在线播放视频 | 91热| av在线观 | 夜添久久精品亚洲国产精品 | 91久久精品日日躁夜夜躁国产 | av软件在线观看 | www.天天成人国产电影 | 特级西西人体444是什么意思 | 亚洲最新av网址 | 91在线产啪 | 国产精品video爽爽爽爽 | 三级视频片 | 九九精品视频在线看 | 亚州精品天堂中文字幕 | 免费中文字幕在线观看 | 婷婷视频在线 | 久久一区二区三区日韩 | 国产999精品久久久久久麻豆 | 成人在线播放免费观看 | 天天要夜夜操 | 国产精品福利小视频 | www.xxxx变态.com | 亚洲日本欧美在线 | 超碰人人做 | 免费观看的黄色 | 久久人人97超碰精品888 | 婷婷色站 | 久久婷婷影视 | 天天插天天爱 | 91麻豆精品国产91久久久久久久久 | 99视频99 | 免费国产在线精品 | 久久撸在线视频 | 成人免费亚洲 | 天天综合色天天综合 | 伊甸园av在线 | 国偷自产视频一区二区久 | 正在播放国产91 | av免费在线免费观看 | 成人在线黄色电影 | 欧美日韩一区二区免费在线观看 | 色妞色视频一区二区三区四区 | 99久久久成人国产精品 | 国产精品一区一区三区 | 久久久电影网站 | 日韩免费在线观看视频 | 久久久久女人精品毛片 | 91九色成人蝌蚪首页 | 日韩视频免费观看高清 | 91av在线不卡 | 亚洲精品国偷拍自产在线观看蜜桃 | bayu135国产精品视频 | 中文免费| 国产精品国产三级国产不产一地 | 国产麻豆精品在线观看 | 蜜桃av久久久亚洲精品 | 中文字幕成人网 | 免费不卡中文字幕视频 | 欧美极品裸体 | 亚洲视频电影在线 | 黄网站色视频免费观看 | 中文在线| 色小说在线 | japanesefreesexvideo高潮 | 久久亚洲私人国产精品 | 免费看久久久 | 91精品视频一区 | 免费在线h| 黄色在线观看www | 婷婷 中文字幕 | 日韩在线观看网址 | 9999精品视频 | 午夜视频免费在线观看 | 成人av资源网站 | 中文字幕999 | 国产成人精品在线播放 | 午夜精品一二区 | 在线a人v观看视频 | 香蕉精品视频在线观看 | 午夜精品久久久久久久99 | 国产偷v国产偷∨精品视频 在线草 | av中文字幕免费在线观看 | 亚洲日本在线视频观看 | 亚洲天堂va | 在线看v片 | 日本黄色免费在线 | 99riav1国产精品视频 | 99精品国产免费久久久久久下载 | 久久久久女人精品毛片九一 | 欧美黑人性爽 | 国产剧情在线一区 | www.啪啪.com | 九九综合九九综合 | 久久av在线 | 91在线网站 | av解说在线观看 | 色天堂在线视频 | www.97视频| 欧美精品久久久久久久久久久 | 日日夜夜国产 | 成人综合婷婷国产精品久久免费 | 偷拍福利视频一区二区三区 | 日韩电影一区二区三区在线观看 | 欧美综合在线视频 | 亚洲成人国产精品 | 国产高清精 | 99热精品国产 | 国产成人精品999 | 国产 亚洲 欧美 在线 | 超碰在线网 | 91av原创| 久久一视频 | 高清av中文字幕 | 天天操天天舔天天爽 | 欧美日韩伦理一区 | 国产一区二区在线看 | 美女网站在线观看 | 国产麻豆视频在线观看 | 亚洲无线视频 | 国产精品成人在线 | 国产精品一区在线 | 在线观看日韩免费视频 | 欧美色图一区 | 99久久视频| 久久与婷婷 | 欧美一区二区伦理片 | 亚洲成人av片 | 中文字幕有码在线 | 99久久激情视频 | 久久久久久高清 | 天天射成人 | 色综合天天综合 | 中文字幕在线视频免费播放 | 亚洲成熟女人毛片在线 | 亚洲精品在线免费播放 | 超碰人人干人人 | av动态图片 | 在线a人v观看视频 | 97人人精品 | 国产高清视频在线 | 韩日三级在线 | 国产成人一区二区三区影院在线 | 国产又粗又猛又黄 | 欧洲精品亚洲精品 | 亚洲人毛片 | 久久婷婷开心 | 欧美a级片网站 | 黄色大片av| 国精产品999国精产品岳 | 天天干天天拍 | 天天操天天摸天天爽 | 天天做天天干 | 热久久国产精品 | 五月天电影免费在线观看一区 | 天天天天天天天天操 | 欧美整片sss | 午夜 久久 tv| 97超碰总站 | 久久深夜福利免费观看 | 久久激情小视频 | 国产精品久久久久久久久软件 | 免费激情在线电影 | av福利超碰网站 | 国产日韩欧美网站 | 一二三四精品 | 亚洲精品456在线播放 | 视频在线亚洲 | 国产成人精品免高潮在线观看 | 我爱av激情网 | 国产日韩欧美网站 | 久久不射影院 | 波多野结衣视频一区二区三区 | 99在线精品视频 | 欧美日韩中文字幕综合视频 | 在线 影视 一区 | 成人午夜电影在线 | 久久婷亚洲五月一区天天躁 | 西西44人体做爰大胆视频 | 蜜臀一区二区三区精品免费视频 | 国产精品国产三级国产不产一地 | 狠狠色丁香九九婷婷综合五月 | 在线视频99 | 国产在线a | 91av网址| 欧美性色黄大片在线观看 | 日韩影视在线观看 | 亚洲精品自在在线观看 | 不卡av电影在线 | 久久99久久99精品中文字幕 | av再线观看 | 天天操天天干天天爽 | 欧美一级电影 | 波多野结衣亚洲一区二区 | 久久综合99 | 国产视频一二区 | 国产精品久久久久影院 | 黄色的片子 | 女人18精品一区二区三区 | 亚洲丁香日韩 | 国产福利小视频在线 | 亚洲成人xxx | 久久久免费观看完整版 | 97精产国品一二三产区在线 | 国产一级大片在线观看 | 91在线一区二区 | 亚洲精品视频在线观看网站 | 婷婷色中文网 | 探花视频在线版播放免费观看 | 69国产精品视频 | 国产免费久久av | 日韩精品中文字幕在线播放 | 久久综合免费视频影院 | 狠狠干中文字幕 | 六月激情网 | 天天爽夜夜爽精品视频婷婷 | 免费黄色a网站 | 国产区免费在线 | 日日碰狠狠躁久久躁综合网 | 中中文字幕av在线 | 在线视频99 | 一区二区三区四区五区在线视频 | 免费看污网站 | 日韩欧美在线免费 | 天天干天天操天天搞 | 日韩激情久久 | 亚州欧美视频 | 久草在线资源观看 | 欧美日韩69 | 久艹视频在线免费观看 | 日韩免费电影一区二区三区 | 91成人国产 | 一区二区精品在线视频 | 精品国产乱码久久久久久三级人 | 天天综合色| 天天插日日插 | 成人毛片一区 | 免费在线a | 香蕉视频免费在线播放 | 欧美一级大片在线观看 | 超碰在线公开免费 | 国产精品网址在线观看 | 欧美综合在线视频 | 999国产在线 | 91精品国产麻豆国产自产影视 | 日日夜夜亚洲 | 天天干,夜夜爽 | 99综合电影在线视频 | 中文字幕中文字幕在线一区 | 欧美日韩三级在线观看 | 亚洲激情视频在线观看 | 欧美电影黄色 | 在线国产小视频 | 久久国内免费视频 | 欧美日韩精品区 | 久久高清免费视频 | 激情综合亚洲精品 | 国产精品一区二区免费在线观看 | 亚洲视频免费 | 日韩欧美xxxx | 日韩视频一二三区 | 久久露脸国产精品 | 97在线影院 | 日韩精品不卡在线观看 | 黄污网| 91看片淫黄大片在线播放 | 最近中文字幕完整视频高清1 | 91av小视频 | 久久久久中文 | 成人午夜在线电影 | 91精品国产99久久久久久久 | 国产在线精品国自产拍影院 | 欧美一级片在线 | 成人免费观看av | 国产伦精品一区二区三区照片91 | 国产日韩精品一区二区三区在线 | 视频一区二区视频 | 日韩高清无线码2023 | 在线观看视频你懂的 | 欧美一级性生活片 | 狠狠操91| 欧美一级激情 | 国产精品观看在线亚洲人成网 | 五月天亚洲综合小说网 | 91系列在线 | 国产做aⅴ在线视频播放 | 久久久久久久久影视 | 深夜国产在线 | 成在人线av | 精品亚洲一区二区三区 | 蜜桃传媒一区二区 | 在线观看色网 | 亚洲成av人影院 | 91成人在线网站 | 成人a在线| 国产精品久久久久久久毛片 | 国产成人区 | 成人黄色短片 | 欧美精品日韩 | 久久精品观看 | 91成版人在线观看入口 | 91cn国产在线 | 99婷婷狠狠成为人免费视频 | 国产精品v欧美精品v日韩 | 国产精品丝袜在线 | 国产亚洲综合性久久久影院 | 国产a级精品 | 欧美一级欧美一级 | 韩日三级av | 日韩欧美在线高清 | 色综合婷婷久久 | 久久综合九九 | 午夜影院在线观看18 | 国产精品电影在线 | 成人性生爱a∨ | 欧美a在线免费观看 | 亚洲精品456在线播放 | 久久久精品久久日韩一区综合 | 久久 在线| 手机成人av在线 | 国产精品久久久久久久久久久久冷 | 黄色www免费 | 日韩视频在线一区 | 夜夜夜 | 高清在线一区 | 精品国内自产拍在线观看视频 | 四虎免费在线观看视频 | 国产美女精品人人做人人爽 | 亚洲天天看 | 蜜臀av性久久久久蜜臀av | 欧美精品少妇xxxxx喷水 | 免费看成人 | 国产精品片 | 日批视频在线观看免费 | 美女福利视频一区二区 | 久久久久久久久久久久久久免费看 | 久久婷婷国产色一区二区三区 | 日韩亚洲在线视频 | 欧美日韩在线第一页 | 丁香花中文字幕 | 久久女同性恋中文字幕 | 狠狠干天天射 | 最近免费在线观看 | 国产91全国探花系列在线播放 | 国产无套精品久久久久久 | 久草视频免费 | 色婷婷电影 | 欧美精品黑人性xxxx | 久久综合毛片 | 五月天婷婷视频 | 手机av在线不卡 | 国产无套精品久久久久久 | 美女视频黄是免费的 | 国产香蕉视频在线观看 | 亚洲综合小说电影qvod | 国产精品粉嫩 | 黄色三级久久 | 九草视频在线观看 | 精品国产一区二区三区久久久蜜月 | 国产黄色精品在线观看 | 久久精品首页 | 国产亚洲在线视频 | 在线观看午夜 | 欧美在线视频一区二区三区 | 狠狠狠色 | 一区二区av | 国产精品美女视频网站 | 国产成人综合精品 | 国色天香在线观看 | 国产一级电影免费观看 | 色开心| 免费中文字幕视频 | aaa亚洲精品一二三区 | 亚洲香蕉视频 | 久久99婷婷| 成人丁香花 | 99久久精品国产观看 | 免费在线一区二区三区 | 亚洲黄色区 | 久久亚洲福利视频 | 久久久久久久18 | 深夜精品福利 | 久久精品中文视频 | av中文国产 | 在线视频婷婷 | 啪嗒啪嗒免费观看完整版 | 欧美精品一区二区性色 | 欧美日韩国产精品一区二区 | 日韩在线视频线视频免费网站 | 日韩成人不卡 | 一区二区三区在线观看中文字幕 | 久操免费视频 | 在线国产一区二区三区 | 天天干天天摸 | 久久亚洲综合色 | 草久草久 | 亚洲一区二区三区四区在线视频 | 国产在线精品区 | 久久久影片 | 久久99精品久久久久久秒播蜜臀 | 久久网站免费 | 天天操比| 成人av免费电影 | 国产一区二区在线视频观看 | 中文视频一区二区 | 奇米影视999 | 久久精品高清视频 | 日韩www在线 | 91精品视频在线 | 在线视频观看你懂的 | av资源在线看 | 精品国产不卡 | 黄色毛片电影 | 国内揄拍国内精品 | 在线国产视频一区 | 久热国产视频 | 欧美成人性战久久 | 日韩大片在线观看 | 9999国产精品| 国产精品永久免费观看 | 日日日网| 日日摸日日 | 特级毛片在线免费观看 | 久久美女精品 | 精品亚洲免a | 亚洲性视频 | 日日躁你夜夜躁你av蜜 | 欧美在线观看视频一区二区 | 青青草视频精品 | 在线精品视频免费播放 | 96在线 | 亚洲一级在线观看 | 久久毛片视频 | 美女视频久久 | 国产99久久久国产精品免费二区 | 久久国产精品影片 | 99精品国产在热久久下载 | 久久久99国产精品免费 | www.综合网.com | 亚洲欧美999 | 午夜黄色影院 | 久草精品视频在线观看 | 五月天综合网 | 午夜av影院 | 天天色天天操天天爽 | 中文字幕 成人 | 精品影院一区二区久久久 | 欧美一级在线 | 一级做a爱片性色毛片www | 五月花激情| 久久午夜精品影院一区 | 日韩精品三区四区 | 久久成人在线 | 国偷自产中文字幕亚洲手机在线 | 国产精品久久久久久久久久新婚 | 国产精品一区二区麻豆 | 天天干天天干天天射 | 缴情综合网五月天 | a级国产乱理论片在线观看 特级毛片在线观看 | 青青网视频 | www.亚洲激情.com | av蜜桃在线 | 天天色.com | 久久视频6 | 国产成人一区二区三区 | 日韩精品免费一区二区 | 美女视频又黄又免费 | 天天躁日日躁狠狠躁 | 97香蕉久久超级碰碰高清版 | 99视频在线精品国自产拍免费观看 | 精品国产乱码久久 | 精品国产乱码久久久久久久 | 亚洲片在线观看 | 91精品国产自产老师啪 | 国产黄色免费电影 | 国产色视频123区 | 国产精品午夜久久久久久99热 | 99视频精品 | 特级黄色一级 | 国产h片在线观看 | 国产福利精品视频 | 色噜噜色噜噜 | 麻豆视传媒官网免费观看 | 国产一区欧美日韩 | 精品国产一区二区三区在线观看 | 久久婷婷色 | 国产美女精品人人做人人爽 | 中文字幕在线观看免费高清电影 | 久久国产欧美日韩 | 日韩在线看片 | 国产 中文 日韩 欧美 | 亚洲视频 中文字幕 | 欧美日韩另类在线观看 | 999国内精品永久免费视频 | 麻花传媒mv免费观看 | 91成人亚洲 | 亚洲黄色免费在线 | 久久伊人综合 | 久草视频在线免费 | 国产亚洲精品免费 | 999成人网| 中文字幕麻豆 | 国产在线观看免 | 国产精品第一视频 | 国产精品九九视频 | 午夜一级免费电影 | 亚洲精品成人在线 | 婷婷色在线播放 | 97国产精品免费 | 色婷婷丁香 | 国产在线观看网站 | 97国产一区二区 | 91在线国内视频 | 日韩免费电影网站 | 99中文在线 | 久久久麻豆精品一区二区 | 激情视频一区二区三区 | 久久久久一区二区三区 | 国产日韩精品一区二区三区在线 | 99国产视频在线 | 久久久精品网 | 又紧又大又爽精品一区二区 | 4438全国亚洲精品观看视频 | 99视频国产精品免费观看 | 婷婷国产在线观看 | 午夜一级免费电影 | 99精品国产高清在线观看 | а中文在线天堂 | 91最新在线 | 亚洲区另类春色综合小说校园片 | 亚洲国产精品女人久久久 | av手机在线播放 | 日韩区视频 | 久久人人爽人人 | 很黄很污的视频网站 | 国精产品999国精产品视频 | 91精品国产乱码久久桃 | 亚洲日日日 | 国产视频久久久久 | 日韩在线视频精品 | 欧美九九视频 | 黄色在线免费观看网址 | 日韩中文在线字幕 | 激情五月看片 | 天天爱天天草 | 欧美成人精品欧美一级乱黄 | 国产精品精品国产婷婷这里av | 亚洲国产精品成人女人久久 | 色婷在线 | 友田真希x88av | 婷婷视频在线观看 | 欧美黑人巨大xxxxx | 美女视频黄频大全免费 | 日韩在线视频观看 | 国产a级免费 | 久久精品免视看 | 中文字幕高清免费日韩视频在线 | 在线观看www. | 国产精品综合在线 | 日本韩国精品一区二区在线观看 | 日韩av网页| 精品一区二区日韩 | 亚洲天堂网在线视频 | av在线超碰 | 黄色国产大片 | 亚洲精品动漫在线 | 97色se| 成人黄色av免费在线观看 | 69亚洲乱 | 天天玩天天操天天射 | 超碰公开在线观看 | 久久字幕精品一区 | 日本一区二区不卡高清 | 色偷偷人人澡久久超碰69 | 特级毛片爽www免费版 | 国产精品手机播放 | 69av视频在线 | 中文字幕在线人 | 久草视频免费在线观看 | 久久久久成 | 麻豆传媒在线免费看 | 国产成人无码AⅤ片在线观 日韩av不卡在线 | 天天草天天干天天 | 亚洲一区日韩精品 | 中文字幕乱码在线播放 | 欧美性生爱 | 色婷婷97 | www.狠狠色 | 久草在线最新免费 | 久草在线资源网 | 国产亚洲免费的视频看 | 日韩精品免费专区 | 在线观看aa| www视频在线免费观看 | 天天色天天爱天天射综合 | 四虎成人精品永久免费av九九 | 玖操 | 黄色av免费看 | 国产精品剧情 | 久久色在线播放 | 96视频免费在线观看 | 国产午夜在线 | 91麻豆网站 | 亚洲精品综合欧美二区变态 | 国产一级在线免费观看 | 激情九九 | av三级在线免费观看 | 伊人天天 | 亚洲国产精品人久久电影 | 国产免费久久 | 91麻豆精品国产自产在线 | 色999五月色 | 久久精品123| 国产一区二区三区网站 | 午夜在线免费视频 | 97香蕉超级碰碰久久免费软件 | 久久久精品国产一区二区三区 | 亚洲精品动漫成人3d无尽在线 | 国产精品免费一区二区三区在线观看 | 绯色av一区 | 五月天婷婷在线播放 | 97视频网址 | 亚洲精品va | 日韩在线视频网 | 欧美二区视频 | 91福利小视频 | 久久免视频 | 免费看黄在线观看 | 婷婷精品在线 | 欧美性做爰猛烈叫床潮 | 99福利影院| 欧美日韩精品影院 | 日本精品视频免费观看 | 久久国产精品系列 | 色婷婷国产精品 | 久久五月情影视 | 日韩欧美在线视频一区二区 | 最新真实国产在线视频 | 在线观看你懂的网址 | 91av蜜桃 | 少妇性色午夜淫片aaaze | 日韩在线观看视频网站 | 国产精品国产亚洲精品看不卡 | 日韩在线播放视频 | 亚洲精品自拍视频在线观看 | av高清免费在线 | 成人国产在线 | 成人精品一区二区三区中文字幕 | 国产xxxx做受性欧美88 | 在线观看免费黄色 | 三级视频片 | 亚洲精品久久久久999中文字幕 | 天堂av在线7 | 国产精品久久久久久久久久尿 | 狠狠干网站 | 日本一区二区不卡高清 | 人人射网站 | 黄色毛片视频免费观看中文 | 在线观看91精品国产网站 | 欧产日产国产69 | 成人国产精品免费 | 日日夜夜精品免费观看 | 丁香婷婷激情五月 | av在线免费在线 | 久草香蕉在线视频 | 首页中文字幕 | 亚洲 欧美 另类人妖 | 99久久这里只有精品 | 精品国产精品国产偷麻豆 | 国产青春久久久国产毛片 | 97电影网站 | 精品久久久久久久久久久久久 | 日本一区二区不卡高清 | 欧美一级片免费播放 | 一级一片免费视频 | 国产馆在线播放 | 91成熟丰满女人少妇 | 久久在线播放 | 99婷婷狠狠成为人免费视频 | 在线播放你懂 | 久久综合九色99 | 深爱激情五月网 | 天天插天天狠 | 天天操天天操 | 一本一道久久a久久精品蜜桃 | 日日摸日日爽 | 麻豆国产网站入口 | 日韩 精品 一区 国产 麻豆 | 欧洲色综合| 久久夜色精品国产欧美乱 | 亚洲精品自拍视频在线观看 | 日本在线观看一区二区 | 国产视频69 | 一区二区 久久 | 亚洲另类久久 | 久久av一区二区三区亚洲 | 色综合天天色综合 | 国产美女视频黄a视频免费 久久综合九色欧美综合狠狠 | 91亚洲精品国产 | av中文在线播放 | 国产一在线精品一区在线观看 | 亚州免费视频 | 最近高清中文在线字幕在线观看 | 中文字幕一区二 | 超碰在97 | 久久午夜羞羞影院 | 欧美日韩免费观看一区二区三区 | 黄色毛片在线看 | 国产亚洲精品免费 | 日韩性xxx| a视频免费在线观看 | 手机成人免费视频 | 超碰97国产精品人人cao | 久久视屏网| 欧美中文字幕久久 | 一本一本久久a久久精品牛牛影视 | 97人人模人人爽人人少妇 | 91九色丨porny丨丰满6 | 国内99视频 | av大片网址 | 亚洲天堂网在线视频 | 欧美aaa级片| 国产精品一区二区在线播放 | 男女全黄一级一级高潮免费看 | 婷婷久久久 | 国产精品一区免费观看 | 亚洲综合欧美激情 | 夜夜操天天干, | 日韩一区二区免费播放 | 欧美性大战久久久久 | 国产资源在线视频 | 日韩欧美视频在线播放 | 最新国产精品亚洲 | 欧美精品乱码久久久久 | 天堂成人在线 | 久久精品视频中文字幕 | 日本不卡一区二区三区在线观看 | 免费看色网站 | 精品国产资源 | 国产精品都在这里 | 成年人在线免费看片 | 国产日韩精品视频 | www.夜夜操 | 亚洲码国产日韩欧美高潮在线播放 | 久久国产精品一区二区三区 | 天天看天天干 | 婷婷色五 | 三级av免费看 | 视频 天天草 | 中文字幕av一区二区三区四区 | 久久精久久精 | 精品 一区 在线 | 尤物97国产精品久久精品国产 | 成人在线黄色电影 | 精品在线视频一区 | 国产精品久久久av久久久 | 久久综合久久综合九色 | 亚洲精品高清在线观看 | 九九视频在线播放 | 久久精品日产第一区二区三区乱码 | 激情五月在线 | 午夜性生活 | 久草观看 | 日韩网站视频 | 顶级欧美色妇4khd | 国产亚洲综合在线 | 亚洲欧美视频网站 | 成人黄色在线看 | 99久久精品国产一区二区三区 | 天天干天天摸天天操 | 亚洲爱av | 精品在线二区 | 精品国产乱码久久 | 日韩精品免费在线观看视频 | 99视频播放 | 91在线视频网址 | 天天操天天射天天插 | 日日骑| 久久在线 | 成人在线免费av | 91精品久久久久久综合五月天 | www夜夜操 | 国产日韩一区在线 | 天天搞天天干天天色 | 国产精品一区二区中文字幕 | 99热这里只有精品久久 | 99精品视频网站 | 亚洲精品免费在线观看视频 | 亚洲精品国产精品乱码不99热 | 日本久久影视 | 久草免费在线观看视频 | 91精品视频在线 | 精品视频免费播放 | 成人一区二区三区中文字幕 | 91在线区 | 国产精品手机视频 | 国产精品久久久久久a | 久草在线免费看视频 | 99久久er热在这里只有精品15 | 尤物97国产精品久久精品国产 | 天堂中文在线视频 | 国产黄色美女 | 欧美一进一出抽搐大尺度视频 | 免费大片av | 国产成人三级三级三级97 | 国产美女在线免费观看 | www色,com | 丁香婷婷久久 | 亚洲综合在线五月天 | 在线观看视频色 | 国内精品久久久 | 一区 二区 精品 | 久久国产片| 人人干在线 | 日本三级全黄少妇三2023 | 欧美一级电影片 | 91mv.cool在线观看 | 91黄视频在线 | 日韩一区二区三区高清免费看看 | 国内亚洲精品 | 手机色在线 | 日批网站在线观看 | 国产午夜不卡 | 亚洲国产欧美一区二区三区丁香婷 | 午夜精品视频免费在线观看 | 国产精品久久久久久影院 | 国产精品久久久久久久久久久久久 | 亚洲一区二区精品在线 | 精品黄色在线观看 | 亚洲激情在线 | 在线视频1卡二卡三卡 | 91久草视频 | 天天色.com| 四虎影视成人永久免费观看视频 | 国产伦精品一区二区三区四区视频 | 色多多视频在线 | 国产精品久久久久久久电影 | 国产一级精品视频 | 99热国产精品 | 久久久精品日本 | 人人插人人插 | 国产伦理精品一区二区 | 久久久免费 | 伊人久久电影网 | 国色天香永久免费 | 久久久免费精品 | 成av人电影 | 午夜99| 色先锋资源网 | 91av资源网| 国产精品免费一区二区三区在线观看 | 国产一区二区久久精品 | 一本一本久久a久久精品综合 | 欧美日韩一区二区三区免费视频 | 国产精品久久婷婷六月丁香 | 一区二区三区在线免费观看视频 | 69视频国产 | 免费h精品视频在线播放 | 天天色天天搞 | 免费福利在线 | 国产精品白虎 | 九九在线精品视频 | 射久久 | 久久影视一区 | 98超碰人人 | 欧美少妇bbwhd | 欧美日韩久久一区 | 国产亚洲观看 | 97精品久久人人爽人人爽 | 九九热精品视频在线观看 | 欧美最猛性xxxxx免费 | 麻花豆传媒mv在线观看 | 美女视频黄免费网站 | 国产97色在线 | 色狠狠婷婷 | 日韩精品一区二区三区三炮视频 | 亚洲精品乱码久久久久久久久久 | 激情五月婷婷丁香 | 久久极品 | 日韩精品欧美精品 | 国产精品久久视频 | 三级黄色免费 | 欧洲不卡av | 色99之美女主播在线视频 | 国产精品手机在线观看 | 亚洲成aⅴ人在线观看 | 亚洲精品中文字幕在线观看 | 国产精品久久久久一区二区三区共 | 波多野结衣电影一区二区 | 中文字幕精品一区 | 91人网站 | 人人爽人人爽人人 | 日韩精品免费在线观看 | 午夜精品av在线 | 亚洲天堂首页 | 亚洲网久久 | 美女国产| 国产黄| 91热精品 | 色诱亚洲精品久久久久久 | 97福利在线观看 | 国产麻豆精品一区 | 视频一区二区免费 | 国产免费久久av | 久久久2o19精品 | 国偷自产视频一区二区久 | 97超碰国产精品女人人人爽 | 国产在线观看不卡 | 婷婷丁香七月 | 国产在线免费观看 | 午夜999 | 日韩午夜精品 | 久久精品亚洲综合专区 | 91人人网 | 亚洲精品视频免费在线 | 欧美日韩精品免费观看视频 | 日韩一区视频在线 | 色网站国产精品 | 国产97在线视频 | 免费三级a | 丁香高清视频在线看看 | 日韩在线高清 | 午夜精品一区二区三区在线播放 |