當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

AVOD-代码理解系列（四）

發(fā)布時(shí)間：2024/8/1 编程问答 45 豆豆

生活随笔收集整理的這篇文章主要介紹了 AVOD-代码理解系列（四）小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

AVOD-代碼理解（四）

前段時(shí)間博主準(zhǔn)備開(kāi)題去了，現(xiàn)在回來(lái)繼續(xù)更新。拖了好長(zhǎng)一段時(shí)間，還是不要半途而廢。最近發(fā)現(xiàn)AVOD代碼的一些小trick還是存在一些不理解之處，如果哪位朋友懂了可以討論一下，我會(huì)把一些暫時(shí)不能明白的地方指出來(lái)。最近有個(gè)同事一直催促我跟他一起仔細(xì)研讀那些小tricks的地方，等哪天我把課題的想法與思路再捋捋后再研究。

RPN->NMS

上一篇我們說(shuō)到使用全連接層進(jìn)行物體與背景判定，并生成bbox的６個(gè)回歸值。接下來(lái)是整個(gè)網(wǎng)絡(luò)結(jié)構(gòu)的這一部分（不包括最后的部分）：

代碼塊如下:

#并沒(méi)有用.就是一個(gè)可視化、可以自己選擇是否可視化with tf.variable_scope('histograms_feature_extractor'):with tf.variable_scope('bev_vgg'):for end_point in self.bev_end_points:tf.summary.histogram(end_point, self.bev_end_points[end_point])with tf.variable_scope('img_vgg'):for end_point in self.img_end_points:tf.summary.histogram(end_point, self.img_end_points[end_point])with tf.variable_scope('histograms_rpn'):with tf.variable_scope('anchor_predictor'):fc_layers = [cls_fc6, cls_fc7, cls_fc8, objectness,reg_fc6, reg_fc7, reg_fc8, offsets]for fc_layer in fc_layers:# fix the name to avoid tf warningstf.summary.histogram(fc_layer.name.replace(':', '_'),fc_layer)# Return the proposalswith tf.variable_scope('proposals'):#手動(dòng)輸入的?anchors = self.placeholders[self.PL_ANCHORS]# Decode anchor regression offsetswith tf.variable_scope('decoding'):#得到回歸后的(x,y,z,dx,dy,dz).由最初的輸入變?yōu)榛貧w的值regressed_anchors = anchor_encoder.offset_to_anchor(anchors, offsets)with tf.variable_scope('bev_projection'):#[[-40,40],[0,70]]#返回bev_box_corner,bev_box_corners_norm_, bev_proposal_boxes_norm = anchor_projector.project_to_bev(regressed_anchors, self._bev_extents)with tf.variable_scope('softmax'):objectness_softmax = tf.nn.softmax(objectness)with tf.variable_scope('nms'):objectness_scores = objectness_softmax[:, 1]# Do NMS on regressed anchors#實(shí)現(xiàn)極大值抑制non max suppression，# 其中boxes是不同boxes的坐標(biāo)，scores是不同boxes預(yù)測(cè)的分?jǐn)?shù)，max_boxes是保留的最大box的個(gè)數(shù)。# iou_threshold是一個(gè)閾值，去掉大于這個(gè)閾值的所有boxes?。#_nms_size=1024,0.8#篩選出來(lái)的序數(shù)top_indices = tf.image.non_max_suppression(bev_proposal_boxes_norm, objectness_scores,max_output_size=self._nms_size,iou_threshold=self._nms_iou_thresh)#選擇篩選后的anchors和objectnesstop_anchors = tf.gather(regressed_anchors, top_indices)top_objectness_softmax = tf.gather(objectness_scores,top_indices)# top_offsets = tf.gather(offsets, top_indices)# top_objectness = tf.gather(objectness, top_indices)

在上訴部分，regressed_anchors = anchor_encoder.offset_to_anchor(anchors, offsets)的解釋如下：

def offset_to_anchor(anchors, offsets):"""Decodes the anchor regression predictions with theanchor. 這一部分的主要工作就是根據(jù)公式計(jì)算回歸的anchor的參數(shù)值，包括[x,y,z,dx,dy,dz]Args:anchors: A numpy array or a tensor of shape [N, 6]representing the generated anchors.offsets: A numpy array or a tensor of shape[N, 6] containing the predicted offsets in theanchor format [x, y, z, dim_x, dim_y, dim_z].Returns:anchors: A numpy array of shape [N, 6]representing the predicted anchor boxes."""#確保anchors的shape是n*6fc.check_anchor_format(anchors)fc.check_anchor_format(offsets)# x = dx * dim_x + x_anchx_pred = (offsets[:, 0] * anchors[:, 3]) + anchors[:, 0]# y = dy * dim_y + y_anchy_pred = (offsets[:, 1] * anchors[:, 4]) + anchors[:, 1]# z = dz * dim_z + z_anchz_pred = (offsets[:, 2] * anchors[:, 5]) + anchors[:, 2]tensor_format = isinstance(anchors, tf.Tensor)if tensor_format:# dim_x = exp(log(dim_x) + dx)dx_pred = tf.exp(tf.log(anchors[:, 3]) + offsets[:, 3])# dim_y = exp(log(dim_y) + dy)dy_pred = tf.exp(tf.log(anchors[:, 4]) + offsets[:, 4])# dim_z = exp(log(dim_z) + dz)dz_pred = tf.exp(tf.log(anchors[:, 5]) + offsets[:, 5])anchors = tf.stack((x_pred,y_pred,z_pred,dx_pred,dy_pred,dz_pred), axis=1)else:dx_pred = np.exp(np.log(anchors[:, 3]) + offsets[:, 3])dy_pred = np.exp(np.log(anchors[:, 4]) + offsets[:, 4])dz_pred = np.exp(np.log(anchors[:, 5]) + offsets[:, 5])anchors = np.stack((x_pred,y_pred,z_pred,dx_pred,dy_pred,dz_pred), axis=1)return anchors

前文的 _, bev_proposal_boxes_norm = anchor_projector.project_to_bev( regressed_anchors, self._bev_extents)部分的解釋如下：

def project_to_bev(anchors, bev_extents):"""Projects an array of 3D anchors into bird's eye view在查看kitti的數(shù)據(jù)集后，我發(fā)現(xiàn)它的數(shù)據(jù)采集系統(tǒng)的坐標(biāo)系是這樣的：Camera：x=right,y=down,z=forwardVelodyne:x=forward,y=left,z=upGPS/IMU:x=fprward,y=left,z=up那就是說(shuō)明實(shí)際上是在camera坐標(biāo)下進(jìn)行的，所以鳥瞰圖上實(shí)際就是xz軸。之后再細(xì)看，如何進(jìn)行的點(diǎn)云鳥瞰圖投影！Args:anchors: list of anchors in anchor format (N x 6):N x [x, y, z, dim_x, dim_y, dim_z],can be a numpy array or tensorbev_extents: xz extents of the 3d area[[min_x, max_x], [min_z, max_z]]Returns:box_corners_norm: corners as a percentage of the map size, in theformat N x [x1, y1, x2, y2]. Origin is the top left corner(原點(diǎn)是左上角)"""#[[-40,40],[0,70]]tensor_format = isinstance(anchors, tf.Tensor)if not tensor_format:anchors = np.asarray(anchors)#這里的鳥瞰圖坐標(biāo)是xz!#x,y,z是框的中心點(diǎn),dx,dy,dz則分別是寬，高，長(zhǎng)（以人在車?yán)锏囊暯强?#xff09;！x = anchors[:, 0]z = anchors[:, 2]half_dim_x = anchors[:, 3] / 2.0half_dim_z = anchors[:, 5] / 2.0# Calculate extent ranges#[[-40,40],[0,70]]。z的方向才是車前方。所以只有正數(shù)。在觀察kitti的數(shù)據(jù)時(shí)可以看到應(yīng)該是只涉及前方的物#體,然而現(xiàn)有的車載感知系統(tǒng)實(shí)際上是車周圍除開(kāi)某個(gè)盲區(qū)都有。bev_x_extents_min = bev_extents[0][0]bev_z_extents_min = bev_extents[1][0]bev_x_extents_max = bev_extents[0][1]bev_z_extents_max = bev_extents[1][1]#80bev_x_extents_range = bev_x_extents_max - bev_x_extents_min#70bev_z_extents_range = bev_z_extents_max - bev_z_extents_min# 2D corners (top left, bottom right)#左上角與右下角x1 = x - half_dim_xx2 = x + half_dim_x# Flip z co-ordinates (origin changes from bottom left to top left)#翻轉(zhuǎn)z軸,原點(diǎn)從左下角變?yōu)樽笊辖恰＿@個(gè)地方為了防止有人不能理解究竟是怎么回事，我在下面畫了一張#草圖,可以加深理解。z1 = bev_z_extents_max - (z + half_dim_z)z2 = bev_z_extents_max - (z - half_dim_z)# Stack into (N x 4)if tensor_format:bev_box_corners = tf.stack([x1, z1, x2, z2], axis=1)else:bev_box_corners = np.stack([x1, z1, x2, z2], axis=1)# Convert from original xz into bev xz, origin moves to top left#[-40,0,40,70]bev_extents_min_tiled = [bev_x_extents_min, bev_z_extents_min,bev_x_extents_min, bev_z_extents_min]bev_box_corners = bev_box_corners - bev_extents_min_tiled# Calculate normalized box corners for ROI pooling#計(jì)算ROI池的標(biāo)準(zhǔn)化方框角extents_tiled = [bev_x_extents_range, bev_z_extents_range,bev_x_extents_range, bev_z_extents_range]#標(biāo)準(zhǔn)化bev_box_corners_norm = bev_box_corners / extents_tiled#[x1,z1,x2,z2],[]return bev_box_corners, bev_box_corners_norm

anchor投影計(jì)算：

總結(jié)

以上是生活随笔為你收集整理的AVOD-代码理解系列（四）的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。