日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 人工智能 > 目标检测 >内容正文

目标检测

目标检测经典论文——Fast R-CNN论文翻译(中英文对照版):Fast R-CNN(Ross Girshick, Microsoft Research(微软研究院))

發(fā)布時(shí)間:2023/12/20 目标检测 60 豆豆
生活随笔 收集整理的這篇文章主要介紹了 目标检测经典论文——Fast R-CNN论文翻译(中英文对照版):Fast R-CNN(Ross Girshick, Microsoft Research(微软研究院)) 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

目標(biāo)檢測(cè)經(jīng)典論文翻譯匯總:[翻譯匯總]

翻譯pdf文件下載:[下載地址]

此版為純中文版,中英文對(duì)照版請(qǐng)穩(wěn)步:[Fast?R-CNN純中文版]

Fast R-CNN

Ross Girshick

Microsoft Research(微軟研究院)

rbg@microsoft.com

Abstract

This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9× faster than R-CNN, is 213× faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3× faster, tests 10× faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https: //github.com/rbgirshick/fast-rcnn.

摘要

本文提出了一種快速的基于區(qū)域的卷積網(wǎng)絡(luò)方法(fast R-CNN)用于目標(biāo)檢測(cè)。Fast R-CNN建立在以前使用的深卷積網(wǎng)絡(luò)有效地分類(lèi)目標(biāo)的成果上。相比于之前的研究工作,Fast R-CNN采用了多項(xiàng)創(chuàng)新提高了訓(xùn)練和測(cè)試速度,同時(shí)也提高了檢測(cè)準(zhǔn)確度。Fast R-CNN訓(xùn)練非常深的VGG16網(wǎng)絡(luò)比R-CNN快9倍,測(cè)試時(shí)快213倍,并在PASCAL VOC上得到了更高的準(zhǔn)確度。與SPPnet相比,Fast R-CNN訓(xùn)練VGG16網(wǎng)絡(luò)比他快3倍,測(cè)試速度快10倍,并且更準(zhǔn)確。Fast R-CNN的Python和C ++(使用Caffe)實(shí)現(xiàn)以MIT開(kāi)源許可證發(fā)布在:https://github.com/rbgirshick/fast-rcnn。

1. Introduction

Recently, deep ConvNets [14, 16] have significantly improved image classification [14] and object detection [9, 19] accuracy. Compared to image classification, object detection is a more challenging task that requires more complex methods to solve. Due to this complexity, current approaches (e.g., [9, 11, 19, 25]) train models in multi-stage pipelines that are slow and inelegant.

1. 引言

最近,深度卷積網(wǎng)絡(luò)[14, 16]已經(jīng)顯著提高了圖像分類(lèi)[14]和目標(biāo)檢測(cè)[9, 19]的準(zhǔn)確性。與圖像分類(lèi)相比,目標(biāo)檢測(cè)是一個(gè)更具挑戰(zhàn)性的任務(wù),需要更復(fù)雜的方法來(lái)解決。由于這種復(fù)雜性,當(dāng)前的方法(例如,[9, 11, 19, 25])采用多級(jí)pipelines的方式訓(xùn)練模型,既慢且精度不高。

Complexity arises because detection requires the accurate localization of objects, creating two primary challenges. First, numerous candidate object locations (often called “proposals”) must be processed. Second, these candidates provide only rough localization that must be refined to achieve precise localization. Solutions to these problems often compromise speed, accuracy, or simplicity.

復(fù)雜性的產(chǎn)生是因?yàn)闄z測(cè)需要目標(biāo)的精確定位,這就導(dǎo)致兩個(gè)主要的難點(diǎn)。首先,必須處理大量候選目標(biāo)位置(通常稱(chēng)為“proposals”)。 第二,這些候選框僅提供粗略定位,其必須被精細(xì)化以實(shí)現(xiàn)精確定位。 這些問(wèn)題的解決方案經(jīng)常會(huì)影響速度、準(zhǔn)確性或簡(jiǎn)潔性。

In this paper, we streamline the training process for state-of-the-art ConvNet-based object detectors [9, 11]. We propose a single-stage training algorithm that jointly learns to classify object proposals and refine their spatial locations.

在本文中,我們簡(jiǎn)化了最先進(jìn)的基于卷積網(wǎng)絡(luò)的目標(biāo)檢測(cè)器的訓(xùn)練過(guò)程[9, 11]。我們提出一個(gè)單階段訓(xùn)練算法,聯(lián)合學(xué)習(xí)候選框分類(lèi)和修正他們的空間位置。

The resulting method can train a very deep detection network (VGG16 [20]) 9× faster than R-CNN [9] and 3× faster than SPPnet [11]. At runtime, the detection network processes images in 0.3s (excluding object proposal time) while achieving top accuracy on PASCAL VOC 2012 [7] with a mAP of 66% (vs. 62% for R-CNN).

最終方法能夠訓(xùn)練非常深的檢測(cè)網(wǎng)絡(luò)(例如VGG16),其網(wǎng)絡(luò)比R-CNN快9倍,比SPPnet快3倍。在運(yùn)行時(shí),檢測(cè)網(wǎng)絡(luò)在PASCAL VOC 2012數(shù)據(jù)集上實(shí)現(xiàn)最高準(zhǔn)確度,其中mAP為66%(R-CNN為62%),每張圖像處理時(shí)間為0.3秒,不包括候選框的生成(注:所有的時(shí)間都是使用一個(gè)超頻875MHz的Nvidia K40 GPU測(cè)試的)。

1.1. RCNN and SPPnet

The Region-based Convolutional Network method (R-CNN) [9] achieves excellent object detection accuracy by using a deep ConvNet to classify object proposals. R-CNN, however, has notable drawbacks:

1. Training is a multi-stage pipeline. R-CNN first fine-tunes a ConvNet on object proposals using log loss. Then, it fits SVMs to ConvNet features. These SVMs act as object detectors, replacing the softmax classifier learnt by fine-tuning. In the third training stage, bounding-box regressors are learned.

2. Training is expensive in space and time. For SVM and bounding-box regressor training, features are extracted from each object proposal in each image and written to disk. With very deep networks, such as VGG16, this process takes 2.5 GPU-days for the 5k images of the VOC07 trainval set. These features require hundreds of gigabytes of storage.

3. Object detection is slow. At test-time, features are extracted from each object proposal in each test image. Detection with VGG16 takes 47s / image (on a GPU).

1.1. R-CNN與SPPnet

基于區(qū)域的卷積網(wǎng)絡(luò)方法(R-CNN)[9]通過(guò)使用深度卷積網(wǎng)絡(luò)來(lái)分類(lèi)目標(biāo)候選框,獲得了很高的目標(biāo)檢測(cè)精度。然而,R-CNN具有明顯的缺點(diǎn):

1. 訓(xùn)練過(guò)程是多級(jí)pipelineR-CNN首先使用目標(biāo)候選框?qū)矸e神經(jīng)網(wǎng)絡(luò)使用log損失進(jìn)行fine-tune。然后,它將卷積神經(jīng)網(wǎng)絡(luò)得到的特征送入SVM。這些SVM作為目標(biāo)檢測(cè)器,替代通過(guò)fine-tune學(xué)習(xí)的softmax分類(lèi)器。在第三個(gè)訓(xùn)練階段,學(xué)習(xí)bounding-box回歸器。

2. 訓(xùn)練在時(shí)間和空間上是的開(kāi)銷(xiāo)很大。對(duì)于SVM和bounding-box回歸訓(xùn)練,從每個(gè)圖像中的每個(gè)目標(biāo)候選框提取特征,并寫(xiě)入磁盤(pán)。對(duì)于VOC07 trainval上的5k個(gè)圖像,使用如VGG16非常深的網(wǎng)絡(luò)時(shí),這個(gè)過(guò)程在單個(gè)GPU上需要2.5天。這些特征需要數(shù)百GB的存儲(chǔ)空間。

3. 目標(biāo)檢測(cè)速度很慢。在測(cè)試時(shí),從每個(gè)測(cè)試圖像中的每個(gè)目標(biāo)候選框提取特征。用VGG16網(wǎng)絡(luò)檢測(cè)目標(biāo)時(shí),每個(gè)圖像需要47秒(在GPU上)。

R-CNN is slow because it performs a ConvNet forward pass for each object proposal, without sharing computation. Spatial pyramid pooling networks (SPPnets) [11] were proposed to speed up R-CNN by sharing computation. The SPPnet method computes a convolutional feature map for the entire input image and then classifies each object proposal using a feature vector extracted from the shared feature map. Features are extracted for a proposal by max-pooling the portion of the feature map inside the proposal into a fixed-size output (e.g., 6×6). Multiple output sizes are pooled and then concatenated as in spatial pyramid pooling [15]. SPPnet accelerates R-CNN by 10 to 100× at test time. Training time is also reduced by 3× due to faster proposal feature extraction.

R-CNN很慢是因?yàn)樗鼮槊總€(gè)目標(biāo)候選框進(jìn)行卷積神經(jīng)網(wǎng)絡(luò)前向傳遞,而沒(méi)有共享計(jì)算。SPPnet網(wǎng)絡(luò)[11]提出通過(guò)共享計(jì)算加速R-CNN。SPPnet計(jì)算整個(gè)輸入圖像的卷積特征圖,然后使用從共享特征圖提取的特征向量來(lái)對(duì)每個(gè)候選框進(jìn)行分類(lèi)。通過(guò)最大池化將候選框內(nèi)的特征圖轉(zhuǎn)化為固定大小的輸出(例如6×6)來(lái)提取針對(duì)候選框的特征。多輸出尺寸被池化,然后連接成空間金字塔池[15]。SPPnet在測(cè)試時(shí)將R-CNN加速10到100倍。由于更快的候選框特征提取,訓(xùn)練時(shí)間也減少了3倍。

SPPnet also has notable drawbacks. Like R-CNN, training is a multi-stage pipeline that involves extracting features, fine-tuning a network with log loss, training SVMs, and finally fitting bounding-box regressors. Features are also written to disk. But unlike R-CNN, the fine-tuning algorithm proposed in [11] cannot update the convolutional layers that precede the spatial pyramid pooling. Unsurprisingly, this limitation (fixed convolutional layers) limits the accuracy of very deep networks.

SPP網(wǎng)絡(luò)也有顯著的缺點(diǎn)。像R-CNN一樣,訓(xùn)練過(guò)程是一個(gè)多級(jí)pipeline,涉及提取特征、使用log損失對(duì)網(wǎng)絡(luò)進(jìn)行fine-tuning、訓(xùn)練SVM分類(lèi)器以及最后擬合檢測(cè)框回歸。特征也要寫(xiě)入磁盤(pán)。但與R-CNN不同,在[11]中提出的fine-tuning算法不能更新在空間金字塔池之前的卷積層。不出所料,這種局限性(固定的卷積層)限制了深層網(wǎng)絡(luò)的精度。

1.2. Contributions

We propose a new training algorithm that fixes the disadvantages of R-CNN and SPPnet, while improving on their speed and accuracy. We call this method Fast R-CNN because it’s comparatively fast to train and test. The Fast RCNN method has several advantages:

1. Higher detection quality (mAP) than R-CNN, SPPnet

2. Training is single-stage, using a multi-task loss

3. Training can update all network layers

4. No disk storage is required for feature caching

1.2. 貢獻(xiàn)

我們提出一種新的訓(xùn)練算法,修正了R-CNN和SPPnet的缺點(diǎn),同時(shí)提高了速度和準(zhǔn)確性。因?yàn)樗鼙容^快地進(jìn)行訓(xùn)練和測(cè)試,我們稱(chēng)之為Fast R-CNN。Fast RCNN方法有以下幾個(gè)優(yōu)點(diǎn):

1. 比R-CNN和SPPnet具有更高的目標(biāo)檢測(cè)精度(mAP)。

2. 訓(xùn)練是使用多任務(wù)損失的單階段訓(xùn)練。

3. 訓(xùn)練可以更新所有網(wǎng)絡(luò)層參數(shù)。

4. 不需要磁盤(pán)空間緩存特征。

Fast R-CNN is written in Python and C++ (Caffe [13]) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.

Fast R-CNN使用Python和C++(Caffe[13])編寫(xiě),以MIT開(kāi)源許可證發(fā)布在:https://github.com/rbgirshick/fast-rcnn。

2. Fast R-CNN architecture and training

Fig. 1 illustrates the Fast R-CNN architecture. A Fast R-CNN network takes as input an entire image and a set of object proposals. The network first processes the whole image with several convolutional (conv) and max pooling layers to produce a conv feature map. Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map. Each feature vector is fed into a sequence of fully connected (fc) layers that finally branch into two sibling output layers: one that produces softmax probability estimates over K object classes plus a catch-all “background” class and another layer that outputs four real-valued numbers for each of the K object classes. Each set of 4 values encodes refined bounding-box positions for one of the K classes.

Figure 1. Fast R-CNN architecture. An input image and multiple regions of interest (RoIs) are input into a fully convolutional network. Each RoI is pooled into a fixed-size feature map and then mapped to a feature vector by fully connected layers (FCs). The network has two output vectors per RoI: softmax probabilities and per-class bounding-box regression offsets. The architecture is trained end-to-end with a multi-task loss.

2. Fast R-CNN架構(gòu)與訓(xùn)練

Fast R-CNN的架構(gòu)如圖1所示。Fast R-CNN網(wǎng)絡(luò)將整個(gè)圖像和一組候選框作為輸入。網(wǎng)絡(luò)首先使用幾個(gè)卷積層(conv)和最大池化層來(lái)處理整個(gè)圖像,以產(chǎn)生卷積特征圖。然后,對(duì)于每個(gè)候選框,RoI池化層從特征圖中提取固定長(zhǎng)度的特征向量。每個(gè)特征向量被送入一系列全連接(fc)層中,其最終分支成兩個(gè)同級(jí)輸出層 :一個(gè)輸出K個(gè)類(lèi)別加上1個(gè)包含所有背景類(lèi)別的Softmax概率估計(jì),另一個(gè)層輸出K個(gè)類(lèi)別的每一個(gè)類(lèi)別輸出四個(gè)實(shí)數(shù)值。每組4個(gè)值表示K個(gè)類(lèi)別中一個(gè)類(lèi)別的修正后檢測(cè)框位置。

1. Fast R-CNN架構(gòu)。輸入圖像和多個(gè)感興趣區(qū)域(RoI)被輸入到全卷積網(wǎng)絡(luò)中。每個(gè)RoI被池化到固定大小的特征圖中,然后通過(guò)全連接層(FC)映射到特征向量。網(wǎng)絡(luò)對(duì)于每個(gè)RoI具有兩個(gè)輸出向量:Softmax概率和每類(lèi)bounding-box回歸偏移量。該架構(gòu)是使用多任務(wù)損失進(jìn)行端到端訓(xùn)練的。

2.1. The RoI pooling layer

The RoI pooling layer uses max pooling to convert the features inside any valid region of interest into a small feature map with a fixed spatial extent of H×W (e.g., 7 7), where H and W are layer hyper-parameters that are independent of any particular RoI. In this paper, an RoI is a rectangular window into a conv feature map. Each RoI is defined by a four-tuple (r, c, h, w) that specifies its top-left corner (r, c) and its height and width (h, w).

2.1. RoI池化層

RoI池化層使用最大池化將任何有效的RoI內(nèi)的特征轉(zhuǎn)換成具有H×W(例如,7×7)的固定空間范圍的小特征圖,其中H和W是層的超參數(shù),獨(dú)立于任何特定的RoI。在本文中,RoI是卷積特征圖中的一個(gè)矩形窗口。每個(gè)RoI由指定其左上角(r,c)及其高度和寬度(h,w)的四元組(r,c,h,w)定義。

RoI max pooling works by dividing the h×w RoI window into an H ×W grid of sub-windows of approximate size h/H×w/W and then max-pooling the values in each sub-window into the corresponding output grid cell. Pooling is applied independently to each feature map channel, as in standard max pooling. The RoI layer is simply the special-case of the spatial pyramid pooling layer used in SPPnets [11] in which there is only one pyramid level. We use the pooling sub-window calculation given in [11].

RoI最大池化通過(guò)將大小為h×w的RoI窗口分割成H×W個(gè)網(wǎng)格,子窗口大小約為h/H×w/W,然后對(duì)每個(gè)子窗口執(zhí)行最大池化,并將輸出合并到相應(yīng)的輸出網(wǎng)格單元中。同標(biāo)準(zhǔn)的最大池化一樣,池化操作獨(dú)立應(yīng)用于每個(gè)特征圖通道。RoI層只是SPPnets[11]中使用的空間金字塔池層的特例,其只有一個(gè)金字塔層。我們使用[11]中給出的池化子窗口計(jì)算方法。

2.2. Initializing from pre-trained networks

We experiment with three pre-trained ImageNet [4] networks, each with five max pooling layers and between five and thirteen conv layers (see Section 4.1 for network details). When a pre-trained network initializes a Fast R-CNN network, it undergoes three transformations.

2.2 從預(yù)訓(xùn)練網(wǎng)絡(luò)初始化

我們實(shí)驗(yàn)了三個(gè)預(yù)訓(xùn)練的ImageNet [4]網(wǎng)絡(luò),每個(gè)網(wǎng)絡(luò)有五個(gè)最大池化層和5至13個(gè)卷積層(網(wǎng)絡(luò)詳細(xì)信息見(jiàn)4.1節(jié))。當(dāng)預(yù)訓(xùn)練網(wǎng)絡(luò)初始化Fast R-CNN網(wǎng)絡(luò)時(shí),其經(jīng)歷三個(gè)變換。

First, the last max pooling layer is replaced by a RoI pooling layer that is configured by setting H and W to be compatible with the net’s first fully connected layer (e.g., H = W = 7 for VGG16).

首先,最后的最大池化層由RoI池層代替,其將H和W設(shè)置為與網(wǎng)絡(luò)的第一個(gè)全連接層兼容的配置(例如,對(duì)于VGG16,H=W=7)。

Second, the network’s last fully connected layer and softmax (which were trained for 1000-way ImageNet classification) are replaced with the two sibling layers described earlier (a fully connected layer and softmax over K+1 categories and category-specific bounding-box regressors).

其次,網(wǎng)絡(luò)的最后一個(gè)全連接層和Softmax(其被訓(xùn)練用于1000類(lèi)ImageNet分類(lèi))被替換為前面描述的兩個(gè)同級(jí)層(全連接層和K+1個(gè)類(lèi)別的Softmax以及特定類(lèi)別的bounding-box回歸)。

Third, the network is modified to take two data inputs: a list of images and a list of RoIs in those images.

最后,網(wǎng)絡(luò)被修改為采用兩個(gè)數(shù)據(jù)輸入:圖像的列表和這些圖像中的RoI的列表。

2.3. Finetuning for detection

Training all network weights with back-propagation is an important capability of Fast R-CNN. First, let’s elucidate why SPPnet is unable to update weights below the spatial pyramid pooling layer.

2.3 檢測(cè)任務(wù)fine-tune

用反向傳播訓(xùn)練所有網(wǎng)絡(luò)權(quán)重是Fast R-CNN的重要能力。首先,讓我們闡明為什么SPPnet無(wú)法更新低于空間金字塔池化層的權(quán)重。

The root cause is that back-propagation through the SPP layer is highly inefficient when each training sample (i.e. RoI) comes from a different image, which is exactly how R-CNN and SPPnet networks are trained. The inefficiency stems from the fact that each RoI may have a very large receptive field, often spanning the entire input image. Since the forward pass must process the entire receptive field, the training inputs are large (often the entire image).

根本原因是當(dāng)每個(gè)訓(xùn)練樣本(即RoI)來(lái)自不同的圖像時(shí),通過(guò)SPP層的反向傳播是非常低效的,這正是訓(xùn)練R-CNN和SPPnet網(wǎng)絡(luò)的方法。低效是因?yàn)槊總€(gè)RoI可能具有非常大的感受野,通常跨越整個(gè)輸入圖像。由于正向傳播必須處理整個(gè)感受野,訓(xùn)練輸入很大(通常是整個(gè)圖像)。

We propose a more efficient training method that takes advantage of feature sharing during training. In Fast R-CNN training, stochastic gradient descent (SGD) mini-batches are sampled hierarchically, first by sampling N images and then by sampling R/N RoIs from each image. Critically, RoIs from the same image share computation and memory in the forward and backward passes. Making N small decreases mini-batch computation. For example, when using N = 2 and R = 128, the proposed training scheme is roughly 64× faster than sampling one RoI from 128 different images (i.e., the R-CNN and SPPnet strategy).

我們提出了一種更有效的訓(xùn)練方法,利用訓(xùn)練期間的特征共享。在Fast R-CNN網(wǎng)絡(luò)訓(xùn)練中,隨機(jī)梯度下降(SGD)的小批量是被分層采樣的,首先采樣N個(gè)圖像,然后從每個(gè)圖像采樣R/N個(gè)RoI。關(guān)鍵的是,來(lái)自同一圖像的RoI在前向和后向傳播中共享計(jì)算和內(nèi)存。減小N,就減少了小批量的計(jì)算。例如,當(dāng)N=2和R=128時(shí),得到的訓(xùn)練方案比從128幅不同的圖采樣一個(gè)RoI(即R-CNN和SPPnet的策略)快64倍。

One concern over this strategy is it may cause slow training convergence because RoIs from the same image are correlated. This concern does not appear to be a practical issue and we achieve good results with N = 2 and R = 128 using fewer SGD iterations than R-CNN.

這個(gè)策略的一個(gè)令人擔(dān)心的問(wèn)題是它可能導(dǎo)致訓(xùn)練收斂變慢,因?yàn)閬?lái)自相同圖像的RoI是相關(guān)的。這個(gè)問(wèn)題似乎在實(shí)際情況下并不存在,當(dāng)N=2和R=128時(shí),我們使用比R-CNN更少的SGD迭代就獲得了良好的結(jié)果。

In addition to hierarchical sampling, Fast R-CNN uses a streamlined training process with one fine-tuning stage that jointly optimizes a softmax classifier and bounding-box regressors, rather than training a softmax classifier, SVMs, and regressors in three separate stages [9, 11]. The components of this procedure (the loss, mini-batch sampling strategy, back-propagation through RoI pooling layers, and SGD hyper-parameters) are described below.

除了分層采樣,Fast R-CNN使用了一個(gè)精細(xì)的訓(xùn)練過(guò)程,在fine-tune階段聯(lián)合優(yōu)化Softmax分類(lèi)器和bounding-box回歸,而不是分別在三個(gè)獨(dú)立的階段訓(xùn)練softmax分類(lèi)器、SVM和回歸器[9, 11]。下面將詳細(xì)描述該過(guò)程(損失、小批量采樣策略、通過(guò)RoI池化層的反向傳播和SGD超參數(shù))。

Multi-task loss. A Fast R-CNN network has two sibling output layers. The first outputs a discrete probability distribution (per RoI), p=(p0,…, pK), over K+1 categories. As usual, p is computed by a softmax over the K+1 outputs of a fully connected layer. The second sibling layer outputs bounding-box regression offsets, tk=(tkx, tky, tkw, tkh), for each of the K object classes, indexed by k. We use the parameterization for tk given in [9], in which tk specifies a scale-invariant translation and log-space height/width shift relative to an object proposal.

多任務(wù)損失。Fast R-CNN網(wǎng)絡(luò)具有兩個(gè)同級(jí)輸出層。第一個(gè)輸出在K+1個(gè)類(lèi)別上的離散概率分布(每個(gè)RoI),p=(p0,…,pK)。通常,基于全連接層的K+1個(gè)輸出通過(guò)Softmax來(lái)計(jì)算p。第二個(gè)輸出層輸出bounding-box回歸偏移,即tk=(tkx, tky, tkw, tkh),k表示K個(gè)類(lèi)別的索引。我們使用[9]中給出方法對(duì)tk進(jìn)行參數(shù)化,其中tk指定相對(duì)于候選框的尺度不變轉(zhuǎn)換和對(duì)數(shù)空間高度/寬度移位。

Each training RoI is labeled with a ground-truth class u and a ground-truth bounding-box regression target v. We use a multi-task loss L on each labeled RoI to jointly train for classification and bounding-box regression:

in which Lcls(p, u) = - log pu is log loss for true class u.

每個(gè)訓(xùn)練的RoI用類(lèi)真值u和bounding-box回歸目標(biāo)真值v打上標(biāo)簽。我們對(duì)每個(gè)標(biāo)記的RoI使用多任務(wù)損失L以聯(lián)合訓(xùn)練分類(lèi)和bounding-box回歸:

其中Lcls(p, u) = - log pu,是類(lèi)真值u的log損失。

The second task loss, Lloc, is defined over a tuple of true bounding-box regression targets for class u, v = (vx, vy, vw, vh), and a predicted tuple tu = (tux ; tuy ; tuw; tuh ), again for class u. The Iverson bracket indicator function [u≥1] evaluates to 1 when u≥1 and 0 otherwise. By convention the catch-all background class is labeled u = 0. For background RoIs there is no notion of a ground-truth bounding box and hence Lloc is ignored. For bounding-box regression, we use the loss

in which

is a robust L1 loss that is less sensitive to outliers than the L2 loss used in R-CNN and SPPnet. When the regression targets are unbounded, training with L2 loss can require careful tuning of learning rates in order to prevent exploding gradients. Eq. 3 eliminates this sensitivity.

對(duì)于類(lèi)真值u,第二個(gè)損失Lloc是定義在bounding-box回歸目標(biāo)真值元組u, v = (vx, vy, vw, vh)和預(yù)測(cè)元組tu=(tux,tuy,tuw,tuh)上的損失。Iverson括號(hào)指示函數(shù)[u≥1],當(dāng)u≥1的時(shí)候值為1,否則為0。按照慣例,任何背景類(lèi)標(biāo)記為u=0。對(duì)于背景RoI,沒(méi)有檢測(cè)框真值的概念,因此Lloc被忽略。對(duì)于檢測(cè)框回歸,我們使用損失:

其中:

是魯棒的L1損失,對(duì)于異常值比在R-CNN和SPPnet中使用的L2損失更不敏感。當(dāng)回歸目標(biāo)無(wú)界時(shí),具有L2損失的訓(xùn)練可能需要仔細(xì)調(diào)整學(xué)習(xí)速率,以防止爆炸梯度。公式(3)消除了這種敏感性。

The hyper-parameter λ in Eq. 1 controls the balance between the two task losses. We normalize the ground-truth regression targets vi to have zero mean and unit variance. All experiments use? = 1.

公式(1)中的超參數(shù)λ控制兩個(gè)任務(wù)損失之間的平衡。我們將回歸目標(biāo)真值vi歸一化為具有零均值和方差為1的分布。所有實(shí)驗(yàn)都使用λ=1。

We note that [6] uses a related loss to train a class-agnostic object proposal network. Different from our approach, [6] advocates for a two-network system that separates localization and classification. OverFeat [19], R-CNN [9], and SPPnet [11] also train classifiers and bounding-box localizers, however these methods use stage-wise training, which we show is suboptimal for Fast R-CNN (Section 5.1).

我們注意到[6]使用相關(guān)損失來(lái)訓(xùn)練一個(gè)類(lèi)別無(wú)關(guān)的目標(biāo)候選網(wǎng)絡(luò)。與我們的方法不同的是,[6]倡導(dǎo)一個(gè)將定位和分類(lèi)分離的雙網(wǎng)絡(luò)系統(tǒng)。OverFeat[19]、R-CNN[9]和SPPnet[11]也訓(xùn)練分類(lèi)器和檢測(cè)框定位器,但是這些方法使用逐級(jí)訓(xùn)練,這對(duì)于Fast R-CNN來(lái)說(shuō)不是最好的選擇。

Mini-batch sampling. During fine-tuning, each SGD mini-batch is constructed from N = 2 images, chosen uniformly at random (as is common practice, we actually iterate over permutations of the dataset). We use mini-batches of size R = 128, sampling 64 RoIs from each image. As in [9], we take 25% of the RoIs from object proposals that have intersection over union (IoU) overlap with a groundtruth bounding box of at least 0.5. These RoIs comprise the examples labeled with a foreground object class, i.e. u≥1. The remaining RoIs are sampled from object proposals that have a maximum IoU with ground truth in the interval [0.1, 0.5], following [11]. These are the background examples and are labeled with u = 0. The lower threshold of 0.1 appears to act as a heuristic for hard example mining [8]. During training, images are horizontally flipped with probability 0.5. No other data augmentation is used.

小批量采樣。在fine-tune期間,每個(gè)SGD的小批量由N=2個(gè)圖像構(gòu)成,均勻地隨機(jī)選擇(如通常的做法,我們實(shí)際上迭代數(shù)據(jù)集的排列)。 我們使用大小為R=128的小批量,從每個(gè)圖像采樣64個(gè)RoI。如在[9]中,我們從候選框中獲取25%的RoI,這些候選框與檢測(cè)框真值的交并比IoU至少為0.5。這些RoI只包括用前景對(duì)象類(lèi)標(biāo)記的樣本,即u≥1。根據(jù)[11],剩余的RoI從候選框中采樣,該候選框與檢測(cè)框真值的最大IoU在區(qū)間[0.1, 0.5]。這些是背景樣本,并用u=0標(biāo)記。0.1的閾值下限似乎充當(dāng)困難樣本重挖掘的啟發(fā)式算法[8]。在訓(xùn)練期間,圖像以概率0.5水平翻轉(zhuǎn)。不使用其他數(shù)據(jù)增強(qiáng)。

Back-propagation through RoI pooling layers. Backpropagation routes derivatives through the RoI pooling layer. For clarity, we assume only one image per mini-batch (N = 1), though the extension to N > 1 is straightforward because the forward pass treats all images independently.

通過(guò)RoI池化層的反向傳播。反向傳播通過(guò)RoI池化層。為了清楚起見(jiàn),我們假設(shè)每個(gè)小批量(N=1)只有一個(gè)圖像,擴(kuò)展到N>1是顯而易見(jiàn)的,因?yàn)榍跋騻鞑オ?dú)立地處理所有圖像。

Let xi∈R be the i-th activation input into the RoI pooling layer and let yrj be the layer’s j-th output from the r-th RoI. The RoI pooling layer computes yrj = xi*(r, j), in which i*(r, j) = argmax xi’ . R(r; j) is the index set of inputs in the sub-window over which the output unit yrj max pools. A single xi may be assigned to several different outputs yrj .

令RoI池化層的第i個(gè)激活輸入xi∈R,令yrj是第r個(gè)RoI層的第j個(gè)輸出。RoI池化層計(jì)算yrj = xi*(r, j),其中i*(r, j) = argmax xi’ . R(r; j)是輸出單元yrj最大池化的子窗口中的輸入的索引集合。一個(gè)xi可以被分配給幾個(gè)不同的輸出yrj。

The RoI pooling layer’s backwards function computes partial derivative of the loss function with respect to each input variable xi by following the argmax switches:

RoI池化層反向傳播函數(shù)通過(guò)遵循argmax switches來(lái)計(jì)算關(guān)于每個(gè)輸入變量xi的損失函數(shù)的偏導(dǎo)數(shù):

In words, for each mini-batch RoI r and for each pooling output unit yrj, the partial derivative ?L/?yrj is accumulated if i is the argmax selected for yrj by max pooling. In back-propagation, the partial derivatives ?L/?yrj are already computed by the backwards function of the layer on top of the RoI pooling layer.

換句話說(shuō),對(duì)于每個(gè)小批量RoI r和對(duì)于每個(gè)池化輸出單元yrj,如果i是yrj通過(guò)最大池化選擇的argmax,則將這個(gè)偏導(dǎo)數(shù)?L/?yrj積累下來(lái)。在反向傳播中,偏導(dǎo)數(shù)?L/?yrj已經(jīng)由RoI池化層頂部的層的反向傳播函數(shù)計(jì)算。

SGD hyper-parameters. The fully connected layers used for softmax classification and bounding-box regression are initialized from zero-mean Gaussian distributions with standard deviations 0.01 and 0.001, respectively. Biases are initialized to 0. All layers use a per-layer learning rate of 1 for weights and 2 for biases and a global learning rate of 0.001. When training on VOC07 or VOC12 trainval we run SGD for 30k mini-batch iterations, and then lower the learning rate to 0.0001 and train for another 10k iterations. When we train on larger datasets, we run SGD for more iterations, as described later. A momentum of 0:9 and parameter decay of 0.0005 (on weights and biases) are used.

SGD超參數(shù)。用于Softmax分類(lèi)和檢測(cè)框回歸的全連接層的權(quán)重分別使用具有方差0.01和0.001的零均值高斯分布初始化。偏置初始化為0。所有層的權(quán)重學(xué)習(xí)率為1倍的全局學(xué)習(xí)率,偏置為2倍的全局學(xué)習(xí)率,全局學(xué)習(xí)率為0.001。當(dāng)對(duì)VOC07或VOC12 trainval訓(xùn)練時(shí),我們進(jìn)行30k次小批量SGD迭代,然后將學(xué)習(xí)率降低到0.0001,再訓(xùn)練10k次迭代。當(dāng)我們訓(xùn)練更大的數(shù)據(jù)集,我們運(yùn)行SGD更多的迭代,如下文所述。使用0.9的動(dòng)量和0.0005的參數(shù)衰減(權(quán)重和偏置)。

2.4. Scale invariance

We explore two ways of achieving scale invariant object detection: (1) via “brute force” learning and (2) by using image pyramids. These strategies follow the two approaches in [11]. In the brute-force approach, each image is processed at a pre-defined pixel size during both training and testing. The network must directly learn scale-invariant object detection from the training data.

2.4. 尺度不變性

我們探索兩種實(shí)現(xiàn)尺度不變目標(biāo)檢測(cè)的方法:(1)通過(guò)“brute force”學(xué)習(xí)和(2)通過(guò)使用圖像金字塔。這些策略遵循[11]中的兩種方法。在“brute force”方法中,在訓(xùn)練和測(cè)試期間以預(yù)定義的像素大小處理每個(gè)圖像。網(wǎng)絡(luò)必須直接從訓(xùn)練數(shù)據(jù)學(xué)習(xí)尺度不變性目標(biāo)檢測(cè)。

The multi-scale approach, in contrast, provides approximate scale-invariance to the network through an image pyramid. At test-time, the image pyramid is used to approximately scale-normalize each object proposal. During multi-scale training, we randomly sample a pyramid scale each time an image is sampled, following [11], as a form of data augmentation. We experiment with multi-scale training for smaller networks only, due to GPU memory limits.

相反,多尺度方法通過(guò)圖像金字塔向網(wǎng)絡(luò)提供近似尺度不變性。 在測(cè)試時(shí),圖像金字塔用于大致縮放-規(guī)范化每個(gè)候選框。按照[11]中的方法,作為數(shù)據(jù)增強(qiáng)的一種形式,在多尺度訓(xùn)練期間,我們?cè)诿看螆D像采樣時(shí)隨機(jī)采樣金字塔尺度。由于GPU內(nèi)存限制,我們只對(duì)較小的網(wǎng)絡(luò)進(jìn)行多尺度訓(xùn)練。

3. Fast R-CNN detection

Once a Fast R-CNN network is fine-tuned, detection amounts to little more than running a forward pass (assuming object proposals are pre-computed). The network takes as input an image (or an image pyramid, encoded as a list of images) and a list of R object proposals to score. At test-time, R is typically around 2000, although we will consider cases in which it is larger (≈45k). When using an image pyramid, each RoI is assigned to the scale such that the scaled RoI is closest to 2242 pixels in area [11].

3. Fast R-CNN檢測(cè)

一旦Fast R-CNN網(wǎng)絡(luò)被fine-tune完畢,檢測(cè)相當(dāng)于運(yùn)行前向傳播(假設(shè)候選框是預(yù)先計(jì)算的)。網(wǎng)絡(luò)將輸入圖像(或圖像金字塔,編碼為圖像列表)和待計(jì)算得分的R個(gè)候選框的列表作為輸入。在測(cè)試的時(shí)候,R通常在2000左右,盡管我們需要考慮更大(約45k)的情況。當(dāng)使用圖像金字塔時(shí),每個(gè)RoI被縮放,使其最接近[11]中的2242個(gè)像素。

For each test RoI r, the forward pass outputs a class posterior probability distribution p and a set of predicted bounding-box offsets relative to r (each of the K classes gets its own refined bounding-box prediction). We assign a detection confidence to r for each object class k using the estimated probability Pr(class = k|r) ? pk. We then perform non-maximum suppression independently for each class using the algorithm and settings from R-CNN [9].

對(duì)于每個(gè)測(cè)試的RoI r,正向傳播輸出類(lèi)別后驗(yàn)概率分布p和相對(duì)于r的預(yù)測(cè)檢測(cè)框偏移集合(K個(gè)類(lèi)別中的每個(gè)類(lèi)別獲得其自己的修正的檢測(cè)框預(yù)測(cè)結(jié)果)。我們使用估計(jì)的概率Pr(class=k|r)?pk為每個(gè)對(duì)象類(lèi)別k分配r的檢測(cè)置信度。然后,我們使用R-CNN [9]算法的設(shè)置和算法對(duì)每個(gè)類(lèi)別獨(dú)立執(zhí)行非極大值抑制。

3.1. Truncated SVD for faster detection

For whole-image classification, the time spent computing the fully connected layers is small compared to the conv layers. On the contrary, for detection the number of RoIs to process is large and nearly half of the forward pass time is spent computing the fully connected layers (see Fig. 2). Large fully connected layers are easily accelerated by compressing them with truncated SVD [5, 23].

Figure 2. Timing for VGG16 before and after truncated SVD. Before SVD, fully connected layers fc6 and fc7 take 45% of the time.

3.1. 使用截?cái)嗟腟VD實(shí)現(xiàn)更快的檢測(cè)

對(duì)于整個(gè)圖像進(jìn)行分類(lèi)任務(wù)時(shí),與卷積層相比,計(jì)算全連接層花費(fèi)的時(shí)間較小。相反,在圖像檢測(cè)任務(wù)中,要處理大量的RoI,并且接近一半的前向傳播時(shí)間用于計(jì)算全連接層(參見(jiàn)圖2)。較大的全連接層可以輕松地通過(guò)用截短的SVD[5, 23]壓縮來(lái)提升速度。

2. 截?cái)?/strong>SVD之前和之后VGG16的時(shí)間分布。在SVD之前,全連接層fc6和fc7消耗45%的時(shí)間。

In this technique, a layer parameterized by the u × v weight matrix W is approximately factorized as

using SVD. In this factorization, U is a u × t matrix comprising the first t left-singular vectors of W, Et is a t diagonal matrix containing the top t singular values of W, and V is v × t matrix comprising the first t right-singular vectors of W. Truncated SVD reduces the parameter count from uv to t(u + v), which can be significant if t is much smaller than min(u, v). To compress a network, the single fully connected layer corresponding to W is replaced by two fully connected layers, without a non-linearity between them. The first of these layers uses the weight matrix EtVT (and no biases) and the second uses U (with the original biases associated with W). This simple compression method gives good speedups when the number of RoIs is large.

在這種技術(shù)中,層的u × v權(quán)重矩陣W通過(guò)SVD被近似分解為:

在這種分解中,U是一個(gè)u×t的矩陣,包括W的前t個(gè)左奇異向量,Et是t×t對(duì)角矩陣,其包含W的前t個(gè)奇異值,并且V是v×t矩陣,包括W的前t個(gè)右奇異向量。截?cái)郤VD將參數(shù)計(jì)數(shù)從uv減少到t(u+v)個(gè),如果t遠(yuǎn)小于min(u,v),則是非常有意義的。為了壓縮網(wǎng)絡(luò),對(duì)應(yīng)于W的單個(gè)全連接層由兩個(gè)全連接層替代,在它們之間沒(méi)有非線性。這些層中的第一層使用權(quán)重矩陣EtVT(沒(méi)有偏置),并且第二層使用U(其中原始偏差與W相關(guān)聯(lián))。當(dāng)RoI的數(shù)量較大時(shí),這種簡(jiǎn)單的壓縮方法能實(shí)現(xiàn)很好的加速。

4. Main results

Three main results support this paper’s contributions:

1. State-of-the-art mAP on VOC07, 2010, and 2012

2. Fast training and testing compared to R-CNN, SPPnet

3. Fine-tuning conv layers in VGG16 improves mAP

4. 主要結(jié)果

三個(gè)主要結(jié)果支持本文的貢獻(xiàn):

1. VOC07,2010和2012的最高的mAP。

2. 相比R-CNN、SPPnet,訓(xùn)練和測(cè)試的速度更快。

3. 對(duì)VGG16卷積層Fine-tuning后提升了mAP。

4.1. Experimental setup

Our experiments use three pre-trained ImageNet models that are available online. The first is the CaffeNet (essentially AlexNet [14]) from R-CNN [9]. We alternatively refer to this CaffeNet as model S, for “small.” The second network is VGG_CNN_M_1024 from [3], which has the same depth as S, but is wider. We call this network model M, for “medium.” The final network is the very deep VGG16 model from [20]. Since this model is the largest, we call it model L. In this section, all experiments use single-scale training and testing (s = 600; see Section 5.2 for details).

4.1. 實(shí)驗(yàn)設(shè)置

我們的實(shí)驗(yàn)使用了三個(gè)經(jīng)過(guò)預(yù)訓(xùn)練的ImageNet網(wǎng)絡(luò)模型,這些模型可以在線獲得(腳注:https://github.com/BVLC/caffe/wiki/Model-Zoo)。第一個(gè)是來(lái)自R-CNN [9]的CaffeNet(實(shí)質(zhì)上是AlexNet[14])。 我們將這個(gè)CaffeNet稱(chēng)為模型S,即小模型。第二網(wǎng)絡(luò)是來(lái)自[3]的VGG_CNN_M_1024,其具有與S相同的深度,但是更寬。我們把這個(gè)網(wǎng)絡(luò)模型稱(chēng)為M,即中等模型。最后一個(gè)網(wǎng)絡(luò)是來(lái)自[20]的非常深的VGG16模型。由于這個(gè)模型是最大的,我們稱(chēng)之為L(zhǎng)。在本節(jié)中,所有實(shí)驗(yàn)都使用單尺度訓(xùn)練和測(cè)試(s=600,詳見(jiàn)5.2節(jié))。

4.2. VOC 2010 and 2012 results

On these datasets, we compare Fast R-CNN (FRCN, for short) against the top methods on the comp4 (outside data) track from the public leaderboard (Table 2, Table 3). For the NUS_NIN_c2000 and BabyLearning methods, there are no associated publications at this time and we could not find exact information on the ConvNet architectures used; they are variants of the Network-in-Network design [17]. All other methods are initialized from the same pre-trained VGG16 network.

Table 2. VOC 2010 test detection average precision (%). BabyLearning uses a network based on [17]. All other methods use VGG16. Training set key: 12: VOC12 trainval, Prop.: proprietary dataset, 12+seg: 12 with segmentation annotations, 07++12: union of VOC07 trainval, VOC07 test, and VOC12 trainval.

Table 3. VOC 2012 test detection average precision (%). BabyLearning and NUS NIN c2000 use networks based on [17]. All other methods use VGG16. Training set key: see Table 2, Unk.: unknown.

4.2. VOC 2010和2012數(shù)據(jù)集上的結(jié)果

(如上面表2,表3所示)在這些數(shù)據(jù)集上,我們比較Fast R-CNN(簡(jiǎn)稱(chēng)FRCN)和公共排行榜中comp4(外部數(shù)據(jù))上的主流方法(腳注:http://host.robots.ox.ac.uk:8080/leaderboard)。對(duì)于NUS_NIN_c2000和BabyLearning方法,目前沒(méi)有相關(guān)的出版物,我們無(wú)法找到有關(guān)所使用的ConvNet體系結(jié)構(gòu)的確切信息;它們是Network-in-Network的變體[17]。所有其他方法都通過(guò)相同的預(yù)訓(xùn)練VGG16網(wǎng)絡(luò)進(jìn)行了初始化。

2. VOC 2010測(cè)試檢測(cè)平均精度(%)。BabyLearning使用基于[17]的網(wǎng)絡(luò)。所有其他方法使用VGG16。訓(xùn)練集關(guān)鍵字:12代表VOC12 trainval,Prop.代表專(zhuān)有數(shù)據(jù)集,12+seg代表具有分割注釋的VOC2012,07++12代表VOC2007 trainval、VOC2007 test和VOC2012 trainval的組合。

表3. VOC 2012測(cè)試檢測(cè)平均精度(%)。BabyLearning和NUS_NIN_c2000使用基于[17]的網(wǎng)絡(luò)。所有其他方法使用VGG16。訓(xùn)練設(shè)置:見(jiàn)表2,Unk.代表未知。

Fast R-CNN achieves the top result on VOC12 with a mAP of 65.7% (and 68.4% with extra data). It is also two orders of magnitude faster than the other methods, which are all based on the “slow” R-CNN pipeline. On VOC10, SegDeepM [25] achieves a higher mAP than Fast R-CNN (67.2% vs. 66.1%). SegDeepM is trained on VOC12 trainval plus segmentation annotations; it is designed to boost R-CNN accuracy by using a Markov random field to reason over R-CNN detections and segmentations from the O2P [1] semantic-segmentation method. Fast R-CNN can be swapped into SegDeepM in place of R-CNN, which may lead to better results. When using the enlarged 07++12 training set (see Table 2 caption), Fast R-CNN’s mAP increases to 68.8%, surpassing SegDeepM.

Fast R-CNN在VOC12上獲得最高結(jié)果,mAP為65.7%(加上額外數(shù)據(jù)為68.4%)。它也比其他方法快兩個(gè)數(shù)量級(jí),這些方法都基于比較“慢”的R-CNN網(wǎng)絡(luò)。在VOC10上,SegDeepM [25]獲得了比Fast R-CNN更高的mAP(67.2%對(duì)比66.1%)。SegDeepM使用VOC12 trainval訓(xùn)練集及分割標(biāo)注進(jìn)行了訓(xùn)練,它被設(shè)計(jì)為通過(guò)使用馬爾可夫隨機(jī)場(chǎng)推理R-CNN檢測(cè)和來(lái)自O(shè)2P [1]的語(yǔ)義分割方法的分割來(lái)提高R-CNN精度。Fast R-CNN可以替換SegDeepM中使用的R-CNN,這可以獲得更好的結(jié)果。當(dāng)使用擴(kuò)大的07++12訓(xùn)練集(見(jiàn)表2標(biāo)題)時(shí),Fast R-CNN的mAP增加到68.8%,超過(guò)SegDeepM。

4.3. VOC 2007 results

On VOC07, we compare Fast R-CNN to R-CNN and SPPnet. All methods start from the same pre-trained VGG16 network and use bounding-box regression. The VGG16 SPPnet results were computed by the authors of [11]. SPPnet uses five scales during both training and testing. The improvement of Fast R-CNN over SPPnet illustrates that even though Fast R-CNN uses single-scale training and testing, fine-tuning the conv layers provides a large improvement in mAP (from 63.1% to 66.9%). R-CNN achieves a mAP of 66.0%. As a minor point, SPPnet was trained without examples marked as “difficult” in PASCAL. Removing these examples improves Fast R-CNN mAP to 68.1%. All other experiments use “difficult” examples.

4.3. VOC 2007數(shù)據(jù)集上的結(jié)果

在VOC07數(shù)據(jù)集上,我們比較Fast R-CNN與R-CNN和SPPnet的mAP。所有方法從相同的預(yù)訓(xùn)練VGG16網(wǎng)絡(luò)開(kāi)始,并使用bounding-box回歸。VGG16 SPPnet結(jié)果由論文[11]的作者提供。SPPnet在訓(xùn)練和測(cè)試期間使用五個(gè)尺度。Fast R-CNN對(duì)SPPnet的改進(jìn)說(shuō)明,即使Fast R-CNN使用單個(gè)尺度訓(xùn)練和測(cè)試,卷積層fine-tuning在mAP中貢獻(xiàn)了很大的改進(jìn)(從63.1%到66.9%)。R-CNN的mAP為66.0%。其次,SPPnet是在PASCAL中沒(méi)有被標(biāo)記為“困難”的樣本上進(jìn)行了訓(xùn)練。除去這些樣本,Fast R-CNN的mAP達(dá)到68.1%。所有其他實(shí)驗(yàn)都使用被標(biāo)記為“困難”的樣本。

4.4. Training and testing time

Fast training and testing times are our second main result. Table 4 compares training time (hours), testing rate (seconds per image), and mAP on VOC07 between Fast RCNN, R-CNN, and SPPnet. For VGG16, Fast R-CNN processes images 146× faster than R-CNN without truncated SVD and 213× faster with it. Training time is reduced by 9×, from 84 hours to 9.5. Compared to SPPnet, Fast RCNN trains VGG16 2.7× faster (in 9.5 vs. 25.5 hours) and tests 7× faster without truncated SVD or 10× faster with it. Fast R-CNN also eliminates hundreds of gigabytes of disk storage, because it does not cache features.

Table 4. Runtime comparison between the same models in Fast RCNN,

R-CNN, and SPPnet. Fast R-CNN uses single-scale mode. SPPnet uses the five scales specified in [11]. ?Timing provided by the authors of [11]. Times were measured on an Nvidia K40 GPU.

4.4. 訓(xùn)練和測(cè)試時(shí)間

快速的訓(xùn)練和測(cè)試是我們的第二個(gè)主要成果。表4比較了Fast RCNN,R-CNN和SPPnet之間的訓(xùn)練時(shí)間(單位小時(shí)),測(cè)試速率(每秒圖像數(shù))和VOC07上的mAP。對(duì)于VGG16,沒(méi)有截?cái)郤VD的Fast R-CNN處理圖像比R-CNN快146倍,有截?cái)郤VD的R-CNN快213倍。訓(xùn)練時(shí)間減少9倍,從84小時(shí)減少到9.5小時(shí)。與SPPnet相比,沒(méi)有截?cái)郤VD的Fast RCNN訓(xùn)練VGG16網(wǎng)絡(luò)比SPPnet快2.7倍(9.5小時(shí)相比于25.5小時(shí)),測(cè)試時(shí)間快7倍,有截?cái)郤VD的Fast RCNN比的SPPnet快10倍。Fast R-CNN還不需要數(shù)百GB的磁盤(pán)存儲(chǔ),因?yàn)樗痪彺嫣卣鳌?/p>

4. Fast RCNNR-CNNSPPnet中相同模型之間的運(yùn)行時(shí)間比較。Fast R-CNN使用單尺度模式。SPPnet使用[11]中指定的五個(gè)尺度。?的時(shí)間由[11]的作者提供。在Nvidia K40 GPU上的進(jìn)行了時(shí)間測(cè)量。

Truncated SVD. Truncated SVD can reduce detection time by more than 30% with only a small (0.3 percentage point) drop in mAP and without needing to perform additional fine-tuning after model compression. Fig. 2 illustrates how using the top 1024 singular values from the 25088×4096 matrix in VGG16’s fc6 layer and the top 256 singular values from the 4096×4096 fc7 layer reduces runtime with little loss in mAP. Further speed-ups are possible with smaller drops in mAP if one fine-tunes again after compression.

截?cái)嗟?/strong>SVD。截?cái)嗟腟VD可以將檢測(cè)時(shí)間減少30%以上,同時(shí)能保持mAP只有很小(0.3個(gè)百分點(diǎn))的下降,并且無(wú)需在模型壓縮后執(zhí)行額外的fine-tune。圖2顯示了如何使用來(lái)自VGG16的fc6層中的25088×4096矩陣的頂部1024個(gè)奇異值和來(lái)自fc7層的4096×4096矩陣的頂部256個(gè)奇異值減少運(yùn)行時(shí)間,而mAP幾乎沒(méi)有損失。如果在壓縮之后再次fine-tune,則可以在mAP更小下降的情況下進(jìn)一步提升速度。

4.5. Which layers to fine-tune?

For the less deep networks considered in the SPPnet paper [11], fine-tuning only the fully connected layers appeared to be sufficient for good accuracy. We hypothesized that this result would not hold for very deep networks. To validate that fine-tuning the conv layers is important for VGG16, we use Fast R-CNN to fine-tune, but freeze the thirteen conv layers so that only the fully connected layers learn. This ablation emulates single-scale SPPnet training and decreases mAP from 66.9% to 61.4% (Table 5). This experiment verifies our hypothesis: training through the RoI pooling layer is important for very deep nets.

Table 5. Effect of restricting which layers are fine-tuned for VGG16. Fine-tuning ≥ fc6 emulates the SPPnet training algorithm [11], but using a single scale. SPPnet L results were obtained using five scales, at a significant (7×) speed cost.

4.5. fine-tune哪些層?

對(duì)于SPPnet論文[11]中提到的不太深的網(wǎng)絡(luò),僅fine-tuning全連接層似乎足以獲得良好的準(zhǔn)確度。我們假設(shè)這個(gè)結(jié)果不適用于非常深的網(wǎng)絡(luò)。為了驗(yàn)證fine-tune卷積層對(duì)于VGG16的重要性,我們使用Fast R-CNN進(jìn)行fine-tune,但凍結(jié)十三個(gè)卷積層,以便只有全連接層學(xué)習(xí)。這種消融模擬了單尺度SPPnet訓(xùn)練,將mAP從66.9%降低到61.4%(如表5所示)。這個(gè)實(shí)驗(yàn)驗(yàn)證了我們的假設(shè):通過(guò)RoI池化層的訓(xùn)練對(duì)于非常深的網(wǎng)是重要的。

表5. 對(duì)VGG16 fine-tune的層進(jìn)行限制產(chǎn)生的影響。fine-tune fc6以上的層模擬了單尺度SPPnet訓(xùn)練算法[11]。SPPnet L是使用五個(gè)尺度,以顯著(7倍)的速度成本獲得的結(jié)果。

Does this mean that all conv layers should be fine-tuned? In short, no. In the smaller networks (S and M) we find that conv1 is generic and task independent (a well-known fact [14]). Allowing conv1 to learn, or not, has no meaningful effect on mAP. For VGG16, we found it only necessary to update layers from conv3_1 and up (9 of the 13 conv layers). This observation is pragmatic: (1) updating from conv2_1 slows training by 1.3× (12.5 vs. 9.5 hours) compared to learning from conv3_1; and (2) updating from conv1_1 over-runs GPU memory. The difference in mAP when learning from conv2_1 up was only +0.3 points (Table 5, last column). All Fast R-CNN results in this paper using VGG16 fine-tune layers conv3_1 and up; all experiments with models S and M fine-tune layers conv2 and up.

這是否意味著所有卷積層應(yīng)該進(jìn)行fine-tune?簡(jiǎn)而言之,不是的。在較小的網(wǎng)絡(luò)(S和M)中,我們發(fā)現(xiàn)conv1(譯者注:第一個(gè)卷積層)是通用的、不依賴(lài)于特定任務(wù)的(一個(gè)眾所周知的事實(shí)[14])。允許conv1學(xué)習(xí)或不學(xué)習(xí),對(duì)mAP沒(méi)有很關(guān)鍵的影響。對(duì)于VGG16,我們發(fā)現(xiàn)只需要更新conv3_1及以上(13個(gè)卷積層中的9個(gè))的層。這個(gè)觀察結(jié)果是實(shí)用的:(1)與從conv3_1更新相比,從conv2_1更新使訓(xùn)練變慢1.3倍(12.5小時(shí)對(duì)比9.5小時(shí)),(2)從conv1_1更新時(shí)GPU內(nèi)存不夠用。從conv2_1學(xué)習(xí)時(shí)mAP僅增加0.3個(gè)點(diǎn)(如表5最后一列所示)。本文中所有Fast R-CNN的結(jié)果都fine-tune VGG16 conv3_1及以上的層,所有用模型S和M的實(shí)驗(yàn)fine-tune conv2及以上的層。

5. Design evaluation

We conducted experiments to understand how Fast R-CNN compares to R-CNN and SPPnet, as well as to evaluate design decisions. Following best practices, we performed these experiments on the PASCAL VOC07 dataset.

5. 設(shè)計(jì)評(píng)估

我們通過(guò)實(shí)驗(yàn)來(lái)了解Fast RCNN與R-CNN和SPPnet的比較,以及評(píng)估設(shè)計(jì)決策。按照最佳實(shí)踐,我們?cè)赑ASCAL VOC07數(shù)據(jù)集上進(jìn)行了這些實(shí)驗(yàn)。

5.1. Does multi-task training help?

Multi-task training is convenient because it avoids managing a pipeline of sequentially-trained tasks. But it also has the potential to improve results because the tasks influence each other through a shared representation (the ConvNet) [2]. Does multi-task training improve object detection accuracy in Fast R-CNN?

5.1. 多任務(wù)訓(xùn)練有用嗎?

多任務(wù)訓(xùn)練是方便的,因?yàn)樗苊夤芾眄樞蛴?xùn)練任務(wù)的pipeline。但它也有可能改善結(jié)果,因?yàn)槿蝿?wù)通過(guò)共享的表示(ConvNet)[2]相互影響。多任務(wù)訓(xùn)練能提高Fast R-CNN中的目標(biāo)檢測(cè)精度嗎?

To test this question, we train baseline networks that use only the classification loss, Lcls, in Eq. 1 (i.e., setting λ= 0). These baselines are printed for models S, M, and L in the first column of each group in Table 6. Note that these models do not have bounding-box regressors. Next (second column per group), we take networks that were trained with the multi-task loss (Eq. 1, λ=1), but we disable bounding-box regression at test time. This isolates the networks’ classification accuracy and allows an apples-to-apples comparison with the baseline networks.

Table 6. Multi-task training (forth column per group) improves mAP over piecewise training (third column per group).

為了測(cè)試這個(gè)問(wèn)題,我們訓(xùn)練僅使用公式(1)中的分類(lèi)損失Lcls(即設(shè)置λ=0)的基準(zhǔn)網(wǎng)絡(luò)。這些baselines是表6中每組的第一列。請(qǐng)注意,這些模型沒(méi)有bounding-box回歸器。接下來(lái)(每組的第二列),是我們采用多任務(wù)損失(公式(1),λ=1)訓(xùn)練的網(wǎng)絡(luò),但是我們?cè)跍y(cè)試時(shí)禁用bounding-box回歸。這隔離了網(wǎng)絡(luò)的分類(lèi)準(zhǔn)確性,并允許與基準(zhǔn)網(wǎng)絡(luò)的相似類(lèi)別之類(lèi)的比較(譯者注:apples-to-apples comparision意思是比較兩個(gè)相同類(lèi)別的事或物)。

表6. 多任務(wù)訓(xùn)練(每組第四列)改進(jìn)了分段訓(xùn)練(每組第三列)的mAP。

Across all three networks we observe that multi-task training improves pure classification accuracy relative to training for classification alone. The improvement ranges from +0.8 to +1.1 mAP points, showing a consistent positive effect from multi-task learning.

在所有三個(gè)網(wǎng)絡(luò)中,我們觀察到多任務(wù)訓(xùn)練相對(duì)于單獨(dú)的分類(lèi)訓(xùn)練提高了純分類(lèi)準(zhǔn)確度。改進(jìn)范圍從+0.8到+1.1個(gè)mAP點(diǎn),顯示了多任務(wù)學(xué)習(xí)的一致的積極效果。

Finally, we take the baseline models (trained with only the classification loss), tack on the bounding-box regression layer, and train them with Lloc while keeping all other network parameters frozen. The third column in each group shows the results of this stage-wise training scheme: mAP improves over column one, but stage-wise training underperforms multi-task training (forth column per group).

最后,我們采用baseline模型(僅使用分類(lèi)損失進(jìn)行訓(xùn)練),加上bounding-box回歸層,并使用Lloc訓(xùn)練它們,同時(shí)保持所有其他網(wǎng)絡(luò)參數(shù)凍結(jié)。每組中的第三列顯示了這種逐級(jí)訓(xùn)練方案的結(jié)果:mAP相對(duì)于第一列有改進(jìn),但逐級(jí)訓(xùn)練表現(xiàn)不如多任務(wù)訓(xùn)練(每組第四列)。

5.2. Scale invariance: to brute force or finesse?

We compare two strategies for achieving scale-invariant object detection: brute-force learning (single scale) and image pyramids (multi-scale). In either case, we define the scale s of an image to be the length of its shortest side.

5.2. 尺度不變性:暴力或精細(xì)?

我們比較兩個(gè)策略實(shí)現(xiàn)尺度不變物體檢測(cè):暴力學(xué)習(xí)(單尺度)和圖像金字塔(多尺度)。在任一情況下,我們將尺度s定義為圖像短邊的長(zhǎng)度。

All single-scale experiments use s = 600 pixels; s may be less than 600 for some images as we cap the longest image side at 1000 pixels and maintain the image’s aspect ratio. These values were selected so that VGG16 fits in GPU memory during fine-tuning. The smaller models are not memory bound and can benefit from larger values of s; however, optimizing s for each model is not our main concern. We note that PASCAL images are 384 × 473 pixels on average and thus the single-scale setting typically upsamples images by a factor of 1.6. The average effective stride at the RoI pooling layer is thus ≈ 10 pixels.

所有單尺度實(shí)驗(yàn)使用s=600像素,對(duì)于一些圖像,s可以小于600,因?yàn)槲覀儽3謾M縱比縮放圖像,并限制其最長(zhǎng)邊為1000像素。選擇這些值使得VGG16在fine-tune期間不至于GPU內(nèi)存不足。較小的模型占用顯存更少,所以可受益于較大的s值。然而,每個(gè)模型的優(yōu)化不是我們的主要的關(guān)注點(diǎn)。我們注意到PASCAL圖像平均大小是384×473像素的,因此單尺度設(shè)置通常以1.6的倍數(shù)對(duì)圖像進(jìn)行上采樣。因此,RoI池化層的平均有效步長(zhǎng)約為10像素。

In the multi-scale setting, we use the same five scales specified in [11] (s ∈ {480, 576, 688, 864, 1200}) to facilitate comparison with SPPnet. However, we cap the longest side at 2000 pixels to avoid exceeding GPU memory.

在多尺度模型配置中,我們使用[11]中指定的相同的五個(gè)尺度(s∈{480,576,688,864,1200}),以方便與SPPnet進(jìn)行比較。但是,我們限制長(zhǎng)邊最大為2000像素,以避免GPU內(nèi)存不足。

Table 7 shows models S and M when trained and tested with either one or five scales. Perhaps the most surprising result in [11] was that single-scale detection performs almost as well as multi-scale detection. Our findings confirm their result: deep ConvNets are adept at directly learning scale invariance. The multi-scale approach offers only a small increase in mAP at a large cost in compute time (Table 7). In the case of VGG16 (model L), we are limited to using a single scale by implementation details. Yet it achieves a mAP of 66.9%, which is slightly higher than the 66.0% reported for R-CNN [10], even though R-CNN uses “infinite” scales in the sense that each proposal is warped to a canonical size.

Table 7. Multi-scale vs. single scale. SPPnet ZF (similar to model S) results are from [11]. Larger networks with a single-scale offer the best speed / accuracy tradeoff. (L cannot use multi-scale in our implementation due to GPU memory constraints.)

表7顯示了當(dāng)使用一個(gè)或五個(gè)尺度進(jìn)行訓(xùn)練和測(cè)試時(shí)的模型S和M的結(jié)果。也許在[11]中最令人驚訝的結(jié)果是單尺度檢測(cè)幾乎與多尺度檢測(cè)一樣好。我們的研究結(jié)果能證明他們的結(jié)果:深度卷積網(wǎng)絡(luò)擅長(zhǎng)直接學(xué)習(xí)到尺度的不變性。多尺度方法消耗大量的計(jì)算時(shí)間僅帶來(lái)了很小的mAP提升(表7)。在VGG16(模型L)的情況下,我們實(shí)現(xiàn)細(xì)節(jié)限制而只能使用單個(gè)尺度。然而,它得到了66.9%的mAP,略高于R-CNN[10]的66.0%,盡管R-CNN在某種意義上使用了“無(wú)限”尺度,但每個(gè)候選區(qū)域還是被縮放為規(guī)范大小。

7. 多尺度對(duì)比單尺度。SPPnet ZF(類(lèi)似于模型S)的結(jié)果來(lái)自[11]。具有單尺度的較大網(wǎng)絡(luò)具有最佳的速度/精度平衡。(L在我們的實(shí)現(xiàn)中不能使用多尺度,因?yàn)镚PU內(nèi)存限制。)

Since single-scale processing offers the best tradeoff between speed and accuracy, especially for very deep models, all experiments outside of this sub-section use single-scale training and testing with s = 600 pixels.

由于單尺度處理能夠權(quán)衡好速度和精度之間的關(guān)系,特別是對(duì)于非常深的模型,本小節(jié)以外的所有實(shí)驗(yàn)使用單尺度s=600像素的尺度進(jìn)行訓(xùn)練和測(cè)試。

5.3. Do we need more training data?

A good object detector should improve when supplied with more training data. Zhu et al. [24] found that DPM [8] mAP saturates after only a few hundred to thousand training examples. Here we augment the VOC07 trainval set with the VOC12 trainval set, roughly tripling the number of images to 16.5k, to evaluate Fast R-CNN. Enlarging the training set improves mAP on VOC07 test from 66.9% to 70.0% (Table 1). When training on this dataset we use 60k mini-batch iterations instead of 40k.

Table 1. VOC 2007 test detection average precision (%). All methods use VGG16. Training set key: 07: VOC07 trainval, 07\diff: 07 without “difficult” examples, 07+12: union of 07 and VOC12 trainval. ySPPnet results were prepared by the authors of [11].

5.3. 我們需要更多訓(xùn)練數(shù)據(jù)嗎?

當(dāng)提供更多的訓(xùn)練數(shù)據(jù)時(shí),好的目標(biāo)檢測(cè)器應(yīng)該會(huì)進(jìn)一步提升性能。Zhu等人[24]發(fā)現(xiàn)DPM [8]模型的mAP在只有幾百到幾千個(gè)訓(xùn)練樣本的時(shí)候就達(dá)到飽和了。實(shí)驗(yàn)中我們?cè)黾覸OC07 trainval訓(xùn)練集與VOC12 trainval訓(xùn)練集,大約增加到三倍的圖像使其數(shù)量達(dá)到16.5k,以評(píng)估Fast R-CNN。擴(kuò)大訓(xùn)練集將VOC07測(cè)試集的mAP從66.9%提高到70.0%(表1)。使用這個(gè)數(shù)據(jù)集進(jìn)行訓(xùn)練時(shí),我們使用60k次小批量迭代而不是40k。

1. VOC 2007測(cè)試檢測(cè)平均精度(%)。所有方法都使用VGG16。 訓(xùn)練集關(guān)鍵字:07代表VOC07 trainval,07\diff代表07沒(méi)有“困難”的樣本,07 + 12表示VOC07和VOC12 trainval的組合。SPPnet結(jié)果由[11]的作者提供。

We perform similar experiments for VOC10 and 2012, for which we construct a dataset of 21.5k images from the union of VOC07 trainval, test, and VOC12 trainval. When training on this dataset, we use 100k SGD iterations and lower the learning rate by 0.1× each 40k iterations (instead of each 30k). For VOC10 and 2012, mAP improves from 66.1% to 68.8% and from 65.7% to 68.4%, respectively.

我們對(duì)VOC2010和2012進(jìn)行類(lèi)似的實(shí)驗(yàn),我們用VOC07 trainval、test和VOC12 trainval數(shù)據(jù)集構(gòu)造了21.5k個(gè)圖像的數(shù)據(jù)集。當(dāng)用這個(gè)數(shù)據(jù)集訓(xùn)練時(shí),我們使用100k次SGD迭代,并且每40k次迭代(而不是每30k次)降低學(xué)習(xí)率10倍。對(duì)于VOC2010和2012,mAP分別從66.1%提高到68.8%和從65.7%提高到68.4%。

5.4. Do SVMs outperform softmax?

Fast R-CNN uses the softmax classifier learnt during fine-tuning instead of training one-vs-rest linear SVMs post-hoc, as was done in R-CNN and SPPnet. To understand the impact of this choice, we implemented post-hoc SVM training with hard negative mining in Fast R-CNN. We use the same training algorithm and hyper-parameters as in R-CNN.

5.4. SVM分類(lèi)是否優(yōu)于Softmax?

Fast R-CNN在fine-tune期間使用softmax分類(lèi)器學(xué)習(xí),而不是像R-CNN和SPPnet在最后訓(xùn)練一對(duì)多線性SVM。為了理解這種選擇的影響,我們?cè)贔ast R-CNN中進(jìn)行了具有難負(fù)采樣的事后SVM訓(xùn)練。我們使用與R-CNN中相同的訓(xùn)練算法和超參數(shù)。

Table 8 shows softmax slightly outperforming SVM for all three networks, by +0.1 to +0.8 mAP points. This effect is small, but it demonstrates that “one-shot” fine-tuning is sufficient compared to previous multi-stage training approaches. We note that softmax, unlike one-vs-rest SVMs, introduces competition between classes when scoring a RoI.

Table 8. Fast R-CNN with softmax vs. SVM (VOC07 mAP).

如表8所示,對(duì)于所有三個(gè)網(wǎng)絡(luò),Softmax略優(yōu)于SVM,mAP分別提高了0.1和0.8個(gè)點(diǎn)。這個(gè)提升效果很小,但是它表明與先前的多級(jí)訓(xùn)練方法相比,“一次性”fine-tune是足夠的。我們注意到,不像一對(duì)多的SVM那樣,Softmax會(huì)在計(jì)算RoI得分時(shí)引入類(lèi)別之間的競(jìng)爭(zhēng)。

表8. 用Softmax的Fast R-CNN對(duì)比用SVM的Fast RCNN(VOC07 mAP)。

5.5. Are more proposals always better?

There are (broadly) two types of object detectors: those that use a sparse set of object proposals (e.g., selective search [21]) and those that use a dense set (e.g., DPM [8]). Classifying sparse proposals is a type of cascade [22] in which the proposal mechanism first rejects a vast number of candidates leaving the classifier with a small set to evaluate. This cascade improves detection accuracy when applied to DPM detections [21]. We find evidence that the proposal-classifier cascade also improves Fast R-CNN accuracy.

5.5. 更多的候選區(qū)域更好嗎?

(廣義上)存在兩種類(lèi)型的目標(biāo)檢測(cè)器:一類(lèi)是使用候選區(qū)域稀疏集合檢測(cè)器(例如,selective search [21])和另一類(lèi)使用密集集合(例如DPM [8])。分類(lèi)稀疏候選區(qū)域通過(guò)一種級(jí)聯(lián)方式[22]的,其中候選機(jī)制首先舍棄大量候選區(qū)域,留下較小的集合讓分類(lèi)器來(lái)評(píng)估。當(dāng)應(yīng)用于DPM檢測(cè)時(shí),這種級(jí)聯(lián)的方式提高了檢測(cè)精度[21]。我們發(fā)現(xiàn)proposal-classifier級(jí)聯(lián)方式也提高了Fast R-CNN的精度。

Using selective search’s quality mode, we sweep from 1k to 10k proposals per image, each time re-training and re-testing model M. If proposals serve a purely computational role, increasing the number of proposals per image should not harm mAP.

使用selective search的質(zhì)量模式,我們對(duì)每個(gè)圖像掃描1k到10k個(gè)候選框,每次重新訓(xùn)練和重新測(cè)試模型M。如果候選框純粹扮演計(jì)算的角色,增加每個(gè)圖像的候選框數(shù)量不會(huì)影響mAP。

We find that mAP rises and then falls slightly as the proposal count increases (Fig. 3, solid blue line). This experiment shows that swamping the deep classifier with more proposals does not help, and even slightly hurts, accuracy.

我們發(fā)現(xiàn)隨著候選區(qū)域數(shù)量的增加,mAP先上升然后略微下降(如圖3藍(lán)色實(shí)線所示)。這個(gè)實(shí)驗(yàn)表明,深度神經(jīng)網(wǎng)絡(luò)分類(lèi)器使用更多的候選區(qū)域沒(méi)有幫助,甚至稍微有點(diǎn)影響準(zhǔn)確性。

This result is difficult to predict without actually running the experiment. The state-of-the-art for measuring object proposal quality is Average Recall (AR) [12]. AR correlates well with mAP for several proposal methods using R-CNN, when using a fixed number of proposals per image. Fig. 3 shows that AR (solid red line) does not correlate well with mAP as the number of proposals per image is varied. AR must be used with care; higher AR due to more proposals does not imply that mAP will increase. Fortunately, training and testing with model M takes less than 2.5 hours. Fast R-CNN thus enables efficient, direct evaluation of object proposal mAP, which is preferable to proxy metrics.

Figure 3. VOC07 test mAP and AR for various proposal schemes.

如果不實(shí)際進(jìn)行實(shí)驗(yàn),這個(gè)結(jié)果很難預(yù)測(cè)。用于評(píng)估候選區(qū)域質(zhì)量的最流行的技術(shù)是平均召回率(Average Recall, AR) [12]。當(dāng)對(duì)每個(gè)圖像使用固定數(shù)量的候選區(qū)域時(shí),AR與使用R-CNN的幾種候選區(qū)域方法時(shí)的mAP具有良好的相關(guān)性。圖3表明AR(紅色實(shí)線)與mAP不相關(guān),因?yàn)槊總€(gè)圖像的候選區(qū)域數(shù)量是變化的。AR必須謹(jǐn)慎使用,由于更多的候選區(qū)域會(huì)得到更高的AR,然而這并不意味著mAP也會(huì)增加。幸運(yùn)的是,使用模型M的訓(xùn)練和測(cè)試需要不到2.5小時(shí)。因此,Fast R-CNN能夠高效地、直接地評(píng)估目標(biāo)候選區(qū)域的mAP,是很合適的代理指標(biāo)。

圖3. 各種候選區(qū)域方案下VOC07測(cè)試的mAPAR

We also investigate Fast R-CNN when using densely generated boxes (over scale, position, and aspect ratio), at a rate of about 45k boxes / image. This dense set is rich enough that when each selective search box is replaced by its closest (in IoU) dense box, mAP drops only 1 point (to 57.7%, Fig. 3, blue triangle).

我們還研究了當(dāng)使用密集生成框(在不同縮放尺度、位置和寬高比上)大約45k個(gè)框/圖像比例時(shí)的Fast R-CNN網(wǎng)絡(luò)模型。這個(gè)密集集足夠大,當(dāng)每個(gè)selective search框被其最近(IoU)密集框替換時(shí),mAP只降低1個(gè)點(diǎn)(到57.7%,如圖3藍(lán)色三角形所示)。

The statistics of the dense boxes differ from those of selective search boxes. Starting with 2k selective search boxes, we test mAP when adding a random sample of 1000×{2,4,6,8,10,32,45} dense boxes. For each experiment we re-train and re-test model M. When these dense boxes are added, mAP falls more strongly than when adding more selective search boxes, eventually reaching 53.0%.

密集框的統(tǒng)計(jì)信息與selective search框的統(tǒng)計(jì)信息不同。從2k個(gè)selective search框開(kāi)始,我們?cè)購(gòu)?000×{2,4,6,8,10,32,45}中隨機(jī)添加密集框,并測(cè)試mAP。對(duì)于每個(gè)實(shí)驗(yàn),我們重新訓(xùn)練和重新測(cè)試模型M。當(dāng)添加這些密集框時(shí),mAP比添加更多選擇性搜索框時(shí)下降得更強(qiáng),最終達(dá)到53.0%。

We also train and test Fast R-CNN using only dense boxes (45k / image). This setting yields a mAP of 52.9% (blue diamond). Finally, we check if SVMs with hard negative mining are needed to cope with the dense box distribution. SVMs do even worse: 49.3% (blue circle).

我們還訓(xùn)練和測(cè)試了Fast R-CNN只使用密集框(45k/圖像)。此設(shè)置的mAP為52.9%(藍(lán)色菱形)。最后,我們檢查是否需要使用難樣本重訓(xùn)練的SVM來(lái)處理密集框分布。SVM結(jié)果更糟糕:49.3%(藍(lán)色圓圈)。

5.6. Preliminary MS COCO results

We applied Fast R-CNN (with VGG16) to the MS COCO dataset [18] to establish a preliminary baseline. We trained on the 80k image training set for 240k iterations and evaluated on the “test-dev” set using the evaluation server. The PASCAL-style mAP is 35.9%; the new COCO-style AP, which also averages over IoU thresholds, is 19.7%.

5.6. MS COCO初步結(jié)果

我們將Fast R-CNN(使用VGG16)應(yīng)用于MS COCO數(shù)據(jù)集[18],以建立初始baseline。我們?cè)?0k圖像訓(xùn)練集上進(jìn)行了240k次迭代訓(xùn)練,并使用評(píng)估服務(wù)器對(duì)“test-dev”數(shù)據(jù)集進(jìn)行評(píng)估。PASCAL形式的mAP為35.9%;新的COCO標(biāo)準(zhǔn)下的AP為19.7%,即超過(guò)IoU閾值的平均值。

6. Conclusion

This paper proposes Fast R-CNN, a clean and fast update to R-CNN and SPPnet. In addition to reporting state-of-the-art detection results, we present detailed experiments that we hope provide new insights. Of particular note, sparse object proposals appear to improve detector quality. This issue was too costly (in time) to probe in the past, but becomes practical with Fast R-CNN. Of course, there may exist yet undiscovered techniques that allow dense boxes to perform as well as sparse proposals. Such methods, if developed, may help further accelerate object detection.

6. 結(jié)論

本文提出Fast R-CNN,一個(gè)對(duì)R-CNN和SPPnet更新的簡(jiǎn)潔、快速版本。除了報(bào)告目前最先進(jìn)的檢測(cè)結(jié)果之外,我們還提供了詳細(xì)的實(shí)驗(yàn),希望提供新的思路。特別值得注意的是,稀疏目標(biāo)候選區(qū)域似乎提高了檢測(cè)器的質(zhì)量。過(guò)去這個(gè)問(wèn)題代價(jià)太大(在時(shí)間上)而一直無(wú)法深入探索,但Fast R-CNN使其變得可能。當(dāng)然,可能存在未發(fā)現(xiàn)的技術(shù),使得密集框能夠達(dá)到與稀疏候選框類(lèi)似的效果。如果這樣的方法被開(kāi)發(fā)出來(lái),則可以幫助進(jìn)一步加速目標(biāo)檢測(cè)。

Acknowledgements. I thank Kaiming He, Larry Zitnick, and Piotr Doll′ar for helpful discussions and encouragement.

致謝:感謝Kaiming He,Larry Zitnick和Piotr Dollár的有益的討論和鼓勵(lì)。

References

參考文獻(xiàn)

[1] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu. Semantic segmentation with second-order pooling. In ECCV,2012. 5

[2] R. Caruana. Multitask learning. Machine learning, 28(1), 1997. 6

[3] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2014. 5

[4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009. 2

[5] E. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. In NIPS, 2014. 4

[6] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. In CVPR, 2014. 3

[7] M. Everingham, L. Van Gool, C. K. I.Williams, J.Winn, and A. Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV, 2010. 1

[8] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. TPAMI, 2010. 3, 7, 8

[9] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014. 1, 3, 4, 8

[10] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Regionbased convolutional networks for accurate object detection and segmentation. TPAMI, 2015. 5, 7, 8

[11] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, 2014. 1, 2, 3, 4, 5, 6, 7

[12] J. H. Hosang, R. Benenson, P. Doll′ar, and B. Schiele. What makes for effective detection proposals? arXiv preprint arXiv:1502.05082, 2015. 8

[13] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proc. of the ACM International Conf. on Multimedia, 2014. 2

[14] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012. 1, 4, 6

[15] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006. 1

[16] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Comp., 1989. 1

[17] M. Lin, Q. Chen, and S. Yan. Network in network. In ICLR, 2014. 5

[18] T. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll′ar, and C. L. Zitnick. Microsoft COCO: common objects in context. arXiv e-prints, arXiv:1405.0312 [cs.CV], 2014. 8

[19] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. In ICLR, 2014. 1, 3

[20] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 1, 5

[21] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. IJCV, 2013. 8

[22] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001. 8

[23] J. Xue, J. Li, and Y. Gong. Restructuring of deep neural network acoustic models with singular value decomposition. In Interspeech, 2013. 4

[24] X. Zhu, C. Vondrick, D. Ramanan, and C. Fowlkes. Do we need more training data or better models for object detection? In BMVC, 2012. 7

[25] Y. Zhu, R. Urtasun, R. Salakhutdinov, and S. Fidler. segDeepM: Exploiting segmentation and context in deep neural networks for object detection. In CVPR, 2015. 1, 5

總結(jié)

以上是生活随笔為你收集整理的目标检测经典论文——Fast R-CNN论文翻译(中英文对照版):Fast R-CNN(Ross Girshick, Microsoft Research(微软研究院))的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。

国产一区二区在线精品 | 国产日韩欧美网站 | 日夜夜精品视频 | 麻豆视屏 | 三上悠亚一区二区在线观看 | 超碰电影在线观看 | 久久久综合 | 在线你懂的视频 | 婷婷久久五月天 | 欧美午夜精品久久久久久浪潮 | 四虎永久国产精品 | 天天做日日爱夜夜爽 | 手机av在线免费观看 | 亚州激情视频 | 亚洲成av人片一区二区梦乃 | 欧美日韩免费在线视频 | 国产精品成人一区二区三区吃奶 | 日韩高清成人 | 欧美粗又大 | 四虎成人精品永久免费av | 四虎成人免费影院 | 五月婷婷影院 | 精品一区二区在线观看 | 国产香蕉久久精品综合网 | 国内精品中文字幕 | 日韩综合一区二区三区 | 成人av电影免费 | 亚洲一区网 | 99视频免费播放 | 久久久久高清毛片一级 | 欧美一二三专区 | 欧美a性| 精品国产一区二区三区久久久蜜臀 | 日韩电影在线一区 | 国产麻豆果冻传媒在线观看 | 99电影| 日韩免费视频网站 | 91污视频在线 | 日日干 天天干 | 激情五月婷婷网 | 中文字幕第一页在线vr | 国产99久久久精品视频 | 91九色porny在线| 成人高清av在线 | 国产麻豆电影在线观看 | 在线看片91 | 亚洲视频免费在线看 | 亚洲精品影院在线观看 | 操操操影院 | 亚洲一区二区三区四区精品 | 日韩av在线高清 | 在线观看91久久久久久 | 中文字幕日韩一区二区三区不卡 | 麻豆果冻剧传媒在线播放 | 久久电影国产免费久久电影 | 久草视频在线资源站 | 久久91久久久久麻豆精品 | 激情久久网 | 亚洲国产精品成人va在线观看 | 久久av一区二区三区亚洲 | 国产精品久久久久一区二区国产 | 成人精品国产免费网站 | 中文字幕一区二区三区久久 | 99国产在线视频 | 不卡av电影在线观看 | 深夜激情影院 | 成人国产精品久久久久久亚洲 | 婷婷色网站 | 青春草视频在线播放 | 亚洲色影爱久久精品 | 国产成人av电影在线观看 | 超碰在线公开免费 | 手机在线中文字幕 | 国产免费视频在线 | 免费在线观看日韩视频 | 久久一精品| 91精品国产91久久久久福利 | 蜜桃传媒一区二区 | 久久av不卡 | 狠狠色丁香婷婷综合欧美 | 国产免费高清 | 精品国产一区二区三区四区vr | 丁五月婷婷| 日韩高清不卡在线 | 国产成人综合图片 | 日韩精品影视 | 日韩中文字幕视频在线观看 | 久久国产精品网站 | 久久夜色电影 | 91精品国产乱码在线观看 | 不卡的av中文字幕 | 中文欧美字幕免费 | 日韩网站在线看片你懂的 | 99精品欧美一区二区三区 | 婷婷视频在线观看 | 99色免费视频 | 国产精品毛片久久久 | 国产亚洲成av人片在线观看桃 | 久久精品99视频 | 国产精品成 | 国产露脸91国语对白 | 亚洲综合色网站 | 日韩免费在线一区 | 亚洲成a人片在线观看网站口工 | 69亚洲精品| 午夜三级理论 | 成人97视频一区二区 | 精品亚洲视频在线 | 九月婷婷综合网 | 国产探花视频在线播放 | 91av视频在线观看免费 | 精品久久91 | 天天操天天干天天爱 | 精品久久久影院 | 婷婷国产v亚洲v欧美久久 | 成人久久18免费 | 五月宗合网 | 一级做a爱片性色毛片www | 欧美日韩电影在线播放 | 91人人视频在线观看 | 丁香五月亚洲综合在线 | 五月天欧美精品 | 欧美日韩电影在线播放 | 91麻豆操 | 久久激情网站 | 久久久精品一区二区三区 | 激情av资源 | 久久综合精品国产一区二区三区 | 欧美色就是色 | 国产精品久久网 | 911国产在线观看 | 夜夜躁狠狠躁日日躁视频黑人 | 在线看片日韩 | 正在播放国产精品 | 2022国产精品视频 | 在线观看免费成人av | 国产成人久久77777精品 | 国产精品99精品久久免费 | 久久久久久久毛片 | 精品久久久成人 | 天天干,天天射,天天操,天天摸 | 欧美做受高潮1 | 国产精品一区二区久久国产 | 久久成人高清 | 国产在线观看地址 | 亚洲一区 影院 | 免费精品 | 97精品在线视频 | 久久久综合电影 | 日韩网站在线免费观看 | 久久高清视频免费 | 四虎影视精品永久在线观看 | 亚洲精品国产精品国自产 | 久久激情综合网 | 99国产在线 | 久久久91精品国产 | 久久激情视频免费观看 | 婷婷五情天综123 | 福利片免费看 | 97在线视频免费播放 | 香蕉影院在线观看 | 91精品啪在线观看国产81旧版 | 精品99免费视频 | 久久人人爽人人爽人人片av免费 | 少妇18xxxx性xxxx片 | 丰满少妇久久久 | 亚洲黄色app | 天天操天天射天天爱 | 欧美精品999 | 国产成人av在线 | 中文字幕观看视频 | 日本精品一区二区三区在线观看 | 天天色天天操综合网 | 欧美一区成人 | av解说在线 | 色婷av| 欧美日韩精品在线 | 91在线免费播放 | 91精品国自产拍天天拍 | 亚洲国产精品久久久久婷婷884 | 久久在现 | 国内成人综合 | 韩国一区二区三区在线观看 | 色视频网站在线 | www.伊人网 | 午夜影院三级 | 中文av在线免费观看 | 免费看av片网站 | 91热在线 | 亚洲专区在线视频 | 天天草天天干天天 | 中文字幕av有码 | 久久在线免费观看 | 日韩不卡高清视频 | 五月天婷亚洲天综合网精品偷 | 日韩中文在线观看 | 最近更新的中文字幕 | 超碰在线97国产 | 国产精品mv在线观看 | 国产日本在线观看 | 久久久久9999亚洲精品 | 亚洲午夜av久久乱码 | 日韩大陆欧美高清视频区 | 国产精品久久久久久久久蜜臀 | 国产精品久久久久永久免费看 | 人人爽人人爽人人爽人人爽 | 精品国产乱码久久久久久天美 | 久久久久久久国产精品 | 天天干人人 | 国产成人一区二区三区久久精品 | 最近中文字幕免费视频 | 亚洲人片在线观看 | 日韩av午夜 | 久久免费国产视频 | 国产一级黄色片免费看 | 国产日韩精品一区二区三区在线 | 奇米影视8888 | 91在线影视 | 日韩在线视频观看免费 | 亚洲 欧洲 国产 日本 综合 | 国产精品网站 | 国产亚洲久久 | 日韩黄色中文字幕 | 99综合影院在线 | 国产精品第72页 | 四虎影视成人精品国库在线观看 | 国产美女久久久 | 欧美日韩三区二区 | 亚洲 欧美 综合 在线 精品 | 日韩v在线 | 三级性生活视频 | 在线看片中文字幕 | 日韩免| 国产婷婷精品av在线 | 狠狠干婷婷色 | 99久久精品久久久久久清纯 | 日本丶国产丶欧美色综合 | 国产成人精品一区二区三区福利 | 亚洲最大av在线播放 | 五月天激情视频 | av在线播放国产 | 91在线视频| av福利超碰网站 | 一区在线观看视频 | 激情综合网五月 | 99热在线国产精品 | 亚洲成人av影片 | 日韩成人邪恶影片 | 亚洲黄色免费观看 | 欧美激情一区不卡 | 免费观看性生活大片3 | 狠狠色丁婷婷日日 | 日韩在线国产精品 | 黄色免费大全 | 婷婷婷国产在线视频 | 国产精品成人自产拍在线观看 | 国产亚洲精品久久久久久电影 | 国内揄拍国产精品 | 午夜私人影院久久久久 | 丁香五月亚洲综合在线 | 亚洲国产午夜精品 | 久久久久久久免费看 | 国产亚洲免费观看 | 97精品国产97久久久久久免费 | 免费观看国产精品视频 | 亚洲jizzjizz日本少妇 | 四虎国产精品永久在线国在线 | 婷婷综合影院 | 免费亚洲黄色 | 欧美精品黑人性xxxx | 久久免费在线视频 | 四虎影视成人精品 | 99久久精品费精品 | 在线www色 | 激情av五月婷婷 | 国产69精品久久久久99尤 | 成人影视免费看 | 婷婷丁香视频 | 欧美日韩一区二区三区在线观看视频 | 精品1区2区3区 | 高清一区二区三区av | 99热这里只有精品免费 | 中文字幕 国产 一区 | 亚洲毛片在线观看. | 亚洲精品88欧美一区二区 | 久久av在线播放 | 日韩www在线| 色狠狠综合天天综合综合 | 中文字幕日本电影 | 五月婷婷综合激情 | 亚洲精品综合一区二区 | 国产一区二区在线播放视频 | 亚洲高清免费在线 | 国产精品丝袜在线 | 国产美女在线精品免费观看 | 99久久精品一区二区成人 | 日韩在线观看视频中文字幕 | 五月激情亚洲 | 91超碰免费在线 | 91精品国产乱码 | 最近日韩免费视频 | 国产精品日韩欧美一区二区 | 国产韩国日本高清视频 | 97韩国电影 | 六月色婷婷| 一本一道久久a久久精品 | 日韩免费视频一区二区 | 久久免费福利 | 97福利在线观看 | 99久久精品无免国产免费 | 最新日韩视频 | 国产精品成人aaaaa网站 | 国产字幕在线观看 | 在线久草视频 | 亚洲男人天堂a | 亚洲精选国产 | 黄色三级视频片 | 五月婷婷欧美视频 | 欧美精品小视频 | 国产中文字幕在线观看 | 久久丁香 | 免费在线观看av网址 | 国产成人av网 | www.五月天激情 | 国产乱老熟视频网88av | 国产黄色片一级 | 2018好看的中文在线观看 | 免费观看91视频大全 | 欧美另类高潮 | 亚洲精品国内 | 91精品网站 | 国产91九色视频 | 在线观看视频黄 | 五月婷婷电影网 | 国产精品视频免费在线观看 | 最近中文字幕国语免费av | 欧美另类xxx| 中文字幕国产一区二区 | 国产精品 国内视频 | 黄色综合| 久久这里只有精品视频99 | 国产精品久久久久av免费 | 国产亚洲片 | 久久久久久久久艹 | 99国产精品| 日本一区二区高清不卡 | 91精品久久久久久综合乱菊 | 亚洲丁香久久久 | 中文字幕黄色网 | 国产成人精品a | 欧美一区日韩精品 | 中文字幕在线日本 | 中文字幕在线观看视频一区二区三区 | 国内成人av| 福利在线看片 | 天天干人人干 | 午夜精品久久久久久久99 | 日韩黄色一级电影 | 久久成电影 | 国产精品久久网站 | 国产精品国产三级国产aⅴ无密码 | 91成人精品视频 | 五月激情视频 | 国产精品久久久久免费 | 天天天干天天射天天天操 | 欧美成人精品欧美一级乱 | 久久官网| 国产精品中文在线 | 热久久在线视频 | 国产大陆亚洲精品国产 | 2023亚洲精品国偷拍自产在线 | 国产伦理久久 | 99精品热视频只有精品10 | 国产精品99久久久久久有的能看 | 天天插天天 | 日韩小视频网站 | 国产精品久久精品国产 | 久久成人毛片 | 黄色成品视频 | 国产精品18久久久久久久网站 | 99精品免费久久久久久久久日本 | 久久天天躁夜夜躁狠狠躁2022 | 国产又粗又猛又黄又爽的视频 | 国产精品美女视频网站 | 成人一级免费电影 | 深爱激情亚洲 | 国产精品久久久久久久久久99 | 成人影片在线免费观看 | 国产精品久久久久久久久岛 | 国产免费久久久久 | 日韩久久精品一区二区三区下载 | 91视频91蝌蚪 | 亚洲专区中文字幕 | 激情久久综合网 | 高潮久久久久久 | 91少妇精拍在线播放 | 国产黄色av影视 | 国产日韩欧美在线看 | 久久精品99精品国产香蕉 | 成人黄色免费观看 | 99精品在线视频观看 | 国产精成人品免费观看 | 九色视频网址 | 日韩av中文在线观看 | 国产精品女教师 | 国产做a爱一级久久 | 中文字幕第一页在线 | 麻花豆传媒mv在线观看网站 | 国产xxxx | 亚洲精品五月 | 1024手机看片国产 | 国产粉嫩在线观看 | 免费网站看v片在线a | 亚洲日本国产 | 国产久视频 | 综合色综合色 | 狠狠操精品 | av888av.com| 免费视频xnxx com| 精品一区二区三区电影 | 亚洲免费资源 | 久久久久99精品成人片三人毛片 | 国产精品美女久久久网av | 久久久久久综合网天天 | 麻豆av一区二区三区在线观看 | 久久国产精品一区二区三区 | 国产色就色 | 五月天六月色 | 婷婷免费在线视频 | 久久久久国产成人精品亚洲午夜 | 欧美一级片免费在线观看 | 粉嫩高清一区二区三区 | 在线看v片成人 | 综合色亚洲 | 久草久热| 久久久久免费精品视频 | a久久免费视频 | 日韩欧美网址 | 日本久久高清视频 | 五月综合激情网 | 精品亚洲欧美一区 | 久草在线视频精品 | www.亚洲精品| 亚洲精品国产电影 | 在线观看一区视频 | 在线观看亚洲精品 | 欧产日产国产69 | 免费日韩一区二区三区 | 日韩不卡高清 | 国产视频2| 亚洲国产免费看 | 在线观看视频91 | 亚洲最新av网址 | 波多野结衣在线视频免费观看 | 色国产在线 | 日韩不卡高清 | 97视频精品 | 丝袜少妇在线 | 国产日产高清dvd碟片 | 久久免费国产视频 | 亚洲区视频在线观看 | 五月天激情综合 | 日日夜夜精品免费观看 | 色多多视频在线 | 欧美精品亚州精品 | 免费久草视频 | 日韩一区二区在线免费观看 | 亚洲人人爱 | 亚洲精品欧美视频 | 久久久久久影视 | 免费看v片 | 久久婷婷一区二区三区 | 成人国产精品一区 | 欧美性超爽 | 99在线观看精品 | 成在线播放 | 久久亚洲私人国产精品 | 色片网站在线观看 | 久久夜靖品 | 午夜视频在线观看一区二区三区 | 午夜精品一区二区三区在线视频 | 天天综合网久久 | 国产精品麻| 久久久久综合网 | 天天射夜夜爽 | 国产91九色蝌蚪 | 在线视频 91 | 成人性生交大片免费看中文网站 | 久久色网站 | 欧美精品在线视频 | 日韩 在线a | 激情视频免费在线观看 | 亚洲精品在线一区二区 | 黄色av影院 | 国产精品v a免费视频 | 国产手机在线播放 | 国产精品女人久久久 | 夜夜澡人模人人添人人看 | 中文字幕一区二区三区四区久久 | 麻豆国产露脸在线观看 | 高清久久久 | 精品国产欧美一区二区 | 中文在线8资源库 | 五月天综合激情网 | 1024久久| 欧美午夜寂寞影院 | 青青看片 | 久久久香蕉视频 | 色综合久久综合中文综合网 | 国产99久久久国产精品免费看 | 2019中文最近的2019中文在线 | 日韩av线观看 | 久久夜色网 | 日韩精品免费在线视频 | 国产毛片久久 | 久久久久久麻豆 | 亚洲美女精品区人人人人 | 国产成人免费观看 | 狠狠色婷婷丁香六月 | 视频一区在线免费观看 | 四虎在线免费观看 | 久久99网| 蜜桃视频在线视频 | 国产美女被啪进深处喷白浆视频 | 黄色免费网战 | 午夜视频色 | 国产精品久久久久一区二区国产 | 五月天色综合 | 天天干天天草 | 国产精品粉嫩 | 免费av观看网站 | 欧美在线观看视频一区二区 | 热久久99这里有精品 | 久久精品国产久精国产 | 日韩.com| 日韩av成人在线 | 日韩高清免费观看 | 999电影免费在线观看 | 久久久这里有精品 | 中文字幕a∨在线乱码免费看 | 久久不卡av | 中文字幕 国产精品 | 色免费在线 | 久久午夜色播影院免费高清 | 日韩最新av在线 | 狠狠操操 | 国产日韩精品视频 | 97超碰国产在线 | 亚洲黄色av网址 | 91亚洲永久精品 | 四虎国产 | 成人97视频 | 91污视频在线观看 | 狠狠操综合 | 亚洲精品88欧美一区二区 | 美女视频黄频大全免费 | 日韩欧美视频免费看 | 天天综合久久综合 | 91av视频在线观看免费 | 狠狠久久婷婷 | 天天操狠狠干 | av黄色免费在线观看 | 狠狠综合网 | 亚洲精品一区二区三区四区高清 | 色婷婷国产在线 | 奇米影视8888在线观看大全免费 | 亚洲国产精品一区二区尤物区 | 最新国产精品拍自在线播放 | 亚洲色图av | 久久久www成人免费毛片麻豆 | 亚洲 欧美 变态 国产 另类 | 99在线视频网站 | 日韩欧美综合视频 | 蜜桃久久久 | 中文字幕免费成人 | 91精品国产麻豆国产自产影视 | 国产亚洲成av人片在线观看桃 | 91tv国产成人福利 | 国产精品视频在线看 | 一级精品视频在线观看宜春院 | 在线中文视频 | 亚洲精品久久视频 | 亚洲国产三级在线观看 | 最近在线中文字幕 | 久久久久国产精品www | 国产一区视频在线播放 | 97在线观看视频国产 | 国产精品系列在线播放 | 欧美大片在线观看一区 | 青草视频在线播放 | 国产精品久久二区 | 四虎影视久久久 | 99在线观看免费视频精品观看 | 97人人艹| 久久电影色 | 成人一级黄色片 | 久久视频这里有久久精品视频11 | 操天天操 | 日b视频在线观看网址 | 一本色道久久综合亚洲二区三区 | 欧美日韩高清国产 | 亚洲 精品在线视频 | 婷婷在线观看视频 | 久久久久久久18 | 日韩免费在线观看视频 | 中文字幕高清在线 | 亚洲午夜精品久久久久久久久 | 一级黄色片在线观看 | 国产精品va | 天天干,夜夜爽 | 国产精品久久久久久模特 | 婷婷精品国产一区二区三区日韩 | 精品黄色片 | 91爱看片 | 国产三级香港三韩国三级 | av久久久| 99精品热视频 | 日本精品一区二区三区在线播放视频 | 久久精品亚洲 | 日韩城人在线 | 久久人人爽人人片 | 爱av在线网 | 麻豆视频免费播放 | 免费看网站在线 | 99精品视频在线观看播放 | 久久午夜精品视频 | 国产不卡精品 | 五月婷婷国产 | 韩日在线一区 | 中文字幕日韩伦理 | 在线观看你懂的网站 | 免费日韩av电影 | 欧美一级乱黄 | 久久er99热精品一区二区三区 | 日韩二区在线播放 | 亚洲天堂网在线播放 | 五月激情站 | 国产一区二区在线免费播放 | 久久视频一区 | 日韩精品电影在线播放 | 狠狠色免费 | 久久黄色小说 | 亚洲精品午夜视频 | 青草视频在线 | 天天操人人干 | 九九九九热精品免费视频点播观看 | 蜜臀av性久久久久av蜜臀三区 | 成人av免费在线看 | 日韩欧美精品一区 | 色国产视频 | 日韩欧美一区二区在线 | 97在线视 | 国产资源在线观看 | 国产成人一区二区三区免费看 | 亚洲电影av在线 | 一色屋精品视频在线观看 | 欧亚日韩精品一区二区在线 | 天天色天天上天天操 | 丁香网五月天 | 四虎在线视频免费观看 | 超碰人人av| 欧美在线视频免费 | 欧美精品久久久久久久免费 | 欧美精品亚州精品 | 丁香六月伊人 | 最新中文字幕 | 免费观看91视频大全 | 黄色软件网站在线观看 | 色婷婷视频在线 | 国产精品正在播放 | 国产色在线| 美女久久久久久久久久久 | 97超碰站| 激情婷婷久久 | 国产v在线观看 | 国产婷婷色 | 国产不卡高清 | 人人干人人做 | 人成在线免费视频 | 日本久久免费视频 | 激情综合网色播五月 | 精品99久久 | 国产专区精品视频 | 久久综合狠狠综合久久狠狠色综合 | 国产一区二区精品久久91 | av丝袜在线 | 国产色在线视频 | 97人人模人人爽人人喊中文字 | 丝袜美腿在线视频 | 国产中文字幕国产 | 日韩特级毛片 | 亚洲精品午夜一区人人爽 | 日韩精品一区二区免费视频 | 99在线观看视频 | 亚洲成人av电影 | 久草干 | 男女免费av | 久久精品一区 | 日韩色中色 | 91九色自拍 | 国产区精品区 | 美女网站视频久久 | 99精品欧美一区二区三区黑人哦 | 欧美在线视频精品 | 日本黄色a级大片 | 免费三级影片 | 成人中心免费视频 | 国产精品永久久久久久久www | 一二三区高清 | 国产精品久久久久久久久软件 | 超碰免费av | 国产午夜一区 | 亚洲v精品| 日本三级香港三级人妇99 | 成人黄色影片在线 | 精品国产午夜 | 在线免费观看成人 | 在线99热| www黄色av| 精品a在线| 日韩午夜在线播放 | 国产精品一区二区三区99 | 五月天婷婷在线视频 | 韩国精品一区二区三区六区色诱 | 精品 一区 在线 | 国产午夜精品免费一区二区三区视频 | 亚洲精品久久久久久久蜜桃 | 97色综合| 成人网中文字幕 | av丝袜在线 | 久久综合狠狠综合 | 亚洲天天摸日日摸天天欢 | 国产高清在线视频 | 亚洲天堂网视频在线观看 | 国产一级视频在线免费观看 | 青春草视频 | 日韩在线观看视频网站 | www.com在线观看 | 成人av高清| 日韩在线视频二区 | 欧美激情在线看 | 欧洲一区二区在线观看 | 免费看av在线 | 在线观看亚洲免费视频 | 91日韩精品 | 99久久精品国产一区二区三区 | 99久热精品 | 欧美激情视频一二区 | 国产婷婷vvvv激情久 | 一区二区三区播放 | www.色五月| 99精品欧美一区二区蜜桃免费 | av在线一级| 国产综合片 | 久久人人干| 亚洲精品一区二区三区在线观看 | 国产99自拍 | 久久人人爽人人爽 | 91亚洲狠狠婷婷综合久久久 | 99草视频 | 久久久久久不卡 | 91成版人在线观看入口 | 国产中文字幕在线播放 | 超碰在线人人97 | 在线精品视频在线观看高清 | 欧美网址在线观看 | 免费 在线 中文 日本 | 久操视频在线 | 69国产精品视频免费观看 | 中文字幕中文 | 中文字幕欧美日韩va免费视频 | 天天插夜夜操 | 精品国产一区二区三区久久久蜜臀 | 在线观看一区二区精品 | 视频国产一区二区三区 | 91成人午夜 | 黄色av播放 | 欧美大荫蒂xxx | 久久久国产精品人人片99精片欧美一 | 日本精品视频在线观看 | 国产成人精品网站 | 免费在线观看不卡av | 国产日本在线观看 | 在线观看视频国产一区 | 久久久.com| 亚洲人精品午夜 | 亚洲午夜av| 免费精品国产va自在自线 | 日韩xxx视频| 中文字幕亚洲精品在线观看 | 日韩在线观看a | 免费网址在线播放 | 亚洲精品国产精品国自产观看浪潮 | 香蕉国产91| 日韩激情视频在线 | 精品91久久久久 | 久久夜色精品亚洲噜噜国4 午夜视频在线观看欧美 | 99国产精品一区 | 久久手机在线视频 | 在线看欧美 | 成人黄色小说视频 | 日本黄色免费在线观看 | 视频在线91| 国产精品丝袜久久久久久久不卡 | 亚洲美女免费精品视频在线观看 | 91精品国自产在线 | 久久9视频 | 色噜噜在线观看 | 99久久激情 | 国产91精品欧美 | 六月丁香六月婷婷 | 欧美精品国产综合久久 | 成人一级片视频 | 欧美婷婷综合 | 久久五月婷婷丁香社区 | 亚洲精品av中文字幕在线在线 | 最新免费av在线 | 国产亚洲91 | 中文字幕色站 | 丁香久久婷婷 | 日韩综合一区二区 | 色操插| 久久综合久久综合这里只有精品 | 婷婷在线不卡 | 欧美专区日韩专区 | 日韩欧美国产免费播放 | 一本到在线 | 精品你懂的 | www免费视频com| 女人久久久久 | 日本xxxxav| 2023av| 91视频 - 114av| 午夜色性片 | 国产午夜精品理论片在线 | 国产精品久久一区二区三区不卡 | 色视频在线免费 | 九九免费在线看完整版 | 色噜噜在线观看视频 | 在线中文字幕电影 | а天堂中文最新一区二区三区 | 丁香激情综合 | 91九色国产 | 91网站在线视频 | 日韩精品专区在线影院重磅 | 狠狠久久伊人 | 午夜10000| 日韩精品一区二区三区免费观看视频 | 三级免费黄色 | 国产精品国产三级国产aⅴ入口 | 亚洲精品美女视频 | 国产中文字幕视频在线观看 | 激情欧美日韩一区二区 | 中文字幕日韩免费视频 | 国产精品免费一区二区 | 日韩高清av在线 | 中文字幕高清有码 | 国产第一页福利影院 | 国产 视频 久久 | 久久久久免费看 | 99精品视频网站 | 国产精品毛片完整版 | 97免费公开视频 | 亚洲国产精品999 | 日韩欧美在线高清 | 日韩大片在线免费观看 | 伊甸园永久入口www 99热 精品在线 | 久草av在线播放 | 亚洲一级二级 | 丝袜美腿在线视频 | 国产视频久久久 | 欧美成人播放 | 97超碰国产精品女人人人爽 | 中文字幕一区二区三区在线播放 | 麻豆久久一区二区 | 久草www| 亚洲精品人人 | 色婷婷国产精品一区在线观看 | 九9热这里真品2 | 国产美女视频黄a视频免费 久久综合九色欧美综合狠狠 | 激情五月视频 | 夜夜看av| 狠狠干电影| 亚洲精品97 | 91激情 | 精品久久一区二区 | 国产福利91精品张津瑜 | 久久综合九色综合欧美狠狠 | 国产精品视屏 | 中文在线www| 麻豆免费精品视频 | 国产大尺度视频 | 伊香蕉大综综综合久久啪 | 欧美国产精品久久久久久免费 | 国产小视频91 | 国产精品久久久久久久久久妇女 | 欧美一区二区免费在线观看 | 久久精品91久久久久久再现 | 亚洲精品国偷拍自产在线观看蜜桃 | 成年人视频在线观看免费 | 免费h视频 | 久久精品一二三 | 波多野结衣一区二区三区中文字幕 | 五月天久久久久久 | 久久天天躁夜夜躁狠狠85麻豆 | 免费观看一区二区 | 波多野结衣在线视频一区 | 亚洲精品国偷拍自产在线观看蜜桃 | 久久99精品波多结衣一区 | 国产a级精品 | 婷婷丁香激情 | 色多多在线观看 | 丰满少妇高潮在线观看 | 午夜精品一区二区三区免费 | 国产精品久久中文字幕 | 五月综合激情网 | 日韩在线观看一区二区三区 | 天天干夜夜爱 | 韩国av一区二区三区在线观看 | 久久精品美女视频网站 | 久久久综合九色合综国产精品 | 久热超碰 | 国产精品久久电影网 | 精品毛片一区二区免费看 | 特级西西www44高清大胆图片 | 丁香六月久久综合狠狠色 | 国产亚洲精品久久久久久大师 | 91尤物国产尤物福利在线播放 | 国内小视频 | 女人魂免费观看 | 正在播放国产一区 | 99精品久久久久 | 久久久www| 国产二区视频在线观看 | 久久国产精品色av免费看 | www.com久久久 | 国产亚洲综合性久久久影院 | 中文字幕av免费在线观看 | 国产欧美精品在线观看 | 视频在线日韩 | 国产精品三级视频 | 一级黄色在线视频 | 天天干天天干天天 | 亚洲精品乱码久久久久久高潮 | 日韩在线免费电影 | 国产小视频国产精品 | 亚州中文av | 成人小视频在线观看免费 | 国产啊v在线观看 | 视频二区在线视频 | 久久综合色综合88 | 久久国产精品99久久人人澡 | 国产精品尤物 | 韩国一区在线 | 国产精品地址 | 免费的国产精品 | 久久综合狠狠狠色97 | 在线观看一区二区精品 | 伊人狠狠干 | 91在线视频免费观看 | 美女视频网 | 久久xxxx| 中文字幕在线免费观看视频 | 精品一区二区三区电影 | 精品国产一区二区三区久久久 | 午夜精品一区二区三区四区 | 成人免费在线观看电影 | 久久五月网 | 麻豆视频国产在线观看 | 国产日韩中文字幕在线 | 精品a视频 | 69成人在线| 成人午夜久久 | 欧美一区视频 | 国产精品美女久久久网av | 九九热精| 天天色综合三 | 久久成人国产精品入口 | 国产精品99蜜臀久久不卡二区 | 丁香九月激情综合 | 午夜精品久久久久久久99无限制 | 99热最新地址 | 久久久久激情电影 | 91欧美精品 | 国产精品永久久久久久久www | 日韩专区av | 久久久蜜桃一区二区 | 99热高清 | 天天色播 | 日韩一区精品 | a在线v | 欧美日韩国产精品一区二区 | 亚洲一区精品二人人爽久久 | 亚洲国产精品va在线看黑人动漫 | 免费高清在线观看成人 | 最近更新的中文字幕 | 中文字幕亚洲欧美日韩 | 久av在线|