深度学习在CV领域的进展以及一些由深度学习演变的新技术
CV領(lǐng)域
1.進(jìn)展:如上圖所述,當(dāng)前CV領(lǐng)域主要包括兩個大的方向,”低層次的感知” 和 “高層次的認(rèn)知”。
2.主要的應(yīng)用領(lǐng)域:視頻監(jiān)控、人臉識別、醫(yī)學(xué)圖像分析、自動駕駛、 機(jī)器人、AR、VR
3.主要的技術(shù):分類、目標(biāo)檢測(識別)、分割、目標(biāo)追蹤、邊緣檢測、姿勢評估、理解CNN、超分辨率重建、序列學(xué)習(xí)、特征檢測與匹配、圖像標(biāo)定,視頻標(biāo)定、問答系統(tǒng)、圖片生成(文本生成圖像)、視覺關(guān)注性和顯著性(質(zhì)量評價)、人臉識別、3D重建、推薦系統(tǒng)、細(xì)粒度圖像分析、圖像壓縮
分類主要需要解決的問題是“我是誰?”
目標(biāo)檢測主要需要解決的問題是“我是誰? 我在哪里?”
分割主要需要解決的問題是“我是誰? 我在哪里?你是否能夠正確分割我?”
目標(biāo)追蹤主要需要解決的問題是“你能不能跟上我的步伐,盡快找到我?”
邊緣檢測主要需要解決的問題是:“如何準(zhǔn)確的檢測到目標(biāo)的邊緣?”
人體姿勢評估主要需要解決的問題是:“你需要通過我的姿勢判斷我在干什么?”
理解CNN主要需要解決的問題是:“從理論上深層次的去理解CNN的原理?”
超分辨率重建主要需要解決的問題是:“你如何從低質(zhì)量圖片獲得高質(zhì)量的圖片?”
序列學(xué)習(xí)主要解決的問題是“你知道我的下一幅圖像或者下一幀視頻是什么嗎?”
特征檢測與匹配主要需要解決的問題是“檢測圖像的特征,判斷相似程度?”
圖像標(biāo)定主要需要解決的問題是“你能說出圖像中有什么東西?他們在干什么呢?”
視頻標(biāo)定主要需要解決的問題是“你知道我這幾幀視頻說明了什么嗎?”
問答系統(tǒng)主要需要解決的問題是:“你能根據(jù)圖像正確回答我提問的問題嗎?”
圖片生成主要需要解決的問題是:“我能通過你給的信息準(zhǔn)確的生成對應(yīng)的圖片?”
視覺關(guān)注性和顯著性主要需要解決的問題是:“如何提出模擬人類視覺注意機(jī)制的模型?”
人臉識別主要需要解決的問題是:“機(jī)器如何準(zhǔn)確的識別出同一個人在不同情況下的臉?”
3D重建主要需要解決的問題是“你能通過我給你的圖片生成對應(yīng)的高質(zhì)量3D點云嗎?”
推薦系統(tǒng)主要需要解決的問題是“你能根據(jù)我的輸入給出準(zhǔn)確的輸出嗎?”
細(xì)粒度圖像分析主要需要解決的問題是“你能辨別出我是哪一種狗嗎?等這些更精細(xì)的任務(wù)”
圖像壓縮主要需要解決的問題是“如何以較少的比特有損或者無損的表示原來的圖像?”
注:
1. 以下我主要從CV領(lǐng)域中的各個小的領(lǐng)域入手,總結(jié)該領(lǐng)域中一些網(wǎng)絡(luò)模型,基本上覆蓋到了各個領(lǐng)域,力求完整的收集各種經(jīng)典的模型,順序基本上是按照時間的先后,一般最后是該領(lǐng)域最新提出來的方案,我主要的目的是做一個整理,方便自己和他人的使用,你不再需要去網(wǎng)上收集大把的資料,需要的是仔細(xì)分析這些模型,并提出自己新的模型。這里面收集的論文質(zhì)量都比較高,主要來自于ECCV、ICCV、CVPR、PAM、arxiv、ICLR、ACM等頂尖國際會議。并且為每篇論文都添加了鏈接。可以大大地節(jié)約你的時間。同時,我挑選出論文比較重要的網(wǎng)絡(luò)模型或者整體架構(gòu),可以方便你去進(jìn)行對比。有一個更好的全局觀。具體 細(xì)節(jié)需要你去仔細(xì)的閱讀論文。由于個人的精力有限,我只能做成這樣,希望大家能夠理解。謝謝。
2. 我會利用自己的業(yè)余時間來更新新的模型,但是由于時間和精力有限,可能并不完整,我希望大家都能貢獻(xiàn)的一份力量,如果你發(fā)現(xiàn)新的模型,可以聯(lián)系我,我會及時回復(fù)大家,期待著的加入,讓我們一起服務(wù)大家!
如下圖所示:
分類:這是一個基礎(chǔ)的研究課題,已經(jīng)獲得了很高的準(zhǔn)確率,在一些場合上面已經(jīng)遠(yuǎn)遠(yuǎn)地超過啦人類啦!
典型的網(wǎng)絡(luò)模型
LeNet
http://yann.lecun.com/exdb/lenet/index.html
AlexNet
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
https://arxiv.org/pdf/1502.01852.pdf
Batch Normalization
https://arxiv.org/pdf/1502.03167.pdf
GoogLeNet
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf
VGGNet
https://arxiv.org/pdf/1409.1556.pdf
ResNet
https://arxiv.org/pdf/1512.03385.pdf
InceptionV4(Inception-ResNet)
https://arxiv.org/pdf/1602.07261.pdf
LeNet網(wǎng)絡(luò)1:
LeNet網(wǎng)絡(luò)2:
AlexNet網(wǎng)絡(luò)1:
AlexNet網(wǎng)絡(luò)2:
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification網(wǎng)絡(luò):
GoogLeNet網(wǎng)絡(luò)1:
GoogLeNet網(wǎng)絡(luò)2:
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification網(wǎng)絡(luò):
Batch Normalization:
VGGNet網(wǎng)絡(luò)1:
VGGNet網(wǎng)絡(luò)2:
ResNet網(wǎng)絡(luò):
InceptionV4網(wǎng)絡(luò):
圖像檢測:這是基于圖像分類的基礎(chǔ)上所做的一些研究,即分類+定位。
典型網(wǎng)絡(luò)
OVerfeat
https://arxiv.org/pdf/1312.6229.pdf
RNN
https://arxiv.org/pdf/1311.2524.pdf
SPP-Net
https://arxiv.org/pdf/1406.4729.pdf
DeepID-Net
https://arxiv.org/pdf/1409.3505.pdf
Fast R-CNN
https://arxiv.org/pdf/1504.08083.pdf
R-CNN minus R
https://arxiv.org/pdf/1506.06981.pdf
End-to-end people detection in crowded scenes
https://arxiv.org/pdf/1506.04878.pdf
DeepBox
https://arxiv.org/pdf/1505.02146.pdf
MR-CNN
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Gidaris_Object_Detection_via_ICCV_2015_paper.pdf
Faster R-CNN
https://arxiv.org/pdf/1506.01497.pdf
YOLO
https://arxiv.org/pdf/1506.02640.pdf
DenseBox
https://arxiv.org/pdf/1509.04874.pdf
Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning
https://arxiv.org/pdf/1503.00949.pdf
R-FCN
https://arxiv.org/pdf/1605.06409.pdf
SSD
https://arxiv.org/pdf/1512.02325v2.pdf
Inside-Outside Net
https://arxiv.org/pdf/1512.04143.pdf
G-CNN
https://arxiv.org/pdf/1512.07729.pdf
PVANET
https://arxiv.org/pdf/1608.08021.pdf
Speed/accuracy trade-offs for modern convolutional object detectors
https://arxiv.org/pdf/1611.10012v1.pdf
OVerfeat網(wǎng)絡(luò):
R-CNN網(wǎng)絡(luò):
SPP-Net網(wǎng)絡(luò):
DeepID-Net網(wǎng)絡(luò):
DeepBox網(wǎng)絡(luò):
MR-CNN網(wǎng)絡(luò):
Fast-RCNN網(wǎng)絡(luò):
R-CNN minus R網(wǎng)絡(luò):
End-to-end people detection in crowded scenes網(wǎng)絡(luò):
Faster-RCNN網(wǎng)絡(luò):
DenseBox網(wǎng)絡(luò):
Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning網(wǎng)絡(luò):
R-FCN網(wǎng)絡(luò):
YOLO和SDD網(wǎng)絡(luò):
Inside-Outside Net網(wǎng)絡(luò):
G-CNN網(wǎng)絡(luò):
PVANET網(wǎng)絡(luò):
Speed/accuracy trade-offs for modern convolutional object detectors:
圖像分割
經(jīng)典網(wǎng)絡(luò)模型:
FCN
https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
segNet
https://arxiv.org/pdf/1511.00561.pdf
Deeplab
https://arxiv.org/pdf/1606.00915.pdf
deconvNet
https://arxiv.org/pdf/1505.04366.pdf
Conditional Random Fields as Recurrent Neural Networks
http://www.robots.ox.ac.uk/~szheng/papers/CRFasRNN.pdf
Semantic Segmentation using Adversarial Networks
https://arxiv.org/pdf/1611.08408.pdf
SEC: Seed, Expand and Constrain:
http://pub.ist.ac.at/~akolesnikov/files/ECCV2016/main.pdf
Efficient piecewise training of deep structured models for semantic segmentation
https://arxiv.org/pdf/1504.01013.pdf
Semantic Image Segmentation via Deep Parsing Network
https://arxiv.org/pdf/1509.02634.pdf
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
https://arxiv.org/pdf/1503.01640.pdf
Learning Deconvolution Network for Semantic Segmentation
https://arxiv.org/pdf/1505.04366.pdf
Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation
https://arxiv.org/pdf/1506.04924.pdf
PUSHING THE BOUNDARIES OF BOUNDARY DETECTION USING DEEP LEARNING
https://arxiv.org/pdf/1511.07386.pdf
Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network
https://arxiv.org/pdf/1512.07928.pdf
Feedforward Semantic Segmentation With Zoom-Out Features
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mostajabi_Feedforward_Semantic_Segmentation_2015_CVPR_paper.pdf
Joint Calibration for Semantic Segmentation
https://arxiv.org/pdf/1507.01581.pdf
Hypercolumns for Object Segmentation and Fine-Grained Localization
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Hariharan_Hypercolumns_for_Object_2015_CVPR_paper.pdf
Scene Parsing with Multiscale Feature Learning
http://yann.lecun.com/exdb/publis/pdf/farabet-icml-12.pdf
Learning Hierarchical Features for Scene Labeling
http://yann.lecun.com/exdb/publis/pdf/farabet-pami-13.pdf
Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Izadinia_Segment-Phrase_Table_for_ICCV_2015_paper.pdf
MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS
https://arxiv.org/pdf/1511.07122v2.pdf
Weakly supervised graph based semantic segmentation by learning communities of image-parts
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Pourian_Weakly_Supervised_Graph_ICCV_2015_paper.pdf
FCN網(wǎng)絡(luò)1:
FCN網(wǎng)絡(luò)2:
segNet網(wǎng)絡(luò):
Deeplab網(wǎng)絡(luò):
deconvNet網(wǎng)絡(luò):
Conditional Random Fields as Recurrent Neural Networks網(wǎng)絡(luò):
Semantic Segmentation using Adversarial Networks網(wǎng)絡(luò):
SEC: Seed, Expand and Constrain網(wǎng)絡(luò):
Efficient piecewise training of deep structured models for semantic segmentation網(wǎng)絡(luò):
Semantic Image Segmentation via Deep Parsing Network網(wǎng)絡(luò):
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation:
Learning Deconvolution Network for Semantic Segmentation:
PUSHING THE BOUNDARIES OF BOUNDARY DETECTION USING DEEP LEARNING:
Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation:
Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network:
Feedforward Semantic Segmentation With Zoom-Out Features網(wǎng)絡(luò):
Joint Calibration for Semantic Segmentation:
Hypercolumns for Object Segmentation and Fine-Grained Localization:
Learning Hierarchical Features for Scene Labeling:
MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS:
Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing:
Weakly supervised graph based semantic segmentation by learning communities of image-parts:
Scene Parsing with Multiscale Feature Learning:
目標(biāo)追蹤
經(jīng)典網(wǎng)絡(luò):
DLT
https://pdfs.semanticscholar.org/b218/0fc4f5cb46b5b5394487842399c501381d67.pdf
Transferring Rich Feature Hierarchies for Robust Visual Tracking
https://arxiv.org/pdf/1501.04587.pdf
FCNT
http://202.118.75.4/lu/Paper/ICCV2015/iccv15_lijun.pdf
Hierarchical Convolutional Features for Visual Tracking
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Ma_Hierarchical_Convolutional_Features_ICCV_2015_paper.pdf
MDNet
https://arxiv.org/pdf/1510.07945.pdf
Recurrently Target-Attending Tracking
http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Cui_Recurrently_Target-Attending_Tracking_CVPR_2016_paper.pdf
DeepTracking
http://www.bmva.org/bmvc/2014/files/paper028.pdf
DeepTrack
http://www.bmva.org/bmvc/2014/files/paper028.pdf
Online Tracking by Learning Discriminative Saliency Map
with Convolutional Neural Network
https://arxiv.org/pdf/1502.06796.pdf
Transferring Rich Feature Hierarchies for Robust Visual Tracking
https://arxiv.org/pdf/1501.04587.pdf
DLT網(wǎng)絡(luò):
Transferring Rich Feature Hierarchies for Robust Visual Tracking網(wǎng)絡(luò):
FCNT網(wǎng)絡(luò):
Hierarchical Convolutional Features for Visual Tracking網(wǎng)絡(luò):
MDNet網(wǎng)絡(luò):
DeepTracking網(wǎng)絡(luò):
ecurrently Target-Attending Tracking網(wǎng)絡(luò):
DeepTrack網(wǎng)絡(luò):
Online Tracking by Learning Discriminative Saliency Map
with Convolutional Neural Network:
邊緣檢測
經(jīng)典模型:
HED
https://arxiv.org/pdf/1504.06375.pdf
DeepEdge
https://arxiv.org/pdf/1412.1123.pdf
DeepConto
http://mc.eistar.net/UpLoadFiles/Papers/DeepContour_cvpr15.pdf
HED網(wǎng)絡(luò):
DeepEdge網(wǎng)絡(luò):
DeepContour網(wǎng)絡(luò):
人體姿勢評估
經(jīng)典模型:
DeepPose
https://arxiv.org/pdf/1312.4659.pdf
JTCN
https://www.robots.ox.ac.uk/~vgg/rg/papers/tompson2014.pdf
Flowing convnets for human pose estimation in videos
https://arxiv.org/pdf/1506.02897.pdf
Stacked hourglass networks for human pose estimation
https://arxiv.org/pdf/1603.06937.pdf
Convolutional pose machines
https://arxiv.org/pdf/1602.00134.pdf
Deepcut
https://arxiv.org/pdf/1605.03170.pdf
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
https://arxiv.org/pdf/1611.08050.pdf
DeepPose網(wǎng)絡(luò):
JTCN網(wǎng)絡(luò):
Flowing convnets for human pose estimation in videos網(wǎng)絡(luò):
Stacked hourglass networks for human pose estimation網(wǎng)絡(luò):
Convolutional pose machines網(wǎng)絡(luò):
Deepcut網(wǎng)絡(luò):
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields網(wǎng)絡(luò):
理解CNN
經(jīng)典網(wǎng)絡(luò):
Visualizing and Understanding Convolutional Networks
https://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf
Inverting Visual Representations with Convolutional Networks
https://arxiv.org/pdf/1506.02753.pdf
Object Detectors Emerge in Deep Scene CNNs
https://arxiv.org/pdf/1412.6856.pdf
Understanding Deep Image Representations by Inverting Them
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mahendran_Understanding_Deep_Image_2015_CVPR_paper.pdf
Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf
Understanding image representations by measuring their equivariance and equivalence
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Lenc_Understanding_Image_Representations_2015_CVPR_paper.pdf
Visualizing and Understanding Convolutional Networks網(wǎng)絡(luò):
Inverting Visual Representations with Convolutional Networks:
Object Detectors Emerge in Deep Scene CNNs:
Understanding Deep Image Representations by Inverting Them:
Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images:
Understanding image representations by measuring their equivariance and equivalence:
超分辨率重建
經(jīng)典模型:
Learning Iterative Image Reconstruction
http://www.ais.uni-bonn.de/behnke/papers/ijcai01.pdf
Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid
http://www.ais.uni-bonn.de/behnke/papers/ijcia01.pdf
Learning a Deep Convolutional Network for Image Super-Resolution
http://personal.ie.cuhk.edu.hk/~ccloy/files/eccv_2014_deepresolution.pdf
Image Super-Resolution Using Deep Convolutional Networks
https://arxiv.org/pdf/1501.00092.pdf
Accurate Image Super-Resolution Using Very Deep Convolutional Networks
https://arxiv.org/pdf/1511.04587.pdf
Deeply-Recursive Convolutional Network for Image Super-Resolution
https://arxiv.org/pdf/1511.04491.pdf
Deep Networks for Image Super-Resolution with Sparse Prior
http://www.ifp.illinois.edu/~dingliu2/iccv15/iccv15.pdf
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
https://arxiv.org/pdf/1603.08155.pdf
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
https://arxiv.org/pdf/1609.04802v3.pdf
Learning Iterative Image Reconstruction網(wǎng)絡(luò):
Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid:
Learning a Deep Convolutional Network for Image Super-Resolution:
Image Super-Resolution Using Deep Convolutional Networks:
Accurate Image Super-Resolution Using Very Deep Convolutional Networks:
Deeply-Recursive Convolutional Network for Image Super-Resolution:
Deep Networks for Image Super-Resolution with Sparse Prior:
Perceptual Losses for Real-Time Style Transfer and Super-Resolution:
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network:
圖像標(biāo)定
經(jīng)典模型:
Explain Images with Multimodal Recurrent Neural Networks
https://arxiv.org/pdf/1410.1090.pdf
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
https://arxiv.org/pdf/1411.2539.pdf
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
https://arxiv.org/pdf/1411.4389.pdf
A Neural Image Caption Generator
https://arxiv.org/pdf/1411.4555.pdf
Deep Visual-Semantic Alignments for Generating Image Description
http://cs.stanford.edu/people/karpathy/cvpr2015.pdf
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
https://arxiv.org/pdf/1412.4729.pdf
Learning a Recurrent Visual Representation for Image Caption Generation
https://arxiv.org/pdf/1411.5654.pdf
From Captions to Visual Concepts and Back
https://arxiv.org/pdf/1411.4952.pdf
Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention
http://www.cs.toronto.edu/~zemel/documents/captionAttn.pdf
Phrase-based Image Captioning
https://arxiv.org/pdf/1502.03671.pdf
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
https://arxiv.org/pdf/1504.06692.pdf
Exploring Nearest Neighbor Approaches for Image Captioning
https://arxiv.org/pdf/1505.04467.pdf
Image Captioning with an Intermediate Attributes Layer
https://arxiv.org/pdf/1506.01144.pdf
Learning language through pictures
https://arxiv.org/pdf/1506.03694.pdf
Describing Multimedia Content using Attention-based Encoder-Decoder Networks
https://arxiv.org/pdf/1507.01053.pdf
Image Representations and New Domains in Neural Image Captioning
https://arxiv.org/pdf/1508.02091.pdf
Learning Query and Image Similarities with Ranking Canonical Correlation Analysis
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Yao_Learning_Query_and_ICCV_2015_paper.pdf
Generative Adversarial Text to Image Synthesis
https://arxiv.org/pdf/1605.05396.pdf
GENERATING IMAGES FROM CAPTIONS WITH ATTENTION
https://arxiv.org/pdf/1511.02793.pdf
Explain Images with Multimodal Recurrent Neural Networks:
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models:
Long-term Recurrent Convolutional Networks for Visual Recognition and Description:
A Neural Image Caption Generator:
Deep Visual-Semantic Alignments for Generating Image Description:
Translating Videos to Natural Language Using Deep Recurrent Neural Networks:
Learning a Recurrent Visual Representation for Image Caption Generation:
From Captions to Visual Concepts and Back:
Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention:
Phrase-based Image Captioning:
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images:
Exploring Nearest Neighbor Approaches for Image Captioning:
Image Captioning with an Intermediate Attributes Layer:
Learning language through pictures:
Describing Multimedia Content using Attention-based Encoder-Decoder Networks:
Image Representations and New Domains in Neural Image Captioning:
Learning Query and Image Similarities with Ranking Canonical Correlation Analysis:
Generative Adversarial Text to Image Synthesis:
GENERATING IMAGES FROM CAPTIONS WITH ATTENTION:
視頻標(biāo)注
經(jīng)典模型:
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
https://arxiv.org/pdf/1411.4389.pdf
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
https://arxiv.org/pdf/1412.4729.pdf
Joint Modeling Embedding and Translation to Bridge Video and Language
https://arxiv.org/pdf/1505.01861.pdf
Sequence to Sequence–Video to Text
https://arxiv.org/pdf/1505.00487.pdf
Describing Videos by Exploiting Temporal Structure
https://arxiv.org/pdf/1502.08029.pdf
The Long-Short Story of Movie Description
https://arxiv.org/pdf/1506.01698.pdf
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
https://arxiv.org/pdf/1506.06724.pdf
Describing Multimedia Content using Attention-based Encoder-Decoder Networks
https://arxiv.org/pdf/1507.01053.pdf
Temporal Tessellation for Video Annotation and Summarization
https://arxiv.org/pdf/1612.06950.pdf
Summarization-based Video Caption via Deep Neural Networks
acm=1492135731_7c7cb5d6bf7455db7f4aa75b341d1a78”>http://delivery.acm.org/10.1145/2810000/2806314/p1191-li.pdf?ip=123.138.79.12&id=2806314&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=923677366&CFTOKEN=37844144&acm=1492135731_7c7cb5d6bf7455db7f4aa75b341d1a78
Deep Learning for Video Classification and Captioning
https://arxiv.org/pdf/1609.06782.pdf
Long-term Recurrent Convolutional Networks for Visual Recognition and Description:
Translating Videos to Natural Language Using Deep Recurrent Neural Networks:
Joint Modeling Embedding and Translation to Bridge Video and Language:
Sequence to Sequence–Video to Text:
Describing Videos by Exploiting Temporal Structure:
The Long-Short Story of Movie Description:
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books:
Describing Multimedia Content using Attention-based Encoder-Decoder Networks:
Temporal Tessellation for Video Annotation and Summarization:
Summarization-based Video Caption via Deep Neural Networks:
Deep Learning for Video Classification and Captioning:
問答系統(tǒng)
經(jīng)典模型:
VQA: Visual Question Answering
https://arxiv.org/pdf/1505.00468.pdf
Ask Your Neurons: A Neural-based Approach to Answering Questions about Images
https://arxiv.org/pdf/1505.01121.pdf
Image Question Answering: A Visual Semantic Embedding Model and a New Dataset
https://arxiv.org/pdf/1505.02074.pdf
Stacked Attention Networks for Image Question Answering
https://arxiv.org/pdf/1511.02274v2.pdf
Dataset and Methods for Multilingual Image Question Answering
https://arxiv.org/pdf/1505.05612.pdf
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Dynamic Memory Networks for Visual and Textual Question Answering
https://arxiv.org/pdf/1603.01417v1.pdf
Multimodal Residual Learning for Visual QA
https://arxiv.org/pdf/1606.01455.pdf
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
https://arxiv.org/pdf/1606.01847.pdf
Training Recurrent Answering Units with Joint Loss Minimization for VQA
https://arxiv.org/pdf/1606.03647.pdf
Hadamard Product for Low-rank Bilinear Pooling
https://arxiv.org/pdf/1610.04325.pdf
Question Answering Using Deep Learning
https://cs224d.stanford.edu/reports/StrohMathur.pdf
VQA: Visual Question Answering:
Ask Your Neurons: A Neural-based Approach to Answering Questions about Images:
Image Question Answering: A Visual Semantic Embedding Model and a New Dataset:
Stacked Attention Networks for Image Question Answering:
Dataset and Methods for Multilingual Image Question Answering:
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction:
Dynamic Memory Networks for Visual and Textual Question Answering:
Multimodal Residual Learning for Visual QA:
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding:
Training Recurrent Answering Units with Joint Loss Minimization for VQA:
Hadamard Product for Low-rank Bilinear Pooling:
Question Answering Using Deep Learning:
圖片生成(CNN、RNN、LSTM、GAN)
經(jīng)典模型:
Conditional Image Generation with PixelCNN Decoders
https://arxiv.org/pdf/1606.05328v2.pdf
Learning to Generate Chairs with Convolutional Neural Networks
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf
DRAW: A Recurrent Neural Network For Image Generation
https://arxiv.org/pdf/1502.04623v2.pdf
Generative Adversarial Networks
https://arxiv.org/pdf/1406.2661.pdf
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
https://arxiv.org/pdf/1506.05751.pdf
A note on the evaluation of generative models
https://arxiv.org/pdf/1511.01844.pdf
Variationally Auto-Encoded Deep Gaussian Processes
https://arxiv.org/pdf/1511.06455v2.pdf
Generating Images from Captions with Attention
https://arxiv.org/pdf/1511.02793v2.pdf
Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks
https://arxiv.org/pdf/1511.06390v1.pdf
Censoring Representations with an Adversary
https://arxiv.org/pdf/1511.05897v3.pdf
Distributional Smoothing with Virtual Adversarial Training
https://arxiv.org/pdf/1507.00677v8.pdf
Generative Visual Manipulation on the Natural Image Manifold
https://arxiv.org/pdf/1609.03552v2.pdf
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
https://arxiv.org/pdf/1511.06434.pdf
Wasserstein GAN
https://arxiv.org/pdf/1701.07875.pdf
Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities
https://arxiv.org/pdf/1701.06264.pdf
Conditional Generative Adversarial Nets
https://arxiv.org/pdf/1411.1784.pdf
InfoGAN: Interpretable Representation Learning byInformation Maximizing Generative Adversarial Nets
https://arxiv.org/pdf/1606.03657.pdf
Conditional Image Synthesis With Auxiliary Classifier GANs
https://arxiv.org/pdf/1610.09585.pdf
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
https://arxiv.org/pdf/1609.05473.pdf
Improved Training of Wasserstein GANs
https://arxiv.org/pdf/1704.00028.pdf
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis
https://arxiv.org/pdf/1704.04086.pdf
Conditional Image Generation with PixelCNN Decoders:
Learning to Generate Chairs with Convolutional Neural Networks:
DRAW: A Recurrent Neural Network For Image Generation:
Generative Adversarial Networks:
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks:
A note on the evaluation of generative models:
Variationally Auto-Encoded Deep Gaussian Processes:
Generating Images from Captions with Attention:
Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks:
Censoring Representations with an Adversary:
Distributional Smoothing with Virtual Adversarial Training:
Generative Visual Manipulation on the Natural Image Manifold:
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks:
Wasserstein GAN:
Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities:
Conditional Generative Adversarial Nets:
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets:
Conditional Image Synthesis With Auxiliary Classifier GANs:
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient:
Improved Training of Wasserstein GANs:
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis:
視覺關(guān)注性和顯著性
經(jīng)典模型:
Predicting Eye Fixations using Convolutional Neural Networks
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Liu_Predicting_Eye_Fixations_2015_CVPR_paper.pdf
Learning a Sequential Search for Landmarks
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Singh_Learning_a_Sequential_2015_CVPR_paper.pdf
Multiple Object Recognition with Visual Attention
https://arxiv.org/pdf/1412.7755.pdf
Recurrent Models of Visual Attention
http://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf
Capacity Visual Attention Networks
http://easychair.org/publications/download/Capacity_Visual_Attention_Networks
Fully Convolutional Attention Networks for Fine-Grained Recognition
https://arxiv.org/pdf/1603.06765.pdf
Predicting Eye Fixations using Convolutional Neural Networks:
Learning a Sequential Search for Landmarks:
Multiple Object Recognition with Visual Attention:
Recurrent Models of Visual Attention:
Capacity Visual Attention Networks:
Fully Convolutional Attention Networks for Fine-Grained Recognition:
特征檢測與匹配(塊)
經(jīng)典模型:
TILDE: A Temporally Invariant Learned DEtector
https://arxiv.org/pdf/1411.4568.pdf
MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching
https://pdfs.semanticscholar.org/81b9/24da33b9500a2477532fd53f01df00113972.pdf
Discriminative Learning of Deep Convolutional Feature Point Descriptors
http://cvlabwww.epfl.ch/~trulls/pdf/iccv-2015-deepdesc.pdf
Learning to Assign Orientations to Feature Points
https://arxiv.org/pdf/1511.04273.pdf
PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors
https://arxiv.org/pdf/1601.05030.pdf
Multi-scale Pyramid Pooling for Deep Convolutional Representation
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7301274
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
https://arxiv.org/pdf/1406.4729.pdf
Learning to Compare Image Patches via Convolutional Neural Networks
https://arxiv.org/pdf/1504.03641.pdf
PixelNet: Representation of the pixels, by the pixels, and for the pixels
http://www.cs.cmu.edu/~aayushb/pixelNet/pixelnet.pdf
LIFT: Learned Invariant Feature Transform
https://arxiv.org/pdf/1603.09114.pdf
TILDE: A Temporally Invariant Learned DEtector:
MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching:
Discriminative Learning of Deep Convolutional Feature Point Descriptors:
Learning to Assign Orientations to Feature Points:
PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors:
Multi-scale Pyramid Pooling for Deep Convolutional Representation:
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition:
Learning to Compare Image Patches via Convolutional Neural Networks:
PixelNet: Representation of the pixels, by the pixels, and for the pixels:
LIFT: Learned Invariant Feature Transform:
人臉識別
經(jīng)典模型:
Learning Hierarchical Representations for Face Verification with Convolutional Deep Belief Networks
http://vis-www.cs.umass.edu/papers/HuangCVPR12.pdf
Deep Convolutional Network Cascade for Facial Point Detection
http://mmlab.ie.cuhk.edu.hk/archive/CNN/data/CNN_FacePoint.pdf
Deep Nonlinear Metric Learning with Independent Subspace Analysis for Face Verification
acm=1492152722_04e9cce5378080a18ec7e700dfb4cd28”>http://delivery.acm.org/10.1145/2400000/2396303/p749-cai.pdf?ip=123.138.79.12&id=2396303&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=923677366&CFTOKEN=37844144&acm=1492152722_04e9cce5378080a18ec7e700dfb4cd28
DeepFace: Closing the Gap to Human-Level Performance in Face Verification
https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf
Deep learning face representation by joint identification-verification
https://arxiv.org/pdf/1406.4773.pdf
Deep learning face representation from predicting 10,000 classes
http://mmlab.ie.cuhk.edu.hk/pdf/YiSun_CVPR14.pdf
Deeply learned face representations are sparse, selective, and robust
https://arxiv.org/pdf/1412.1265.pdf
Deepid3: Face recognition with very deep neural networks
https://arxiv.org/pdf/1502.00873.pdf
FaceNet: A Unified Embedding for Face Recognition and Clustering
https://arxiv.org/pdf/1503.03832.pdf
Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness
https://arxiv.org/pdf/1609.07304.pdf
Large-pose Face Alignment via CNN-based Dense 3D Model Fitting
http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Jourabloo_Large-Pose_Face_Alignment_CVPR_2016_paper.pdf
Unconstrained 3D face reconstruction
http://cvlab.cse.msu.edu/pdfs/Roth_Tong_Liu_CVPR2015.pdf
Adaptive contour fitting for pose-invariant 3D face shape reconstruction
http://akme-a2.iosb.fraunhofer.de/ETGS15p/2015_Adaptive%20contour%20fitting%20for%20pose-invariant%203D%20face%20shape%20reconstruction.pdf
High-fidelity pose and expression normalization for face recognition in the wild
http://www.cbsr.ia.ac.cn/users/xiangyuzhu/papers/CVPR2015_High-Fidelity.pdf
Adaptive 3D face reconstruction from unconstrained photo collections
http://cvlab.cse.msu.edu/pdfs/Roth_Tong_Liu_CVPR16.pdf
Dense 3D face alignment from 2d videos in real-time
http://ieeexplore.ieee.org/stamp/stamp.jsp arnumber=7163142
Robust facial landmark detection under significant head poses and occlusion
http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Wu_Robust_Facial_Landmark_ICCV_2015_paper.pdf
A convolutional neural network cascade for face detection
http://users.eecs.northwestern.edu/~xsh835/assets/cvpr2015_cascnn.pdf
Deep Face Recognition Using Deep Convolutional Neural
Network
http://aiehive.com/deep-face-recognition-using-deep-convolution-neural-network/
Multi-view Face Detection Using Deep Convolutional Neural Networks
acm=1492157015_8ffa84e6632810ea05ff005794fed8d5”>http://delivery.acm.org/10.1145/2750000/2749408/p643-farfade.pdf?ip=123.138.79.12&id=2749408&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=923677366&CFTOKEN=37844144&acm=1492157015_8ffa84e6632810ea05ff005794fed8d5
HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition
https://arxiv.org/pdf/1603.01249.pdf
Wider face: A face detectionbenchmark
http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/paper.pdf
Joint training of cascaded cnn for face detection
http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Qin_Joint_Training_of_CVPR_2016_paper.pdf
Face detection with end-to-end integration of a convnet and a 3d model
https://arxiv.org/pdf/1606.00850.pdf
Face Detection using Deep Learning: An Improved Faster RCNN Approach
https://arxiv.org/pdf/1701.08289.pdf
新舊方法對比:
Learning Hierarchical Representations for Face Verification with Convolutional Deep Belief Networks:
Deep Convolutional Network Cascade for Facial Point Detection:
Deep Nonlinear Metric Learning with Independent Subspace Analysis for Face Verification:
DeepFace: Closing the Gap to Human-Level Performance in Face Verification:
Deep learning face representation by joint identification-verification:
Deep learning face representation from predicting 10,000 classes:
Deeply learned face representations are sparse, selective, and robust:
Deepid3: Face recognition with very deep neural networks:
FaceNet: A Unified Embedding for Face Recognition and Clustering:
Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness:
Large-pose Face Alignment via CNN-based Dense 3D Model Fitting:
Unconstrained 3D face reconstruction:
Adaptive contour fitting for pose-invariant 3D face shape reconstruction:
High-fidelity pose and expression normalization for face recognition in the wild:
Adaptive 3D face reconstruction from unconstrained photo collections:
Regressing a 3D face shape from a single image:
Dense 3D face alignment from 2d videos in real-time:
Robust facial landmark detection under significant head poses and occlusion:
A convolutional neural network cascade for face detection:
Deep Face Recognition Using Deep Convolutional Neural
Network:
Multi-view Face Detection Using Deep Convolutional Neural Networks:
HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender
Recognition:
Wider face: A face detectionbenchmark
Joint training of cascaded cnn for face detection::
Face detection with end-to-end integration of a convnet and a 3d model:
Face Detection using Deep Learning: An Improved Faster RCNN Approach:
3D重建
經(jīng)典模型:
3D ShapeNets: A Deep Representation for Volumetric Shapes
https://people.csail.mit.edu/khosla/papers/cvpr2015_wu.pdf
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
https://arxiv.org/pdf/1604.00449.pdf
Learning to generate chairs with convolutional neural networks
https://arxiv.org/pdf/1411.5928.pdf
Category-specific object reconstruction from a single image
http://people.eecs.berkeley.edu/~akar/categoryshapes.pdf
Enriching Object Detection with 2D-3D Registration and Continuous Viewpoint Estimation
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7298866
ShapeNet: An Information-Rich 3D Model Repository
https://arxiv.org/pdf/1512.03012.pdf
3D reconstruction of synapses with deep learning based on EM Images
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7558866
Analysis and synthesis of 3d shape families via deep-learned generative models of surfaces
https://arxiv.org/pdf/1605.06240.pdf
Unsupervised Learning of 3D Structure from Images
https://arxiv.org/pdf/1607.00662.pdf
Deep learning 3d shape surfaces using geometry images
http://download.springer.com/static/pdf/605/chp%253A10.1007%252F978-3-319-46466-4_14.pdf?originUrl=http%3A%2F%2Flink.springer.com%2Fchapter%2F10.1007%2F978-3-319-46466-4_14&token2=exp=1492181498~acl=%2Fstatic%2Fpdf%2F605%2Fchp%25253A10.1007%25252F978-3-319-46466-4_14.pdf%3ForiginUrl%3Dhttp%253A%252F%252Flink.springer.com%252Fchapter%252F10.1007%252F978-3-319-46466-4_14*~hmac=b772943d8cd5f914e7bc84a30ddfdf0ef87991bee1d52717cb4930e3eccb0e63
FPNN: Field Probing Neural Networks for 3D Data
https://arxiv.org/pdf/1605.06240.pdf
Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views
https://arxiv.org/pdf/1505.05641.pdf
Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling
https://arxiv.org/pdf/1610.07584.pdf
SurfNet: Generating 3D shape surfaces using deep residual networks
https://arxiv.org/pdf/1703.04079.pdf
3D ShapeNets: A Deep Representation for Volumetric Shapes:
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction:
Learning to generate chairs with convolutional neural networks:
Category-specific object reconstruction from a single image:
Enriching Object Detection with 2D-3D Registration and Continuous Viewpoint Estimation:
Completing 3d object shape from one depth image:
ShapeNet: An Information-Rich 3D Model Repository:
3D reconstruction of synapses with deep learning based on EM Images:
Analysis and synthesis of 3d shape families via deep-learned generative models of surfaces:
FPNN: Field Probing Neural Networks for 3D Data:
Unsupervised Learning of 3D Structure from Images:
Deep learning 3d shape surfaces using geometry images:
Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views:
Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling:
SurfNet: Generating 3D shape surfaces using deep residual networks:
推薦系統(tǒng)
經(jīng)典模型:
Autorec: Autoencoders meet collaborative filtering
http://users.cecs.anu.edu.au/~akmenon/papers/autorec/autorec-paper.pdf
User modeling with neural network for review rating prediction
https://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwj35dyVo6nTAhWEnpQKHSAwCw4QFggjMAA&url=http%3a%2f%2fwww%2eaaai%2eorg%2focs%2findex%2ephp%2fIJCAI%2fIJCAI15%2fpaper%2fdownload%2f11051%2f10849&usg=AFQjCNHeMJX8AZzoRF0ODcZE_mXazEktUQ
Collaborative Deep Learning for Recommender Systems
https://arxiv.org/pdf/1409.2944.pdf
A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf
A neural probabilistic model for context based citation recommendation
http://www.personal.psu.edu/wzh112/publications/aaai_slides.pdf
Hybrid Recommender System based on Autoencoders
acm=1492356698_958d1b64105cd41b9719c8d285736396”>http://delivery.acm.org/10.1145/2990000/2988456/p11-strub.pdf?ip=123.138.79.12&id=2988456&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=751612499&CFTOKEN=37099060&acm=1492356698_958d1b64105cd41b9719c8d285736396
Wide & Deep Learning for Recommender Systems
https://arxiv.org/pdf/1606.07792.pdf
Deep Neural Networks for YouTube Recommendations
https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf
Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks
http://www.wanghao.in/paper/NIPS16_CRAE.pdf
Neural Collaborative Filtering
http://www.comp.nus.edu.sg/~xiangnan/papers/ncf.pdf
Recurrent Recommender Networks
http://alexbeutel.com/papers/rrn_wsdm2017.pdf
Autorec: Autoencoders meet collaborative filtering:
User modeling with neural network for review rating prediction:
A neural probabilistic model for context based citation recommendation:
A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems:
Collaborative Deep Learning for Recommender Systems:
Wide & Deep Learning for Recommender Systems:
Deep Neural Networks for YouTube Recommendations:
Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks:
Neural Collaborative Filtering:
Recurrent Recommender Networks:
細(xì)粒度圖像分析
經(jīng)典模型:
Part-based R-CNNs for Fine-grained Category Detection
https://people.eecs.berkeley.edu/~nzhang/papers/eccv14_part.pdf
Bird Species Categorization Using Pose Normalized Deep Convolutional Nets
http://www.bmva.org/bmvc/2014/files/paper071.pdf
Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition
https://arxiv.org/pdf/1605.06878.pdf
The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Xiao_The_Application_of_2015_CVPR_paper.pdf
Bilinear CNN Models for Fine-grained Visual Recognition
http://vis-www.cs.umass.edu/bcnn/docs/bcnn_iccv15.pdf
Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval
https://arxiv.org/pdf/1604.04994.pdf
Near Duplicate Image Detection: min-Hash and tf-idf Weighting
https://www.robots.ox.ac.uk/~vgg/publications/papers/chum08a.pdf
Fine-grained image search
https://users.eecs.northwestern.edu/~jwa368/pdfs/deep_ranking.pdf
Efficient large-scale structured learning
http://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Branson_Efficient_Large-Scale_Structured_2013_CVPR_paper.pdf
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks
https://arxiv.org/pdf/1504.08289.pdf
Part-based R-CNNs for Fine-grained Category Detection:
Bird Species Categorization Using Pose Normalized Deep Convolutional Nets
Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition
The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification:
Bilinear CNN Models for Fine-grained Visual Recognition:
Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval:
Near Duplicate Image Detection: min-Hash and tf-idf Weighting:
Fine-grained image search:
Efficient large-scale structured learning:
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks:
圖像壓縮
經(jīng)典模型:
Auto-Encoding Variational Bayes
https://arxiv.org/pdf/1312.6114.pdf
k-Sparse Autoencoders
https://arxiv.org/pdf/1312.5663.pdf
Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
http://www.iro.umontreal.ca/~lisa/pointeurs/ICML2011_explicit_invariance.pdf
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
http://www.jmlr.org/papers/volume11/vincent10a/vincent10a.pdf
Tutorial on Variational Autoencoders
https://arxiv.org/pdf/1606.05908.pdf
End-to-end Optimized Image Compression
https://openreview.net/pdf?id=rJxdQ3jeg
Guetzli: Perceptually Guided JPEG Encoder
https://arxiv.org/pdf/1703.04421.pdf
Auto-Encoding Variational Bayes:
k-Sparse Autoencoders:
Contractive Auto-Encoders: Explicit Invariance During Feature Extraction:
Stacked Denoising Autoencoders: Learning Useful Representa-tions in a Deep Network with a Local Denoising Criterion:
Tutorial on Variational Autoencoders:
End-to-end Optimized Image Compression:
Guetzli: Perceptually Guided JPEG Encoder:
引用塊內(nèi)容
NLP領(lǐng)域
教程:http://cs224d.stanford.edu/syllabus.html
注:
1)目前接觸了該領(lǐng)域的一點皮毛,后續(xù)會慢慢更新。
2)也希望研究該領(lǐng)域的朋友們做出一些貢獻(xiàn),期待你們的加入。
語音識別領(lǐng)域
注:
1)目前還沒有詳細(xì)了解語音識別領(lǐng)域,后續(xù)會慢添加更新。
2)也希望研究該領(lǐng)域的朋友們做出一些貢獻(xiàn),期待你們的加入。
AGI – 通用人工智能領(lǐng)域
注:
1)目前還沒有詳細(xì)了解語音識別領(lǐng)域,后續(xù)會慢添加。
2)也希望研究該領(lǐng)域的朋友們做出一些貢獻(xiàn),期待你們的加入。
深度學(xué)習(xí)引起的一些新的技術(shù):
視頻教程:
https://cn.udacity.com/course/reinforcement-learning–ud600
注:由于還沒有學(xué)習(xí)到該部分,僅僅知道這個新的概念,后面會慢慢添加進(jìn)來。
Tutorial:http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
課程: http://rll.berkeley.edu/deeprlcourse/
DeepMind:
https://deepmind.com/blog/deep-reinforcement-learning/
終結(jié)語
注:
1. 好了,終于差不多啦,為了寫這個東西,花費了很多時間,但是通過這個總結(jié)以后,我也學(xué)到了很多,我真正的認(rèn)識到DeepLearning已經(jīng)貫穿了整個CV領(lǐng)域。如果你從事CV領(lǐng)域的話,我建議你花一些時間去了解深度學(xué)習(xí)吧!畢竟,它正在顛覆這個鄰域!
2. 由于經(jīng)驗有限,可能會有一些錯誤,希望大家多多包涵。如果你有任何問題,可以你消息給我,我會及時的回復(fù)大家。
3. 由于本博客是我自己原創(chuàng),如需轉(zhuǎn)載,請聯(lián)系我。
郵箱:1575262785@qq.com
總結(jié)
以上是生活随笔為你收集整理的深度学习在CV领域的进展以及一些由深度学习演变的新技术的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Win10怎么修改文档存储位置 Win1
- 下一篇: 深度学习框架汇总