日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 >

深度学习在CV领域的进展以及一些由深度学习演变的新技术

發(fā)布時間:2023/12/15 123 豆豆
生活随笔 收集整理的這篇文章主要介紹了 深度学习在CV领域的进展以及一些由深度学习演变的新技术 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

CV領(lǐng)域

1.進(jìn)展:如上圖所述,當(dāng)前CV領(lǐng)域主要包括兩個大的方向,”低層次的感知” 和 “高層次的認(rèn)知”。

2.主要的應(yīng)用領(lǐng)域:視頻監(jiān)控、人臉識別、醫(yī)學(xué)圖像分析、自動駕駛、 機(jī)器人、AR、VR

3.主要的技術(shù):分類、目標(biāo)檢測(識別)、分割、目標(biāo)追蹤、邊緣檢測、姿勢評估、理解CNN、超分辨率重建、序列學(xué)習(xí)、特征檢測與匹配、圖像標(biāo)定,視頻標(biāo)定、問答系統(tǒng)、圖片生成(文本生成圖像)、視覺關(guān)注性和顯著性(質(zhì)量評價)、人臉識別、3D重建、推薦系統(tǒng)、細(xì)粒度圖像分析、圖像壓縮

分類主要需要解決的問題是“我是誰?”
目標(biāo)檢測主要需要解決的問題是“我是誰? 我在哪里?”
分割主要需要解決的問題是“我是誰? 我在哪里?你是否能夠正確分割我?”
目標(biāo)追蹤主要需要解決的問題是“你能不能跟上我的步伐,盡快找到我?”
邊緣檢測主要需要解決的問題是:“如何準(zhǔn)確的檢測到目標(biāo)的邊緣?”
人體姿勢評估主要需要解決的問題是:“你需要通過我的姿勢判斷我在干什么?”
理解CNN主要需要解決的問題是:“從理論上深層次的去理解CNN的原理?”
超分辨率重建主要需要解決的問題是:“你如何從低質(zhì)量圖片獲得高質(zhì)量的圖片?”
序列學(xué)習(xí)主要解決的問題是“你知道我的下一幅圖像或者下一幀視頻是什么嗎?”
特征檢測與匹配主要需要解決的問題是“檢測圖像的特征,判斷相似程度?”
圖像標(biāo)定主要需要解決的問題是“你能說出圖像中有什么東西?他們在干什么呢?”
視頻標(biāo)定主要需要解決的問題是“你知道我這幾幀視頻說明了什么嗎?”
問答系統(tǒng)主要需要解決的問題是:“你能根據(jù)圖像正確回答我提問的問題嗎?”
圖片生成主要需要解決的問題是:“我能通過你給的信息準(zhǔn)確的生成對應(yīng)的圖片?”
視覺關(guān)注性和顯著性主要需要解決的問題是:“如何提出模擬人類視覺注意機(jī)制的模型?”
人臉識別主要需要解決的問題是:“機(jī)器如何準(zhǔn)確的識別出同一個人在不同情況下的臉?”
3D重建主要需要解決的問題是“你能通過我給你的圖片生成對應(yīng)的高質(zhì)量3D點云嗎?”
推薦系統(tǒng)主要需要解決的問題是“你能根據(jù)我的輸入給出準(zhǔn)確的輸出嗎?”
細(xì)粒度圖像分析主要需要解決的問題是“你能辨別出我是哪一種狗嗎?等這些更精細(xì)的任務(wù)”
圖像壓縮主要需要解決的問題是“如何以較少的比特有損或者無損的表示原來的圖像?”

注:
1. 以下我主要從CV領(lǐng)域中的各個小的領(lǐng)域入手,總結(jié)該領(lǐng)域中一些網(wǎng)絡(luò)模型,基本上覆蓋到了各個領(lǐng)域,力求完整的收集各種經(jīng)典的模型,順序基本上是按照時間的先后,一般最后是該領(lǐng)域最新提出來的方案,我主要的目的是做一個整理,方便自己和他人的使用,你不再需要去網(wǎng)上收集大把的資料,需要的是仔細(xì)分析這些模型,并提出自己新的模型。這里面收集的論文質(zhì)量都比較高,主要來自于ECCV、ICCV、CVPR、PAM、arxiv、ICLR、ACM等頂尖國際會議。并且為每篇論文都添加了鏈接。可以大大地節(jié)約你的時間。同時,我挑選出論文比較重要的網(wǎng)絡(luò)模型或者整體架構(gòu),可以方便你去進(jìn)行對比。有一個更好的全局觀。具體 細(xì)節(jié)需要你去仔細(xì)的閱讀論文。由于個人的精力有限,我只能做成這樣,希望大家能夠理解。謝謝。
2. 我會利用自己的業(yè)余時間來更新新的模型,但是由于時間和精力有限,可能并不完整,我希望大家都能貢獻(xiàn)的一份力量,如果你發(fā)現(xiàn)新的模型,可以聯(lián)系我,我會及時回復(fù)大家,期待著的加入,讓我們一起服務(wù)大家!

如下圖所示:

分類:這是一個基礎(chǔ)的研究課題,已經(jīng)獲得了很高的準(zhǔn)確率,在一些場合上面已經(jīng)遠(yuǎn)遠(yuǎn)地超過啦人類啦!

典型的網(wǎng)絡(luò)模型

  • LeNet
    http://yann.lecun.com/exdb/lenet/index.html

  • AlexNet
    http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  • Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
    https://arxiv.org/pdf/1502.01852.pdf

  • Batch Normalization
    https://arxiv.org/pdf/1502.03167.pdf

  • GoogLeNet
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf

  • VGGNet
    https://arxiv.org/pdf/1409.1556.pdf

  • ResNet
    https://arxiv.org/pdf/1512.03385.pdf

  • InceptionV4(Inception-ResNet)
    https://arxiv.org/pdf/1602.07261.pdf

  • LeNet網(wǎng)絡(luò)1:

    LeNet網(wǎng)絡(luò)2:

    AlexNet網(wǎng)絡(luò)1:

    AlexNet網(wǎng)絡(luò)2:

    Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification網(wǎng)絡(luò):

    GoogLeNet網(wǎng)絡(luò)1:

    GoogLeNet網(wǎng)絡(luò)2:

    Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification網(wǎng)絡(luò):

    Batch Normalization:

    VGGNet網(wǎng)絡(luò)1:

    VGGNet網(wǎng)絡(luò)2:

    ResNet網(wǎng)絡(luò):

    InceptionV4網(wǎng)絡(luò):

    圖像檢測:這是基于圖像分類的基礎(chǔ)上所做的一些研究,即分類+定位。

    典型網(wǎng)絡(luò)

  • OVerfeat
    https://arxiv.org/pdf/1312.6229.pdf

  • RNN
    https://arxiv.org/pdf/1311.2524.pdf

  • SPP-Net
    https://arxiv.org/pdf/1406.4729.pdf

  • DeepID-Net
    https://arxiv.org/pdf/1409.3505.pdf

  • Fast R-CNN
    https://arxiv.org/pdf/1504.08083.pdf

  • R-CNN minus R
    https://arxiv.org/pdf/1506.06981.pdf

  • End-to-end people detection in crowded scenes
    https://arxiv.org/pdf/1506.04878.pdf

  • DeepBox
    https://arxiv.org/pdf/1505.02146.pdf

  • MR-CNN
    http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Gidaris_Object_Detection_via_ICCV_2015_paper.pdf

  • Faster R-CNN
    https://arxiv.org/pdf/1506.01497.pdf

  • YOLO
    https://arxiv.org/pdf/1506.02640.pdf

  • DenseBox
    https://arxiv.org/pdf/1509.04874.pdf

  • Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning
    https://arxiv.org/pdf/1503.00949.pdf

  • R-FCN
    https://arxiv.org/pdf/1605.06409.pdf

  • SSD
    https://arxiv.org/pdf/1512.02325v2.pdf

  • Inside-Outside Net
    https://arxiv.org/pdf/1512.04143.pdf

  • G-CNN
    https://arxiv.org/pdf/1512.07729.pdf

  • PVANET
    https://arxiv.org/pdf/1608.08021.pdf

  • Speed/accuracy trade-offs for modern convolutional object detectors
    https://arxiv.org/pdf/1611.10012v1.pdf

  • OVerfeat網(wǎng)絡(luò):

    R-CNN網(wǎng)絡(luò):

    SPP-Net網(wǎng)絡(luò):

    DeepID-Net網(wǎng)絡(luò):

    DeepBox網(wǎng)絡(luò):

    MR-CNN網(wǎng)絡(luò):

    Fast-RCNN網(wǎng)絡(luò):

    R-CNN minus R網(wǎng)絡(luò):

    End-to-end people detection in crowded scenes網(wǎng)絡(luò):

    Faster-RCNN網(wǎng)絡(luò):

    DenseBox網(wǎng)絡(luò):

    Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning網(wǎng)絡(luò):

    R-FCN網(wǎng)絡(luò):

    YOLO和SDD網(wǎng)絡(luò):

    Inside-Outside Net網(wǎng)絡(luò):

    G-CNN網(wǎng)絡(luò):

    PVANET網(wǎng)絡(luò):

    Speed/accuracy trade-offs for modern convolutional object detectors:

    圖像分割

    經(jīng)典網(wǎng)絡(luò)模型:

  • FCN
    https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf

  • segNet
    https://arxiv.org/pdf/1511.00561.pdf

  • Deeplab
    https://arxiv.org/pdf/1606.00915.pdf

  • deconvNet
    https://arxiv.org/pdf/1505.04366.pdf

  • Conditional Random Fields as Recurrent Neural Networks
    http://www.robots.ox.ac.uk/~szheng/papers/CRFasRNN.pdf

  • Semantic Segmentation using Adversarial Networks
    https://arxiv.org/pdf/1611.08408.pdf

  • SEC: Seed, Expand and Constrain:
    http://pub.ist.ac.at/~akolesnikov/files/ECCV2016/main.pdf

  • Efficient piecewise training of deep structured models for semantic segmentation
    https://arxiv.org/pdf/1504.01013.pdf

  • Semantic Image Segmentation via Deep Parsing Network
    https://arxiv.org/pdf/1509.02634.pdf

  • BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
    https://arxiv.org/pdf/1503.01640.pdf

  • Learning Deconvolution Network for Semantic Segmentation
    https://arxiv.org/pdf/1505.04366.pdf

  • Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation
    https://arxiv.org/pdf/1506.04924.pdf

  • PUSHING THE BOUNDARIES OF BOUNDARY DETECTION USING DEEP LEARNING
    https://arxiv.org/pdf/1511.07386.pdf

  • Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network
    https://arxiv.org/pdf/1512.07928.pdf

  • Feedforward Semantic Segmentation With Zoom-Out Features
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mostajabi_Feedforward_Semantic_Segmentation_2015_CVPR_paper.pdf

  • Joint Calibration for Semantic Segmentation
    https://arxiv.org/pdf/1507.01581.pdf

  • Hypercolumns for Object Segmentation and Fine-Grained Localization
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Hariharan_Hypercolumns_for_Object_2015_CVPR_paper.pdf

  • Scene Parsing with Multiscale Feature Learning
    http://yann.lecun.com/exdb/publis/pdf/farabet-icml-12.pdf

  • Learning Hierarchical Features for Scene Labeling
    http://yann.lecun.com/exdb/publis/pdf/farabet-pami-13.pdf

  • Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing
    http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Izadinia_Segment-Phrase_Table_for_ICCV_2015_paper.pdf

  • MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS
    https://arxiv.org/pdf/1511.07122v2.pdf

  • Weakly supervised graph based semantic segmentation by learning communities of image-parts
    http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Pourian_Weakly_Supervised_Graph_ICCV_2015_paper.pdf

  • FCN網(wǎng)絡(luò)1:

    FCN網(wǎng)絡(luò)2:

    segNet網(wǎng)絡(luò):

    Deeplab網(wǎng)絡(luò):

    deconvNet網(wǎng)絡(luò):

    Conditional Random Fields as Recurrent Neural Networks網(wǎng)絡(luò):

    Semantic Segmentation using Adversarial Networks網(wǎng)絡(luò):

    SEC: Seed, Expand and Constrain網(wǎng)絡(luò):

    Efficient piecewise training of deep structured models for semantic segmentation網(wǎng)絡(luò):


    Semantic Image Segmentation via Deep Parsing Network網(wǎng)絡(luò):

    BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation:

    Learning Deconvolution Network for Semantic Segmentation:

    PUSHING THE BOUNDARIES OF BOUNDARY DETECTION USING DEEP LEARNING:


    Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation:

    Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network:

    Feedforward Semantic Segmentation With Zoom-Out Features網(wǎng)絡(luò):


    Joint Calibration for Semantic Segmentation:

    Hypercolumns for Object Segmentation and Fine-Grained Localization:

    Learning Hierarchical Features for Scene Labeling:

    MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS:

    Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing:

    Weakly supervised graph based semantic segmentation by learning communities of image-parts:

    Scene Parsing with Multiscale Feature Learning:

    目標(biāo)追蹤

    經(jīng)典網(wǎng)絡(luò):

  • DLT
    https://pdfs.semanticscholar.org/b218/0fc4f5cb46b5b5394487842399c501381d67.pdf

  • Transferring Rich Feature Hierarchies for Robust Visual Tracking
    https://arxiv.org/pdf/1501.04587.pdf

  • FCNT
    http://202.118.75.4/lu/Paper/ICCV2015/iccv15_lijun.pdf

  • Hierarchical Convolutional Features for Visual Tracking
    http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Ma_Hierarchical_Convolutional_Features_ICCV_2015_paper.pdf

  • MDNet
    https://arxiv.org/pdf/1510.07945.pdf

  • Recurrently Target-Attending Tracking
    http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Cui_Recurrently_Target-Attending_Tracking_CVPR_2016_paper.pdf

  • DeepTracking
    http://www.bmva.org/bmvc/2014/files/paper028.pdf

  • DeepTrack
    http://www.bmva.org/bmvc/2014/files/paper028.pdf

  • Online Tracking by Learning Discriminative Saliency Map
    with Convolutional Neural Network
    https://arxiv.org/pdf/1502.06796.pdf

  • Transferring Rich Feature Hierarchies for Robust Visual Tracking
    https://arxiv.org/pdf/1501.04587.pdf

  • DLT網(wǎng)絡(luò):

    Transferring Rich Feature Hierarchies for Robust Visual Tracking網(wǎng)絡(luò):

    FCNT網(wǎng)絡(luò):

    Hierarchical Convolutional Features for Visual Tracking網(wǎng)絡(luò):

    MDNet網(wǎng)絡(luò):

    DeepTracking網(wǎng)絡(luò):

    ecurrently Target-Attending Tracking網(wǎng)絡(luò):

    DeepTrack網(wǎng)絡(luò):

    Online Tracking by Learning Discriminative Saliency Map
    with Convolutional Neural Network:

    邊緣檢測

    經(jīng)典模型:

  • HED
    https://arxiv.org/pdf/1504.06375.pdf

  • DeepEdge
    https://arxiv.org/pdf/1412.1123.pdf

  • DeepConto
    http://mc.eistar.net/UpLoadFiles/Papers/DeepContour_cvpr15.pdf

  • HED網(wǎng)絡(luò):

    DeepEdge網(wǎng)絡(luò):

    DeepContour網(wǎng)絡(luò):

    人體姿勢評估

    經(jīng)典模型:

  • DeepPose
    https://arxiv.org/pdf/1312.4659.pdf

  • JTCN
    https://www.robots.ox.ac.uk/~vgg/rg/papers/tompson2014.pdf

  • Flowing convnets for human pose estimation in videos
    https://arxiv.org/pdf/1506.02897.pdf

  • Stacked hourglass networks for human pose estimation
    https://arxiv.org/pdf/1603.06937.pdf

  • Convolutional pose machines
    https://arxiv.org/pdf/1602.00134.pdf

  • Deepcut
    https://arxiv.org/pdf/1605.03170.pdf

  • Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
    https://arxiv.org/pdf/1611.08050.pdf

  • DeepPose網(wǎng)絡(luò):

    JTCN網(wǎng)絡(luò):

    Flowing convnets for human pose estimation in videos網(wǎng)絡(luò):

    Stacked hourglass networks for human pose estimation網(wǎng)絡(luò):

    Convolutional pose machines網(wǎng)絡(luò):

    Deepcut網(wǎng)絡(luò):

    Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields網(wǎng)絡(luò):

    理解CNN

    經(jīng)典網(wǎng)絡(luò):

  • Visualizing and Understanding Convolutional Networks
    https://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf

  • Inverting Visual Representations with Convolutional Networks
    https://arxiv.org/pdf/1506.02753.pdf

  • Object Detectors Emerge in Deep Scene CNNs
    https://arxiv.org/pdf/1412.6856.pdf

  • Understanding Deep Image Representations by Inverting Them
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mahendran_Understanding_Deep_Image_2015_CVPR_paper.pdf

  • Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf

  • Understanding image representations by measuring their equivariance and equivalence
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Lenc_Understanding_Image_Representations_2015_CVPR_paper.pdf

  • Visualizing and Understanding Convolutional Networks網(wǎng)絡(luò):

    Inverting Visual Representations with Convolutional Networks:

    Object Detectors Emerge in Deep Scene CNNs:

    Understanding Deep Image Representations by Inverting Them:

    Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images:

    Understanding image representations by measuring their equivariance and equivalence:

    超分辨率重建

    經(jīng)典模型:

  • Learning Iterative Image Reconstruction
    http://www.ais.uni-bonn.de/behnke/papers/ijcai01.pdf

  • Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid
    http://www.ais.uni-bonn.de/behnke/papers/ijcia01.pdf

  • Learning a Deep Convolutional Network for Image Super-Resolution
    http://personal.ie.cuhk.edu.hk/~ccloy/files/eccv_2014_deepresolution.pdf

  • Image Super-Resolution Using Deep Convolutional Networks
    https://arxiv.org/pdf/1501.00092.pdf

  • Accurate Image Super-Resolution Using Very Deep Convolutional Networks
    https://arxiv.org/pdf/1511.04587.pdf

  • Deeply-Recursive Convolutional Network for Image Super-Resolution
    https://arxiv.org/pdf/1511.04491.pdf

  • Deep Networks for Image Super-Resolution with Sparse Prior
    http://www.ifp.illinois.edu/~dingliu2/iccv15/iccv15.pdf

  • Perceptual Losses for Real-Time Style Transfer and Super-Resolution
    https://arxiv.org/pdf/1603.08155.pdf

  • Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
    https://arxiv.org/pdf/1609.04802v3.pdf

  • Learning Iterative Image Reconstruction網(wǎng)絡(luò):

    Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid:

    Learning a Deep Convolutional Network for Image Super-Resolution:

    Image Super-Resolution Using Deep Convolutional Networks:

    Accurate Image Super-Resolution Using Very Deep Convolutional Networks:

    Deeply-Recursive Convolutional Network for Image Super-Resolution:

    Deep Networks for Image Super-Resolution with Sparse Prior:

    Perceptual Losses for Real-Time Style Transfer and Super-Resolution:

    Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network:

    圖像標(biāo)定

    經(jīng)典模型:

  • Explain Images with Multimodal Recurrent Neural Networks
    https://arxiv.org/pdf/1410.1090.pdf

  • Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
    https://arxiv.org/pdf/1411.2539.pdf

  • Long-term Recurrent Convolutional Networks for Visual Recognition and Description
    https://arxiv.org/pdf/1411.4389.pdf

  • A Neural Image Caption Generator
    https://arxiv.org/pdf/1411.4555.pdf

  • Deep Visual-Semantic Alignments for Generating Image Description
    http://cs.stanford.edu/people/karpathy/cvpr2015.pdf

  • Translating Videos to Natural Language Using Deep Recurrent Neural Networks
    https://arxiv.org/pdf/1412.4729.pdf

  • Learning a Recurrent Visual Representation for Image Caption Generation
    https://arxiv.org/pdf/1411.5654.pdf

  • From Captions to Visual Concepts and Back
    https://arxiv.org/pdf/1411.4952.pdf

  • Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention
    http://www.cs.toronto.edu/~zemel/documents/captionAttn.pdf

  • Phrase-based Image Captioning
    https://arxiv.org/pdf/1502.03671.pdf

  • Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
    https://arxiv.org/pdf/1504.06692.pdf

  • Exploring Nearest Neighbor Approaches for Image Captioning
    https://arxiv.org/pdf/1505.04467.pdf

  • Image Captioning with an Intermediate Attributes Layer
    https://arxiv.org/pdf/1506.01144.pdf

  • Learning language through pictures
    https://arxiv.org/pdf/1506.03694.pdf

  • Describing Multimedia Content using Attention-based Encoder-Decoder Networks
    https://arxiv.org/pdf/1507.01053.pdf

  • Image Representations and New Domains in Neural Image Captioning
    https://arxiv.org/pdf/1508.02091.pdf

  • Learning Query and Image Similarities with Ranking Canonical Correlation Analysis
    http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Yao_Learning_Query_and_ICCV_2015_paper.pdf

  • Generative Adversarial Text to Image Synthesis
    https://arxiv.org/pdf/1605.05396.pdf

  • GENERATING IMAGES FROM CAPTIONS WITH ATTENTION
    https://arxiv.org/pdf/1511.02793.pdf

  • Explain Images with Multimodal Recurrent Neural Networks:

    Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models:

    Long-term Recurrent Convolutional Networks for Visual Recognition and Description:

    A Neural Image Caption Generator:

    Deep Visual-Semantic Alignments for Generating Image Description:

    Translating Videos to Natural Language Using Deep Recurrent Neural Networks:

    Learning a Recurrent Visual Representation for Image Caption Generation:

    From Captions to Visual Concepts and Back:

    Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention:

    Phrase-based Image Captioning:

    Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images:

    Exploring Nearest Neighbor Approaches for Image Captioning:

    Image Captioning with an Intermediate Attributes Layer:

    Learning language through pictures:

    Describing Multimedia Content using Attention-based Encoder-Decoder Networks:

    Image Representations and New Domains in Neural Image Captioning:

    Learning Query and Image Similarities with Ranking Canonical Correlation Analysis:

    Generative Adversarial Text to Image Synthesis:

    GENERATING IMAGES FROM CAPTIONS WITH ATTENTION:

    視頻標(biāo)注

    經(jīng)典模型:

  • Long-term Recurrent Convolutional Networks for Visual Recognition and Description
    https://arxiv.org/pdf/1411.4389.pdf

  • Translating Videos to Natural Language Using Deep Recurrent Neural Networks
    https://arxiv.org/pdf/1412.4729.pdf

  • Joint Modeling Embedding and Translation to Bridge Video and Language
    https://arxiv.org/pdf/1505.01861.pdf

  • Sequence to Sequence–Video to Text
    https://arxiv.org/pdf/1505.00487.pdf

  • Describing Videos by Exploiting Temporal Structure
    https://arxiv.org/pdf/1502.08029.pdf

  • The Long-Short Story of Movie Description
    https://arxiv.org/pdf/1506.01698.pdf

  • Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
    https://arxiv.org/pdf/1506.06724.pdf

  • Describing Multimedia Content using Attention-based Encoder-Decoder Networks
    https://arxiv.org/pdf/1507.01053.pdf

  • Temporal Tessellation for Video Annotation and Summarization
    https://arxiv.org/pdf/1612.06950.pdf

  • Summarization-based Video Caption via Deep Neural Networks
    acm=1492135731_7c7cb5d6bf7455db7f4aa75b341d1a78”>http://delivery.acm.org/10.1145/2810000/2806314/p1191-li.pdf?ip=123.138.79.12&id=2806314&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=923677366&CFTOKEN=37844144&acm=1492135731_7c7cb5d6bf7455db7f4aa75b341d1a78

  • Deep Learning for Video Classification and Captioning
    https://arxiv.org/pdf/1609.06782.pdf

  • Long-term Recurrent Convolutional Networks for Visual Recognition and Description:

    Translating Videos to Natural Language Using Deep Recurrent Neural Networks:

    Joint Modeling Embedding and Translation to Bridge Video and Language:

    Sequence to Sequence–Video to Text:

    Describing Videos by Exploiting Temporal Structure:

    The Long-Short Story of Movie Description:

    Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books:

    Describing Multimedia Content using Attention-based Encoder-Decoder Networks:

    Temporal Tessellation for Video Annotation and Summarization:

    Summarization-based Video Caption via Deep Neural Networks:

    Deep Learning for Video Classification and Captioning:

    問答系統(tǒng)

    經(jīng)典模型:

  • VQA: Visual Question Answering
    https://arxiv.org/pdf/1505.00468.pdf

  • Ask Your Neurons: A Neural-based Approach to Answering Questions about Images
    https://arxiv.org/pdf/1505.01121.pdf

  • Image Question Answering: A Visual Semantic Embedding Model and a New Dataset
    https://arxiv.org/pdf/1505.02074.pdf

  • Stacked Attention Networks for Image Question Answering
    https://arxiv.org/pdf/1511.02274v2.pdf

  • Dataset and Methods for Multilingual Image Question Answering
    https://arxiv.org/pdf/1505.05612.pdf

  • Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

  • Dynamic Memory Networks for Visual and Textual Question Answering
    https://arxiv.org/pdf/1603.01417v1.pdf

  • Multimodal Residual Learning for Visual QA
    https://arxiv.org/pdf/1606.01455.pdf

  • Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
    https://arxiv.org/pdf/1606.01847.pdf

  • Training Recurrent Answering Units with Joint Loss Minimization for VQA
    https://arxiv.org/pdf/1606.03647.pdf

  • Hadamard Product for Low-rank Bilinear Pooling
    https://arxiv.org/pdf/1610.04325.pdf

  • Question Answering Using Deep Learning
    https://cs224d.stanford.edu/reports/StrohMathur.pdf

  • VQA: Visual Question Answering:

    Ask Your Neurons: A Neural-based Approach to Answering Questions about Images:

    Image Question Answering: A Visual Semantic Embedding Model and a New Dataset:

    Stacked Attention Networks for Image Question Answering:

    Dataset and Methods for Multilingual Image Question Answering:

    Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction:

    Dynamic Memory Networks for Visual and Textual Question Answering:

    Multimodal Residual Learning for Visual QA:

    Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding:

    Training Recurrent Answering Units with Joint Loss Minimization for VQA:

    Hadamard Product for Low-rank Bilinear Pooling:

    Question Answering Using Deep Learning:

    圖片生成(CNN、RNN、LSTM、GAN)

    經(jīng)典模型:

  • Conditional Image Generation with PixelCNN Decoders
    https://arxiv.org/pdf/1606.05328v2.pdf

  • Learning to Generate Chairs with Convolutional Neural Networks
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf

  • DRAW: A Recurrent Neural Network For Image Generation
    https://arxiv.org/pdf/1502.04623v2.pdf

  • Generative Adversarial Networks
    https://arxiv.org/pdf/1406.2661.pdf

  • Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
    https://arxiv.org/pdf/1506.05751.pdf

  • A note on the evaluation of generative models
    https://arxiv.org/pdf/1511.01844.pdf

  • Variationally Auto-Encoded Deep Gaussian Processes
    https://arxiv.org/pdf/1511.06455v2.pdf

  • Generating Images from Captions with Attention
    https://arxiv.org/pdf/1511.02793v2.pdf

  • Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks
    https://arxiv.org/pdf/1511.06390v1.pdf

  • Censoring Representations with an Adversary
    https://arxiv.org/pdf/1511.05897v3.pdf

  • Distributional Smoothing with Virtual Adversarial Training
    https://arxiv.org/pdf/1507.00677v8.pdf

  • Generative Visual Manipulation on the Natural Image Manifold
    https://arxiv.org/pdf/1609.03552v2.pdf

  • Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
    https://arxiv.org/pdf/1511.06434.pdf

  • Wasserstein GAN
    https://arxiv.org/pdf/1701.07875.pdf

  • Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities
    https://arxiv.org/pdf/1701.06264.pdf

  • Conditional Generative Adversarial Nets
    https://arxiv.org/pdf/1411.1784.pdf

  • InfoGAN: Interpretable Representation Learning byInformation Maximizing Generative Adversarial Nets
    https://arxiv.org/pdf/1606.03657.pdf

  • Conditional Image Synthesis With Auxiliary Classifier GANs
    https://arxiv.org/pdf/1610.09585.pdf

  • SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
    https://arxiv.org/pdf/1609.05473.pdf

  • Improved Training of Wasserstein GANs
    https://arxiv.org/pdf/1704.00028.pdf

  • Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis
    https://arxiv.org/pdf/1704.04086.pdf

  • Conditional Image Generation with PixelCNN Decoders:

    Learning to Generate Chairs with Convolutional Neural Networks:

    DRAW: A Recurrent Neural Network For Image Generation:

    Generative Adversarial Networks:

    Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks:

    A note on the evaluation of generative models:

    Variationally Auto-Encoded Deep Gaussian Processes:

    Generating Images from Captions with Attention:

    Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks:

    Censoring Representations with an Adversary:

    Distributional Smoothing with Virtual Adversarial Training:

    Generative Visual Manipulation on the Natural Image Manifold:

    Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks:

    Wasserstein GAN:

    Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities:

    Conditional Generative Adversarial Nets:

    InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets:

    Conditional Image Synthesis With Auxiliary Classifier GANs:

    SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient:

    Improved Training of Wasserstein GANs:

    Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis:

    視覺關(guān)注性和顯著性

    經(jīng)典模型:

  • Predicting Eye Fixations using Convolutional Neural Networks
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Liu_Predicting_Eye_Fixations_2015_CVPR_paper.pdf

  • Learning a Sequential Search for Landmarks
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Singh_Learning_a_Sequential_2015_CVPR_paper.pdf

  • Multiple Object Recognition with Visual Attention
    https://arxiv.org/pdf/1412.7755.pdf

  • Recurrent Models of Visual Attention
    http://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf

  • Capacity Visual Attention Networks
    http://easychair.org/publications/download/Capacity_Visual_Attention_Networks

  • Fully Convolutional Attention Networks for Fine-Grained Recognition
    https://arxiv.org/pdf/1603.06765.pdf

  • Predicting Eye Fixations using Convolutional Neural Networks:

    Learning a Sequential Search for Landmarks:

    Multiple Object Recognition with Visual Attention:

    Recurrent Models of Visual Attention:

    Capacity Visual Attention Networks:

    Fully Convolutional Attention Networks for Fine-Grained Recognition:

    特征檢測與匹配(塊)

    經(jīng)典模型:

  • TILDE: A Temporally Invariant Learned DEtector
    https://arxiv.org/pdf/1411.4568.pdf

  • MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching
    https://pdfs.semanticscholar.org/81b9/24da33b9500a2477532fd53f01df00113972.pdf

  • Discriminative Learning of Deep Convolutional Feature Point Descriptors
    http://cvlabwww.epfl.ch/~trulls/pdf/iccv-2015-deepdesc.pdf

  • Learning to Assign Orientations to Feature Points
    https://arxiv.org/pdf/1511.04273.pdf

  • PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors
    https://arxiv.org/pdf/1601.05030.pdf

  • Multi-scale Pyramid Pooling for Deep Convolutional Representation
    http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7301274

  • Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
    https://arxiv.org/pdf/1406.4729.pdf

  • Learning to Compare Image Patches via Convolutional Neural Networks
    https://arxiv.org/pdf/1504.03641.pdf

  • PixelNet: Representation of the pixels, by the pixels, and for the pixels
    http://www.cs.cmu.edu/~aayushb/pixelNet/pixelnet.pdf

  • LIFT: Learned Invariant Feature Transform
    https://arxiv.org/pdf/1603.09114.pdf

  • TILDE: A Temporally Invariant Learned DEtector:

    MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching:

    Discriminative Learning of Deep Convolutional Feature Point Descriptors:

    Learning to Assign Orientations to Feature Points:

    PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors:

    Multi-scale Pyramid Pooling for Deep Convolutional Representation:

    Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition:

    Learning to Compare Image Patches via Convolutional Neural Networks:



    PixelNet: Representation of the pixels, by the pixels, and for the pixels:

    LIFT: Learned Invariant Feature Transform:


    人臉識別

    經(jīng)典模型:

  • Learning Hierarchical Representations for Face Verification with Convolutional Deep Belief Networks
    http://vis-www.cs.umass.edu/papers/HuangCVPR12.pdf

  • Deep Convolutional Network Cascade for Facial Point Detection
    http://mmlab.ie.cuhk.edu.hk/archive/CNN/data/CNN_FacePoint.pdf

  • Deep Nonlinear Metric Learning with Independent Subspace Analysis for Face Verification
    acm=1492152722_04e9cce5378080a18ec7e700dfb4cd28”>http://delivery.acm.org/10.1145/2400000/2396303/p749-cai.pdf?ip=123.138.79.12&id=2396303&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=923677366&CFTOKEN=37844144&acm=1492152722_04e9cce5378080a18ec7e700dfb4cd28

  • DeepFace: Closing the Gap to Human-Level Performance in Face Verification
    https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf

  • Deep learning face representation by joint identification-verification
    https://arxiv.org/pdf/1406.4773.pdf

  • Deep learning face representation from predicting 10,000 classes
    http://mmlab.ie.cuhk.edu.hk/pdf/YiSun_CVPR14.pdf

  • Deeply learned face representations are sparse, selective, and robust
    https://arxiv.org/pdf/1412.1265.pdf

  • Deepid3: Face recognition with very deep neural networks
    https://arxiv.org/pdf/1502.00873.pdf

  • FaceNet: A Unified Embedding for Face Recognition and Clustering
    https://arxiv.org/pdf/1503.03832.pdf

  • Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness
    https://arxiv.org/pdf/1609.07304.pdf

  • Large-pose Face Alignment via CNN-based Dense 3D Model Fitting
    http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Jourabloo_Large-Pose_Face_Alignment_CVPR_2016_paper.pdf

  • Unconstrained 3D face reconstruction
    http://cvlab.cse.msu.edu/pdfs/Roth_Tong_Liu_CVPR2015.pdf

  • Adaptive contour fitting for pose-invariant 3D face shape reconstruction
    http://akme-a2.iosb.fraunhofer.de/ETGS15p/2015_Adaptive%20contour%20fitting%20for%20pose-invariant%203D%20face%20shape%20reconstruction.pdf

  • High-fidelity pose and expression normalization for face recognition in the wild
    http://www.cbsr.ia.ac.cn/users/xiangyuzhu/papers/CVPR2015_High-Fidelity.pdf

  • Adaptive 3D face reconstruction from unconstrained photo collections
    http://cvlab.cse.msu.edu/pdfs/Roth_Tong_Liu_CVPR16.pdf

  • Dense 3D face alignment from 2d videos in real-time
    http://ieeexplore.ieee.org/stamp/stamp.jsp arnumber=7163142

  • Robust facial landmark detection under significant head poses and occlusion
    http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Wu_Robust_Facial_Landmark_ICCV_2015_paper.pdf

  • A convolutional neural network cascade for face detection
    http://users.eecs.northwestern.edu/~xsh835/assets/cvpr2015_cascnn.pdf

  • Deep Face Recognition Using Deep Convolutional Neural
    Network
    http://aiehive.com/deep-face-recognition-using-deep-convolution-neural-network/

  • Multi-view Face Detection Using Deep Convolutional Neural Networks
    acm=1492157015_8ffa84e6632810ea05ff005794fed8d5”>http://delivery.acm.org/10.1145/2750000/2749408/p643-farfade.pdf?ip=123.138.79.12&id=2749408&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=923677366&CFTOKEN=37844144&acm=1492157015_8ffa84e6632810ea05ff005794fed8d5

  • HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition
    https://arxiv.org/pdf/1603.01249.pdf

  • Wider face: A face detectionbenchmark
    http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/paper.pdf

  • Joint training of cascaded cnn for face detection
    http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Qin_Joint_Training_of_CVPR_2016_paper.pdf

  • Face detection with end-to-end integration of a convnet and a 3d model
    https://arxiv.org/pdf/1606.00850.pdf

  • Face Detection using Deep Learning: An Improved Faster RCNN Approach
    https://arxiv.org/pdf/1701.08289.pdf

  • 新舊方法對比:

    Learning Hierarchical Representations for Face Verification with Convolutional Deep Belief Networks:

    Deep Convolutional Network Cascade for Facial Point Detection:

    Deep Nonlinear Metric Learning with Independent Subspace Analysis for Face Verification:

    DeepFace: Closing the Gap to Human-Level Performance in Face Verification:

    Deep learning face representation by joint identification-verification:

    Deep learning face representation from predicting 10,000 classes:

    Deeply learned face representations are sparse, selective, and robust:

    Deepid3: Face recognition with very deep neural networks:

    FaceNet: A Unified Embedding for Face Recognition and Clustering:

    Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness:

    Large-pose Face Alignment via CNN-based Dense 3D Model Fitting:

    Unconstrained 3D face reconstruction:

    Adaptive contour fitting for pose-invariant 3D face shape reconstruction:

    High-fidelity pose and expression normalization for face recognition in the wild:

    Adaptive 3D face reconstruction from unconstrained photo collections:

    Regressing a 3D face shape from a single image:

    Dense 3D face alignment from 2d videos in real-time:

    Robust facial landmark detection under significant head poses and occlusion:

    A convolutional neural network cascade for face detection:

    Deep Face Recognition Using Deep Convolutional Neural
    Network:

    Multi-view Face Detection Using Deep Convolutional Neural Networks:

    HyperFace: A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender
    Recognition:

    Wider face: A face detectionbenchmark

    Joint training of cascaded cnn for face detection::

    Face detection with end-to-end integration of a convnet and a 3d model:

    Face Detection using Deep Learning: An Improved Faster RCNN Approach:

    3D重建

    經(jīng)典模型:

  • 3D ShapeNets: A Deep Representation for Volumetric Shapes
    https://people.csail.mit.edu/khosla/papers/cvpr2015_wu.pdf

  • 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
    https://arxiv.org/pdf/1604.00449.pdf

  • Learning to generate chairs with convolutional neural networks
    https://arxiv.org/pdf/1411.5928.pdf

  • Category-specific object reconstruction from a single image
    http://people.eecs.berkeley.edu/~akar/categoryshapes.pdf

  • Enriching Object Detection with 2D-3D Registration and Continuous Viewpoint Estimation
    http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7298866

  • ShapeNet: An Information-Rich 3D Model Repository
    https://arxiv.org/pdf/1512.03012.pdf

  • 3D reconstruction of synapses with deep learning based on EM Images
    http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7558866

  • Analysis and synthesis of 3d shape families via deep-learned generative models of surfaces
    https://arxiv.org/pdf/1605.06240.pdf

  • Unsupervised Learning of 3D Structure from Images
    https://arxiv.org/pdf/1607.00662.pdf

  • Deep learning 3d shape surfaces using geometry images
    http://download.springer.com/static/pdf/605/chp%253A10.1007%252F978-3-319-46466-4_14.pdf?originUrl=http%3A%2F%2Flink.springer.com%2Fchapter%2F10.1007%2F978-3-319-46466-4_14&token2=exp=1492181498~acl=%2Fstatic%2Fpdf%2F605%2Fchp%25253A10.1007%25252F978-3-319-46466-4_14.pdf%3ForiginUrl%3Dhttp%253A%252F%252Flink.springer.com%252Fchapter%252F10.1007%252F978-3-319-46466-4_14*~hmac=b772943d8cd5f914e7bc84a30ddfdf0ef87991bee1d52717cb4930e3eccb0e63

  • FPNN: Field Probing Neural Networks for 3D Data
    https://arxiv.org/pdf/1605.06240.pdf

  • Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views
    https://arxiv.org/pdf/1505.05641.pdf

  • Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling
    https://arxiv.org/pdf/1610.07584.pdf

  • SurfNet: Generating 3D shape surfaces using deep residual networks
    https://arxiv.org/pdf/1703.04079.pdf

  • 3D ShapeNets: A Deep Representation for Volumetric Shapes:


    3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction:


    Learning to generate chairs with convolutional neural networks:

    Category-specific object reconstruction from a single image:


    Enriching Object Detection with 2D-3D Registration and Continuous Viewpoint Estimation:

    Completing 3d object shape from one depth image:

    ShapeNet: An Information-Rich 3D Model Repository:

    3D reconstruction of synapses with deep learning based on EM Images:

    Analysis and synthesis of 3d shape families via deep-learned generative models of surfaces:

    FPNN: Field Probing Neural Networks for 3D Data:


    Unsupervised Learning of 3D Structure from Images:

    Deep learning 3d shape surfaces using geometry images:


    Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views:

    Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling:

    SurfNet: Generating 3D shape surfaces using deep residual networks:


    推薦系統(tǒng)

    經(jīng)典模型:

  • Autorec: Autoencoders meet collaborative filtering
    http://users.cecs.anu.edu.au/~akmenon/papers/autorec/autorec-paper.pdf

  • User modeling with neural network for review rating prediction
    https://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwj35dyVo6nTAhWEnpQKHSAwCw4QFggjMAA&url=http%3a%2f%2fwww%2eaaai%2eorg%2focs%2findex%2ephp%2fIJCAI%2fIJCAI15%2fpaper%2fdownload%2f11051%2f10849&usg=AFQjCNHeMJX8AZzoRF0ODcZE_mXazEktUQ

  • Collaborative Deep Learning for Recommender Systems
    https://arxiv.org/pdf/1409.2944.pdf

  • A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems
    https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf

  • A neural probabilistic model for context based citation recommendation
    http://www.personal.psu.edu/wzh112/publications/aaai_slides.pdf

  • Hybrid Recommender System based on Autoencoders
    acm=1492356698_958d1b64105cd41b9719c8d285736396”>http://delivery.acm.org/10.1145/2990000/2988456/p11-strub.pdf?ip=123.138.79.12&id=2988456&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2EB37B3B2DF215A17D%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=751612499&CFTOKEN=37099060&acm=1492356698_958d1b64105cd41b9719c8d285736396

  • Wide & Deep Learning for Recommender Systems
    https://arxiv.org/pdf/1606.07792.pdf

  • Deep Neural Networks for YouTube Recommendations
    https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf

  • Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks
    http://www.wanghao.in/paper/NIPS16_CRAE.pdf

  • Neural Collaborative Filtering
    http://www.comp.nus.edu.sg/~xiangnan/papers/ncf.pdf

  • Recurrent Recommender Networks
    http://alexbeutel.com/papers/rrn_wsdm2017.pdf

  • Autorec: Autoencoders meet collaborative filtering:

    User modeling with neural network for review rating prediction:

    A neural probabilistic model for context based citation recommendation:

    A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems:

    Collaborative Deep Learning for Recommender Systems:


    Wide & Deep Learning for Recommender Systems:



    Deep Neural Networks for YouTube Recommendations:


    Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks:

    Neural Collaborative Filtering:

    Recurrent Recommender Networks:

    細(xì)粒度圖像分析

    經(jīng)典模型:

  • Part-based R-CNNs for Fine-grained Category Detection
    https://people.eecs.berkeley.edu/~nzhang/papers/eccv14_part.pdf

  • Bird Species Categorization Using Pose Normalized Deep Convolutional Nets
    http://www.bmva.org/bmvc/2014/files/paper071.pdf

  • Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition
    https://arxiv.org/pdf/1605.06878.pdf

  • The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Xiao_The_Application_of_2015_CVPR_paper.pdf

  • Bilinear CNN Models for Fine-grained Visual Recognition
    http://vis-www.cs.umass.edu/bcnn/docs/bcnn_iccv15.pdf

  • Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval
    https://arxiv.org/pdf/1604.04994.pdf

  • Near Duplicate Image Detection: min-Hash and tf-idf Weighting
    https://www.robots.ox.ac.uk/~vgg/publications/papers/chum08a.pdf

  • Fine-grained image search
    https://users.eecs.northwestern.edu/~jwa368/pdfs/deep_ranking.pdf

  • Efficient large-scale structured learning
    http://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Branson_Efficient_Large-Scale_Structured_2013_CVPR_paper.pdf

  • Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks
    https://arxiv.org/pdf/1504.08289.pdf

  • Part-based R-CNNs for Fine-grained Category Detection:

    Bird Species Categorization Using Pose Normalized Deep Convolutional Nets

    Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition

    The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification:


    Bilinear CNN Models for Fine-grained Visual Recognition:

    Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval:

    Near Duplicate Image Detection: min-Hash and tf-idf Weighting:

    Fine-grained image search:


    Efficient large-scale structured learning:

    Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks:

    圖像壓縮

    經(jīng)典模型:

  • Auto-Encoding Variational Bayes
    https://arxiv.org/pdf/1312.6114.pdf

  • k-Sparse Autoencoders
    https://arxiv.org/pdf/1312.5663.pdf

  • Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
    http://www.iro.umontreal.ca/~lisa/pointeurs/ICML2011_explicit_invariance.pdf

  • Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
    http://www.jmlr.org/papers/volume11/vincent10a/vincent10a.pdf

  • Tutorial on Variational Autoencoders
    https://arxiv.org/pdf/1606.05908.pdf

  • End-to-end Optimized Image Compression
    https://openreview.net/pdf?id=rJxdQ3jeg

  • Guetzli: Perceptually Guided JPEG Encoder
    https://arxiv.org/pdf/1703.04421.pdf

  • Auto-Encoding Variational Bayes:

    k-Sparse Autoencoders:

    Contractive Auto-Encoders: Explicit Invariance During Feature Extraction:

    Stacked Denoising Autoencoders: Learning Useful Representa-tions in a Deep Network with a Local Denoising Criterion:




    Tutorial on Variational Autoencoders:


    End-to-end Optimized Image Compression:


    Guetzli: Perceptually Guided JPEG Encoder:



    引用塊內(nèi)容
    NLP領(lǐng)域
    教程:http://cs224d.stanford.edu/syllabus.html
    注:
    1)目前接觸了該領(lǐng)域的一點皮毛,后續(xù)會慢慢更新。
    2)也希望研究該領(lǐng)域的朋友們做出一些貢獻(xiàn),期待你們的加入。

    語音識別領(lǐng)域
    注:
    1)目前還沒有詳細(xì)了解語音識別領(lǐng)域,后續(xù)會慢添加更新。
    2)也希望研究該領(lǐng)域的朋友們做出一些貢獻(xiàn),期待你們的加入。

    AGI – 通用人工智能領(lǐng)域
    注:
    1)目前還沒有詳細(xì)了解語音識別領(lǐng)域,后續(xù)會慢添加。
    2)也希望研究該領(lǐng)域的朋友們做出一些貢獻(xiàn),期待你們的加入。

    深度學(xué)習(xí)引起的一些新的技術(shù):

  • 遷移學(xué)習(xí):近些年來在人工智能領(lǐng)域提出的處理不同場景下識別問題的主流方法。相比于淺時代的簡單方法,深度神經(jīng)網(wǎng)絡(luò)模型具備更加優(yōu)秀的遷移學(xué)習(xí)能力。并有一套簡單有效的遷移方法,概括來說就是在復(fù)雜任務(wù)上進(jìn)行基礎(chǔ)模型的預(yù)訓(xùn)練(pre-train),在特定任務(wù)上對模型進(jìn)行精細(xì)化調(diào)整(fine-tune)
  • 聯(lián)合學(xué)習(xí)(JL):
  • 強(qiáng)化學(xué)習(xí)(RL):強(qiáng)化學(xué)習(xí)(reinforcement learning,又稱再勵學(xué)習(xí),評價學(xué)習(xí))是一種重要的機(jī)器學(xué)習(xí)方法,在智能控制機(jī)器人及分析預(yù)測等領(lǐng)域有許多應(yīng)用。但在傳統(tǒng)的機(jī)器學(xué)習(xí)分類中沒有提到過強(qiáng)化學(xué)習(xí),而在連接主義學(xué)習(xí)中,把學(xué)習(xí)算法分為三種類型,即非監(jiān)督學(xué)習(xí)(unsupervised learning)、監(jiān)督學(xué)習(xí)(supervised leaning)和強(qiáng)化學(xué)習(xí)。
    視頻教程:
    https://cn.udacity.com/course/reinforcement-learning–ud600
  • 注:由于還沒有學(xué)習(xí)到該部分,僅僅知道這個新的概念,后面會慢慢添加進(jìn)來。

  • 深度強(qiáng)化學(xué)習(xí)(DRL):
    Tutorial:http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
    課程: http://rll.berkeley.edu/deeprlcourse/
    DeepMind:
    https://deepmind.com/blog/deep-reinforcement-learning/
  • 終結(jié)語

    注:
    1. 好了,終于差不多啦,為了寫這個東西,花費了很多時間,但是通過這個總結(jié)以后,我也學(xué)到了很多,我真正的認(rèn)識到DeepLearning已經(jīng)貫穿了整個CV領(lǐng)域。如果你從事CV領(lǐng)域的話,我建議你花一些時間去了解深度學(xué)習(xí)吧!畢竟,它正在顛覆這個鄰域!
    2. 由于經(jīng)驗有限,可能會有一些錯誤,希望大家多多包涵。如果你有任何問題,可以你消息給我,我會及時的回復(fù)大家。
    3. 由于本博客是我自己原創(chuàng),如需轉(zhuǎn)載,請聯(lián)系我。
    郵箱:1575262785@qq.com

    總結(jié)

    以上是生活随笔為你收集整理的深度学习在CV领域的进展以及一些由深度学习演变的新技术的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。