强化学习资源列表
人工智能是21世紀(jì)最激動(dòng)人心的技術(shù)之一。人工智能,目的是創(chuàng)造像人一樣的智能,而人的智能包括感知、決策和認(rèn)知(從直覺(jué)到推理、規(guī)劃、意識(shí)等)。其中,感知解決what,深度學(xué)習(xí)已經(jīng)超越人類水平;決策解決how,強(qiáng)化學(xué)習(xí)在游戲和機(jī)器人等領(lǐng)域取得了一定效果;認(rèn)知解決why,知識(shí)圖譜、因果推理和持續(xù)學(xué)習(xí)等正在研究。強(qiáng)化學(xué)習(xí),采用反饋學(xué)習(xí)的方式解決序列決策問(wèn)題,因此必然是通往通用人工智能的終極鑰匙。
課程和視頻
Reinforcement Learning by David Silver (2015) [homepage] [youtube] [bilibili]
CS 188: Introduction to Artificial Intelligence [Fall 2012-Spring 2014] [Fall 2018] [Summer 2019] [Spring 2020]
CS 294: Deep Reinforcement Learning by Sergey Levine [Fall 2015] [Spring 2017] [Fall 2017] [Fall 2018]
CS 285: Deep Reinforcement Learning [Fall 2019] [youtube]
Advanced Deep Learning & Reinforcement Learning by DeepMind & UCL [youtube2018]
Deep Reinforcement Learning and Control [Spring 2017]
CS234: Reinforcement Learning [Winter 2019] [youtube]
Deep RL Bootcamp [August 2017]
Deep Reinforcement Learning by 李宏毅 [Spring 2018] [youtube2018]
Reinforcement Learning by 莫煩 [homepage]
書籍
Reinforcement Learning: An Introduction (1st Edition, 1998) [homepage]
Reinforcement Learning: An Introduction (2nd Edition, 2018) [homepage] [bookdraft2018jan1] [2018] [Python Code] [中文翻譯]
Hands-On Reinforcement Learning With Python (2018) [homepage]
Reinforcement Learning With Open AI TensorFlow and Keras Using Python (2018) [homepage]
Algorithms for Reinforcement Learning (2010) [download]
《神經(jīng)網(wǎng)絡(luò)與深度學(xué)習(xí)》[download]
代碼
ShangtongZhang/Python Implementation of Reinforcement Learning: An Introduction (2nd Edition) [github]
JuliaReinforcementLearning/ReinforcementLearningAnIntroduction.jl [github]
berkeleydeeprlcourse [github]
tensorlayer/RLzoo [github]
rlcode/reinforcement-learning [github]
MorvanZhou/Reinforcement-learning-with-tensorflow [github]
dennybritz/reinforcement-learning [github]
p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch [github]
教程
OpenAI Spinning Up [英文版] [中文版]
演講
Rich Sutton, 2015, Introduction to Reinforcement Learning with Function Approximation
Andrew Barto, 2018, A history of reinforcement learning
David Silver, Principles of Deep RL
Benjamin Recht, 2018, Optimization Perspectives on Learning to Control
John Schulman, 2017, The Nuts and Bolts of Deep Reinforcement Learning Research
Joelle Pineau, Introduction to Reinforcement Learning
Deep Learning and Reinforcement Learning Summer School, 2018, 2017
Deep Learning Summer School, 2016, 2015
Yisong Yue and Hoang M. Le, Imitation Learning, ICML 2018 Tutorial
綜述
Li, Y. (2017). Deep Reinforcement Learning: An Overview. ArXiv. [paper]
Littman, M. L. (2015). Reinforcement learning improves behaviour from evaluative feedback. Nature, 521:445–451. [paper]
Kaelbling, L., Littman, M., and Moore, A. (1996). Reinforcement learning: A survey. Journalof Artificial Intelligence Research, 4:237–285. [paper]
算法
(1) Reinforcement Learning
- Q-learning
Learning From Delayed Reward (Watkins et al. 1989) [paper] - REINFORCE
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning (Williams et al. 1992) [paper] [ML] - SARSA
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding (Sutton et al. 1996) [paper] [NIPS]
(2) Deep Reinforcement Learning
- DQN
Playing Atari with Deep Reinforcement Learning (Mnih et al. 2013) [arxiv] - DDQN
Deep Reinforcement Learning with Double Q-learning (Hasselt et al. 2015) [arxiv] [AAAI] - TRPO
Trust Region Policy Optimization (Schulman et al. 2015) [arxiv] [ICML] - H-DQN
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation (Kulkarni et al. 2016) [arxiv] [NIPS] - PER
Prioritized Experience Replay (Schaul et al. 2016) [arxiv] [ICLR] - Dueling DDQN
Dueling Network Architectures for Deep Reinforcement Learning (Wang et al. 2016) [arxiv] [ICML] - DDPG
Continuous Control With Deep Reiforcement Learning (Lillicrap et al. 2016) [arxiv] [ICLR] - A2C/A3C
Asynchronous Methods for Deep Reinforcement Learning (Mnih et al. 2016) [arxiv] [ICML] - SNN-HRL
Stochastic Neural Networks For Hierarchical Reinforcement learning (Florensa et al. 2017) [arxiv] [ICLR] - PPO
Proximal Policy Optimization Algorithms (Schulman et al. 2017) [arxiv] - HER
Hindsight Experience Replay (Andrychowicz et al. 2018) [arxiv] [NIPS] - TD3
Addressing Function Approximation Error in Actor-Critic Methods (Fujimoto et al. 2018) [arxiv] [ICML] - DIAYN
Diversity is All You Need: Learning Skills Without a Reward Function (Eyensbach et al. 2018) [arxiv] [ICLR] - HIRO
Data-Efficient Hierarchical Reinforcement Learning (Nachum et al. 2018) [arxiv] [NIPS] - SAC
Soft Actor-Critic Algorithms and Applications (Haarnoja et al. 2019) [arxiv] - SAC-Discrete
Soft Actor-Critic For Discrete Action Settings (Christodoulou 2019) [arxiv] - TQC
Controlling overestimation bias with truncated mixture of continuous distributional quantile critics (Kuznetsov et al. 2020) [arxiv] [ICML]
環(huán)境
Cart Pole
Mountain Car
OpenAI Gym
Google Dopamine 2.0
Emo Todorov Mujoco
通用格子世界環(huán)境類
框架
OpenAI Baselines
百度 PARL
DeepMind OpenSpiel
研究員
Richard S. Sutton [homepage]
David Silver [homepage]
Pieter Abbeel [homepage]
Sergey Levine [homepage]
李宏毅 [homepage]
會(huì)議/期刊
會(huì)議:AAAI、NIPS、ICML、ICLR、IJCAI、 AAMAS、IROS等。
期刊:AI、 JMLR、JAIR、 Machine Learning、JAAMAS等。
研究機(jī)構(gòu)
OpenAI
DeepMind
Berkeley Artificial Intelligence Research (BAIR) Lab
博客
Keavnn’Blog
Medium : Reinforcement Learning
StackOverflow : Reinforcement Learning
[BEST Reinforcement Learning (RL) Books Update till Jan 2021]
[Introduction to Deep Reinforcement Learning]
知乎
強(qiáng)化學(xué)習(xí)知識(shí)大講堂
智能單元
強(qiáng)化學(xué)習(xí)
公眾號(hào)
深度強(qiáng)化學(xué)習(xí)實(shí)驗(yàn)室
深度學(xué)習(xí)技術(shù)前沿
AI科技評(píng)論
新智元
其他
kmario23/deep-learning-drizzle [github] [webpage]
Mr.Jk.Zhang [CSDN]
總結(jié)
- 上一篇: 前端面试必考题
- 下一篇: Observability——Wavef