CS 594: Modern Reinforcement Learning (Spring 2022)
General Resources Heading link
Here are a number of useful references: Sutton and Barto (our main text); Agarwal, Jiang, and Kakade; Powell; Szepesvari; Bertsekas; Shoham and Leyton-Brown (multi-agent); Goodfellow, Bengio, and Courville (deep learning).
RL courses by Nan Jiang (498,542), Deep mind (13 lectures), Emma Brunskill (234), David Silver, Sergei Levine (285), DeepPavlov (13 lectures), Fragkiadaki and Mitchell (703).
Code samples: Replication of Sutton and Barto. An online course with a number of code examples and exercises (may work better with a GPU). A “crash course” and accompany repository with cartpole and linear-quadratic control examples and detailed instructions on how to get started working with the material in a Jupyter notebook. OpenAI Spinning Up has a lot of resources including installation instructions and detailed documentation of a range of algorithms. Baseline implementations of many algorithms (forked from a set started by OpenAI).
Need more resources than you have locally? Try services such as Google Colab or Paperspace Gradient
Schedule Heading link
Lecture # | Date | Topic | Readings | Further Readings |
---|---|---|---|---|
1 | 01/10/22 | Introduction, MDPs, Dynamic Programming | lec1.pdf | Chapters 3 and 4 of Sutton and Barto |
2 | 01/12/22 | Monte Carlo and Temporal Difference Methods | lec2.pdf | Chapters 5 and 6 of Sutton and Barto |
3 | 01/17/22 | No Class, Martin Luther King, Jr., Day | ||
4 | 01/19/22 | More on TD Methods | ||
01/23/22 | Homework 1 | hw1.pdf | ||
5 | 01/24/22 | Policy Gradient Methods | lec3.pdf | Chapter 13 of Sutton and Barto |
6 | 01/26/22 | Actor Critic Methods | lec4.pdf | Chapter 13 of Sutton and Barto |
7 | 01/31/22 | MARL and MCTS | lec5.pdf | MARL - Section 7.4 of Shoham and Leyton-Brown. Recent blog post. MCTS - 8.10 and 8.11 of Sutton and Barto |
8 | 02/02/22 | Function Approximation | lec6.pdf | Deadly Triad: Chapter 11.2-11.3 of Sutton and Barto If you want more on neural networks, you can look at relevant parts of the Goodfellow, Bengio, and Courville text. |
02/06/22 | Homework 2 | hw2.pdf | ||
9 | 02/07/22 | Deep Q-Networks | DQN | An earlier version |
10 | 02/09/22 | Maximization Bias | Double DQN (main paper) Averaged DQN (skim) | Chapters 6.7 of Sutton and Barto. TS93 Double Q-Learning |
11 | 02/14/22 | Distributional RL | Blog post Categorical DQN | |
12 | 02/16/22 | Combining DQN Extensions | Rainbow Please also choose one of the 4 extensions we haven't covered under further readings to skim | Prioritized Experience Replay, Dueling DQN, NoisyNet. n-step returns are covered in Chapter 7 of Sutton and Barto |
13 | 02/21/22 | Asynchronous Updates | A3C Also read the A2C and A3C part of this blog post | GA3C, IMPALA, PAAC |
14 | 02/23/22 | Deterministic Policy Gradients | DPG | DDPG, OffPAC |
02/25/22 | Project Proposal Due | |||
15 | 02/28/22 | Imitation Learning | Invitation to Imitation | SCBW21 |
16 | 03/02/22 | Trust Regions | TRPO | |
17 | 03/07/22 | Proximal Policy Optimization | PPO | ACER |
18 | 03/09/22 | Q-Learning with Continuous Actions | NAF | |
03/13/22 | Homework 3 | hw3.pdf | ||
19 | 03/14/22 | Exploration | #Exploration | BSOSSM16, Bootstrapped DQN, VIME, SLA16 |
20 | 03/16/22 | Exploration | RND | Go-Explore, NPDBLL18, HKSV19, EGIL18 |
03/18/22 | Initial Project Report Due | |||
SPRING BREAK | ||||
21 | 03/28/22 | Learning from Demonstrations | DQfD | DDPGfD, Q-Filter |
22 | 03/30/22 | Generative Adversarial Imitation Learning | GAIL | MGAIL |
23 | 04/04/22 | AlphaGo | Alpha Go | |
24 | 04/06/22 | AlphaZero | Alpha Go Zero | Alpha Zero MuZero |
25 | 04/11/22 | Poker | Libratus | DeepStack, Limit,CFR, MCCFR Pluribus, NFSP |
26 | 04/13/22 | Actor-Critic Methods for Multiagent RL | ACPO | NeuRD |
27 | 04/18/22 | Cooperative Actor-Critic | COMA | LWTHAM17 LOLA |
28 | 04/20/22 | Cooperative Value-Based | QMIX | VDAC, VDN, QTRAN, |
29 | 04/25/22 | Project Presentations | ||
30 | 04/27/22 | Project Presentations | ||
04/29/22 | Final Project Report Due |