CS 594: Modern Reinforcement Learning (Spring 2022)

General Resources Heading link

Here are a number of useful references: Sutton and Barto (our main text); Agarwal, Jiang, and KakadePowellSzepesvari; Bertsekas; Shoham and Leyton-Brown (multi-agent); Goodfellow, Bengio, and Courville (deep learning).

RL courses by Nan Jiang (498,542), Deep mind (13 lectures), Emma Brunskill (234), David Silver, Sergei Levine (285), DeepPavlov (13 lectures), Fragkiadaki and Mitchell (703).

Code samples: Replication of Sutton and Barto.  An online course with a number of code examples and exercises (may work better with a GPU).  A “crash course” and accompany repository with cartpole and linear-quadratic control examples and detailed instructions on how to get started working with the material in a Jupyter notebook.  OpenAI Spinning Up has a lot of resources including installation instructions and detailed documentation of a range of algorithms.  Baseline implementations of many algorithms (forked from a set started by OpenAI).

Need more resources than you have locally?  Try services such as Google Colab or Paperspace Gradient

Schedule Heading link

Lecture #DateTopicReadingsFurther Readings
101/10/22Introduction, MDPs, Dynamic Programminglec1.pdfChapters 3 and 4 of
Sutton and Barto
201/12/22Monte Carlo and Temporal Difference Methodslec2.pdfChapters 5 and 6 of
Sutton and Barto
301/17/22No Class, Martin Luther King, Jr., Day
401/19/22More on TD Methods
01/23/22Homework 1hw1.pdf
501/24/22Policy Gradient Methodslec3.pdfChapter 13 of Sutton and Barto
601/26/22Actor Critic Methodslec4.pdfChapter 13 of Sutton and Barto
701/31/22MARL and MCTSlec5.pdfMARL - Section 7.4 of Shoham and Leyton-Brown. Recent blog post.
MCTS - 8.10 and 8.11 of Sutton and Barto
802/02/22Function Approximationlec6.pdfDeadly Triad: Chapter 11.2-11.3 of Sutton and Barto
If you want more on neural networks, you can look at relevant parts of the Goodfellow, Bengio, and Courville text.
02/06/22Homework 2hw2.pdf
902/07/22Deep Q-NetworksDQNAn earlier version
1002/09/22Maximization BiasDouble DQN (main paper) Averaged DQN (skim)Chapters 6.7 of Sutton and Barto. TS93 Double Q-Learning
1102/14/22Distributional RLBlog post
Categorical DQN
1202/16/22Combining DQN ExtensionsRainbow
Please also choose one of the 4 extensions we haven't covered under further readings to skim
Prioritized Experience Replay, Dueling DQN, NoisyNet. n-step returns are covered in Chapter 7 of Sutton and Barto
1302/21/22Asynchronous UpdatesA3C Also read the A2C and A3C part of this blog postGA3C, IMPALA, PAAC
1402/23/22Deterministic Policy GradientsDPGDDPG, OffPAC
02/25/22Project Proposal Due
1502/28/22Imitation LearningInvitation to ImitationSCBW21
1603/02/22Trust RegionsTRPO
1703/07/22Proximal Policy OptimizationPPOACER
1803/09/22Q-Learning with Continuous ActionsNAF
03/13/22Homework 3hw3.pdf
1903/14/22Exploration#ExplorationBSOSSM16, Bootstrapped DQN, VIME, SLA16
2003/16/22ExplorationRNDGo-Explore, NPDBLL18, HKSV19, EGIL18
03/18/22Initial Project Report Due
SPRING BREAK
2103/28/22Learning from DemonstrationsDQfDDDPGfD, Q-Filter
2203/30/22Generative Adversarial Imitation LearningGAILMGAIL
2304/04/22AlphaGoAlpha Go
2404/06/22AlphaZeroAlpha Go ZeroAlpha Zero MuZero
2504/11/22PokerLibratusDeepStack, Limit,CFR, MCCFR Pluribus, NFSP
2604/13/22Actor-Critic Methods for Multiagent RLACPONeuRD
2704/18/22Cooperative Actor-CriticCOMALWTHAM17 LOLA
2804/20/22Cooperative Value-BasedQMIXVDAC, VDN, QTRAN,
2904/25/22Project Presentations
3004/27/22Project Presentations
04/29/22Final Project Report Due