CS 594: Modern Reinforcement Learning (Spring 2022)

General Resources

Here are a number of useful references: Sutton and Barto (our main text); Agarwal, Jiang, and Kakade; Powell; Szepesvari; Bertsekas; Shoham and Leyton-Brown (multi-agent); Goodfellow, Bengio, and Courville (deep learning).

RL courses by Nan Jiang (498,542), Deep mind (13 lectures), Emma Brunskill (234), David Silver, Sergei Levine (285), DeepPavlov (13 lectures), Fragkiadaki and Mitchell (703).

Code samples: Replication of Sutton and Barto. An online course with a number of code examples and exercises (may work better with a GPU). A “crash course” and accompany repository with cartpole and linear-quadratic control examples and detailed instructions on how to get started working with the material in a Jupyter notebook. OpenAI Spinning Up has a lot of resources including installation instructions and detailed documentation of a range of algorithms. Baseline implementations of many algorithms (forked from a set started by OpenAI).

Need more resources than you have locally? Try services such as Google Colab or Paperspace Gradient

Schedule

Lecture #	Date	Topic	Readings	Further Readings
1	01/10/22	Introduction, MDPs, Dynamic Programming	lec1.pdf	Chapters 3 and 4 of Sutton and Barto
2	01/12/22	Monte Carlo and Temporal Difference Methods	lec2.pdf	Chapters 5 and 6 of Sutton and Barto
3	01/17/22	No Class, Martin Luther King, Jr., Day
4	01/19/22	More on TD Methods
	01/23/22	Homework 1	hw1.pdf
5	01/24/22	Policy Gradient Methods	lec3.pdf	Chapter 13 of Sutton and Barto
6	01/26/22	Actor Critic Methods	lec4.pdf	Chapter 13 of Sutton and Barto
7	01/31/22	MARL and MCTS	lec5.pdf	MARL - Section 7.4 of Shoham and Leyton-Brown. Recent blog post. MCTS - 8.10 and 8.11 of Sutton and Barto
8	02/02/22	Function Approximation	lec6.pdf	Deadly Triad: Chapter 11.2-11.3 of Sutton and Barto If you want more on neural networks, you can look at relevant parts of the Goodfellow, Bengio, and Courville text.
	02/06/22	Homework 2	hw2.pdf
9	02/07/22	Deep Q-Networks	DQN	An earlier version
10	02/09/22	Maximization Bias	Double DQN (main paper) Averaged DQN (skim)	Chapters 6.7 of Sutton and Barto. TS93 Double Q-Learning
11	02/14/22	Distributional RL	Blog post Categorical DQN
12	02/16/22	Combining DQN Extensions	Rainbow Please also choose one of the 4 extensions we haven't covered under further readings to skim	Prioritized Experience Replay, Dueling DQN, NoisyNet. n-step returns are covered in Chapter 7 of Sutton and Barto
13	02/21/22	Asynchronous Updates	A3C Also read the A2C and A3C part of this blog post	GA3C, IMPALA, PAAC
14	02/23/22	Deterministic Policy Gradients	DPG	DDPG, OffPAC
	02/25/22	Project Proposal Due
15	02/28/22	Imitation Learning	Invitation to Imitation	SCBW21
16	03/02/22	Trust Regions	TRPO
17	03/07/22	Proximal Policy Optimization	PPO	ACER
18	03/09/22	Q-Learning with Continuous Actions	NAF
	03/13/22	Homework 3	hw3.pdf
19	03/14/22	Exploration	#Exploration	BSOSSM16, Bootstrapped DQN, VIME, SLA16
20	03/16/22	Exploration	RND	Go-Explore, NPDBLL18, HKSV19, EGIL18
	03/18/22	Initial Project Report Due
		SPRING BREAK
21	03/28/22	Learning from Demonstrations	DQfD	DDPGfD, Q-Filter
22	03/30/22	Generative Adversarial Imitation Learning	GAIL	MGAIL
23	04/04/22	AlphaGo	Alpha Go
24	04/06/22	AlphaZero	Alpha Go Zero	Alpha Zero MuZero
25	04/11/22	Poker	Libratus	DeepStack, Limit,CFR, MCCFR Pluribus, NFSP
26	04/13/22	Actor-Critic Methods for Multiagent RL	ACPO	NeuRD
27	04/18/22	Cooperative Actor-Critic	COMA	LWTHAM17 LOLA
28	04/20/22	Cooperative Value-Based	QMIX	VDAC, VDN, QTRAN,
29	04/25/22	Project Presentations
30	04/27/22	Project Presentations
	04/29/22	Final Project Report Due

Ian Kash

UIC Computer Science

CS 594: Modern Reinforcement Learning (Spring 2022)

General Resources

Schedule