| ||RL Basics|| || |
|L1|| ||[SB] Chapters 1-3|| |
| ||VI, PI etc.|| || |
|L2-L5|| || [SB] Chapter 4, [RN] Chapter 17
Convergence properties [P] Sections 6.2-3
See also Review Slides
|Assignment 1|| ||MDP Planning||due 9/24|
|L6-L7||TD methods, Generalization, Learning and Planning|| [SB] Chapters 5-9
For convergence result with aggregation consult Feature-Based Methods for Large Scale Dynamic Programming. Tsitsiklis and Van Roy. Machine Learning, Vol. 22, 1996, pp. 59-94.
See also Review Slides
|Assignment 2|| ||MDP Learning and Planning||due 10/6|
|L8-9||Symbolic Dynamic Programming||
Symbolic Boolean manipulation with ordered binary-decision diagrams,
by Randal E Bryant, 1992
(read at least sections 1-3)
SPUDD: Stochastic Planning using Decision Diagrams, Hoey , St-Aubin, Hu and Boutilier, UAI 1999.
| ||Optional Reading||APRICODD: Approximate Policy Construction Using Decision Diagrams, St-Aubin, Hoey , and Boutilier, NIPS 2000.|| |
|L10-11||Introduction to RDDL||
RDDL overview and specification
RDDL Tutorial Slides from 2014 (used in class)
RDDL video tutorial from 2011
RDDL Code Repository
| ||A challenge problem from industry||Invited talk at ICAPS 2014 on "How to Coordinate a Thousand Robots".|| |
|L11-12||AI Search and RTDP|| [RN] Chapter 3,4.
Labeled RTDP: Improving the Convergence of Real Time Dynamic Programming. B. Bonet and H. Geffner. 13th International Conference on Automated Planning and Scheduling (ICAPS-2003).
See also Lecture Slides
|Assignment 3|| ||Working with RDDL and Additional Files for the assignment.||due 10/22|
|L13-16||Monte Carlo Search for Planning||
We will be using excellent Lecture Slides by Alan Fern on
Rollout and Policy Improvement
Approximate Policy Iteration
Kocsis, L. and Szepesvari, Cs., Bandit based Monte-Carlo Planning, ECML, pp. 282--293, 2006.
Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvari, Cs., and Teytaud, O., The grand challenge of computer Go: Monte Carlo tree search and extensions, Communications of the ACM, 55 (3) , pp. 106--113, 2012.
|L16||Review of Advanced algorithms from recent lectures||
See Review Slides
|Assignment 4|| ||Policy Rollout and Additional Files for the assignment.||due 11/10|
|Information for Project|| ||project.pdf||11/6, 11/10, and 12/8|
|(L18)||Exam||The exam includes all the material covered in class up to this point but excluding details of RDDL (L10-11). You are expected to know the material at the level discussed in class, i.e., if we discussed formal properties and their proofs I will expect you to be proficient in these. If we only explained an algorithm, how it worked, and the intuition behind it but did not formally analyze it, you are expected to know that much but you are not expected to cover the analysis on your own.|| |
|L17||Relational Symbolic Dynamic Programming||
First Order Decision Diagrams for Relational MDPs
Wang, Joshi and Khardon, JAIR 2008.
Solving Relational MDPs with Exogenous Events and Additive Rewards S. Joshi, R. Khardon, P. Tadepalli, A. Raghavan, A. Fern, 2013.
| ||Optional Reading||
Symbolic Dynamic Programming for First-order MDPs.
Boutilier, Reiter and Price, IJCAI 2001.
Bellman Goes Relational Kersting, Van Otterlo and De Raedt, ICML 2004.
Practical Solution Techniques for First-order MDPs Sanner and Boutilier, AIJ 2009.
|L19||Policy Based Methods||
Approximate Policy Iteration with a Policy Language Bias
non-Parametric Policy Gradients
|L20||Partially Observable MDPs||[RN] Section 17.4|| |
|L21|| ||Brief Project Presentations|| |
Reverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors(GLUTTON)
LRTDP vs. UCT for Online Probabilistic Planning (Gourmand)
PROST: Probabilistic Planning Based on UCT
Trial-based Heuristic Tree Search for Finite Horizon MDPs (THTS/PROST)
Concurrent Reinforcement Learning from Customer Interactions
An Empirical Evaluation of Thompson Sampling (app to Advertizing)
A Contextual-Bandit Approach to Personalized News Article Recommendation
Design, Analysis, and Learning Control of a Fully Actuated Micro Wind Turbine
Optimal Planning and Learning in Uncertain Environments for the Management of Wind Farms
Non-Linear Monte-Carlo Search in Civilization II
Playing Atari with Deep Reinforcement Learning
|L26|| ||Brief Project Presentations|| |