Lecture 
Topic 
Sources 
Comments 

RL Basics 


L1 

[SB] Chapters 13 


VI, PI etc. 


L2L4 

[SB] Chapter 4, [RN] Chapter 17
Convergence properties [P] Sections 6.23


Assignment 1 

MDP Planning


L5L7 
TD methods 
[SB] Chapters 5,6 for MC and TD(0), and chapter 7 for TD(lambda) 

L89 
State Agregation and Parameteric Representations 
[SB] Chapter 8
FeatureBased Methods for Large Scale Dynamic Programming.
Tsitsiklis and Van Roy.
Machine Learning, Vol. 22, 1996, pp. 5994.


L89 
Learning and planning 
[SB] Chapter 9 

L10 
Review 
Review Slides


Assignment 2 

MDP Learning and Planning


L10 
AI planning 
[RN] Chapter 3 and Chapter 10 (esp. 10.13).
AI slides from Blumer:
Search Part I, and
Search Part II.
AI slides from Lenaerts:
Planning.


3/1 
Polynomial guarantees 
E^3: NearOptimal Reinforcement Learning in Polynomial Time. Kearns and
Singh. ICML 1998.
RMAX  A General Polynomial Time Algorithm for NearOptimal
Reinforcement Learning, Brafman and Tennenholtz, JMLR,
3(Oct):213231, 2002.

Jingjing; Roni 


PAC ModelFree Reinforcement Learning,
Strehl, Li, Wiewiora, Langford, and Littman,
ICML 2006.


3/8 
Search based planning and MDPs 
LAO*: A Heuristic Search Algorithm that Finds Solutions with Loops
E.A. Hansen and S. Zilberstein. Artificial Intelligence, 129(12):3562, 2001.
Labeled RTDP: Improving the Convergence of Real Time Dynamic Programming. B. Bonet and H. Geffner. 13th International Conference on Automated Planning and Scheduling (ICAPS2003).
mGPT: A Probabilistic Planner Based on Heuristic Search.
Blai Bonet and Héctor Geffner.
J. of Artificial Intelligence Research 24 (2005).

Keith; Bernie 
3/10 + 3/16 
Logic Based Factored Representations: I 
GraphBased Algorithms for Boolean Function Manipulation, IEEE Transactions on Computers, Vol. C  35, No. 8, August, 1986, pp. 677  691.
SPUDD: Stochastic Planning using Decision Diagrams,
Hoey , StAubin, Hu and Boutilier, UAI 1999.
APRICODD: Approximate Policy Construction Using Decision Diagrams,
StAubin, Hoey , and Boutilier, NIPS 2000.

Mengfei; Jisoo; Roni 


Stochastic Dynamic Programming with Factored Representations,
Boutilier, Dearden and Goldszmidt,
Artificial Intelligence 121(1), pp.49107 (2000).



Logic Based Factored Representations: II 
Symbolic Heuristic Search for Factored Markov Decision Processes
Feng, Z. Hansen, E. A. , AAAI 2002.
Symbolic Generalization for Online Planning.
Z. Feng, E.A. Hansen, and S. Zilberstein. Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI), 2003.
Factored E^3:
Efficient Reinforcement Learning in Factored MDPs,
Kearns and Koller, IJCAI 1999.


3/17 + 3/29 
Relational MDPs: Dynamic Programming 
Symbolic Dynamic Programming for Firstorder MDPs,
Boutilier, Reiter and Price, IJCAI 2001.
Bellman Goes Relational,
Kersting, van Otterlo and De Raedt, ICML 2004.
First Order Decision Diagrams for Relational MDPs,
Wang, Joshi and Khardon, JAIR 2008.

Roni 
3/31 
Linear Representations of Factored MDPs 

Roni 


Efficient Solution Algorithms for Factored MDPs,
Guestrin, Koller, Parr and Venkataraman,
JAIR Volume 19, pp. 399468, 2003.




Piecewise Linear Value Function Approximation for Factored MDPs,
Poupart, Boutilier, Schuurmans and Patrascu, AAAI 2002.




LeastSquares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Journal of Machine Learning Research, 4, 2003, pp. 11071149.



Relational MDPs: Linear Functions 




Practical solution techniques for firstorder MDPs, S. Sanner, and C. Boutilier, Artificial Intelligence Journal (AIJ). Volume 173, pp 748788, 2009.



Inverse RL 




Apprenticeship learning via inverse reinforcement learning,
Abbeel and Ng,
ICML 2004.
An Application of Reinforcement Learning to Aerobatic Helicopter Flight,
Abbeel, Coates, Quigley and Ng, NIPS 2006.
Videos of results from these papers available from
Andrew Ng's home page



Hierarchical reinforcement learning 




Hierarchical reinforcement learning with the MAXQ value function
decomposition,
Dietterich, JAIR, 13, 227303.




Reinforcement Learning with Hierarchies of Machines,
Parr and Russell. NIPS 97
State Abstraction for Programmable Reinforcement Learning Agents.
Andre, and Russell, AAAI2002.
A compact, hierarchically optimal Qfunction decomposition,
Marthi, Russell, and Andre, UAI 2006.



Relational MDPs, Supervised Learning and API 




Learning to take Actions,
Khardon, Machine Learning, Vol 35, No 1, 1999, pages 5790.
Learning Action Strategies for Planning Domains,
Khardon, Artificial Intelligence Vol 113, 1999, pages 125148.
Inductive Policy Selection for FirstOrder Markov Decision Processes,
Yoon, Fern, and Givan, UAI2002




Approximate Policy Iteration with a Policy Language Bias: Solving
Relational Markov Decision Processes,
Fern, Yoon, and Givan
JAIR, 25, 85118, 2006.
Reinforcement Learning as Classification: Leveraging Modern
Classifiers, Lagoudakis and Parr, ICML 2003.



Relational MDPs: RRL 


4/5 

Relational reinforcement learning,
Dzeroski, De Raedt, and Driessens,
Machine Learning 43, pp. 752, 2001
NonParametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains, K. Kersting, K. Driessens, ICML 2008.

Jisoo; Roni 


Gaussian Processes as
Regression technique for Relational Reinforcement Learning,
Driessens, Ramon and Gaertner,
Machine Learning 64, pp. 91119, 2006
Integrating guidance into relational reinforcement learning,
Driessens and Dzeroski,
Machine Learning 57, pp. 271304, 2004



Hierarchical Relational RL 
Function Approximation in Hierarchical Relational Reinforcement Learning,
Roncagliolo and Tadepalli, RRL Workshop, 2005.



Reward Shaping 
Policy invariance under reward transformations:
Theory and application to reward shaping,
Andrew Y. Ng, Daishi Harada, Stuart Russell,
ICML 1999.
PotentialBased Shaping and QValue Initialization are Equivalent, Eric Wiewiora, Journal of Artificial Intelligence Research, 2003.



Model Learning 




Learning Probabilistic Planning Rules,
Pasula, Zettlemoyer, and Kaelbling,
ICAPS 2004.
Learning Planning Rules in Noisy Stochastic Worlds.
Zettlemoyer, Pasula, and Kaelbling,
AAAI 2005.




Learning Partially Observable Action Schemas,
Shahaf and Amir, AAAI 2006.
Learning Partially Observable Action Models: Efficient Algorithms,
Shahaf, Chang and Amir, AAAI 2006.



Learning and using control knowledge 
Discriminative Learning of BeamSearch Heuristics for Planning,
Xu, Fern, and Yoon, IJCAI 2007.
Using Learned Policies in HeuristicSearch Planning,
Yoon, Fern, and Givan, IJCAI 2007.


4/7 
Sampling/Search Based Algorithms 
Alan Fern's ICAPS10 tutorial
Lower Bounding Klondike Solitaire with MonteCarlo Planning,
Ronald Bjarnason, Alan Fern, and Prasad Tadepalli,
ICAPS 2009.

Roni; Jingjing 


A Sparse Sampling Algorithm for NearOptimal Planning in Large Markov Decision Processes, Kearns, Mansour, Ng, MLJ 2002.
Bandit Based MonteCarlo Planning, L. Kocsis, Cs. Szepesvári, ECML 2006.


4/12 
Planning and Inference 
ExpectationMaximization methods for solving (PO)MDPs and optimal
control problems.
Marc Toussaint, Amos Storkey, Stefan Harmeling (2010).
Planning with Noisy Probabilistic Relational Rules,
Tobias Lang, Marc Toussaint, Journal of Artificial Intelligence Research, 39, 149, 2010.

Bernie; Keith 
4/14 
Exogenous Events 
Scaling ModelBased AverageReward Reinforcement Learning for Product Delivery, Proper, S. and Tadepalli, P., European Conference on Machine Learning , 2006.
Solving Multiagent Assignment Markov Decision Processes,
Proper, S., Tadepalli, P., in Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems p 681688, 2009.
Simulationbased Optimization of Resource Placement and Emergency Response, Bjarnason, R., Tadepalli, P., Fern, A. and Niedner, C., in Conference on Innovative Applications of Artificial Intelligence (IAAI) , 2009, p 4753.

Roni 


Approximate Dynamic Programming with Affine ADDs, S. Sanner, W. Uther, K. V. Delgado, In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS2010).
Approximate solution techniques for factored firstorder MDPs, S. Sanner, and C. Boutilier, In Proceedings of the 17th Conference on Automated Planning and Scheduling (ICAPS2007).



Parallel Actions, Action Durations and Resources 


4/21 

Planning with Durative Actions in Stochastic Domains, Mausam, Daniel S. Weld, Journal of Artificial Intelligence Research (JAIR). Volume 31 (2008) 3382.
Solving generalized semiMarkov decision processes using continuous
phasetype distributions, Younes and Simmons, In Proceedings of the
Nineteenth National Conference on Artificial Intelligence, 2004.

Jisoo; Bernie 


A General, Fully Distributed MultiAgent Planning Algorithm, Raz Nissim, Ronen I. Brafman, Carmel Domshlak, AAMAS 2010
A Heuristic Search Approach to Planning with Continuous Resources in Stochastic Domains, Nicolas Meuleau, Emmanuel Benazera, Ronen Brafman, Eric Hansen, Mausam, Journal of Artificial Intelligence Research (JAIR). Volume 34 (2009) 2759.




From One to Many: Planning for Loosely Coupled MultiAgent Systems, Ronen I. Brafman, Carmel Domshlak, ICAPS 2008: 2835
Planning and execution with phase
transitions, H. Younes, In Proceedings of the
Nineteenth National Conference on Artificial Intelligence, 2005.


4/26 
Review 
Review Slides

Roni 

Partial observability 
Planning and acting in partially observable stochastic
domains. 1998.
Kaelbling, Littman and Cassandra.
Artificial Intelligence, 101(12):99134.
Dynamic Programming for POMDPs using a Factored State Representation,
Hansen and Feng.
AIPS 2000.



Predictive State Representations 
Learning Predictive State Representations,
Singh, Littman, Jong, Pardoe and Stone.
ICML 2003.
Predictive Representations of State,
Littman, Sutton and Singh,
NIPS 2002.
Predictive State Representations: A New Theory for Modeling Dynamical
Systems by Satinder Singh, Michael R. James and Matthew R. Rudary. In
Uncertainty in Artificial Intelligence: Proceedings of the Twentieth
Conference (UAI), pages 512519, 2004.




Relational Knowledge with Predictive State Representations by David Wingate, Vishal Soni, Britton Wolfe and Satinder Singh. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), 2007.



Proto Value Functions 
Samuel Meets Amarel: Automating Value Function Approximation using
Global State Space Analysis,
Mahadevan,
AAAI 2005.
Representation Policy Iteration,
Mahadevan,
UAI 2005.


4/28 
Project Presentations 

