Comp 150AML: Advanced Topics in Machine Learning
Learning, Planning and Acting in Complex Environments
Department of Computer Science
Tufts University
Spring 2011

Course Web Page (this page):

  • (3/25) complete course schedule posted


In this semester the course will focus on agents that learn, plan and act in non-deterministic and complex environments. We will first cover the main relevant results and approaches from Markov Decision Processes and Reinforcement Learning (RL) and then turn to recent research papers reviewing representations and corresponding algorithms approximation methods and convergence guarantees for complex domains. We will discuss factored, relational and hierarchical representations, and explore models with multiple/parallel actions, action durations, continuous resources, and exogenous events.


Math 22, Comp 15, Comp 160 or consent of instructor. Some exposure to probability theory is necessary. Previous courses in machine learning or artificial intelligence are helpful but not required.

Class Times:

Tuesday, Thursday 12:00-1:15 Halligan 106


Roni Khardon
Office Hours: Wednesday 3-4 (office H-230), or by appointment.


We will use [SB] for the basic unit on RL and try to cover much of this text in a few weeks. Other material is mainly from articles.

Course Work and Marking

The course will be run partly as a lecture course and partly a a reading seminar. Students will present and/or lead the discussion on some of the articles read. The course grade will be based on homework assignments (20%), paper presentations and reports and class discussion (30%), and a final project (50%). The final project can investigate some new (or old) application of complex RL domain, investigate some new (or old) methods, etc. I encourage initiative to create individually-tailored projects, and I am happy to consider group projects.

Tentative List of Topics

Lecture Topic Sources Comments
  RL Basics    
L1   [SB] Chapters 1-3  
  VI, PI etc.    
L2-L4   [SB] Chapter 4, [RN] Chapter 17
Convergence properties [P] Sections 6.2-3
Assignment 1   MDP Planning  
L5-L7 TD methods [SB] Chapters 5,6 for MC and TD(0), and chapter 7 for TD(lambda)  
L8-9 State Agregation and Parameteric Representations [SB] Chapter 8

Feature-Based Methods for Large Scale Dynamic Programming. Tsitsiklis and Van Roy. Machine Learning, Vol. 22, 1996, pp. 59-94.
L8-9 Learning and planning [SB] Chapter 9  
L10 Review Review Slides  
Assignment 2   MDP Learning and Planning  
L10 AI planning [RN] Chapter 3 and Chapter 10 (esp. 10.1-3).
AI slides from Blumer: Search Part I, and Search Part II.
AI slides from Lenaerts: Planning.
3/1 Polynomial guarantees E^3: Near-Optimal Reinforcement Learning in Polynomial Time. Kearns and Singh. ICML 1998.

R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning, Brafman and Tennenholtz, JMLR, 3(Oct):213-231, 2002.
Jingjing; Roni
      PAC Model-Free Reinforcement Learning, Strehl, Li, Wiewiora, Langford, and Littman, ICML 2006.  
3/8 Search based planning and MDPs LAO*: A Heuristic Search Algorithm that Finds Solutions with Loops E.A. Hansen and S. Zilberstein. Artificial Intelligence, 129(1-2):35-62, 2001.

Labeled RTDP: Improving the Convergence of Real Time Dynamic Programming. B. Bonet and H. Geffner. 13th International Conference on Automated Planning and Scheduling (ICAPS-2003).
mGPT: A Probabilistic Planner Based on Heuristic Search. Blai Bonet and Héctor Geffner. J. of Artificial Intelligence Research 24 (2005).
Keith; Bernie
3/10 + 3/16 Logic Based Factored Representations: I Graph-Based Algorithms for Boolean Function Manipulation, IEEE Transactions on Computers, Vol. C - 35, No. 8, August, 1986, pp. 677 - 691.

SPUDD: Stochastic Planning using Decision Diagrams, Hoey , St-Aubin, Hu and Boutilier, UAI 1999.
APRICODD: Approximate Policy Construction Using Decision Diagrams, St-Aubin, Hoey , and Boutilier, NIPS 2000.
Mengfei; Jisoo; Roni
      Stochastic Dynamic Programming with Factored Representations, Boutilier, Dearden and Goldszmidt, Artificial Intelligence 121(1), pp.49--107 (2000).  
  Logic Based Factored Representations: II Symbolic Heuristic Search for Factored Markov Decision Processes Feng, Z. Hansen, E. A. , AAAI 2002.

Symbolic Generalization for On-line Planning. Z. Feng, E.A. Hansen, and S. Zilberstein. Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI), 2003.

Factored E^3: Efficient Reinforcement Learning in Factored MDPs, Kearns and Koller, IJCAI 1999.
3/17 + 3/29 Relational MDPs: Dynamic Programming Symbolic Dynamic Programming for First-order MDPs, Boutilier, Reiter and Price, IJCAI 2001.

Bellman Goes Relational, Kersting, van Otterlo and De Raedt, ICML 2004.

First Order Decision Diagrams for Relational MDPs, Wang, Joshi and Khardon, JAIR 2008.
3/31 Linear Representations of Factored MDPs   Roni
    Efficient Solution Algorithms for Factored MDPs, Guestrin, Koller, Parr and Venkataraman, JAIR Volume 19, pp. 399-468, 2003.  
    Piecewise Linear Value Function Approximation for Factored MDPs, Poupart, Boutilier, Schuurmans and Patrascu, AAAI 2002.  
    Least-Squares Policy Iteration Michail G. Lagoudakis and Ronald Parr Journal of Machine Learning Research, 4, 2003, pp. 1107-1149.  
  Relational MDPs: Linear Functions    
    Practical solution techniques for first-order MDPs, S. Sanner, and C. Boutilier, Artificial Intelligence Journal (AIJ). Volume 173, pp 748-788, 2009.  
  Inverse RL    
    Apprenticeship learning via inverse reinforcement learning, Abbeel and Ng, ICML 2004.

An Application of Reinforcement Learning to Aerobatic Helicopter Flight, Abbeel, Coates, Quigley and Ng, NIPS 2006.

Videos of results from these papers available from Andrew Ng's home page
  Hierarchical reinforcement learning    
    Hierarchical reinforcement learning with the MAXQ value function decomposition, Dietterich, JAIR, 13, 227-303.  
    Reinforcement Learning with Hierarchies of Machines, Parr and Russell. NIPS 97

State Abstraction for Programmable Reinforcement Learning Agents. Andre, and Russell, AAAI-2002.

A compact, hierarchically optimal Q-function decomposition, Marthi, Russell, and Andre, UAI 2006.
  Relational MDPs, Supervised Learning and API    
    Learning to take Actions, Khardon, Machine Learning, Vol 35, No 1, 1999, pages 57-90.

Learning Action Strategies for Planning Domains, Khardon, Artificial Intelligence Vol 113, 1999, pages 125-148.

Inductive Policy Selection for First-Order Markov Decision Processes, Yoon, Fern, and Givan, UAI-2002
    Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes, Fern, Yoon, and Givan JAIR, 25, 85-118, 2006.

Reinforcement Learning as Classification: Leveraging Modern Classifiers, Lagoudakis and Parr, ICML 2003.
  Relational MDPs: RRL    
4/5   Relational reinforcement learning, Dzeroski, De Raedt, and Driessens, Machine Learning 43, pp. 7-52, 2001

Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains, K. Kersting, K. Driessens, ICML 2008.
Jisoo; Roni
    Gaussian Processes as Regression technique for Relational Reinforcement Learning, Driessens, Ramon and Gaertner, Machine Learning 64, pp. 91-119, 2006

Integrating guidance into relational reinforcement learning, Driessens and Dzeroski, Machine Learning 57, pp. 271-304, 2004
  Hierarchical Relational RL Function Approximation in Hierarchical Relational Reinforcement Learning, Roncagliolo and Tadepalli, RRL Workshop, 2005.  
  Reward Shaping Policy invariance under reward transformations: Theory and application to reward shaping, Andrew Y. Ng, Daishi Harada, Stuart Russell, ICML 1999.

Potential-Based Shaping and Q-Value Initialization are Equivalent, Eric Wiewiora, Journal of Artificial Intelligence Research, 2003.
  Model Learning    
    Learning Probabilistic Planning Rules, Pasula, Zettlemoyer, and Kaelbling, ICAPS 2004.

Learning Planning Rules in Noisy Stochastic Worlds. Zettlemoyer, Pasula, and Kaelbling, AAAI 2005.
    Learning Partially Observable Action Schemas, Shahaf and Amir, AAAI 2006.

Learning Partially Observable Action Models: Efficient Algorithms, Shahaf, Chang and Amir, AAAI 2006.

  Learning and using control knowledge Discriminative Learning of Beam-Search Heuristics for Planning, Xu, Fern, and Yoon, IJCAI 2007.

Using Learned Policies in Heuristic-Search Planning, Yoon, Fern, and Givan, IJCAI 2007.

4/7 Sampling/Search Based Algorithms Alan Fern's ICAPS10 tutorial

Lower Bounding Klondike Solitaire with Monte-Carlo Planning, Ronald Bjarnason, Alan Fern, and Prasad Tadepalli, ICAPS 2009.
Roni; Jingjing
    A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes, Kearns, Mansour, Ng, MLJ 2002.

Bandit Based Monte-Carlo Planning, L. Kocsis, Cs. Szepesvári, ECML 2006.
4/12 Planning and Inference Expectation-Maximization methods for solving (PO)MDPs and optimal control problems. Marc Toussaint, Amos Storkey, Stefan Harmeling (2010).

Planning with Noisy Probabilistic Relational Rules, Tobias Lang, Marc Toussaint, Journal of Artificial Intelligence Research, 39, 1-49, 2010.
Bernie; Keith
4/14 Exogenous Events Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery, Proper, S. and Tadepalli, P., European Conference on Machine Learning , 2006.
Solving Multiagent Assignment Markov Decision Processes, Proper, S., Tadepalli, P., in Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems p 681-688, 2009.

Simulation-based Optimization of Resource Placement and Emergency Response, Bjarnason, R., Tadepalli, P., Fern, A. and Niedner, C., in Conference on Innovative Applications of Artificial Intelligence (IAAI) , 2009, p 47--53.
    Approximate Dynamic Programming with Affine ADDs, S. Sanner, W. Uther, K. V. Delgado, In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-2010).

Approximate solution techniques for factored first-order MDPs, S. Sanner, and C. Boutilier, In Proceedings of the 17th Conference on Automated Planning and Scheduling (ICAPS-2007).
  Parallel Actions, Action Durations and Resources    
4/21   Planning with Durative Actions in Stochastic Domains, Mausam, Daniel S. Weld, Journal of Artificial Intelligence Research (JAIR). Volume 31 (2008) 33-82.

Solving generalized semi-Markov decision processes using continuous phase-type distributions, Younes and Simmons, In Proceedings of the Nineteenth National Conference on Artificial Intelligence, 2004.
Jisoo; Bernie
    A General, Fully Distributed Multi-Agent Planning Algorithm, Raz Nissim, Ronen I. Brafman, Carmel Domshlak, AAMAS 2010

A Heuristic Search Approach to Planning with Continuous Resources in Stochastic Domains, Nicolas Meuleau, Emmanuel Benazera, Ronen Brafman, Eric Hansen, Mausam, Journal of Artificial Intelligence Research (JAIR). Volume 34 (2009) 27-59.
    From One to Many: Planning for Loosely Coupled Multi-Agent Systems, Ronen I. Brafman, Carmel Domshlak, ICAPS 2008: 28-35

Planning and execution with phase transitions, H. Younes, In Proceedings of the Nineteenth National Conference on Artificial Intelligence, 2005.
4/26 Review Review Slides Roni
  Partial observability Planning and acting in partially observable stochastic domains. 1998. Kaelbling, Littman and Cassandra. Artificial Intelligence, 101(1--2):99--134.

Dynamic Programming for POMDPs using a Factored State Representation, Hansen and Feng. AIPS 2000.
  Predictive State Representations Learning Predictive State Representations, Singh, Littman, Jong, Pardoe and Stone. ICML 2003.

Predictive Representations of State, Littman, Sutton and Singh, NIPS 2002.

Predictive State Representations: A New Theory for Modeling Dynamical Systems by Satinder Singh, Michael R. James and Matthew R. Rudary. In Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI), pages 512-519, 2004.
    Relational Knowledge with Predictive State Representations by David Wingate, Vishal Soni, Britton Wolfe and Satinder Singh. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), 2007.  
  Proto Value Functions Samuel Meets Amarel: Automating Value Function Approximation using Global State Space Analysis, Mahadevan, AAAI 2005.

Representation Policy Iteration, Mahadevan, UAI 2005.

4/28 Project Presentations