Comp 150AML: Advanced Topics in Machine Learning
Learning, Planning and Acting in Complex Environments
Department of Computer Science
Tufts University
Spring 2007

Course Web Page (this page): http://www.cs.tufts.edu/comp/150AML/

Syllabus:

In this semester the course will focus on agents that learn, plan and act in non-deterministic and complex environments. We will first cover the main relevant results and approaches from Markov Decision Processes and Reinforcement Learning (RL) and then turn to recent research papers. Advanced topics include: combining planning and learning, factored representations, relational RL, hierarchical RL, partial programming, predictive state representations, approximation methods, convergence guarantees.

Prerequisites:

Math 22, Comp 15, Comp 160 or consent of instructor. Some exposure to probability theory is necessary. Previous courses in machine learning or artificial intelligence are helpful but not required.

Class Times:

Tuesday, Thursday 1:30-2:45 Halligan 106

Instructor:

Roni Khardon

Textbooks

We will use [SB] for the basic unit on RL and try to cover much of this text in a few weeks. Other material is mainly from articles.

Course Work and Marking

The course will be run partly as a lecture course and partly a a reading seminar. Students will present and/or lead the discussion on some of the articles read. The course grade will be based on small homework assignments (20%), paper presentations and class discussion (30%), and a final project (50%). The final project can investigate some new (or old) application of complex RL domain, investigate some new (or old) methods, etc. I encourage initiative to create individually-tailored projects, and I am happy to consider group projects.

Tentative List of Topics

Topic Sources Comments
RL Basics    
  [SB] Chapters 1-3 L1 (1/23)
VI, PI etc.    
  [SB] Chapter 4, [RN] Chapter 17 L2-L5
  Convergence properties [P] Sections 6.2-3 L4 (1/30)
TD methods [SB] Chapters 5,6 for TD(0), and chapter 7 for TD(lambda) L6-L7 (2/6, 2/8)
State Agregation and Parameteric Representations    
  [SB] Chapter 8

Feature-Based Methods for Large Scale Dynamic Programming. Tsitsiklis and Van Roy. Machine Learning, Vol. 22, 1996, pp. 59-94.
L8-L9 (2/13, 2/15)
Learning and planning [SB] Chapter 9 L10 (2/20)
RL Algorithms with theoretical guarantees    
  E^3: Near-Optimal Reinforcement Learning in Polynomial Time. Kearns and Singh. ICML 1998. L11 (2/27): Roni
  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning, Brafman and Tennenholtz, JMLR, 3(Oct):213-231, 2002. L12 (3/1): Steve [Kyle, Saket]
  PAC Model-Free Reinforcement Learning, Strehl, Li, Wiewiora, Langford, and Littman, ICML 2006. L12 (3/1): Gabe [Noah, Ben, Jonathan]
Logic Based Factored Representations    
  Stochastic Dynamic Programming with Factored Representations, Boutilier, Dearden and Goldszmidt, Artificial Intelligence 121(1), pp.49--107 (2000). L13 (3/6): Jonathan, Saket [Ben]
  SPUDD: Stochastic Planning using Decision Diagrams, Hoey , St-Aubin, Hu and Boutilier, UAI 1999.

APRICODD: Approximate Policy Construction Using Decision Diagrams, St-Aubin, Hoey , and Boutilier, NIPS 2000.
L14 (3/8): Noah, Kyle [Gabe, Steve]
  Factored E^3: Efficient Reinforcement Learning in Factored MDPs, Kearns and Koller, IJCAI 1999. Roni
Linear Representations of Factored MDPs    
  Efficient Solution Algorithms for Factored MDPs, Guestrin, Koller, Parr and Venkataraman, JAIR Volume 19, pp. 399-468, 2003. L15 (3/13): Ben, Steve [Gabe, Jonathan]
  Piecewise Linear Value Function Approximation for Factored MDPs, Poupart, Boutilier, Schuurmans and Patrascu, AAAI 2002.  
Inverse RL    
  Apprenticeship learning via inverse reinforcement learning, Abbeel and Ng, ICML 2004.

An Application of Reinforcement Learning to Aerobatic Helicopter Flight, Abbeel, Coates, Quigley and Ng, NIPS 2006.

Videos of results from these papers available from Andrew Ng's home page
L16 (3/15): Roni
Hierarchical reinforcement learning    
  Hierarchical reinforcement learning with the MAXQ value function decomposition, Dietterich, JAIR, 13, 227-303. L17 (3/27): Saket, Kyle [Steve]
  Reinforcement Learning with Hierarchies of Machines, Parr and Russell. NIPS 97

State Abstraction for Programmable Reinforcement Learning Agents. Andre, and Russell, AAAI-2002.

A compact, hierarchically optimal Q-function decomposition, Marthi, Russell, and Andre, UAI 2006.
L18 (3/29): Jonathan, Gabe [Noah, Ben]
Relational MDPs, Supervised Learning and API    
  Learning to take Actions, Khardon, Machine Learning, Vol 35, No 1, 1999, pages 57-90.

Learning Action Strategies for Planning Domains, Khardon, Artificial Intelligence Vol 113, 1999, pages 125-148.

Inductive Policy Selection for First-Order Markov Decision Processes, Yoon, Fern, and Givan, UAI-2002
L19 (4/5): Roni
  Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes, Fern, Yoon, and Givan JAIR, 25, 85-118, 2006.

Reinforcement Learning as Classification: Leveraging Modern Classifiers, Lagoudakis and Parr, ICML 2003.
L20 (**4/6 10:30**) Ben, Noah [All]
Relational MDPs: Linear Functions    
  Practical Linear Value-approximation Techniques for First-order MDPs, Sanner and Boutilier, UAI 2006.

Approximate Linear Programming for First-order MDPs Sanner and Boutilier, UAI 2005.
L21 (4/10): Gabe, Kyle [Noah]
Relational MDPs: RRL    
  Relational reinforcement learning, Dzeroski, De Raedt, and Driessens, Machine Learning 43, pp. 7-52, 2001

Gaussian Processes as Regression technique for Relational Reinforcement Learning, Driessens, Ramon and Gaertner, Machine Learning 64, pp. 91-119, 2006

Integrating guidance into relational reinforcement learning, Driessens and Dzeroski, Machine Learning 57, pp. 271-304, 2004
L22 (4/12) Ben, Jonathan [Saket, Steve]
Relational MDPs: Dynamic Programming    
  Symbolic Dynamic Programming for First-order MDPs, Boutilier, Reiter and Price, IJCAI 2001.

Bellman Goes Relational, Kersting, van Otterlo and De Raedt, ICML 2004.

First Order Decision Diagrams for Relational MDPs, Wang, Joshi and Khardon, IJCAI 2007.
L23 (4/17): Saket
Hierarchical Relational RL Function Approximation in Hierarchical Relational Reinforcement Learning, Roncagliolo and Tadepalli, RRL Workshop, 2005.  
Model Learning    
  Learning Probabilistic Planning Rules, Pasula, Zettlemoyer, and Kaelbling, ICAPS 2004.

Learning Planning Rules in Noisy Stochastic Worlds. Zettlemoyer, Pasula, and Kaelbling, AAAI 2005.
L24 (4/19): Noah, Steve [All]
  Learning Partially Observable Action Schemas, Shahaf and Amir, AAAI 2006.

Learning Partially Observable Action Models: Efficient Algorithms, Shahaf, Chang and Amir, AAAI 2006.

 
Learning and using control knowledge Discriminative Learning of Beam-Search Heuristics for Planning, Xu, Fern, and Yoon, IJCAI 2007.

Using Learned Policies in Heuristic-Search Planning, Yoon, Fern, and Givan, IJCAI 2007.

 
Partial observability Planning and acting in partially observable stochastic domains. 1998. Kaelbling, Littman and Cassandra. Artificial Intelligence, 101(1--2):99--134.

Dynamic Programming for POMDPs using a Factored State Representation, Hansen and Feng. AIPS 2000.
 
Predictive State Representations Learning Predictive State Representations, Singh, Littman, Jong, Pardoe and Stone. ICML 2003.

Predictive Representations of State, Littman, Sutton and Singh, NIPS 2002.

Predictive State Representations: A New Theory for Modeling Dynamical Systems by Satinder Singh, Michael R. James and Matthew R. Rudary. In Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI), pages 512-519, 2004.
L25 (4/24): Roni
  Relational Knowledge with Predictive State Representations by David Wingate, Vishal Soni, Britton Wolfe and Satinder Singh. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), 2007.  
Proto Value Functions Samuel Meets Amarel: Automating Value Function Approximation using Global State Space Analysis, Mahadevan, AAAI 2005.

Representation Policy Iteration, Mahadevan, UAI 2005.

L26 (4/26): Roni
Probabilistic Planners ...