COMP 150AML: Advanced Topics in Machine Learning
Learning, Planning and Acting in Complex Environments
Department of Computer Science
Tufts University
Fall 2014

Course Web Page (this page): http://www.cs.tufts.edu/comp/150AML/

Announcement(s):
  • Khardon's office hours on Tuesday 11/25 are canceled. Please email with any questions.
  • Office hours on Thursday+Friday 10/30-31 are canceled. Please email Roni Khardon with any questions.
  • (10/17) Clarification for assignment 3: you need to have an updated version of Java to run the RDDL code; if you are having trouble with that, please ssh to log in to the departmental homework server and do your work there.
  • (10/9) Assignment 3 posted.
  • Temporary change in office hours: office hours on Tuesday 10/14 4:30-5:30 will be held by Megfei Cao. Office Hours on Thursday 10/16, 1:30-3:00 will be held by Roni Khardon.
  • (9/29) Assignment 2 clarification. The algorithms do not have an explicit stopping criterion. Run you simulations to a max of 500000 steps.
  • Course information, Reading, Assignments, etc updated regularly in the table below.

Syllabus:

In this semester the course will focus on agents that learn, plan and act in non-deterministic and complex environments. We will first cover the main relevant results and approaches from Markov Decision Processes and Reinforcement Learning (RL) and then turn to recent research papers reviewing representations and corresponding algorithms approximation methods and convergence guarantees for complex domains. We will discuss factored, relational and hierarchical representations, and explore models with multiple/parallel actions, action durations, continuous resources, and exogenous events. We will explore applications in online web/user interaction, smart cities, and computational sustainability.

Prerequisites:

Math 22, Comp 15, Comp 160 or consent of instructor. Some exposure to probability theory is necessary. Previous courses in machine learning or artificial intelligence are helpful but not required.

Class Times:

Monday, Wednesday 1:30-2:45 Halligan 111A

Instructor:

Roni Khardon
Office Hours: Tuesday 4:30-5:30 (also Mon/Wed 5:45 after 135 lecture)
Office: Halligan 230
Phone: 1-617-627-5290
Email: roni@cs.tufts.edu

Teaching Assistant:

Mengfei Cao, Email: mengfei.cao@tufts.edu
Office Hours: Thursday and Friday 1:30-2:30, Halligan 121.

Textbooks

We will use [SB] for the basic unit on RL and try to cover much of this text in a few weeks. Other material is mainly from articles.

Course Organization, Work and Marking

Policy on collaboration on homework assignments:

You may discuss the problems and general ideas about their solutions with other students, and similarly you may consult other textbooks or the web. However, you must work out the details on your own and code/write-out the solution on your own. Every such collaboration (either getting help of giving help) and every use of text or electronic sources must be clearly cited and acknowledged in the submitted homework. Failure to follow these guidelines may result in disciplinary action for all parties involved. Any questions? for this and other issues concerning academic integrity please consult the booklet available from the office of the Dean of Student Affairs.

Tentative List of Topics

Lecture Topic Sources Comments
  RL Basics    
L1   [SB] Chapters 1-3  
  VI, PI etc.    
L2-L5   [SB] Chapter 4, [RN] Chapter 17
Convergence properties [P] Sections 6.2-3
See also Review Slides
 
Assignment 1   MDP Planning due 9/24
L6-L7 TD methods, Generalization, Learning and Planning [SB] Chapters 5-9
For convergence result with aggregation consult Feature-Based Methods for Large Scale Dynamic Programming. Tsitsiklis and Van Roy. Machine Learning, Vol. 22, 1996, pp. 59-94.
See also Review Slides
 
Assignment 2   MDP Learning and Planning due 10/6
L8-9 Symbolic Dynamic Programming Symbolic Boolean manipulation with ordered binary-decision diagrams, by Randal E Bryant, 1992 (read at least sections 1-3)
SPUDD: Stochastic Planning using Decision Diagrams, Hoey , St-Aubin, Hu and Boutilier, UAI 1999.
 
  Optional Reading APRICODD: Approximate Policy Construction Using Decision Diagrams, St-Aubin, Hoey , and Boutilier, NIPS 2000.  
L10-11 Introduction to RDDL RDDL overview and specification
RDDL Tutorial Slides from 2014 (used in class)
RDDL video tutorial from 2011
RDDL Code Repository
 
  A challenge problem from industry Invited talk at ICAPS 2014 on "How to Coordinate a Thousand Robots".  
L11-12 AI Search and RTDP [RN] Chapter 3,4.
Labeled RTDP: Improving the Convergence of Real Time Dynamic Programming. B. Bonet and H. Geffner. 13th International Conference on Automated Planning and Scheduling (ICAPS-2003).
See also Lecture Slides
 
Assignment 3   Working with RDDL and Additional Files for the assignment. due 10/22
L13-16 Monte Carlo Search for Planning We will be using excellent Lecture Slides by Alan Fern on
Bandits
Rollout and Policy Improvement
Tree Search
Approximate Policy Iteration

Kocsis, L. and Szepesvari, Cs., Bandit based Monte-Carlo Planning, ECML, pp. 282--293, 2006.
Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvari, Cs., and Teytaud, O., The grand challenge of computer Go: Monte Carlo tree search and extensions, Communications of the ACM, 55 (3) , pp. 106--113, 2012.
 
L16 Review of Advanced algorithms from recent lectures
See Review Slides
 
Assignment 4   Policy Rollout and Additional Files for the assignment. due 11/10
Information for Project   project.pdf 11/6, 11/10, and 12/8
(L18) Exam The exam includes all the material covered in class up to this point but excluding details of RDDL (L10-11). You are expected to know the material at the level discussed in class, i.e., if we discussed formal properties and their proofs I will expect you to be proficient in these. If we only explained an algorithm, how it worked, and the intuition behind it but did not formally analyze it, you are expected to know that much but you are not expected to cover the analysis on your own.  
L17 Relational Symbolic Dynamic Programming First Order Decision Diagrams for Relational MDPs Wang, Joshi and Khardon, JAIR 2008.
Solving Relational MDPs with Exogenous Events and Additive Rewards S. Joshi, R. Khardon, P. Tadepalli, A. Raghavan, A. Fern, 2013.
 
  Optional Reading Symbolic Dynamic Programming for First-order MDPs. Boutilier, Reiter and Price, IJCAI 2001.
Bellman Goes Relational Kersting, Van Otterlo and De Raedt, ICML 2004.
Practical Solution Techniques for First-order MDPs Sanner and Boutilier, AIJ 2009.
 
L19 Policy Based Methods
Approximate Policy Iteration with a Policy Language Bias
non-Parametric Policy Gradients
 
L20 Partially Observable MDPs [RN] Section 17.4  
L21   Brief Project Presentations  
L22-25 Paper Presentations Reverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors(GLUTTON)
LRTDP vs. UCT for Online Probabilistic Planning (Gourmand)

PROST: Probabilistic Planning Based on UCT
Trial-based Heuristic Tree Search for Finite Horizon MDPs (THTS/PROST)

Concurrent Reinforcement Learning from Customer Interactions

An Empirical Evaluation of Thompson Sampling (app to Advertizing)

A Contextual-Bandit Approach to Personalized News Article Recommendation

Design, Analysis, and Learning Control of a Fully Actuated Micro Wind Turbine

Optimal Planning and Learning in Uncertain Environments for the Management of Wind Farms

Non-Linear Monte-Carlo Search in Civilization II

Playing Atari with Deep Reinforcement Learning
 
L26   Brief Project Presentations