| Topic |
Sources |
Comments |
| RL Basics |
  |
  |
|   |
[SB] Chapters 1-3 |
L1 (1/23) |
| VI, PI etc. |
  |
  |
|   |
[SB] Chapter 4, [RN] Chapter 17 |
L2-L5 |
|   |
Convergence properties [P] Sections 6.2-3 |
L4 (1/30) |
| TD methods |
[SB] Chapters 5,6 for TD(0), and chapter 7 for TD(lambda) |
L6-L7 (2/6, 2/8) |
| State Agregation and Parameteric Representations |
  |
  |
|   |
[SB] Chapter 8
Feature-Based Methods for Large Scale Dynamic Programming.
Tsitsiklis and Van Roy.
Machine Learning, Vol. 22, 1996, pp. 59-94.
|
L8-L9 (2/13, 2/15) |
| Learning and planning |
[SB] Chapter 9 |
L10 (2/20) |
| RL Algorithms with theoretical guarantees |
  |
  |
|   |
E^3: Near-Optimal Reinforcement Learning in Polynomial Time. Kearns and
Singh. ICML 1998.
| L11 (2/27): Roni |
|   |
R-MAX - A General Polynomial Time Algorithm for Near-Optimal
Reinforcement Learning, Brafman and Tennenholtz, JMLR,
3(Oct):213-231, 2002.
|
L12 (3/1): Steve [Kyle, Saket] |
|   |
PAC Model-Free Reinforcement Learning,
Strehl, Li, Wiewiora, Langford, and Littman,
ICML 2006.
|
L12 (3/1): Gabe [Noah, Ben, Jonathan] |
| Logic Based Factored Representations |
  |
  |
|   |
Stochastic Dynamic Programming with Factored Representations,
Boutilier, Dearden and Goldszmidt,
Artificial Intelligence 121(1), pp.49--107 (2000).
|
L13 (3/6): Jonathan, Saket [Ben] |
|   |
SPUDD: Stochastic Planning using Decision Diagrams,
Hoey , St-Aubin, Hu and Boutilier, UAI 1999.
APRICODD: Approximate Policy Construction Using Decision Diagrams,
St-Aubin, Hoey , and Boutilier, NIPS 2000.
|
L14 (3/8): Noah, Kyle [Gabe, Steve] |
|   |
Factored E^3:
Efficient Reinforcement Learning in Factored MDPs,
Kearns and Koller, IJCAI 1999.
|
Roni |
| Linear Representations of Factored MDPs |
  |
  |
|   |
Efficient Solution Algorithms for Factored MDPs,
Guestrin, Koller, Parr and Venkataraman,
JAIR Volume 19, pp. 399-468, 2003.
|
L15 (3/13): Ben, Steve [Gabe, Jonathan] |
|   |
Piecewise Linear Value Function Approximation for Factored MDPs,
Poupart, Boutilier, Schuurmans and Patrascu, AAAI 2002.
|
  |
| Inverse RL |
  |
  |
|   |
Apprenticeship learning via inverse reinforcement learning,
Abbeel and Ng,
ICML 2004.
An Application of Reinforcement Learning to Aerobatic Helicopter Flight,
Abbeel, Coates, Quigley and Ng, NIPS 2006.
Videos of results from these papers available from
Andrew Ng's home page
|
L16 (3/15): Roni |
| Hierarchical reinforcement learning |
  |
  |
|   |
Hierarchical reinforcement learning with the MAXQ value function
decomposition,
Dietterich, JAIR, 13, 227-303.
|
L17 (3/27): Saket, Kyle [Steve] |
|   |
Reinforcement Learning with Hierarchies of Machines,
Parr and Russell. NIPS 97
State Abstraction for Programmable Reinforcement Learning Agents.
Andre, and Russell, AAAI-2002.
A compact, hierarchically optimal Q-function decomposition,
Marthi, Russell, and Andre, UAI 2006.
|
L18 (3/29): Jonathan, Gabe [Noah, Ben] |
| Relational MDPs, Supervised Learning and API |
  |
  |
|   |
Learning to take Actions,
Khardon, Machine Learning, Vol 35, No 1, 1999, pages 57-90.
Learning Action Strategies for Planning Domains,
Khardon, Artificial Intelligence Vol 113, 1999, pages 125-148.
Inductive Policy Selection for First-Order Markov Decision Processes,
Yoon, Fern, and Givan, UAI-2002
|
L19 (4/5): Roni |
|   |
Approximate Policy Iteration with a Policy Language Bias: Solving
Relational Markov Decision Processes,
Fern, Yoon, and Givan
JAIR, 25, 85-118, 2006.
Reinforcement Learning as Classification: Leveraging Modern
Classifiers, Lagoudakis and Parr, ICML 2003.
|
L20 (**4/6 10:30**) Ben, Noah [All] |
| Relational MDPs: Linear Functions |
  |
  |
|   |
Practical Linear Value-approximation Techniques for First-order MDPs,
Sanner and Boutilier, UAI 2006.
Approximate Linear Programming for First-order MDPs
Sanner and Boutilier, UAI 2005.
|
L21 (4/10): Gabe, Kyle [Noah] |
| Relational MDPs: RRL |
  |
  |
|   |
Relational reinforcement learning,
Dzeroski, De Raedt, and Driessens,
Machine Learning 43, pp. 7-52, 2001
Gaussian Processes as
Regression technique for Relational Reinforcement Learning,
Driessens, Ramon and Gaertner,
Machine Learning 64, pp. 91-119, 2006
Integrating guidance into relational reinforcement learning,
Driessens and Dzeroski,
Machine Learning 57, pp. 271-304, 2004
|
L22 (4/12) Ben, Jonathan [Saket, Steve] |
| Relational MDPs: Dynamic Programming |
  |
  |
|   |
Symbolic Dynamic Programming for First-order MDPs,
Boutilier, Reiter and Price, IJCAI 2001.
Bellman Goes Relational,
Kersting, van Otterlo and De Raedt, ICML 2004.
First Order Decision Diagrams for Relational MDPs,
Wang, Joshi and Khardon, IJCAI 2007.
|
L23 (4/17): Saket |
| Hierarchical Relational RL |
Function Approximation in Hierarchical Relational Reinforcement Learning,
Roncagliolo and Tadepalli, RRL Workshop, 2005.
|
  |
| Model Learning |
|   |
  |
|   |
Learning Probabilistic Planning Rules,
Pasula, Zettlemoyer, and Kaelbling,
ICAPS 2004.
Learning Planning Rules in Noisy Stochastic Worlds.
Zettlemoyer, Pasula, and Kaelbling,
AAAI 2005.
|
L24 (4/19): Noah, Steve [All] |
|   |
Learning Partially Observable Action Schemas,
Shahaf and Amir, AAAI 2006.
Learning Partially Observable Action Models: Efficient Algorithms,
Shahaf, Chang and Amir, AAAI 2006.
|
  |
| Learning and using control knowledge |
Discriminative Learning of Beam-Search Heuristics for Planning,
Xu, Fern, and Yoon, IJCAI 2007.
Using Learned Policies in Heuristic-Search Planning,
Yoon, Fern, and Givan, IJCAI 2007.
|
  |
| Partial observability |
Planning and acting in partially observable stochastic
domains. 1998.
Kaelbling, Littman and Cassandra.
Artificial Intelligence, 101(1--2):99--134.
Dynamic Programming for POMDPs using a Factored State Representation,
Hansen and Feng.
AIPS 2000.
|
  |
| Predictive State Representations |
Learning Predictive State Representations,
Singh, Littman, Jong, Pardoe and Stone.
ICML 2003.
Predictive Representations of State,
Littman, Sutton and Singh,
NIPS 2002.
Predictive State Representations: A New Theory for Modeling Dynamical
Systems by Satinder Singh, Michael R. James and Matthew R. Rudary. In
Uncertainty in Artificial Intelligence: Proceedings of the Twentieth
Conference (UAI), pages 512-519, 2004.
|
L25 (4/24): Roni |
|   |
Relational Knowledge with Predictive State Representations by David Wingate, Vishal Soni, Britton Wolfe and Satinder Singh. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), 2007.
|
  |
| Proto Value Functions |
Samuel Meets Amarel: Automating Value Function Approximation using
Global State Space Analysis,
Mahadevan,
AAAI 2005.
Representation Policy Iteration,
Mahadevan,
UAI 2005.
|
L26 (4/26): Roni |
| Probabilistic Planners |
...
|
  |