Topic 
Sources 
Comments 
RL Basics 



[SB] Chapters 13 
L1 (1/23) 
VI, PI etc. 



[SB] Chapter 4, [RN] Chapter 17 
L2L5 

Convergence properties [P] Sections 6.23 
L4 (1/30) 
TD methods 
[SB] Chapters 5,6 for TD(0), and chapter 7 for TD(lambda) 
L6L7 (2/6, 2/8) 
State Agregation and Parameteric Representations 



[SB] Chapter 8
FeatureBased Methods for Large Scale Dynamic Programming.
Tsitsiklis and Van Roy.
Machine Learning, Vol. 22, 1996, pp. 5994.

L8L9 (2/13, 2/15) 
Learning and planning 
[SB] Chapter 9 
L10 (2/20) 
RL Algorithms with theoretical guarantees 



E^3: NearOptimal Reinforcement Learning in Polynomial Time. Kearns and
Singh. ICML 1998.
 L11 (2/27): Roni 

RMAX  A General Polynomial Time Algorithm for NearOptimal
Reinforcement Learning, Brafman and Tennenholtz, JMLR,
3(Oct):213231, 2002.

L12 (3/1): Steve [Kyle, Saket] 

PAC ModelFree Reinforcement Learning,
Strehl, Li, Wiewiora, Langford, and Littman,
ICML 2006.

L12 (3/1): Gabe [Noah, Ben, Jonathan] 
Logic Based Factored Representations 



Stochastic Dynamic Programming with Factored Representations,
Boutilier, Dearden and Goldszmidt,
Artificial Intelligence 121(1), pp.49107 (2000).

L13 (3/6): Jonathan, Saket [Ben] 

SPUDD: Stochastic Planning using Decision Diagrams,
Hoey , StAubin, Hu and Boutilier, UAI 1999.
APRICODD: Approximate Policy Construction Using Decision Diagrams,
StAubin, Hoey , and Boutilier, NIPS 2000.

L14 (3/8): Noah, Kyle [Gabe, Steve] 

Factored E^3:
Efficient Reinforcement Learning in Factored MDPs,
Kearns and Koller, IJCAI 1999.

Roni 
Linear Representations of Factored MDPs 



Efficient Solution Algorithms for Factored MDPs,
Guestrin, Koller, Parr and Venkataraman,
JAIR Volume 19, pp. 399468, 2003.

L15 (3/13): Ben, Steve [Gabe, Jonathan] 

Piecewise Linear Value Function Approximation for Factored MDPs,
Poupart, Boutilier, Schuurmans and Patrascu, AAAI 2002.


Inverse RL 



Apprenticeship learning via inverse reinforcement learning,
Abbeel and Ng,
ICML 2004.
An Application of Reinforcement Learning to Aerobatic Helicopter Flight,
Abbeel, Coates, Quigley and Ng, NIPS 2006.
Videos of results from these papers available from
Andrew Ng's home page

L16 (3/15): Roni 
Hierarchical reinforcement learning 



Hierarchical reinforcement learning with the MAXQ value function
decomposition,
Dietterich, JAIR, 13, 227303.

L17 (3/27): Saket, Kyle [Steve] 

Reinforcement Learning with Hierarchies of Machines,
Parr and Russell. NIPS 97
State Abstraction for Programmable Reinforcement Learning Agents.
Andre, and Russell, AAAI2002.
A compact, hierarchically optimal Qfunction decomposition,
Marthi, Russell, and Andre, UAI 2006.

L18 (3/29): Jonathan, Gabe [Noah, Ben] 
Relational MDPs, Supervised Learning and API 



Learning to take Actions,
Khardon, Machine Learning, Vol 35, No 1, 1999, pages 5790.
Learning Action Strategies for Planning Domains,
Khardon, Artificial Intelligence Vol 113, 1999, pages 125148.
Inductive Policy Selection for FirstOrder Markov Decision Processes,
Yoon, Fern, and Givan, UAI2002

L19 (4/5): Roni 

Approximate Policy Iteration with a Policy Language Bias: Solving
Relational Markov Decision Processes,
Fern, Yoon, and Givan
JAIR, 25, 85118, 2006.
Reinforcement Learning as Classification: Leveraging Modern
Classifiers, Lagoudakis and Parr, ICML 2003.

L20 (**4/6 10:30**) Ben, Noah [All] 
Relational MDPs: Linear Functions 



Practical Linear Valueapproximation Techniques for Firstorder MDPs,
Sanner and Boutilier, UAI 2006.
Approximate Linear Programming for Firstorder MDPs
Sanner and Boutilier, UAI 2005.

L21 (4/10): Gabe, Kyle [Noah] 
Relational MDPs: RRL 



Relational reinforcement learning,
Dzeroski, De Raedt, and Driessens,
Machine Learning 43, pp. 752, 2001
Gaussian Processes as
Regression technique for Relational Reinforcement Learning,
Driessens, Ramon and Gaertner,
Machine Learning 64, pp. 91119, 2006
Integrating guidance into relational reinforcement learning,
Driessens and Dzeroski,
Machine Learning 57, pp. 271304, 2004

L22 (4/12) Ben, Jonathan [Saket, Steve] 
Relational MDPs: Dynamic Programming 



Symbolic Dynamic Programming for Firstorder MDPs,
Boutilier, Reiter and Price, IJCAI 2001.
Bellman Goes Relational,
Kersting, van Otterlo and De Raedt, ICML 2004.
First Order Decision Diagrams for Relational MDPs,
Wang, Joshi and Khardon, IJCAI 2007.

L23 (4/17): Saket 
Hierarchical Relational RL 
Function Approximation in Hierarchical Relational Reinforcement Learning,
Roncagliolo and Tadepalli, RRL Workshop, 2005.


Model Learning 
 


Learning Probabilistic Planning Rules,
Pasula, Zettlemoyer, and Kaelbling,
ICAPS 2004.
Learning Planning Rules in Noisy Stochastic Worlds.
Zettlemoyer, Pasula, and Kaelbling,
AAAI 2005.

L24 (4/19): Noah, Steve [All] 

Learning Partially Observable Action Schemas,
Shahaf and Amir, AAAI 2006.
Learning Partially Observable Action Models: Efficient Algorithms,
Shahaf, Chang and Amir, AAAI 2006.


Learning and using control knowledge 
Discriminative Learning of BeamSearch Heuristics for Planning,
Xu, Fern, and Yoon, IJCAI 2007.
Using Learned Policies in HeuristicSearch Planning,
Yoon, Fern, and Givan, IJCAI 2007.


Partial observability 
Planning and acting in partially observable stochastic
domains. 1998.
Kaelbling, Littman and Cassandra.
Artificial Intelligence, 101(12):99134.
Dynamic Programming for POMDPs using a Factored State Representation,
Hansen and Feng.
AIPS 2000.


Predictive State Representations 
Learning Predictive State Representations,
Singh, Littman, Jong, Pardoe and Stone.
ICML 2003.
Predictive Representations of State,
Littman, Sutton and Singh,
NIPS 2002.
Predictive State Representations: A New Theory for Modeling Dynamical
Systems by Satinder Singh, Michael R. James and Matthew R. Rudary. In
Uncertainty in Artificial Intelligence: Proceedings of the Twentieth
Conference (UAI), pages 512519, 2004.

L25 (4/24): Roni 

Relational Knowledge with Predictive State Representations by David Wingate, Vishal Soni, Britton Wolfe and Satinder Singh. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), 2007.


Proto Value Functions 
Samuel Meets Amarel: Automating Value Function Approximation using
Global State Space Analysis,
Mahadevan,
AAAI 2005.
Representation Policy Iteration,
Mahadevan,
UAI 2005.

L26 (4/26): Roni 
Probabilistic Planners 
...

