# Reinforcement Learning by Policy Search

## Abstract

Teaching is hard, criticizing is easy. This metaphor stands behind the concept of reinforcement learning as opposed to supervised learning. Reinforcement learning means learning a policy---a mapping of observations into actions---based on feedback from the environment. Learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. In this talk I briefly review the framework of reinforcement learning and present two highlights from my dissertation. First, I describe an algorithm which learns by ascending the gradient of expected cumulative reinforcement. I show what conditions enable experience re-use in learning. Building on statistical learning theory, I address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds. Second, I demonstrate an application of the proposed algorithm to the complex domain of simulated adaptive packet routing in a telecommunication network. I conclude by suggesting how to build an intelligent agent and where to apply reinforcement learning in computer vision and natural language processing.