Understanding Gradient Descent in Neutral Networks: Mean Field and Beyond

October 18, 2023
11:30AM to 12:30pm EST
JCC 265
Speaker: Margalit Glasgow
Host: Liping Liu

Abstract

Recent years have empirically demonstrated the unprecedented success of deep learning, with gradient descent starring as the central workhorse of training neural networks. However, our theoretical understanding of why gradient descent succeeds still lags: why does gradient descent find good local minima in the highly non- convex loss landscape given by neural networks? In this talk, I'll discuss two recent works on training two-layer neural networks via (stochastic) gradient descent to learn single- and multi-index models with near-optimal sample complexity. In both settings, the main challenge is showing that the gradient descent trajectory escapes saddles in the landscape. We show that even in the presence of unstable saddle regions, the gradient descent trajectory stays close to a certain surrogate trajectory, either characterized by the mean- field limit of the neural network (ie. when the network has infinite width), or by other approximations. Ultimately, we show that we can learn via gradient descent with far fewer training examples than previously guaranteed by the mean-field or neural tangent kernel approaches.

Bio:

Margalit Glasgow is a 6th year PhD student at Stanford University, advised by Mary Wootters and Tengyu Ma. Her research focuses on theoretical aspects of machine learning, in particular understanding the capabilities and limitations of gradient-based optimization.