2/25/21 | 4:15pm | Online only
Abstract: In this talk, I will present a framework for performing average-case analysis in the large dimensional regime. Average-case analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worst-case analysis, it is more representative of the typical behavior of an algorithm but remains largely unexplored in continuous optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, I will show that this is not the case for a class of large-scale problems trained with first-order methods including random least squares and one-hidden layer neural networks with random weights. In fact, the halting time exhibits a universality property: it is independent of the probability distribution. With this barrier for average-case analysis removed, I will present the first explicit average-case convergence rates showing a tighter complexity not captured by traditional worst-case analysis. Finally, I will illustrate how this framework can be applied to stochastic gradient descent (SGD). Under our approach, SGD with a fixed step size has deterministic dynamics and, surprisingly, these dynamics are governed by a Volterra integral equation. By analyzing this Volterra equation, I will show SGD undergoes a phase transition at a critical step size that ultimately affects its convergence rate.
Bio: Courtney Paquette is an assistant professor at McGill University and a CIFAR Canada AI chair. Paquette’s research broadly focuses on designing and analyzing algorithms for large-scale optimization problems, motivated by applications in data science. She received her PhD from the mathematics department at the University of Washington (2017), held postdoctoral positions at Lehigh University (2017-2018) and University of Waterloo (NSF postdoctoral fellowship, 2018-2019), and was a research scientist at Google Research, Brain Montreal (2019-2020).