Distribution-Free, Risk-Controlling Prediction Sets

9/23/21 | 4:15pm | E51-149

Stephen Bates

Postdoctoral Researcher
University of California, Berkeley

Abstract: While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying predictive models in consequential settings also requires analyzing and communicating their uncertainty. To give valid inference for prediction tasks, we show how to generate set-valued predictions from any black-box predictive model that control certain statistical error rates on future test points at a user-specified level. Our approach provides explicit finite-sample guarantees for any dataset and model by using a holdout set to calibrate the size of the prediction sets. This framework enables simple, distribution-free, rigorous error control for many tasks, and we demonstrate it in four large-scale prediction problems: (1) multi-label classification, where each observation has multiple associated labels; (2) classification problems where the labels have a hierarchical structure; (3) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (4) protein structure prediction.

Bio: Dr. Stephen Bates is a postdoctoral fellow working with Professor Michael I. Jordan at UC Berkeley and the Simons Institute for the Theory of Computing. His research interests include high-dimensional statistics, causal inference, and uncertainty quantification for predictive models. Previously, he earned his Ph.D. in statistics from Stanford University under the supervision of Professor Emmanuel Candès, where he received the best dissertation award and his thesis work appeared on the cover of the Proceedings of the National Academy of Sciences (USA).