Sparse Regression at Scale

9/10/20 | 4:15pm | Online only


 

 

 

 

Hussein Hazimeh

ORC PhD Student
MIT


Abstract: We consider the least squares regression problem, penalized with a combination of the L0 and L2 norms (a.k.a. L0L2 regularization). Recent work presents strong evidence that the resulting L0-based estimators can outperform popular sparse learning methods, under many important high-dimensional settings. However, exact computation of such estimators remains a major challenge. Indeed, state-of-the-art mixed integer programming (MIP) methods for L0L2-regularized regression face difficulties in solving many statistically interesting instances when the number of features p ~ 10^4.

In this work, we present a new exact MIP framework for L0L2-regularized regression that can scale to p ~ 10^7, achieving over 3600x speed-ups compared to the fastest exact methods. Unlike recent work, which relies on off-the-shelf MIP solvers, we design a specialized nonlinear branch-and-bound framework that critically exploits the problem structure. Our framework is based on a primal first-order algorithm for solving node subproblems. In addition, we develop a novel method to obtain dual bounds directly from primal solutions, and provide an analysis that demonstrates its effectiveness in high dimensions. We open source the implementation through our toolkit L0BnB.

Bio: Hussein Hazimeh is a PhD candidate at the MIT Operations Research Center, advised by Rahul Mazumder. His research lies at the intersection of optimization and machine learning. In particular, he is currently working on developing scalable optimization algorithms for sparse and interpretable learning. His work has been recently recognized by the INFORMS Computing Society (honorable mention) and 2019 mixed integer programming workshop (honorable mention).

Event Time: 

2020 - 16:15