XSTrees: Extended Sampled Tree Ensembles for Classification and Regression

12/2/21 | 4:15pm | E25-111





Omar Skali Lami

ORC PhD Student

Abstract: Predictive analytics is at the core of many operations management problems, ranging from understanding a customer's propensity to buy a product to a patient's length of stay in the emergency department. This talk introduces the Extended Sampled Trees (XSTrees) method, a novel tree ensemble method for classification and regression. Instead of learning a single decision tree such as CART, or an independent collection of trees such as Random Forests or Gradient Boosting, XSTrees learns a probability distribution over the tree space. 

This approach enjoys good theoretical guarantees and has a significant edge over other ensemble methods in terms of performance. Both aspects are equally critical in operations management applications. 

Analytically, we prove that XSTrees converges to the true underlying tree model with rate log(n)/n where n is the number of training observations through averaging a sequence of local optimums of the CART estimation problem. Experimentally, we show on publicly available datasets and synthetic data that XSTrees is very competitive with the state-of-the-art predictive models, with an average accuracy between 2.5% and 50% higher than competitors for classification and an average R2 between 2% and 85% higher for regression. We finally discuss two critical applications, one on revenue management with Wayfair, the leading online retailer for home goods and décor, on customer choice in the context of ancillary services, and the other with UMass Memorial Hospital on predicting patients' length of stay in the Emergency Department.

Bio: Omar is a fourth-year Ph.D. student at the Operations Research Center at the Massachusetts Institute of Technology (MIT). Before that, he received a Master of Business Analytics from MIT and a Master of Science in Applied Mathematics from École Centrale Paris.

Alongside academia, Omar has worked in management and data science consulting at McKinsey & Company in their QuantumBlack team. He helped clients through advanced analytics and optimization to make transformative and sustainable performance improvements.

Omar’s primary research interests lie at the intersection of predictive and prescriptive analytics in operations management, focusing mainly on developing novel machine learning and optimization methods for revenue management and healthcare applications.

Event Time: 

2021 - 16:15