UMOTEM: Upper Bounding Method for Optimizing over Tree Ensemble Models

11/2/23 | 4:15pm | E51-335


 

 

 

 

Leann Thayaparan

ORC PhD Candidate
2023 ORC Best Student Paper Award Winner


Abstract: Machine learning has become core to forecasting and planning. However, when decision makers are provided with trained and more complex machine learning models, often these models are difficult to then optimize over. When tree-based ensemble models, such as Random Forest or XGBoost, are used in optimization formulations, they require an exponential number of binary decision variables. Optimization problems of this type do not scale well. We propose UMOTEM (Upper Bounding Method for Optimizing over Tree Ensemble Models), an algorithm for solving a constrained optimization problem where the objective function is determined by a tree ensemble model. The algorithm narrows the region of decision variables to an approximate region of optimality by iteratively optimizing using upper bounds as it moves down the trees in the ensemble, at each step only using information available at that depth of the tree. This significantly improves the problem's complexity, with the number of binary variables scaling only linearly, quickly outpacing the exponential growth of the alternative formulations. We show how this method can be used to jointly predict and optimize to save time building sub-optimal branches of the decision trees. We prove an expected optimality gap bound for Random Forest in terms of the forest's in-sample error and leaf separation and show when it is tight. We demonstrate computationally that our algorithm can capture at least 90% of optimality on a variety of datasets. Finally, we discuss an application of this problem concerning the issue of driver behavior and renewables in the electric grid. As we move towards more renewable resources, the ability to produce electricity in time with demand diminishes. Instead, rises a need for energy storage or the ability to produce electricity when renewables allow and store it for when demand needs it later. Electric Vehicles (EVs) have been discussed as a way of providing a distributed energy storage resource to the electric grid. By forecasting driver behavior using a random forest and optimizing the charging and discharging through UMOTEM, we show that optimal charging and discharging of the car’s battery can result in $1,069 saved and carbon benefit equivalent to 77.3% of a home’s energy consumption annually. 

Bio: Leann Thayaparan is a fifth-year PhD student at the MIT, Operations Research Center where she is being advised by Prof. Georgia Perakis. Her research focuses on optimization problems learned from data-driven machine learning, with a focus on sustainable operations. Her research has been in collaboration with companies such as General Motors, Oracle Retail, the CDC and MIT Quest for Intelligence. Before joining the PhD program, Leann received her Master’s of Business Analytics (2019) at MIT, worked at McKinsey and Co in their Operations Advance Analytics group and graduated with high honors in Operations Research and Financial Engineering (2016) from Princeton University.

Event Time: 

2023 - 16:15