10/10/19 | 4:15pm | E51-335
Reception to follow.
Abstract: We consider a dynamic assortment selection problem where the goal is to offer a sequence of assortments that maximizes the expected cumulative revenue. The feedback here is in the form of the item that the user picks from the offered assortment. This problem arises in many real-world applications such as online retailing, streaming services, online advertising, news websites, etc. We assume the customer choice is described by a multinomial logit (MNL) choice model where the utility of each item is a potentially time-varying function of contextual information of both the item and the user. We propose upper confidence bound (UCB) based policies for this problem to simultaneously optimize assortment selections and learn users' preference, balancing between exploration and exploitation. We establish the near-optimal regret bound when the assortment size is fixed. Motivated by the superior empirical performance of Thompson sampling (TS) methods in many other bandit problems, we also propose TS-based policies for this problem. We establish both the Bayesian regret and the worst-case regret for the TS-based policies. In particular, we propose an optimistic sampling technique to address the challenge of ensuring the frequent optimism which arises in the worst-case regret analysis. We show that empirical performance of the proposed policies outperform the theoretical guarantees. We will also discuss extensions to contextual reinforcement learning.
Bio: Garud Iyengar is a Professor in the Industrial Engineering and Operations Research Department in the School of Engineering and Applied Science at Columbia University. He received his B. Tech. in Electrical Engineering from IIT Kanpur, and an MS and PhD in Electrical Engineering from Stanford University. His research interests are broadly in control and optimization. His published works span a diverse range of fields, including information theory, applied mathematics, operations research, economics and financing engineering. His current projects focus on the areas of large scale portfolio selection, reinforcement learning, and modeling of cellular processes.