Battle of Bandits: Online learning from Preference Feedback

Abstract

This tutorial would cover the development and recent progress on Preference based Bandits where the goal is to sequentially learn the best-action of a decision set from preference feedback over an actively chosen subset of items. We initially start with a brief overview of the motivation and problem formulation, and then understand the breakthrough results for the simplest pairwise preference setting (where the subsets are of size 2), famously studied as the ‘Dueling Bandit’ problem in the literature. Will then generalize it to the ‘Battling Bandits’ framework for subsets of any arbitrary size and understand the tradeoff between learning rates-vs-increasing subset sizes.

Biographies

Aadirupa Saha
Aadirupa Saha is a postdoctoral researcher at Microsoft Research New York City. Earlier she was a PhD student at Indian Institute of Science, Bangalore. Her research interests broadly lie in the areas of Bandits and Reinforcement learning, Optimization, Learning theory, and Algorithm analysis. She currently works on developing large-scale robust algorithms for various sequential online decision-making problems under preference information or more generally partial monitoring feedback.