Accepted Paper: X-Armed Bandits: Optimizing Quantiles, CVaR and Other Risks

Session 7: Reinforcement Learning -- Day 3 (Nov.19), poster session: 11:30-14:00, talks: 15:55-17:10 (5th floor Hall 1)
Poster number: Tue34
Download paper

Authors

Leonard Torossian (INRA-IMT); Aurélien Garivier (ENS Lyon); Victor Picheny (Prowler)

Abstract

We propose and analyze StoROO, an algorithm for risk optimization on stochastic black-box functions derived from StoOO. Motivated by risk-averse decision making fields like agriculture, medicine, biology or finance, we do not focus on the mean payoff but on generic functionals of the return distribution. We provide a generic regret analysis of StoROO and illustrate its applicability with two examples: the optimization of quantiles and CVaR. Inspired by the bandit literature and black-box mean optimizers, StoROO relies on the possibility to construct confidence intervals for the targeted functional based on random-size samples. We detail their construction in the case of quantiles, providing tight bounds based on Kullback-Leibler divergence. We finally present numerical experiments that show a dramatic impact of tight bounds for the optimization of quantiles and CVaR.