ACML 2020 | Constrained Reinforcement Learning via Policy Splitting by Haoxian Chen, Henry Lam, Fengpei Li, and Amirhossein Meisami

Constrained Reinforcement Learning via Policy Splitting

By Haoxian Chen, Henry Lam, Fengpei Li, and Amirhossein Meisami

Abstract

We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem.