ACML 2020 🇹🇭
  • News
  • Program

Constrained Reinforcement Learning via Policy Splitting

By Haoxian Chen, Henry Lam, Fengpei Li, and Amirhossein Meisami

Abstract

We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem.