Dynamic Coordination Graph for Cooperative Multi-Agent Reinforcement Learning

Chapman Siu (University of Technology Sydney)*; Jason Traish (University of Technology Sydney ); Richard Yi Da Xu (University of Technology, Sydney)
PMLR Page

Abstract

This paper introduces Dynamic Q-value Coordination Graph (QCGraph) for cooperative multi-agent reinforcement learning. QCGraph aims to dynamically represent and generalize through factorizing the joint value function of all agents according to dynamically created coordination graph based on subsets of agents. The value can be maximized by message passing at both a local and global level along the graph which allows training the value function end-to-end. The coordination graph is dynamically generated and used to generate the payoff functions which are approximated using graph neural networks and parameter sharing to improve generalization over the state-action space. We show that QCGraph can solve a variety of challenging multi-agent tasks being superior to other value factorization approaches.