ContriQ: Ally-Focused Cooperation and Enemy-Concentrated Confrontation in Multi-Agent Reinforcement Learning

Chenran Zhao (National University of Defense Technology); Dianxi Shi (National Innovation Institute of Defense Technology;Tianjin Artiﬁcial Intelligence Innovation Center)*; Yaowen Zhang (National Innovation Institute of Defense Technology (NIIDT)); Huanhuan Yang (National University of Defense Technology); Shaowu Yang (National University of Defense Technology); Yongjun Zhang (National Innovation Institute of Defense Technology)

PMLR Page

Abstract

Centralized training with decentralized execution (CTDE) is an important setting for cooperative multi-agent reinforcement learning (MARL) due to communication constraints during execution and scalability constraints during training, which has shown superior performance but still suffers from challenges. One branch is to understand the mutual interplay between agents. Due to the communication constraints in practice, agents cannot exchange perceptual information, and thus, many approaches use a centralized attention network with scalability constraints. Contrary to these common approaches, we propose to learn to cooperate in a decentralized way by applying attention mechanism on the local observation so that each agent could focus on allied agents with a decentralized model, and therefore promote understanding. Another branch is to model how agents cooperate and simplify the learning process. Previous approaches that focus on value decomposition have achieved innovative results but still suffer from problems. These approaches either limit the representation expressiveness of their value function classes or relax the IGM consistency to achieve scalability, which may lead to poor performance. We combine value composition with game abstraction by modeling the relationships between agents as a bi-level graph. We propose a novel value decomposition network based on it through a bi-level attention network, which indicates the contribution of allied agents attacking enemies and the priority of attacking each enemy under the situation of each time step, respectively. We show that our method substantially outperforms existing state-of-the-art methods on battle games in StarCraft Ⅱ, and attention analysis is also comprehensively discussed with sights.