Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling

Benjamin Freed, Aditya Kapoor, Ian Abraham, Jeff Schneider, and Howie Choset

Journal Article, IEEE Robotics and Automation Letters, Vol. 7, No. 2, pp. 890 - 897, April, 2022

View Publication

Abstract

One of the preeminent obstacles to scaling multiagent reinforcement learning to large numbers of agents is assigning credit to individual agents’ actions. In this paper, we address this credit assignment problem with an approach that we call partial reward decoupling (PRD), which attempts to decompose large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment. We empirically demonstrate that decomposing the RL problem using PRD in an actor-critic algorithm results in lower variance policy gradient estimates, which improves data efficiency, learning stability, and asymptotic performance across a wide array of multi-agent RL tasks, compared to various other actor-critic approaches. Additionally, we relate our approach to counterfactual multi-agent policy gradient (COMA), a state-of-the-art MARL algorithm, and empirically show that our approach outperforms COMA by making better use of information in agents’ reward streams, and by enabling recent advances in advantage estimation to be used.

BibTeX

@article{Freed-2022-131921,
author = {Benjamin Freed and Aditya Kapoor and Ian Abraham and Jeff Schneider and Howie Choset},
title = {Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling},
journal = {IEEE Robotics and Automation Letters},
year = {2022},
month = {April},
volume = {7},
number = {2},
pages = {890 - 897},
keywords = {Reinforcement Learning, Multi-Robot Systems, Cooperating Robots},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.