On Local Rewards and Scaling Distributed Reinforcement Learning - Robotics Institute Carnegie Mellon University

On Local Rewards and Scaling Distributed Reinforcement Learning

Conference Paper, Proceedings of (NeurIPS) Neural Information Processing Systems, pp. 91 - 98, December, 2005

Abstract

We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worst-case lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.

BibTeX

@conference{Bagnell-2005-9452,
author = {J. Andrew (Drew) Bagnell and Andrew Ng},
title = {On Local Rewards and Scaling Distributed Reinforcement Learning},
booktitle = {Proceedings of (NeurIPS) Neural Information Processing Systems},
year = {2005},
month = {December},
pages = {91 - 98},
publisher = {MIT Press},
keywords = {multi-agent, learning theory, reinforcement learning, Markov Decision Processes, local reward, distributed RL},
}