On Local Rewards and Scaling Distributed Reinforcement Learning

Conference Paper, Proceedings of (NeurIPS) Neural Information Processing Systems, pp. 91 - 98, December, 2005

View Publication

Abstract

We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worst-case lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.

BibTeX

@conference{Bagnell-2005-9452,
author = {J. Andrew (Drew) Bagnell and Andrew Ng},
title = {On Local Rewards and Scaling Distributed Reinforcement Learning},
booktitle = {Proceedings of (NeurIPS) Neural Information Processing Systems},
year = {2005},
month = {December},
pages = {91 - 98},
publisher = {MIT Press},
keywords = {multi-agent, learning theory, reinforcement learning, Markov Decision Processes, local reward, distributed RL},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.