On Local Rewards and Scaling Distributed Reinforcement Learning

J. Andrew (Drew) Bagnell and Andrew Ng
Neural Information Processing Systems, May, 2006.


Download
  • Adobe portable document format (pdf) (728KB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worst-case lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.

Keywords
multi-agent, learning theory, reinforcement learning, Markov Decision Processes, local reward, distributed RL

Notes

Text Reference
J. Andrew (Drew) Bagnell and Andrew Ng, "On Local Rewards and Scaling Distributed Reinforcement Learning," Neural Information Processing Systems, May, 2006.

BibTeX Reference
@inproceedings{Bagnell_2006_5326,
   author = "J. Andrew (Drew) Bagnell and Andrew Ng",
   title = "On Local Rewards and Scaling Distributed Reinforcement Learning",
   booktitle = "Neural Information Processing Systems",
   publisher = "MIT Press",
   month = "May",
   year = "2006",
}