The Necessity of Average Rewards in Cooperative Multirobot Learning

Poj Tangamchit, John M. Dolan, and Pradeep Khosla
IEEE Conference on Robotics and Automation 2002, May, 2002.


Download
  • Adobe portable document format (pdf) (177KB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
Learning can be an effective way for robot systems to deal with dynamic environments and changing task conditions. However, popular single-robot learning algorithms based on discounted rewards, such as Q learning, do not achieve cooperation (i.e., purposeful division of labor) when applied to task-level multirobot systems. A task-level system is defined as one performing a mission that is decomposed into subtasks shared among robots. In this paper, we demonstrate the superiority of average-reward-based learning such as the Monte Carlo algorithm for task-level multirobot systems, and suggest an explanation for this superiority.

Keywords
Cooperation, Multirobot, Learning

Notes

Text Reference
Poj Tangamchit, John M. Dolan, and Pradeep Khosla, "The Necessity of Average Rewards in Cooperative Multirobot Learning," IEEE Conference on Robotics and Automation 2002, May, 2002.

BibTeX Reference
@inproceedings{Tangamchit_2002_3974,
   author = "Poj Tangamchit and John M Dolan and Pradeep Khosla",
   title = "The Necessity of Average Rewards in Cooperative Multirobot Learning",
   booktitle = "IEEE Conference on Robotics and Automation 2002",
   month = "May",
   year = "2002",
}