Carnegie Mellon Robotics Institute
Leemon Baird and Andrew Moore
Advances in
Neural Information Processing Systems 11, , 1999
| Download |
|
| Abstract |
| A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcement-learning algorithms. These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MDPs. These include Q-learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function. In addition, it allows policy-search and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search (VAPS) algorithm. And these algorithms converge for POMDPs without requiring a proper belief state. Simulations results are given, and several areas for future research are discussed. |
| Notes |
Associated Lab(s) / Group(s):
Auton Lab Associated Project(s):
Auton Project Number of pages: 7 |
| Text Reference |
| Leemon Baird and Andrew Moore, "Gradient Descent for General Reinforcement Learning," Advances in Neural Information Processing Systems 11, , 1999 |
| BibTeX Reference |
|
@article{Baird_1999_2947, author = "Leemon Baird and Andrew Moore", title = "Gradient Descent for General Reinforcement Learning", journal = "Advances in Neural Information Processing Systems 11", year = "1999", } |
| The Robotics Institute is part of the School of Computer Science, Carnegie Mellon University. Contact Us | Update Instructions |