Carnegie Mellon Robotics Institute
J. Andrew (Drew) Bagnell, Sham Kakade, Andrew Ng, and Jeff Schneider
Neural Information Processing Systems, December, 2003.
| Download |
|
| Abstract |
| We consider the policy search approach to reinforcement learning. We show that if a ``baseline distribution'' is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem. |
| Notes |
Associated Lab(s) / Group(s):
Auton Lab Associated Project(s):
Auton Project |
| Text Reference |
| J. Andrew (Drew) Bagnell, Sham Kakade, Andrew Ng, and Jeff Schneider, "Policy Search by Dynamic Programming," Neural Information Processing Systems, December, 2003. |
| BibTeX Reference |
|
@inproceedings{Bagnell_2003_4485, author = "J. Andrew (Drew) Bagnell and Sham Kakade and Andrew Ng and Jeff Schneider", title = "Policy Search by Dynamic Programming", booktitle = "Neural Information Processing Systems", publisher = "MIT Press", month = "December", year = "2003", volume = "16", } |
| The Robotics Institute is part of the School of Computer Science, Carnegie Mellon University. Contact Us |