Generalization in Reinforcement Learning: Safely Approximating the Value Function
Conference Paper, Proceedings of (NeurIPS) Neural Information Processing Systems, pp. 369 – 376, December, 1994
Abstract
A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neural net. Although this has been successful in the domain of backgammon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic programming and function approximation is not robust, and in even very benign cases, may produce an entirely wrong policy. We then introduce Grow-Support, a new algorithm which is safe from divergence yet can still reap the benefits of successful generalization.
Notes
Selected for oral presentation. Approximately 30 out of 500 submissions received this honor
Selected for oral presentation. Approximately 30 out of 500 submissions received this honor
BibTeX
@conference{Boyan-1994-16117,author = {Justin Boyan and Andrew Moore},
title = {Generalization in Reinforcement Learning: Safely Approximating the Value Function},
booktitle = {Proceedings of (NeurIPS) Neural Information Processing Systems},
year = {1994},
month = {December},
editor = {G. Tesauro, D. S. Touretzky, and T. K. Leen},
pages = {369 – 376},
publisher = {MIT Press},
}
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.