Generalization in Reinforcement Learning: Safely Approximating the Value Function - Robotics Institute Carnegie Mellon University

Generalization in Reinforcement Learning: Safely Approximating the Value Function

Justin Boyan and Andrew Moore
Conference Paper, Proceedings of (NeurIPS) Neural Information Processing Systems, pp. 369 – 376, December, 1994

Abstract

A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neural net. Although this has been successful in the domain of backgammon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic programming and function approximation is not robust, and in even very benign cases, may produce an entirely wrong policy. We then introduce Grow-Support, a new algorithm which is safe from divergence yet can still reap the benefits of successful generalization.

Notes
Selected for oral presentation. Approximately 30 out of 500 submissions received this honor

BibTeX

@conference{Boyan-1994-16117,
author = {Justin Boyan and Andrew Moore},
title = {Generalization in Reinforcement Learning: Safely Approximating the Value Function},
booktitle = {Proceedings of (NeurIPS) Neural Information Processing Systems},
year = {1994},
month = {December},
editor = {G. Tesauro, D. S. Touretzky, and T. K. Leen},
pages = {369 – 376},
publisher = {MIT Press},
}