A convergent Reinforcement Learning algorithm in the continuous case based on a Finite Difference method

Remi Munos

Conference Paper, Proceedings of 15th International Joint Conference on Artificial Intelligence (IJCAI '97), pp. 826 - 831, August, 1997

View Publication

Abstract

In this paper, we propose a convergent Reinforcement Learning algorithm for solving optimal control problems for which the state space and the time are continuous variables. The problem of computing a good approximation of the value function, which is essential because this provides the optimal control, is a difficult task in the continuous case. Indeed, as it has been pointed out by several authors, the use of parameterized functions such as neural networks for approximating the value function may produce very bad results and even diverge. In fact, we show that classical algorithms, like Q-learning, used with a simple look-up table built on a regular grid, may fail to converge. The main reason is that the discretization of the state space implies a lost of the Markov property even for deterministic continuous processes. We propose to approximate the value function with a convergent numerical scheme based on a Finite Difference approximation of the Hamilton-Jacobi-Bellman equation. Then we present a model-free reinforcement learning algorithm, called Finite Difference Reinforcement Learning, and prove its convergence to the value function of the continuous problem.

BibTeX

@conference{Munos-1997-16480,
author = {Remi Munos},
title = {A convergent Reinforcement Learning algorithm in the continuous case based on a Finite Difference method},
booktitle = {Proceedings of 15th International Joint Conference on Artificial Intelligence (IJCAI '97)},
year = {1997},
month = {August},
pages = {826 - 831},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.