A convergent Reinforcement Learning algorithm in the continuous case based on a Finite Difference method - Robotics Institute Carnegie Mellon University

A convergent Reinforcement Learning algorithm in the continuous case based on a Finite Difference method

Remi Munos
Conference Paper, Proceedings of 15th International Joint Conference on Artificial Intelligence (IJCAI '97), pp. 826 - 831, August, 1997

Abstract

In this paper, we propose a convergent Reinforcement Learning algorithm for solving optimal control problems for which the state space and the time are continuous variables. The problem of computing a good approximation of the value function, which is essential because this provides the optimal control, is a difficult task in the continuous case. Indeed, as it has been pointed out by several authors, the use of parameterized functions such as neural networks for approximating the value function may produce very bad results and even diverge. In fact, we show that classical algorithms, like Q-learning, used with a simple look-up table built on a regular grid, may fail to converge. The main reason is that the discretization of the state space implies a lost of the Markov property even for deterministic continuous processes. We propose to approximate the value function with a convergent numerical scheme based on a Finite Difference approximation of the Hamilton-Jacobi-Bellman equation. Then we present a model-free reinforcement learning algorithm, called Finite Difference Reinforcement Learning, and prove its convergence to the value function of the continuous problem.

BibTeX

@conference{Munos-1997-16480,
author = {Remi Munos},
title = {A convergent Reinforcement Learning algorithm in the continuous case based on a Finite Difference method},
booktitle = {Proceedings of 15th International Joint Conference on Artificial Intelligence (IJCAI '97)},
year = {1997},
month = {August},
pages = {826 - 831},
}