A Non-Parametric Approach to Dynamic Programming

Conference Paper, Proceedings of (NeurIPS) Neural Information Processing Systems, pp. 1719 - 1727, December, 2011

View Publication

Abstract

In this paper, we consider the problem of policy evaluation for continuous state systems. We present a non-parametric approach to policy evaluation, which uses kernel density estimation to represent the system. The true form of the value function for this model can be determined, and can be computed using Galerkin’s method. Furthermore, we also present a unified view of several well-known policy evaluation methods. In particular, we show that the same Galerkin method can be used to derive Least-Squares Temporal Difference learning, Kernelized Temporal Difference learning, and a discrete-state Dynamic Programming solution, as well as our proposed method. In a numerical evaluation of these algorithms, the proposed approach performed better than the other methods.

BibTeX

@conference{Kroemer-2011-112193,
author = {Oliver Kroemer and Jan Peters},
title = {A Non-Parametric Approach to Dynamic Programming},
booktitle = {Proceedings of (NeurIPS) Neural Information Processing Systems},
year = {2011},
month = {December},
pages = {1719 - 1727},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.