Policy Iteration with Gaussian Process based Value Function Approximation - Robotics Institute Carnegie Mellon University

Policy Iteration with Gaussian Process based Value Function Approximation

Ashwin Khadke and Akshara Rai
Workshop Paper, RSS '20 Robotics Retrospectives Workshop (RobRetro '20), July, 2020

Abstract

In this work, we explore the use of Gaussian processes (GP) as function approximators for Reinforcement Learning (RL), and build estimates of the value function and Q-function using GPs. Such a representation allows us to learn Q-functions, and thereby policies, conditioned on uncertainty in the system dynamics, and can be useful in sample efficiently transferring policies learned in simulation to hardware.

We use two approaches, GPTD and GPSARSA, to build approximate value functions and Q-functions respectively. While for simple, continuous problems, we found these to be effective at approximating the value function and the Q-function, for discontinuous landscapes GPSARSA deteriorates in performance, even on simple problems. As the problem complexity increases, for example, for an inverted pendulum, we find that both approaches are extremely sensitive to the GP hyperparameters, and do not scale well. We experiment with a sparse variant of the algorithm but find that GPSARSA still converges to poor solutions. Our experiments show that while GPTD and GPSARSA are nice theoretical formulations, they are not suitable for complex domains without extensive hyperparameter tuning.

BibTeX

@workshop{Khadke-2020-126695,
author = {Ashwin Khadke and Akshara Rai},
title = {Policy Iteration with Gaussian Process based Value Function Approximation},
booktitle = {Proceedings of RSS '20 Robotics Retrospectives Workshop (RobRetro '20)},
year = {2020},
month = {July},
}