Hierarchical Relative Entropy Policy Search

Christian Daniel, Gerhard Neumann, Oliver Kroemer, and Jan Peters

Journal Article, Journal of Machine Learning Research, Vol. 17, No. 93, pp. 1 - 50, June, 2016

View Publication

Abstract

Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that are strongly structured. Such task structures can be exploited by incorporating hierarchical policies that consist of gating networks and sub-policies. However, this concept has only been partially explored for real world settings and complete methods, derived from first principles, are needed. Real world settings are challenging due to large and continuous state-action spaces that are prohibitive for exhaustive sampling methods. We define the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-policies for execution by the agent. In order to efficiently share experience with all sub-policies, also called inter-policy learning, we treat these sub-policies as latent variables which allows for distribution of the update information between the sub-policies. We present three different variants of our algorithm, designed to be suitable for a wide variety of real world robot learning tasks and evaluate our algorithms in two real robot learning scenarios as well as several simulations and comparisons.

BibTeX

@article{Daniel-2016-112241,
author = {Christian Daniel and Gerhard Neumann and Oliver Kroemer and Jan Peters},
title = {Hierarchical Relative Entropy Policy Search},
journal = {Journal of Machine Learning Research},
year = {2016},
month = {June},
volume = {17},
number = {93},
pages = {1 - 50},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.