Algorithms for Learning Markov Field Policies

Abdeslam Boularias, Oliver Kroemer and Jan Peters
Conference Paper, Neural Information Processing Systems (NIPS), January, 2012

Download Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

We use a graphical model for representing policies in Markov Decision Processes. This new representation can easily incorporate domain knowledge in the form of a state similarity graph that loosely indicates which states are supposed to have similar optimal actions. A bias is then introduced into the policy search process by sampling policies from a distribution that assigns high probabilities to policies that agree with the provided state similarity graph, i.e. smoother policies. This distribution corresponds to a Markov Random Field. We also present forward and inverse reinforcement learning algorithms for learning such policy distributions. We illustrate the advantage of the proposed approach on two problems: cart-balancing with swing-up, and teaching a robot to grasp unknown objects.


@conference{Boularias-2012-112191,
author = {Abdeslam Boularias and Oliver Kroemer and Jan Peters},
title = {Algorithms for Learning Markov Field Policies},
booktitle = {Neural Information Processing Systems (NIPS)},
year = {2012},
month = {January},
} 2019-03-12T14:19:07-04:00