/Phase-Parametric Policies for Reinforcement Learning in Cyclic Environments

Phase-Parametric Policies for Reinforcement Learning in Cyclic Environments

Arjun Sharma and Kris M. Kitani
Conference Paper, AAAI Conference on Artificial Intelligence, February, 2018

Download Publication (PDF)

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.


In many reinforcement learning problems, parameters of the model may vary with its phase while the agent attempts to learn through its interaction with the environment. For example, an autonomous car’s reward on selecting a path may depend on traffic conditions at the time of the day or the transition dynamics of a drone may depend on the current wind direction. Many such processes exhibit a cyclic phase-structure and could be represented with a control policy parameterized over a circular or cyclic phase space. Attempting to model such phase variations with a standard data-driven approach (e.g. deep networks) without explicitly modeling the phase of the model can be challenging. Ambiguities may arise as the optimal action for a given state can vary depending on the phase. To better model cyclic environments, we propose
phase-parameterized policies and value function approximators that explicitly enforce a cyclic structure to the policy
or value space. We apply our phase-parameterized reinforcement learning approach to both feed-forward and recurrent
deep networks in the context of trajectory optimization and locomotion problems. Our experiments show that our proposed approach has superior modeling performance than traditional function approximators in cyclic environments.

BibTeX Reference
author = {Arjun Sharma and Kris M. Kitani},
title = {Phase-Parametric Policies for Reinforcement Learning in Cyclic Environments},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2018},
month = {February},