/Integrating Structure with Deep Reinforcement and Imitation Learning

Integrating Structure with Deep Reinforcement and Imitation Learning

Arjun Sharma
Master's Thesis, Tech. Report, CMU-RI-TR-18-60, August, 2018

Download Publication (PDF)

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.


Most deep reinforcement and imitation learning methods are data-driven and do not utilize the underlying problem structure. While these methods have achieved great success on many challenging tasks, several key problems such as generalization, data efficiency, compositionality etc. remain open. Utilizing problem structure in the form of architecture design, priors, structured losses, domain knowledge etc. may be a viable strategy to solve some of these problems. In this thesis, we present two approaches towards integrating problem structure with deep reinforcement and imitation learning methods.

In the first part of the thesis, we consider reinforcement learning problems where parameters of the model vary with its phase while the agent attempts to learn through its interactions with the environment. We propose phase-parameterized policies and value function approximators which explicitly enforce a phase structure to the policy or value space to better model such environments. We apply our phase-parameterized reinforcement learning approach to both feed-forward and recurrent deep networks in the context of trajectory optimization and locomotion problems. Our experiments show that our proposed approach has superior modeling performance and leads to improved sample complexity when compared with traditional function approximators in cyclic and linear phase environments.

In the second part of the thesis, we present a framework that incorporates structure in imitation learning by modelling the imitation of complex tasks or activities as a composition of easier sub-tasks. We propose a new algorithm based on the Generative Adversarial Imitation Learning (GAIL) framework which automatically learns sub-task policies from unsegmented demonstrations. Our approach leverages the idea of directed or causal information to segment demonstrations of complex tasks into simpler sub-tasks and learn sub-task policies that can then be composed together to perform complicated activities. We thus call our approach Directed-Information GAIL. We experiment with both discrete and continuous state-action environments and show that our proposed approach is able to find meaningful sub-tasks from unsegmented trajectories which are then be combined to perform more complicated tasks.

BibTeX Reference
author = {Arjun Sharma},
title = {Integrating Structure with Deep Reinforcement and Imitation Learning},
year = {2018},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-18-60},