Environment Generalization in Deep Reinforcement Learning

Master's Thesis, Tech. Report, CMU-RI-TR-19-59, Robotics Institute, Carnegie Mellon University, July, 2019

View Publication

Abstract

A key challenge in deep reinforcement learning (RL) is environment generalization: a policy trained to solve a task in one environment often fails to solve the same task in a slightly different test environment. In this work, we propose the "Environment-Probing" Interaction (EPI) policy, which allows the agent to probe a new environment to extract an implicit understanding of that environment's behavior. Once this environment-specific information is obtained, it is used as an additional input to a task-specific policy that can now perform environment-conditioned actions to solve a task. To learn these EPI-policies, we present a reward function based on transition predictability. Specifically, a higher reward is given if the trajectory generated by the EPI-policy can be used to better predict transitions. We experimentally show that EPI-conditioned task-specific policies significantly outperform commonly used environment generalization methods on novel testing environments.

BibTeX

@mastersthesis{Zhou-2019-116805,
author = {Wenxuan Zhou},
title = {Environment Generalization in Deep Reinforcement Learning},
year = {2019},
month = {July},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-59},
keywords = {Reinforcement Learning; Robot Learning; System Identification; Domain Adaptation;},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.