Environment Generalization in Deep Reinforcement Learning

Wenxuan Zhou
Master's Thesis, Tech. Report, CMU-RI-TR-19-59, July, 2019

Download Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


A key challenge in deep reinforcement learning (RL) is environment generalization: a policy trained to solve a task in one environment often fails to solve the same task in a slightly different test environment. In this work, we propose the “Environment-Probing” Interaction (EPI) policy, which allows the agent to probe a new environment to extract an implicit understanding of that environment’s behavior. Once this environment-specific information is obtained, it is used as an additional input to a task-specific policy that can now perform environment-conditioned actions to solve a task. To learn these EPI-policies, we present a reward function based on transition predictability. Specifically, a higher reward is given if the trajectory generated by the EPI-policy can be used to better predict transitions. We experimentally show that EPI-conditioned task-specific policies significantly outperform commonly used environment generalization methods on novel testing environments.

author = {Wenxuan Zhou},
title = {Environment Generalization in Deep Reinforcement Learning},
year = {2019},
month = {July},
school = {},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-59},
keywords = {Reinforcement Learning; Robot Learning; System Identification; Domain Adaptation;},
} 2019-07-26T14:16:10-04:00