Deep Reinforcement Learning with Prior Knowledge

Tao Chen
Master's Thesis, Tech. Report, CMU-RI-TR-19-09, May, 2019

Download Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Deep reinforcement learning has been applied to many domains from computer games, natural language processing, recommendation systems to robotics. While there are many scenarios where huge amounts of data is easily available such as games, the applications of deep reinforcement learning to robotics is often limited by the bottleneck of acquiring data. Hence, generalization becomes essential in making the learning algorithm practical in robotics. We found out that using prior knowledge of the tasks can significantly boost the learning performance and generalization capabilities.

Deep reinforcement learning could be used to learn dexterous robotic policies but it is challenging to transfer them to new robots with vastly different hardware properties. It is also prohibitively expensive to learn a new policy from scratch for each robot hardware due to the high sample complexity of modern state-of-the-art algorithms. We propose a novel approach called Hardware Conditioned Policies where we train a universal policy conditioned on a vector representation of robot hardware. We considered robots in simulation with varied dynamics, kinematic structure, kinematic lengths and degrees-of-freedom and show better generalization with our method.

In this thesis, we also explore the generalization problem in navigation. Even though numerous past works have tackled the problem of task-driven navigation, how to effectively explore a new environment to enable a variety of down-stream tasks has received much less attention. We study how agents can autonomously explore realistic and complex 3D environments without the context of task-rewards. We propose a learning-based approach and investigate different policy architectures, reward functions, and training paradigms. We find that use of policies with spatial memory that are bootstrapped with imitation learning and finally fine-tuned with coverage rewards derived purely from on-board sensors can be effective at exploring novel environments. We also show how such task-agnostic exploration can be used for down-stream tasks.

author = {Tao Chen},
title = {Deep Reinforcement Learning with Prior Knowledge},
year = {2019},
month = {May},
school = {},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-09},
} 2019-05-03T15:55:08-04:00