Improving Imitation Learning through Efficient Expert Querying

Master's Thesis, Tech. Report, CMU-RI-TR-18-56, Robotics Institute, Carnegie Mellon University, August, 2018

View Publication

Abstract

Learning from demonstration is an intuitive approach to encoding complex behaviors in autonomous agents. Learners have shown success in challenging tasks like autonomous driving, aerial obstacle avoidance, and information gathering, through observation and mimicry alone. State of the art algorithms like Dataset Aggregation (DAgger) have made significant advances over traditional behavior cloning, demonstrating strong theoretical and empirical results. However, these methods typically impose large sampling burdens on experts which may restrict the type of demonstrators or problems that can be addressed.

In this work we propose a modified version of the DAgger algorithm aimed at reducing expert queries while maintaining learner performance. Randomly initialized policies typically have state distributions unlike those of the final policies, leading to wasted expert labeling especially early in training. By increasing the rate of policy updates we aim to collect more relevant labeled data with respect to the total number of queries. In addition, we implement several supervised active learning approaches as part of our query selection, allowing policy uncertainty to inform expert label queries. We demonstrate our algorithm on a variety of simulated robot manipulator and control tasks.

BibTeX

@mastersthesis{Hanczor-2018-107302,
author = {Matthew Hanczor},
title = {Improving Imitation Learning through Efficient Expert Querying},
year = {2018},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-18-56},
keywords = {Imitation Learning, DAgger, QE-DAgger, Learning from Demonstration},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.