Efficient Reductions for Imitation Learning

Stephane Ross and J. Andrew (Drew) Bagnell
Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), May, 2010.


Download
  • Adobe portable document format (pdf) (390KB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d.. This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively.

Keywords
Imitation Learning, Reduction

Notes
Sponsor: ONR MURI
Associated Center(s) / Consortia: National Robotics Engineering Center
Number of pages: 8

Text Reference
Stephane Ross and J. Andrew (Drew) Bagnell, "Efficient Reductions for Imitation Learning," Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), May, 2010.

BibTeX Reference
@inproceedings{Ross_2010_6569,
   author = "Stephane Ross and J. Andrew (Drew) Bagnell",
   title = "Efficient Reductions for Imitation Learning",
   booktitle = "Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS)",
   month = "May",
   year = "2010",
}