|Recent research has shown the benefit of framing problems of imitation learning as solutions to Markov Decision Problems. This approach reduces the problem of learning to recovering a utility function that makes the behavior induced by a near-optimal policy closely mimic demonstrated behavior. In this work, we develop a probabilistic approach based on the principle of maximum entropy. Our approach provides a well-defined, globally normalized distribution over decisions, while providing the same performance guarantees as existing methods.
We develop our technique in the context of modeling real-world navigation and driving behaviors where collected data is inherently noisy and imperfect. Our probabilistic approach enables modeling of route preferences as well as a powerful new approach to inferring destinations and routes based on partial trajectories.
|maximum entropy, inverse reinforcement learning, learning preferences, planning, reinforcement learning|
Sponsor: National Science Foundation
Associated Center(s) / Consortia: Vision and Autonomous Systems Center and Quality of Life Technology Center
Associated Lab(s) / Group(s): Human-Robot Interaction Group
Associated Project(s): Quality of Life Technology and PeepPredict
|Brian D. Ziebart, Andrew Maas, J. Andrew (Drew) Bagnell, and Anind Dey, "Maximum Entropy Inverse Reinforcement Learning," Proceeding of AAAI 2008, July, 2008.|
author = "Brian D. Ziebart and Andrew Maas and J. Andrew (Drew) Bagnell and Anind Dey",
title = "Maximum Entropy Inverse Reinforcement Learning",
booktitle = "Proceeding of AAAI 2008",
month = "July",
year = "2008",
|The Robotics Institute is part of the School of Computer Science, Carnegie Mellon University.|
Contact Us | Update Instructions