Maximum Entropy Inverse Reinforcement Learning - Robotics Institute Carnegie Mellon University

Maximum Entropy Inverse Reinforcement Learning

Brian D. Ziebart, Andrew Maas, J. Andrew (Drew) Bagnell, and Anind Dey
Conference Paper, Proceedings of 23rd AAAI Conference on Artificial Intelligence (AAAI '08), pp. 1433 - 1438, July, 2008

Abstract

Recent research has shown the benefit of framing problems of imitation learning as solutions to Markov Decision Problems. This approach reduces the problem of learning to recovering a utility function that makes the behavior induced by a near-optimal policy closely mimic demonstrated behavior. In this work, we develop a probabilistic approach based on the principle of maximum entropy. Our approach provides a well-defined, globally normalized distribution over decisions, while providing the same performance guarantees as existing methods. We develop our technique in the context of modeling real-world navigation and driving behaviors where collected data is inherently noisy and imperfect. Our probabilistic approach enables modeling of route preferences as well as a powerful new approach to inferring destinations and routes based on partial trajectories.

BibTeX

@conference{Ziebart-2008-10030,
author = {Brian D. Ziebart and Andrew Maas and J. Andrew (Drew) Bagnell and Anind Dey},
title = {Maximum Entropy Inverse Reinforcement Learning},
booktitle = {Proceedings of 23rd AAAI Conference on Artificial Intelligence (AAAI '08)},
year = {2008},
month = {July},
pages = {1433 - 1438},
keywords = {maximum entropy, inverse reinforcement learning, learning preferences, planning, reinforcement learning},
}