Off-Policy Reinforcement Learning for Autonomous Driving

Master's Thesis, Tech. Report, CMU-RI-TR-20-34, Robotics Institute, Carnegie Mellon University, August, 2020

View Publication

Abstract

Modern autonomous driving systems continue to face the challenges of handling complex and variable multi-agent real-world scenarios. Some subsystems, such as perception, use deep learning-based approaches to leverage large amounts of data to generalize to novel scenes. Other subsystems, such as planning and control, still follow the classic cost-based trajectory optimization approaches, and require high efforts to handle the long tail of rare events. Deep Reinforcement Learning (RL) has shown encouraging evidence in learning complex decision-making tasks, spanning from strategic games to challenging robotics tasks. Further, the dense reward structure and modest time horizons make autonomous driving a favorable prospect for applying RL.

As there are practical challenges in running RL online on vehicles and most self-driving companies have millions of miles of collected data, it motivates the use of off-policy RL algorithms to learn policies that can eventually work in the real world. We explore the use of an off-policy RL algorithm, Deep Q-Learning, to learn goal-directed navigation in a simulated urban driving environment. Since Deep Q-Learning methods are susceptible to instability and sub-optimal convergence, we investigate different strategies to sample experiences from the replay buffer to mitigate these issues. We also explore combining expert agent's demonstration data with the RL agent's experiences to speed-up the learning process. We demonstrate promising results on the CoRL2017 and NoCrash benchmarks on CARLA.

BibTeX

@mastersthesis{Arora-2020-123619,
author = {Hitesh Arora},
title = {Off-Policy Reinforcement Learning for Autonomous Driving},
year = {2020},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-20-34},
keywords = {Reinforcement Learning, Autonomous Driving, Q-learning, Reinforcement Learning with Expert Demonstrations},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.