Behavior Planning at Roundabouts

Aman Khurana
Master's Thesis, Tech. Report, CMU-RI-TR-19-54, August, 2019

Download Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Roundabouts or traffic circles represent a significant portion of unsignalized intersections commonly found in urban and rural roads due to their capability in managing significant traffic flow safely. Such circular intersections pose a specific challenge for autonomous or self-driving cars due to the variations in their geometric layout, difficulty in perception, increased interactions between the traffic participants, and possible driving strategies available to the drivers. This work investigates behavior planning approaches for a self-driving vehicle as a part of a hierarchical planning structure for such scenarios. We present the benefits of using a POMDP formulation along with dividing the task into different stages (merging, traversal, and exit) to tackle the problem. Using recent advances in deep reinforcement learning we find that using recurrent elements with the given framework allows an autonomous vehicle to interact with other participants, make long-term decisions, account for perception errors, and safely navigate the roundabout. We compare these to traditional, rule-based methods and simple neural-network architectures like DQN. The model-free learning POMDP framework is further extended to include the agent’s previous actions into the network architecture.
Additionally, we present multiple techniques to generalize policies across different traffic densities. The presented novel architecture involves explicitly encoding a continuous variable describing the non-stationary environment as an input to the network. We compare this to the hidden-mode method of dividing the problem into distinct modes of traffic densities and learning different policies for individual modes. These methods can also be extended to other intersection scenarios and/or to different deep-reinforcement learning formulations.

author = {Aman Khurana},
title = {Behavior Planning at Roundabouts},
year = {2019},
month = {August},
school = {},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-54},
} 2019-08-13T15:55:16-04:00