Towards Off-road Autonomous Driving
Abstract
Off-road autonomous driving poses significant challenges such as navigating unmapped, variable terrain and handling uncertain, diverse dynamics. Addressing these challenges requires effective long-horizon planning and adaptable control. Many existing methods employ either Model Predictive Control (MPC) or Reinforcement Learning (RL). MPC methods rely on dense sampling and accurate dynamics models, making them computationally expensive and unsuitable for real-time long-horizon planning. In contrast, RL methods are computationally efficient at deployment but struggle with exploration in obstacle-dense and unpredictable terrains. To overcome these limitations, we propose a hierarchical autonomy pipeline consisting of a low-frequency global planner and a high-frequency local RL controller. To address the exploration challenges in RL, we introduce a teacher-student learning paradigm that enables end-to-end training of an RL policy capable of real-time control in complex environments. We present a novel policy gradient method that extends Proximal Policy Optimization (PPO), incorporating off-policy trajectories for teacher supervision and on-policy trajectories for student exploration. Our method is evaluated in a realistic off-road simulation environment and demonstrates superior performance compared to baseline RL and imitation learning approaches. It is further deployed on a high-performance real-world vehicle, showcasing its practical applicability.
BibTeX
@techreport{Wu-2025-147283,author = {Zhouchonghao Wu},
title = {Towards Off-road Autonomous Driving},
year = {2025},
month = {June},
institute = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-55},
keywords = {Reinforcement learning, Learning from Demonstrations, Autonomous driving, Off-road driving},
}