Abstract:
Off-road autonomous driving poses significant challenges such as navigating unmapped, variable terrain with uncertain and diverse dynamics. Addressing these challenges requires effective long-horizon planning and adaptable control. Model Predictive Control (MPC) methods rely on dense sampling and accurate dynamics models, making them computationally expensive and unsuitable for real-time long-horizon planning. In contrast, Reinforcement Learning (RL) methods are computationally efficient at deployment but struggle with exploration in obstacle-dense and unpredictable terrains. To overcome these limitations, this thesis proposes a hierarchical autonomy pipeline consisting of a low-frequency global planner and a high-frequency local RL controller. To address the exploration challenges in RL, it introduces a teacher-student learning paradigm that enables end-to-end training of an RL policy capable of real-time control in complex environments. It presents a novel policy gradient method that extends Proximal Policy Optimization (PPO), incorporating off-policy trajectories for teacher supervision and on-policy trajectories for student exploration. The proposed system is validated in a high-fidelity off-road simulation environment, demonstrating superior performance over standard RL and imitation learning baselines. It is further deployed on a high-performance real-world vehicle, showcasing its practical applicability.
Committee:
Jeff Schneider (advisor)
Guanya Shi
Samuel Triest
