Abstract:
Off-road autonomous driving presents a complex set of challenges, including navigation through unmapped environments, variable terrain geometries, and uncertain, non-stationary dynamics. These conditions demand planning and control strategies that are both long-horizon and adaptable. Traditional Model Predictive Control (MPC) methods rely on dense sampling and precise dynamics modeling, which limits their feasibility for real-time planning in unstructured terrains. In contrast, Reinforcement Learning (RL) approaches offer fast execution but suffer from poor exploration efficiency, particularly in obstacle-dense and dynamically diverse settings.
This thesis proposes a hierarchical autonomy framework that integrates a low-frequency, long-horizon planner with a high-frequency, reactive RL-based controller. To overcome the exploration limitations of RL, the thesis introduces a novel teacher-student training paradigm. A teacher policy, trained off-policy using expert trajectories or heuristics, guides the learning process of a student policy trained on-policy. The thesis further extends the Proximal Policy Optimization (PPO) algorithm with a new hybrid policy gradient formulation that effectively leverages off-policy guidance alongside stable on-policy updates.
The proposed approach is validated in a realistic off-road simulation environment and benchmarked against standard RL and imitation learning baselines, showing improved terrain traversal and obstacle avoidance. Additionally, the trained policy is deployed on Sabrecat, a full-scale autonomous off-road ground vehicle. Experimental results demonstrate successful real-time execution, robust obstacle avoidance, and generalization to novel, complex terrains. This thesis contributes a practical and scalable solution to long-horizon off-road autonomy by combining hierarchical planning and guided reinforcement learning.
Committee:
Prof. Jeff Schneider (advisor)
Prof. Wennie Tabib
Brian Yang
Zoom Meeting ID: 928 0552 1791 | Passcode: 15232
