In this talk, we present Value Function-Guided MPPI, a hierarchical framework that integrates reinforcement learning (RL) with MPPI. Rather than relying on an explicit cost map, the planner uses a learned RL value function as an implicit, execution-aware objective, allowing trajectories to be evaluated based on the expected performance of the controller. MPPI performs trajectory exploration, while a pretrained RL policy executes the selected plans, creating a bidirectional feedback loop between planning and control. This removes the need for manual cost design and better aligns planning with execution dynamics. We evaluate the method in simulation and real-world field tests on a full-size Yamaha ATV, showing improved safety and goal-reaching performance in challenging terrain, and discuss sim-to-real challenges encountered.
Finally, we outline ongoing and future work that draws upon a growing body of literature reframing MPC itself as a learning problem, using RL to optimize parameters of the MPC formulation. Across the approaches discussed in this talk, we highlight key tradeoffs and design choices when integrating RL and MPC, such as sim-to-real transfer, robustness, and adaptability, thereby informing how algorithms should allocate responsibility between learning and control in real-world autonomous systems.
Jeff Schneider (co-chair)
Guanya Shi (co-chair)
Wenshan Wang
Sam Triest
