Scalable Imitation Learning for Lifelong Multi-Agent Path Finding

Master's Thesis, Tech. Report, CMU-RI-TR-25-67, August, 2025

View Publication

Abstract

With the advancement of artificial intelligence (AI) and robotics, autonomous agents are becoming an essential part of our daily lives. Future large-scale multi-agent systems—such as those envisioned in smart cities—may involve tens of thousands of ground vehicles, aerial drones, and diverse robots, all requiring highly efficient coordination.

This thesis focuses on a fundamental coordination problem underlying many such systems: Lifelong Multi-Agent Path Finding (LMAPF). LMAPF is the task of constantly planning efficient, collision-free paths for multiple agents, which get new goals every time they reach their current ones. Despite decades of research in related areas, the rapid increase in the number of agents in modern systems demands increasingly scalable algorithms—capable, for example, of coordinating thousands of mobile robots in automated warehouses operated by companies such as Amazon and Ocado.

Therefore, the main contribution of this thesis is to introduce Scalable Imitation Learning for LMAPF (SILLM), which can effectively learn a neural policy shared by up to ten thousand agents to generate high-quality single-step decisions based on their local observations. One of the key ingredients in our learning recipe is the scalable expert, a windowed anytime search-based solver named Windowed PIBT-LNS (WPL), originally proposed as a part of our winning solution in the 2023 League of Robot Runners (LoRR) competition. To emulate WPL’s decision-making logic, we carefully design a neural policy representation that integrates a novel Spatially Sensitive Communication (SSC) module that allows for precise spatial reasoning, with other heuristic search progresses on single-step collision resolution and global heuristic guidance.

Our algorithm substantially outperforms state-of-the-art learning-based solvers in both inference speed and solution quality. When planning time is limited, it also surpasses existing search-based methods, including our competition-winning solution.

Notes
https://github.com/DiligentPanda/Scalable-Imitation-Learning-for-LMAPF

BibTeX

@mastersthesis{Jiang-2025-148146,
author = {He Jiang},
title = {Scalable Imitation Learning for Lifelong Multi-Agent Path Finding},
year = {2025},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-67},
keywords = {Multi-Agent Path Finding, Imitation Learning},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.