A research group at Carnegie Mellon University’s Robotics Institute (RI) will host the latest phase of the Vision-Language-Navigation (VLN) Challenge, bringing researchers together to enable robots to understand and act on human instructions in the real world — one of robotics’ most famously difficult pursuits.
This year’s challenge pushes beyond earlier versions by removing “ground truth”, which is predefined information about object identities and locations. Instead, teams must design systems that interpret raw sensor data and make sense of complex 3D environments independently. The goal is to move away from relying on perfectly structured datasets and toward robots that understand the world in a more human-like way.
“A major focus in this next phase is on semantic and spatial reasoning,” said Jingfan Tang, an RI exchange student from Shanghai Jiao Tong University helping to coordinate the challenge. “In simple terms, robots need to understand both what something is and where it is. For example, a hallway isn’t just a narrow space. It connects different rooms and shapes how people move through a building. By learning these relationships, robots can make more context-aware decisions.”
Teams will develop systems that allow robots to interpret natural-language instructions and navigate unfamiliar environments. Rather than relying on predefined maps or labeled locations, robots must explore these environments, identify meaningful objects and understand how spaces relate to one another. The RI provides a robotic platform equipped with 3D light detection and ranging technology and a 360-degree camera, while participating teams focus on building the reasoning systems that guide robot decision-making. For 2026, the competition will begin in a custom simulation environment before transitioning to real-robot testing later in the challenge.
“The Robotics Institute is particularly well positioned to host the challenge, given its long-standing emphasis on building systems that operate beyond controlled lab settings and its expertise in integrating vision, language and navigation into real-world robotics platforms,” said Ji Zhang, RI systems scientist and advisor of the research group.
One of the other goals of the challenge is to help set a new global standard for physical AI, ensuring that intelligence translates into reliable, autonomous action in the unstructured environments of tomorrow.
“The impact of this work goes beyond research,” Zhang said. “Advances in vision-language navigation could lead to more capable home assistants, improved search-and-rescue robots and smarter tools for industry. Ultimately, the challenge is about creating robots that don’t just follow commands, but truly understand the environments they operate in.”
The VLN Challenge will conclude with a workshop at the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), held in Pittsburgh this year, where teams will present their results.
Researchers interested in participating can visit the VLN Challenge website to register and access additional details.
For More Information: Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu