Towards Scaling Embodied Data for Robot Learning - Robotics Institute Carnegie Mellon University

Towards Scaling Embodied Data for Robot Learning

Master's Thesis, Tech. Report, CMU-RI-TR-25-102, December, 2025

Abstract

As artificial intelligence advances quickly in the digital domain, the next frontier lies in physical intelligence: systems that learn through acting and sensing in the real world. In this thesis, we explore practical ways of scaling such embodied data across three directions. AnyCar scales synthetic data through large-scale simulation, training a universal dynamics transformer that generalizes across vehicles and environments. FACTR improves the efficiency of real robot data with a low-cost bilateral teleoperation system and a curriculum that teaches policies to integrate force and vision. DexWild scales human data through in-the-wild data collection and co-training with robot demonstrations, enabling generalization to unseen objects and environments. Together, these projects explore how a data-centric approach can enable more adaptive and capable robots.

BibTeX

@mastersthesis{Tao-2025-149669,
author = {Tony (Long) Tao},
title = {Towards Scaling Embodied Data for Robot Learning},
year = {2025},
month = {December},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-102},
}