CMU, Meta Develop New Framework That Allows Humanoids To Learn Independently
The Breakdown
- The BFM-Zero system lets humanoid robots switch seamlessly between many skills.
- It relies on unsupervised reinforcement learning, allowing robots to explore, generalize and self-improve.
- The system moves robotics closer to general-purpose humanoids that can adapt safely in everyday environments.
***
Humanoid robots in a lab at Carnegie Mellon University don’t fall over when pushed. Instead, they run a few steps, regain their balance and return to a standing position as if nothing happened. These same robots can practice boxing, dribble a ball and even dance — all using the same underlying control system.
This new system, called BFM-Zero, marks one of the first efforts toward creating general-purpose controls for general-purpose robots, with one neural policy allowing for smooth transitions between different motions and objectives. The collaborative work propels the field closer to humanoid robots that can safely and reliably adapt to a variety of tasks.
“If we want humanoids to eventually work safely around people, whether in homes, offices or public spaces, they need to be able to recover from unexpected events gracefully and safely,” said Guanya Shi, an assistant professor in CMU’s Robotics Institute (RI). “BFM-Zero is the starting point for the adaptability humanoids need before they can accomplish real-world tasks.”
Developed by researchers at the RI and Meta, BFM-Zero acts as an all-purpose behavior foundation model (BFM) that lets humanoid robots learn and adapt far more flexibly than previous methods and traditional control systems. The framework is designed to serve as a single, all-purpose control model that can guide a robot through many different movements and tasks without needing to be retrained for each one.
BFM-Zero functions through a shared latent space, an internal workspace where the robot organizes information about motions, goals and rewards in the same format. Keeping everything in this unified space allows the model to smoothly switch between different behaviors. It enables the humanoids to perform a wide range of whole-body skills, from fall recovery and natural shock absorption to running, walking and more.
These capabilities mean humanoid robots could one day perform complex tasks safely and autonomously while reducing the need for constant human supervision. Unlike traditional reinforcement learning systems that must be trained directly on every individual behavior, BFM-Zero uses unsupervised reinforcement learning to help the robot interpret patterns and connections in its actions. The robot experiments and determines what works best to achieve its goals in the latent space, allowing it to generalize and adapt to new situations more accurately.
At CMU, Shi worked with Associate Research Professor Kris Kitani and Yitang Li, Zhengyi Luo, Tonghe Zhang, Cunxi (Jimmy) Dai and Haoyang Weng –– students and interns in the Learning and Control for Agile Robotics (LeCAR) Lab. The CMU research team was initially inspired by Meta Motivo, a behavioral foundation model that controls virtual humanoids.
To develop BFM-Zero, the CMU team worked with Motivo researchers to translate the ideas behind the virtual humanoids to real-world humanoid robots. They used their combined expertise to overcome unique robotics challenges that appear when moving from virtual environments to real robots, such as hardware constraints and the complexity of real-world physics.
First authors Li and Luo focused on developing a neural policy capable of handling thousands of motions while preventing conflicts between them. They helped design the system to maintain promptability on real-world humanoids, allowing the robot to respond to high-level instructions and interpret outcomes through a natural reward function.
“The promptability really is a key feature of the system,” Li said. “Similar to how we use prompts for large language models, users can prompt the robot with high-level goals rather than having to show it every step of every motion. BFM-Zero interprets these prompts through a natural reward function and allows the robot to plan the movement on its own.”
The team demonstrated this feature through a wide range of whole-body motions. In each scenario, the robot could optimize for rewards, track motions, reach a target position and recover naturally when something unexpected happened.
Testing these recovery abilities posed a unique challenge that did not deter the researchers. Zhang and Dai made up the real-world deployment team and focused on safety and any necessary retraining to ensure the success of BFM-Zero when used on the humanoids. They performed hands-on experiments with the robots, including pushing them to test their fall recovery abilities.
“We have seen other humanoids panic and flail violently when they fall or lose their balance, which is unsafe for humans in their proximity,” Zhang said. “We wanted to change these behaviors to make humanoids safe and reliable for future household or industry purposes.”
The real-world deployment team saw a significant improvement in how humanoids equipped with BFM-Zero recovered from failures. For example, when Dai pushed the humanoid, instead of falling, it ran a few steps and recovered its own balance –– an action the team did not have to train it to do. When the robot did fall, it smoothly and naturally stood back up without putting any team members in danger. These real-world tests highlighted just how well the system can support humanoid stability and adaptability.
To learn more about BFM-Zero, visit the project website.
For More Information: Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu