Seminar
From Video Generation to Video World Models
Abstract: Video diffusion models have achieved remarkable success in content creation, yet they still fall short of simulating interactive worlds that respond to users in real time. This talk examines the fundamental challenges preventing these models from evolving into true world simulators. I will present a series of works — CausVid, Self-Forcing, MotionStream, and State-Space [...]
Just Asking Questions
Abstract: In the age of deep networks, "learning" almost invariably means "learning from examples". We train language models with human-generated text and labeled preference pairs, image classifiers with large datasets of images, and robot policies with rollouts or demonstrations. When human learners acquire new concepts and skills, we often do so with richer supervision, especially [...]
How to Coordinate Thousands of Robots Efficiently and Robustly
Abstract: Large-scale robot fleets are increasingly deployed in warehouses, factories, transportation systems, and emerging robotics applications. Coordinating hundreds or thousands of robots in shared, cluttered spaces creates fundamental challenges in maintaining safety, preventing deadlocks, and minimizing congestion. In this talk, I will present our recent work on scalable imitation learning methods for coordinating 10k robots, automatic environment [...]
What Can We Learn from a Million Models?
Abstract: Machine learning has transformed many fields by learning from large collections of data. Yet, it is rarely applied to its own outputs: the models themselves. Today, with millions of publicly available models, a natural question arises: what can we do with so many models? In this talk, I will motivate two core applications that [...]
Should we skip attention?
Abstract: Transformers are ubiquitous. They influence nearly every aspect of modern AI. However, the mechanics of their training remain poorly understood. This poses a problem for the field due to the immense amounts of data, computational power, and energy being invested in the training of these networks. I highlight a recent intriguing empirical result from [...]