Watch, Predict, Act: Robot Learning meets Web Videos
Abstract: To enable robots to assist in everyday tasks in diverse natural environments such as homes, offices, and kitchens, it is critical to develop policies that generalize to novel tasks in unseen scenarios. Practical utility demands that these policies do not require task-specific adaptation at test time but can instead execute directly given a natural [...]
Towards Robust Informative Path Planning for Spatiotemporal Environment Prediction
Abstract: Informative Path Planning (IPP) is an important planning paradigm for various real-world robotic applications such as wildfire monitoring and predicting infection spread in crops. IPP involves planning a path that can learn an accurate belief of the quantity of interest, while adhering to planning constraints. Traditional IPP methods are effective only in static, time-invariant [...]
Semantics-Driven Perception and Manipulation for Agricultural Robotics
Abstract: With growing expectations for autonomous robot deployment in unstructured, real-world environments, these systems must operate efficiently while perceiving and interpreting complex scenes to navigate dynamic, cluttered conditions. Robust performance in these settings require handling occlusions, clutter, and ambiguous visual cues; challenges exacerbated by the limited semantic understanding in standard visuomotor policy frameworks. This thesis [...]
Towards Dexterous Robotic Manipulation by Imitating Experts
Abstract: Imitation learning enables scalable transfer of complex manipulation skills to robots, but its effectiveness depends on high-quality demonstrations and robust policy learning, especially in dynamic, contact-rich environments. This thesis investigates how combining imitation learning with teleoperation and classical planners can teach dexterous manipulation across diverse real-world settings. We develop a teleoperation system for collecting [...]
Unified Predictive Representations for Generalized Robotic Perception
Abstract: Building robots that can perceive, reason, and act across a wide range of objects and environments remains a central goal in robotics. To achieve such generalization without relying on large amounts of task-specific data, predicting future outcomes in response to actions is a core capability towards generalized robotics. In this thesis, we investigate how to [...]
Structured and Adaptive Real2Sim2Real RL for Humanoid Whole-Body Control and Loco-Manipulation
Abstract: Humanoid robots offer two unparalleled advantages in general-purpose embodied intelligence. First, humanoids are built as generalist robots that can potentially do all the tasks humans can do in complex environments. Second, the embodiment alignment between humans and humanoids allows for the seamless integration of human cognitive skills with versatile humanoid capabilities. To fully unleash the [...]
Enhancing Concept-Based Decision Making in AI Models with Disentanglement
Abstract: Deploying AI in high-stakes settings requires models that are not only accurate but also interpretable and amenable to human oversight. Concept Bottleneck Models (CBMs) support these goals by structuring predictions around human-understandable concepts, enabling interpretability and post-hoc human intervenability. However, CBMs rely on a ‘complete’ concept set, requiring practitioners to define and label enough concepts [...]
Learning to Generalize via Human Manipulation Priors
Abstract: Generalization is a core challenge in robotics, where the goal is to enable robots to handle novel objects, environments, and embodiments with minimal additional data. This thesis explores how human prior knowledge, captured through both passive observation and active demonstration, can be leveraged to improve generalization in manipulation tasks. We propose two complementary approaches that scale robot learning leveraging large-scale human-derived data. First, we introduce HRP (Human Affordances for Robotic Pre-Training), where we learn actionable visual representations by extracting hand trajectories, contact points, and object labels from internet-scale human videos. These representations, when used to initialize control policies, lead to significant performance gains in downstream robot manipulation tasks and transfer effectively across viewpoints and robot morphologies. Second, we present DexWild (Dexterous Human Interactions for In-the-Wild Robot Policies), a system that collects high-fidelity in-the-wild demonstrations using a human motion-capture device. A human-robot co-training algorithm combines this diverse human data with limited robot data, enabling robust policy transfer to unseen scenes, robot arms, and hands. [...]
Accelerating Video Understanding and Generation at Scale
Abstract: While image understanding, generation, and manipulation have matured rapidly in recent years, video remains challenging due to the significantly larger input size. As a result, tasks such as generating long videos or understanding extended video sequences remain out of reach for current models due to their computational cost. This talk presents a series of [...]
Parameter-Efficient Neuro-Symbolic Action Anticipation via Iterative Context Refinement
Abstract: As robots and intelligent systems increasingly interact with humans, the ability to understand users by anticipating their actions becomes significantly more important. Current approaches to action anticipation leverage the inference capabilities of large foundational models but are limited in their application by the complexity and resource requirement, as well as the difficulty of training. [...]
Incrementally Learned Shared Autonomy and Intelligent Mode Switching for Assistive Teleoperation
Abstract: Assistive robotic systems have the potential to significantly enhance autonomy and independence for individuals with physical disabilities. However, existing shared autonomy frameworks typically rely on static policies trained offline, which fail to adapt effectively when encountering unforeseen environmental variations or evolving user behaviors. Additionally, teleoperating high degrees-of-freedom (DoF) robotic manipulators through low-dimensional (low-DoF) user [...]