MSR Thesis Presentation
Structured Policies for Efficient Knowledge-Guided Learning from Humans
Abstract: Imitation learning has achieved strong performance in sequential decision-making tasks, but typically requires large numbers of expert demonstrations, has limited generalization capability in unseen scenarios, and is challenging for laypeople without technical backgrounds. This thesis introduces structured policies, a framework that integrates human domain knowledge into imitation learning by using large language models (LLMs) to generate semantically meaningful policy structures from [...]
GRAPPA: Generalizing and Adapting Robot Policies via Online Agentic Guidance
Abstract: Robot learning approaches such as behavior cloning and reinforcement learning have shown great promise in synthesizing robot skills from human demonstrations in specific environments. However, these approaches often struggle to generalize to unseen real-world settings because they rely on task-specific demonstrations or complex simulators. While foundation models (e.g., LLMs, VLMs) offer rich semantic understanding [...]
UFM: A Simple Path towards Unified Dense Correspondence with Flow
Abstract: Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this talk, we develop a Unified Flow & Matching model (UFM), which [...]
Carnegie Mellon University
Regression-based Multi-view Face Synthesis
Abstract: Synthesizing photorealistic human faces from novel viewpoints using only a single frontal image remains a challenging problem in computer vision. Large viewpoint changes introduce geometric distortions, self-occlusions, and missing visual information, making identity preservation and high-frequency detail reconstruction particularly difficult. While recent generative approaches such as diffusion models and 3D-aware neural representations produce visually [...]
Learning Dynamic Rope Manipulation with Task-Level Iterative Learning Control
Abstract: Dynamic manipulation of deformable objects is challenging for humans and robots because they have infinite degrees of freedom and exhibit underactuated dynamics. This thesis introduces a Task-Level Iterative Learning Control method for dynamic manipulation of deformable objects and demonstrates this method on a non-planar rope manipulation task called the flying knot. Using a single human [...]
Doppler Velocity Imaging Sonar
Abstract: Underwater robotics applications require accurate velocity sensing to enable long-term dead reckoning navigation in the absence of GPS or visual features. Velocity is typically measured with a Doppler Velocity Log, which measures the Doppler frequency shift induced by the motion of the sensor along four beams, from which the sensor velocity can be recovered. We expand upon [...]
Learned Metrics-Aware Covariance for Visual-Inertial Fusion
Abstract: Visual-inertial state estimation integrates cameras and inertial measurement units (IMUs) to achieve accurate, metric-scale state estimation for autonomous systems. The covariance matrices associated with visual and inertial measurements determine how the estimator weights each sensing modality, making correct covariance modeling critical for fusion accuracy and consistency. However, most existing VI state estimators rely on [...]
Observational Study to Inform Wound Care Robotics Design
Abstract: The integration of robotics and assistive technology into wound care offers a promising solution to growing patient demand amid a global nursing shortage. While assistive technologies, including robotics for dressing removal and AI for wound measurement, have shown promise for isolated tasks, research has yet to evaluate nurse wound care practices from a technology [...]
Pairwise 3D Human Object Contact Estimation
Abstract: Understanding real-world human-object interactions in images is an inherently many-to-many problem, where disentangling fine-grained and concurrent physical contacts is particularly challenging. Existing semantic contact estimation methods are either limited to single-human settings or require object geometry (e.g., meshes) in addition to the input image. Current state-of-the-art method leverages a powerful VLM for category-level [...]
Learning Generalizable Robot Skills from Diverse Data Sources and Modalities
Abstract: Robust robot behavior in real-world environments requires generalization across diverse objects, scenes, and embodiments despite limited training data. This thesis studies how different sources and modalities of data can improve different forms of robot generalization. It explores three complementary directions: force information for object-level generalization in contact-rich manipulation, human demonstration data for environment- and [...]