RI Ph.D. Thesis Defense – Mrinal Verghese
Date: 7th April 2026
Time: 10:00 a.m. (ET)
Location: NSH 3305
Zoom: Link
Type: Ph.D. Thesis Defense
Who: Mrinal Verghese
Title: Strategies for Robot Learning from Human Data
Time: 10:00 a.m. (ET)
Location: NSH 3305
Zoom: Link
Type: Ph.D. Thesis Defense
Who: Mrinal Verghese
Title: Strategies for Robot Learning from Human Data
Abstract:
Robot learning is fundamentally data-constrained. Internet-scale human data is a promising source of additional data about human environments, tasks, and common skills. This data comes in diverse representations, such as human videos and Large Language Models, and can provide supervision and priors at all levels of robot reasoning and control. However, leveraging this data for robot learning is not trivial. Regardless of the representation, human data often lacks physical details and contains drastically different embodiments and environments from our target robot deployments.
In this thesis, I argue that one of the keys to effectively leveraging human data for robot learning lies in identifying appropriate features and modalities in this human data for each level of robot reasoning. My work on robot learning from human data follows a three-step process: analyze deployed robot learning systems augmented with human data on common tasks, identify appropriate features and failure modes, and integrate models trained on these appropriate features into existing robot learning and reasoning paradigms. This thesis demonstrates these strategies across both task planning and skill learning domains. In task planning, we identify high-level language representations of visual details as good features and design an approach that leverages Bayesian reasoning about information gain to better ground LLM-based planners to their environment. In skill learning, we identify visual motion representations as good features and present an approach that uses dense reward signals learned from human video to rapidly improve robot performance in real-world experiments. Across our experiments, we observe that for a robotics subproblem, appropriate features and representations from human data for that subproblem are often ones that have a similar level of abstraction. For example, we found high-level language details to be the most effective history representations for high-level task planning with LLMs and low-level visual motion features to most effectively capture salient information from human demonstrations when training robot visuomotor policies for skills. In addition to the specific methods and approaches presented here, the work in this thesis offers a general strategy for robot learning from diverse human data.
Thesis Committee Members:
Christopher Atkeson (Chair)
Oliver Kroemer
Dave Held
Ruta Desai (ex-FAIR)
