RI PhD Thesis Proposal - Yehonathan Litman - Robotics Institute Carnegie Mellon University
Loading Events

RI Event

April

21
Tue
Yehonathan Litman PhD Student Robotics Institute,
Carnegie Mellon University
Tuesday, April 21
12:30 pm to 2:00 pm
Newell-Simon Hall 4305
RI PhD Thesis Proposal – Yehonathan Litman
Date: April 21, 2026
Time:12:30-2 PM
Location: NSH 4305
Type: RI PhD Thesis Proposal
Who: Yehonathan Litman
Title: Reconstructing, Relighting, and Editing from Casual Captures
Abstract:
Virtual and augmented reality are transforming how we interact with digital content, enabling immersive experiences that blend the physical and virtual worlds. A key enabler of these technologies is the ability to virtualize real-world objects and scenes, capturing their geometry, appearance, and motion from casual photographs or videos, and rendering them under novel viewpoints and lighting conditions. This capability unlocks applications ranging from telepresence and virtual try-on to film production and interactive gaming. However, achieving high-fidelity virtualization from casual captures remains challenging: real-world lighting is baked into captured imagery, dynamic content requires temporal coherence, and editing must be both efficient and semantically meaningful.

This thesis addresses these challenges through three complementary research directions focused on leveraging large scale pretrained priors for 3D and 4D reconstruction. The first direction tackles relighting of static content. MaterialFusion introduces a 2D material diffusion prior trained on high-quality PBR assets to guide inverse rendering, enabling accurate disentanglement of geometry, materials, and lighting from multi-view images under unknown illumination. Building on this foundation, LightSwitch presents a multi-view consistent relighting diffusion framework that leverages inferred material cues to efficiently relight objects in as little as 2 minutes, matching or exceeding the quality of optimization-based methods that take hours.

The second direction addresses 4D reconstruction in the wild. Lift4D presents a test-time optimization framework for complete 4D reconstruction from monocular video, using causally conditioned image-to-3D priors and occlusion-aware supervision to handle large deformations and severe occlusions that challenge existing methods. We propose to extend this work by training a feedforward 4D reconstruction model on Lift4D outputs, enabling real-time 4D capture without test-time optimization.

The third direction focuses on video editing. EditCtrl introduces a disentangled editing framework with local and global control that achieves 10 times speedup over state-of-the-art methods while improving editing quality, enabling real-time video editing for applications such as augmented reality. We propose to extend this with an action-conditioned autoregressive framework that treats edited content as an embodied agent, enabling spatially-aware generation that responds to scene context and user actions.

Thesis Committee:
Shubham Tulsiani (Co-chair),
Fernando De la Torre, (Co-chair)
Kris Kitani,
Christian Richardt, Meta Reality Labs