Correspondence-Preserving Transformers for Scalable 3D Lifting - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Defense

February

5
Thu
Mosamkumar Dabhi PhD Student Robotics Institute,
Carnegie Mellon University
Thursday, February 5
5:00 pm to 6:30 pm
Newell-Simon Hall 4305
Correspondence-Preserving Transformers for Scalable 3D Lifting
Abstract:

Takeo Kanade’s famous quip – to infer geometry or motion from images, you must first know what in one image corresponds to what in another, has guided geometric vision for three decades.

Deep learning seemed to bypass this: methods in 2017-2019 lifted 2D to 3D using only reprojection loss, exploiting an implicit bias toward smooth solutions. But these methods didn’t scale – each category needed its own architecture. Transformers promised scalability, yet with 2D-only supervision, naively scaling transformers often fails. The field concluded that scalable 3D learning requires massive 3D supervision (VGGT, Depth Anything).

This thesis asks: what went wrong, and can we recover 2D-only learning in the transformer era?

The answer: preserving correspondence, not adding supervision is what unlocks scale. Transformers scale through selective attention, but 3D lifting requires preserving every correspondence – these goals can conflict under standard architectures. We resolve this with an architectural principle that preserves correspondence throughout the network, achieving 12x improvement and matching full supervision with zero 3D labels. The result is 2D-LFM (2D Lifting Foundation model): a single model lifting 45+ categories to 3D using only 2D observations. The framework extends to template-free dense reconstruction (RAT4D).


This thesis shows that Kanade’s classical insights remain crucial in the modern foundation model era, and that understanding why correspondence matters unlocks a different path: 3D foundation models trained on the widely available 2D observations the world already provides.

Thesis Committee Members:
Simon Lucey and László A. Jeni, Co-chairs
Katerina Fragkiadaki
Jason Saragih, Meta

Link to thesis draft