Advancing 3D Semantic and Geometric Reasoning - Robotics Institute Carnegie Mellon University
Loading Events

MSR Thesis Defense

April

25
Fri
Pujith Kachana MSR Student Robotics Institute,
Carnegie Mellon University
Friday, April 25
9:00 am to 10:00 am
GHC 6115
Advancing 3D Semantic and Geometric Reasoning
Abstract:
Recent advances in foundation models have dramatically improved reasoning over language, vision, and decision-making for autonomous systems. However, extending this intelligence to embodied agents requires bridging the gap between abstract 2D understanding and grounded 3D interaction—a challenge driven by limited 3D data and the inherent complexity of spatial reasoning. This work addresses the problem by decomposing it into two parts: 3D semantic reasoning, or understanding and reasoning about meaning over native 3D data like point clouds, and 3D geometric reasoning, or inferring 3D structure from 2D observations.

To support semantic understanding, we introduce VLA-3D and IRef-VLA—a benchmark and dataset for vision-language alignment and referential grounding in 3D scenes. We also propose SORT3D, a method that leverages the reasoning abilities of pretrained vision-language models for 3D tasks. Additionally, we explore how foundational 2D features can bootstrap semantic understanding in 3D environments. For geometric reasoning, we highlight the role of camera models in 3D understanding and present VOLNet, a learning-based visual-LiDAR odometry model that demonstrates how multimodal grounding between 2D and 3D can enhance geometric reasoning. Finally, we explore emerging 3D foundation models and their potential to unify and advance diverse 3D reasoning capabilities. Through comprehensive evaluations, we show that our datasets and methods advance 3D reasoning and help bridge the gap between abstract understanding and real-world physical environments.

Committee:
Prof. Ji Zhang (advisor)

Prof. Wenshan Wang (advisor)
Prof. Shubham Tulsiani
Brian Yang