Abstract:
As robots are increasingly deployed in real-world environments, their perception systems face growing demands. Tasks such as tracking and manipulation require maps with both high spatial fidelity and detailed object-level organization, which must be delivered faster to support timely decision-making and control. Concurrently, advances in vision foundation models allow us to build powerful prediction tools to process raw sensor data into useful cues for scene understanding. There is an opportunity to design a new generation of perception systems that are better able to capture fine geometric detail while providing richer descriptions of scene content.
In current 3D reconstruction pipelines, instance-level information is often included in a decoupled manner. In some systems, geometric regions are associated with object instances, but segmentation labels are typically noisy and thus are not used to refine the reconstruction itself. Another common approach is to perform instance labeling as a post-processing step. I posit that 3D reconstruction and instance segmentation are complementary tasks. Knowledge of object boundaries can improve surface estimation, and accurate geometry can guide instance prediction. Can we produce richer maps by solving these tasks concurrently?
In this thesis, I explore 3D surface reconstruction and instance segmentation, along with the integration of auxiliary inputs from prediction models. While these tasks are ultimately complementary, we begin by investigating them separately to better understand their individual challenges and to develop solutions well-suited for later integration.
First, I address the segmentation of scene data and introduce methods for propagating information across different sensing modalities. Next, I present a neural surface reconstruction method centered on a hybrid geometry representation that captures high-fidelity surface details effectively, while converging faster and requiring significantly less runtime than prior approaches.
Finally, I demonstrate how this reconstruction system can be enriched using predictions from monocular instance segmentation and 3D perception models. This unified approach to segmentation and reconstruction balances perceptual coherence and metric precision, and enables new techniques for refining surface geometry.
Thesis Committee Members:
Michael Kaess, (Chair)
Deva Ramanan
Shubham Tulsiani
Joshua Mangelson (Brigham Young University)
