Title: Leveraging Geometric Priors for Robust Robotic Manipulation - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Defense

October

24
Fri
Shun Iwase PhD Student Robotics Institute,
Carnegie Mellon University
Friday, October 24
9:00 am to 11:00 am
Gates Hillman Center 6115
Title: Leveraging Geometric Priors for Robust Robotic Manipulation

Abstract:

This thesis explores how explicit 3D geometric representations, trained at scale on synthetic data, can serve as priors to enhance robotic manipulation. Even with recent progress in geometric understanding, generalization to unseen objects and environments remains constrained by the scale and diversity of existing 3D training data. Although more large-scale 3D datasets have been released, their sizes are still considerably smaller than their image and language counterparts. In addition, collecting diverse real-world 3D data is time-consuming and labor-intensive, limiting the coverage of objects and scenes. To tackle this challenge, this thesis explores how geometric understanding learned from large-scale synthetic 3D model datasets can improve generalization in robotic manipulation without further expanding real-world 3D training data.

As a first step, Chapter 2 introduces RePOSE, a fast and accurate 6D object pose refinement method that establishes a foundation for scalable geometric perception. Chapter 3 and Chapter 4 frame the acquisition of a universal geometric prior as a supervised learning problem on 3D geometry tasks. We propose two frameworks, OctMAE and ZeroGrasp, which learn a geometric prior through shape reconstruction and grasp pose prediction. We also introduce ZeroGrasp-11B, a large-scale synthetic dataset containing 1M RGB-D images, 12K 3D models, and 11B grasps, specifically designed for training such models. These methods achieve state-of-the-art performance on both shape reconstruction and grasp pose prediction of unseen objects on public benchmarks, demonstrating the strength of the learned geometric prior. Real-world pick-and-place experiments further validate its generalization to practical robotic scenarios.

While the learned geometric prior shows strong performance in pick-and-place tasks, robotic manipulation involves a broader range of behaviors and longer temporal horizons. In Chapter 5, we focus on integrating this prior into imitation learning to address more complex, long-horizon tasks. To this end, we propose GeoFlow, a framework for flow-based 3D visuomotor policy learning that leverages geometry-aware pre-trained models as strong priors. GeoFlow achieves state-of-the-art performance across diverse benchmarks and demonstrates improved data efficiency and robustness under clutter and distractors, highlighting that large-scale geometry pre-training and sparse voxel representations are key to scalable and generalizable robotic learning.

Thesis Committee:

Kris Kitani, Chair

David Held

Shubham Tulsiani

Sergey Zakharov, Toyota Research Institute

Document Link