Unsupervised Learning for 3D Reconstruction and Blocks World Representation - Robotics Institute Carnegie Mellon University

Unsupervised Learning for 3D Reconstruction and Blocks World Representation

Master's Thesis, Tech. Report, CMU-RI-TR-19-29, Robotics Institute, Carnegie Mellon University, June, 2019

Abstract

Recovering the dense 3D structure of a scene from its images has been a long-standing goal in computer vision. Recent years have seen attempts of encoding richer priors into the geometry-based pipelines with the introduction of learning based methods. We argue that the form of 3D supervision required by such methods is too onerous, is not naturally available, and it is therefore of both practical and scientific interest to pursue solutions that do not rely on such 3D supervision.

In this thesis, we attempt to bridge the worlds of geometric modeling and deep learning -- how to use geometric constraints for obtaining supervisory signal for the task of reconstructing and representing the 3D world efficiently. We first present an unsupervised learning based approach for 3D reconstruction, based on a novel robust photometric consistency objective, the output of which is a 3D point cloud. When trained with our proposed learning objective, deep multi-view stereo models produce significantly better 3D reconstructions.
The proposed objective allows implicitly overcoming lighting changes and occlusions across multiple views.

In order to represent the reconstructions efficiently, we draw inspiration from Larry Roberts' famous Blocks World of 1965. We introduce a deep learning framework that enables representing 3D point clouds as an assembly of blocks giving way to a lightweight representation with a several orders of magnitude reduction in memory. We describe how geometric relationships between points and surfaces along with physical priors can be utilized to provide supervisory signal for training deep models. We also present a synthetic-to-real transfer learning setup with a differentiable matching loss that facilitates supervised learning of such blocks world representations.

BibTeX

@mastersthesis{Khot-2019-116142,
author = {Tejas Khot},
title = {Unsupervised Learning for 3D Reconstruction and Blocks World Representation},
year = {2019},
month = {June},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-29},
keywords = {3D Point Cloud, Deep Learning, 3D Reconstruction, Multi-View Geometry, Minimum Description Length, Volumetric Primitives},
}