Home/Unsupervised Learning for 3D Reconstruction and Blocks World Representation

Unsupervised Learning for 3D Reconstruction and Blocks World Representation

Tejas Khot
Master's Thesis, Tech. Report, CMU-RI-TR-19-29, June, 2019

Download Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Recovering the dense 3D structure of a scene from its images has been a long-standing goal in computer vision. Recent years have seen attempts of encoding richer priors into the geometry-based pipelines with the introduction of learning based methods. We argue that the form of 3D supervision required by such methods is too onerous, is not naturally available, and it is therefore of both practical and scientific interest to pursue solutions that do not rely on such 3D supervision.

In this thesis, we attempt to bridge the worlds of geometric modeling and deep learning — how to use geometric constraints for obtaining supervisory signal for the task of reconstructing and representing the 3D world efficiently. We first present an unsupervised learning based approach for 3D reconstruction, based on a novel robust photometric consistency objective, the output of which is a 3D point cloud. When trained with our proposed learning objective, deep multi-view stereo models produce significantly better 3D reconstructions.
The proposed objective allows implicitly overcoming lighting changes and occlusions across multiple views.

In order to represent the reconstructions efficiently, we draw inspiration from Larry Roberts’ famous Blocks World of 1965. We introduce a deep learning framework that enables representing 3D point clouds as an assembly of blocks giving way to a lightweight representation with a several orders of magnitude reduction in memory. We describe how geometric relationships between points and surfaces along with physical priors can be utilized to provide supervisory signal for training deep models. We also present a synthetic-to-real transfer learning setup with a differentiable matching loss that facilitates supervised learning of such blocks world representations.

author = {Tejas Khot},
title = {Unsupervised Learning for 3D Reconstruction and Blocks World Representation},
year = {2019},
month = {June},
school = {},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-29},
keywords = {3D Point Cloud, Deep Learning, 3D Reconstruction, Multi-View Geometry, Minimum Description Length, Volumetric Primitives},
} 2019-06-20T14:26:56-04:00