Coherent Scene Understanding with 3D Geometric Reasoning - Robotics Institute Carnegie Mellon University

Coherent Scene Understanding with 3D Geometric Reasoning

PhD Thesis, Tech. Report, CMU-RI-TR-14-06, Robotics Institute, Carnegie Mellon University, April, 2014

Abstract

When looking at a single 2D image of a scene, humans could effortlessly un- derstand the 3D world behind the scene even though stereo and motion cues are not available. Due to this remarkable human capability, one of the ultimate goals of computer vision is to enable machines to automatically infer the 3D structure of a scene given a single 2D image. This dissertation proposes methods that produce a geometrically and semantically coherent 3D interpretation of urban scenes from a single image, and shows the benefits of reasoning in 3D when analyzing 2D images. In this dissertation, we model an urban scene using three types of elements. The first type is global geometries such as ground plane and gravity direction. The second type is objects such as cars and pedestrians that have definitive shapes and extents. The third type is vertical surfaces such as building facades that do not have definitive shapes and extents. Such a modeling allows for a richer characterization of an urban scene than existing works. To tackle the inherent ambiguity involved in recovering the 3D structure from a single 2D image, we systematically identify geometric constraints among the three types of elements in our model, and encode such constraints in a Conditional Ran- dom Field (CRF). For objects, we consider both their global geometric compatibil- ity with ground plane and gravity direction, and their local geometric compatibility between adjacent objects. For building facades, we decompose them into a set of continuously-oriented planes mutually related by 3D geometric relationships, and constrained by nearby objects in 3D. We also propose a generalized RANSAC al- gorithm to make the inference of the model tractable. We show that performing 3D geometric reasoning using our model benefits individual tasks such as object detec- tion, viewpoint estimation, and facade layout recovery. In addition, it yields a more informative interpretation of the 3D scene behind the image.

BibTeX

@phdthesis{Pan-2014-7856,
author = {Jiyan Pan},
title = {Coherent Scene Understanding with 3D Geometric Reasoning},
year = {2014},
month = {April},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-14-06},
}