Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding

Derek Hoiem
doctoral dissertation, tech. report CMU-RI-TR-07-28, Robotics Institute, Carnegie Mellon University, August, 2007


Download
  • Adobe portable document format (pdf) (6MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
When humans look at an image, they see not just a pattern of color and texture, but the world behind the image. In the same way, computer vision algorithms must go beyond the pixels and reason about the underlying scene. In this dissertation, we propose methods to recover the basic spatial layout from a single image and begin to investigate its use as a foundation for scene understanding.

Our spatial layout is a description of the 3D scene in terms of surfaces, occlusions, camera viewpoint, and objects. We propose a geometric class representation, a coarse categorization of surfaces according to their 3D orientations, and learn appearance-based models of geometry to identify surfaces in an image. These surface estimates serve as a basis for recovering the boundaries and occlusion relationships of prominent objects. We further show that simple reasoning about camera viewpoint and object size in the image allows accurate inference of the viewpoint and greatly improves object detection. Finally, we demonstrate the potential usefulness of our methods in applications to 3D reconstruction, scene synthesis, and robot navigation.

Scene understanding from a single image requires strong assumptions about the world. We show that the necessary assumptions can be modeled statistically and learned from training data. Our work demonstrates the importance of robustness through a wide variety of image cues, multiple segmentations, and a general strategy of soft decisions and gradual inference of image structure. Above all, our work manifests the tremendous amount of 3D information that can be gleaned from a single image. Our hope is that this dissertation will inspire others to further explore how computer vision can go beyond pattern recognition and produce an understanding of the environment.


Notes
Number of pages: 153

Text Reference
Derek Hoiem, "Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding," doctoral dissertation, tech. report CMU-RI-TR-07-28, Robotics Institute, Carnegie Mellon University, August, 2007

BibTeX Reference
@phdthesis{Hoiem_2007_5825,
   author = "Derek Hoiem",
   title = "Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding",
   booktitle = "",
   school = "Robotics Institute, Carnegie Mellon University",
   month = "August",
   year = "2007",
   number= "CMU-RI-TR-07-28",
   address= "Pittsburgh, PA",
}