Carnegie Mellon Robotics Institute
Derek Hoiem
doctoral dissertation, tech. report CMU-RI-TR-07-28, Robotics Institute, Carnegie Mellon University, August, 2007
| Download |
|
| Abstract |
| When humans look at an image, they see not just a pattern of color and texture, but the world behind the image. In the same way, computer vision algorithms must go beyond the pixels and reason about the underlying scene. In this dissertation, we propose methods to recover the basic spatial layout from a single image and begin to investigate its use as a foundation for scene understanding.
Our spatial layout is a description of the 3D scene in terms of surfaces, occlusions, camera viewpoint, and objects. We propose a geometric class representation, a coarse categorization of surfaces according to their 3D orientations, and learn appearance-based models of geometry to identify surfaces in an image. These surface estimates serve as a basis for recovering the boundaries and occlusion relationships of prominent objects. We further show that simple reasoning about camera viewpoint and object size in the image allows accurate inference of the viewpoint and greatly improves object detection. Finally, we demonstrate the potential usefulness of our methods in applications to 3D reconstruction, scene synthesis, and robot navigation. Scene understanding from a single image requires strong assumptions about the world. We show that the necessary assumptions can be modeled statistically and learned from training data. Our work demonstrates the importance of robustness through a wide variety of image cues, multiple segmentations, and a general strategy of soft decisions and gradual inference of image structure. Above all, our work manifests the tremendous amount of 3D information that can be gleaned from a single image. Our hope is that this dissertation will inspire others to further explore how computer vision can go beyond pattern recognition and produce an understanding of the environment. |
| Notes |
Number of pages: 153 |
| Text Reference |
| Derek Hoiem, "Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding," doctoral dissertation, tech. report CMU-RI-TR-07-28, Robotics Institute, Carnegie Mellon University, August, 2007 |
| BibTeX Reference |
|
@phdthesis{Hoiem_2007_5825, author = "Derek Hoiem", title = "Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding", booktitle = "", school = "Robotics Institute, Carnegie Mellon University", month = "August", year = "2007", number= "CMU-RI-TR-07-28", address= "Pittsburgh, PA", } |
| The Robotics Institute is part of the School of Computer Science, Carnegie Mellon University. Contact Us | Update Instructions |