Carnegie Mellon Robotics Institute
Santosh Kumar Divvala
doctoral dissertation, tech. report CMU-RI-TR-12-17, Robotics Institute, Carnegie Mellon University, August, 2012
|Object recognition is one of the fundamental challenges in computer vision, where the goal is to identify and localize the extent of object instances within an image. The current de facto standard for building high-performance object category detectors is the sliding window approach. This approach involves scanning an image with a ﬁxed-size rectangular window and applying a classiﬁer to the features extracted within the sub-image deﬁned by the window. In this thesis, we study two important factors inﬂuencing the performance of the approach.
First is the role played by context, where information outside the sliding window is used to rescore the detections output by the local window classiﬁer. Context helps to suppress detections in regions that are less probable to contain an object and encourages those that are more plausible. In the ﬁrst part of this thesis, we enumerate different sources and uses of context, and comprehensively evaluate their role in a benchmark detection challenge. Our analysis demonstrates that carefully used contextual cues serve not only to improve performance of local classiﬁers, but also to make their error patterns more meaningful and reasonable. Our analysis also provides a basis for assessing the inherent limitations of the existing approaches as well as the speciﬁc problems that remain unsolved.
The second factor is the role played by subcategories, where information within the sliding window is used to split the training data into smaller groups, for learning multiple classiﬁers to model the appearance of an object category. The smaller groups have reduced appearance diversity and thus lead to simpler classiﬁcation problems. In the second part of this thesis, we analyze different schemes to generate subcategories and ﬁnd that unsupervised feature-space clustering produces well-performing subcategory classiﬁers. Beyond performance gains, subcategories are attractive for their conceptual simplicity and computational tractability. For example, we ﬁnd that careful use of subcategories can potentially replace the need for deformable parts within the state-of-the-art deformable parts model detector for many object categories. Data fragmentation is an important problem associated with subcategory-based methods. We present a novel approach that circumvents this problem by allowing different subcategories to share each other’s training instances.
|Object Category Detection, Sliding Window, Context, Subcategories|
Number of pages: 145
|Santosh Kumar Divvala, "Context and Subcategories for Sliding Window Object Recognition ," doctoral dissertation, tech. report CMU-RI-TR-12-17, Robotics Institute, Carnegie Mellon University, August, 2012|
author = "Santosh Kumar Divvala",
title = "Context and Subcategories for Sliding Window Object Recognition ",
booktitle = "",
school = "Robotics Institute, Carnegie Mellon University",
month = "August",
year = "2012",
address= "Pittsburgh, PA",
|The Robotics Institute is part of the School of Computer Science, Carnegie Mellon University.|
Contact Us | Update Instructions