Exemplar-based Representations for Object Detection, Association and Beyond

Tomasz Malisiewicz
doctoral dissertation, tech. report CMU-RI-TR-11-32, Robotics Institute, Carnegie Mellon University, August, 2011


Download
  • Adobe portable document format (pdf) (35MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
Recognizing and reasoning about the objects found in an image is one of the key problems in computer vision. This thesis is based on the idea that in order to understand a novel object, it is often not enough to recognize the object category it belongs to (i.e., answering “What is this?”). We argue that a more meaningful interpretation can be obtained by linking the input object with a similar representation in memory (i.e., asking “What is this like?”). In this thesis, we present a memory-based system for recognizing and interpreting objects in images by establishing visual associations between an input image and a large database of object exemplars. These visual associations can then be used to predict properties of the novel object which cannot be deduced solely from category membership (e.g., which way is it facing? what is its segmentation? is there a person sitting on it?).

Part I of this thesis is dedicated to exemplar representations and algorithms for creating visual associations. We propose Local Distance Functions and Exemplar-SVMs, which are trained separately for each exemplar and allow an instance-specific notion of visual similarity. We show that an ensemble of Exemplar-SVMs performs competitively to state-of-the-art on the PASCAL VOC object detection task. In Part II, we focus on the advantages of using exemplars over a purely category-based approach. Because Exemplar-SVMs show good alignment between detection windows and their associated exemplars, we show that it is possible to transfer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc.) directly onto the detections, which can then be used as part of overall scene understanding. Finally, we construct a Visual Memex, a vast graph over exemplars encoding both visual as well as spatial relationships, and apply it to an object prediction task. Our results show that exemplars provide a better notion of object context than category-based approaches.

Notes
Number of pages: 130

Text Reference
Tomasz Malisiewicz, "Exemplar-based Representations for Object Detection, Association and Beyond ," doctoral dissertation, tech. report CMU-RI-TR-11-32, Robotics Institute, Carnegie Mellon University, August, 2011

BibTeX Reference
@phdthesis{Malisiewicz_2011_6911,
   author = "Tomasz Malisiewicz",
   title = "Exemplar-based Representations for Object Detection, Association and Beyond ",
   booktitle = "",
   school = "Robotics Institute, Carnegie Mellon University",
   month = "August",
   year = "2011",
   number= "CMU-RI-TR-11-32",
   address= "Pittsburgh, PA",
}