Carnegie Mellon University
Discovering Objects and Their Location in Images

Josef Sivic, Bryan Russell, Alexei A. Efros, Andrew Zisserman, and Bill Freeman
International Conference on Computer Vision (ICCV 2005), October, 2005.

  • Adobe portable document format (pdf) (5MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic Latent Seman- tic Analysis (pLSA). In text analysis this is used to discover topics in a corpus using the bag-of-words document repre- sentation. Here we treat object categories as topics, so that an image containing instances of several categories is mod- eled as a mixture of topics. The model is applied to images by using a visual ana- logue of a word, formed by vector quantizing SIFT-like re- gion descriptors. The topic discovery approach successfully translates to the visual domain: for a small set of objects, we show that both the object categories and their approx- imate spatial layout are found without supervision. Per- formance of this unsupervised method is compared to the supervised approach of Fergus et al. [8] on a set of unseen images containing only one object per image. We also extend the bag-of-words vocabulary to include `doublets' which encode spatially local co-occurring re- gions. It is demonstrated that this extended vocabulary gives a cleaner image segmentation. Finally, the classifi- cation and segmentation methods are applied to a set of images containing multiple objects per image. These re- sults demonstrate that we can successfully build object class models from an unsupervised analysis of images.

Associated Center(s) / Consortia: Vision and Autonomous Systems Center
Number of pages: 8

Text Reference
Josef Sivic, Bryan Russell, Alexei A. Efros, Andrew Zisserman, and Bill Freeman, "Discovering Objects and Their Location in Images," International Conference on Computer Vision (ICCV 2005), October, 2005.

BibTeX Reference
   author = "Josef Sivic and Bryan Russell and Alexei A. Efros and Andrew Zisserman and Bill Freeman",
   title = "Discovering Objects and Their Location in Images",
   booktitle = "International Conference on Computer Vision (ICCV 2005)",
   month = "October",
   year = "2005",