/Discovering and Leveraging Visual Structure for Large-scale Recognition

Discovering and Leveraging Visual Structure for Large-scale Recognition

Abhinav Shrivastava
PhD Thesis, Tech. Report, CMU-RI-TR-17-63, Robotics Institute, Carnegie Mellon University, August, 2017

Download Publication (PDF)

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.


Our visual world is extraordinarily varied and complex, but despite its richness, the space of visual data may not be that astronomically large. We live in a well-structured, predictable world, where cars almost always drive on roads, sky is always above the ground, and so on. As humans, the ability to learn this structure from prior experiences is essential to our visual perception. In fact, we effortlessly (and often unconsciously) employ this structure for perceiving and responding to our surroundings; a feat that still eludes our computational systems. In this dissertation, we propose to discover and harness this structure to improve large-scale visual recognition systems.

In Part I, we present supervised recognition algorithms that can leverage these underlying regularities in our visual world. We propose effective models for object recognition that incorporate top-down contextual feedback and models that can leverage geometric-structure of objects. We also develop supervised learning and inference methods that exploit the structure offered by visual data and by a wide range of recognition tasks.

These supervised systems, limited by our ability to collect annotations, are confined to curated datasets. Therefore, in Part II, we propose to overcome this limitation by automatically discovering structure in large amounts of visual data and incorporating it as constraints in large-scale semi-supervised learning algorithms to improve visual recognition systems.

BibTeX Reference
author = {Abhinav Shrivastava},
title = {Discovering and Leveraging Visual Structure for Large-scale Recognition},
year = {2017},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-17-63},