PhD Thesis Defense
Carnegie Mellon University
11:00 am - 12:00 pm
Machine learning models have led to remarkable progress in visual recognition. A key factor driving this progress is the abundance of labeled data. Unfortunately, this reliance on lots of labeled data is also a key limitation in the rapid development and deployment of vision systems. These visual recognition systems show poor performance on concepts with limited data. Also, as these models are passive and are just “fed” lots of supervision, they lack the ability to actively seek supervision and improve their own performance. This hurts their adaptability and generalization to new environments.
To tackle these challenges, this thesis explores methods that enable visual learning with minimal supervision. The core idea is to model the natural regularity and repetition from the visual world in our learning algorithms as their inductive bias. This regularity can be used by directly exploiting similarities in the visual data, or indirectly by using the structure in the semantic tasks and models that operate on this visual data. We use this abundant natural structure or “supervision” in the visual world in the form of temporal structure from videos, modeling relationships between tasks and labels, and similarities in the space of classifiers. We show the effectiveness of these methods on both static images and videos across various tasks such as image classification, object detection, action recognition, human pose estimation, etc. However, all these methods are still passively fed supervision and thus lack the ability to decide what information they need and how to get it. To this end, we propose interactive learners that ask for supervision when needed and can also decide what samples they want to learn from.
Thesis Committee Members:
Martial Hebert, Co-chair
Abhinav Gupta, Co-chair
Alexei A. Efros, University of California, Berkeley
Andrew Zisserman, University of Oxford