VASC Seminar: Devi Parikh
Advancing Computer Vision by Leveraging Humans
Research Assistant Professor, TTIC
May 14, 2012
Historically, humans have played a limited role in advancing the challenging problem of computer vision: either by designing algorithms in their capacity as researchers or by acting as ground-truth generating minions. This seems rather counter-productive since we often aim to replicate human performance (e.g. in semantic image understanding) and are faced with applications where humans communicate with vision systems (e.g. in image search, or for training the systems). In this talk, I will describe my recent efforts in expanding the roles humans play in advancing computer vision.
In the first part of my talk, I will describe our recently-introduced "human-debugging" paradigm. It allows us to identify weak-links in machine vision approaches that require further research. It involves replacing subcomponents of machine vision pipelines with human subjects, and examining the resultant effect on overall recognition performance. Using this paradigm, we have studied image classification, scene and object recognition, contextual reasoning and person detection. I will present some of our recent efforts at analyzing semantic as well as unsupervised segmentation within this framework.
In the second part of my talk, I will present our work on allowing humans and machines to better communicate with each other by exploiting visual attributes. Visual attributes are mid-level concepts such as "furry" and "metallic" that bridge the gap between low-level image features (e.g. texture) and high-level concepts (e.g. rabbit or car). They are shareable across different but related concepts. Most importantly, visual attributes are both machine detectable and human understandable, making them ideal as a mode of communication between the two. I will present our work on enhancing the communication power of attributes by using them relatively. I will present a variety of applications that exploit relative attributes: improved image search, effective active learning of image classifiers, zero-shot learning and generating automatic descriptions of images.
Time permitting, I will also present some of our latest work on modeling predictable spatial structures in images and exploiting them for contextual reasoning and scene recognition.
Host: Gupta, Abhinav