/From Red Wine to Red Tomato: Composition with Context

From Red Wine to Red Tomato: Composition with Context

Ishan Misra, Abhinav Gupta and Martial Hebert
Conference Paper, Computer Vision and Pattern Recognition, July, 2017

Download Publication (PDF)

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.


Compositionality and contextuality are key building blocks of intelligence. They allow us to compose known concepts to generate new and complex ones. However, traditional learning methods do not model both these properties and require copious amounts of labeled data to learn new concepts. A large fraction of existing techniques, \eg using late fusion, compose concepts but fail to model contextuality. For example, red in red wine is different from red in red tomatoes. In this paper, we present a simple method that respects contextuality in order to compose classifiers of known visual concepts. Our method builds upon the intuition that classifiers lie in a smooth space where compositional transforms can be modeled. We show how it can generalize to unseen combinations of concepts. Our results on composing attributes, objects as well as composing subject, predicate, and objects demonstrate its strong generalization performance compared to baselines. Finally, we present detailed analysis of our method and highlight its properties.

Associated Lab - Vision and Mobile Robotics Lab, Associated Center - Vision and Autonomous Systems Center (VASC)

BibTeX Reference
author = {Ishan Misra and Abhinav Gupta and Martial Hebert},
title = {From Red Wine to Red Tomato: Composition with Context},
booktitle = {Computer Vision and Pattern Recognition},
year = {2017},
month = {July},
keywords = {computer vision, image classification, deep learning, composing classifiers},