Sense Discovery via Co-Clustering on Images and Text

Xinlei Chen, Alan Ritter, Abhinav Gupta, and Tom Mitchell

Conference Paper, Proceedings of (CVPR) Computer Vision and Pattern Recognition, pp. 5298 - 5306, June, 2015

Abstract

We present a co-clustering framework that can be used to discover multiple semantic and visual senses of a given Noun Phrase (NP). Unlike traditional clustering approaches which assume a one-to-one mapping between the clusters in the text-based feature space and the visual space, we adopt a one-to-many mapping between the two spaces. This is primarily because each semantic sense (concept) can correspond to different visual senses due to viewpoint and appearance variations. Our structure-EM style optimization not only extracts the multiple senses in both semantic and visual feature space, but also discovers the mapping between the senses. We introduce a challenging dataset (CMU Polysemy-30) for this problem consisting of 30 NPs (~5600 labeled instances out of ~22K total instances). We have also conducted a large-scale experiment that performs sense disambiguation for ~2000 NPs.

BibTeX

@conference{Chen-2015-113350,
author = {Xinlei Chen and Alan Ritter and Abhinav Gupta and Tom Mitchell},
title = {Sense Discovery via Co-Clustering on Images and Text},
booktitle = {Proceedings of (CVPR) Computer Vision and Pattern Recognition},
year = {2015},
month = {June},
pages = {5298 - 5306},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.