Visual Motif Discovery via First-Person Vision

Ryo Yonetani, Kris M. Kitani, and Yoichi Sato

Conference Paper, Proceedings of (ECCV) European Conference on Computer Vision, pp. 187 - 203, October, 2016

Abstract

Visual motifs are images of visual experiences that are significant and shared across many people, such as an image of an informative sign viewed by many people and that of a familiar social situation such as when interacting with a clerk at a store. The goal of this study is to discover visual motifs from a collection of first-person videos recorded by a wearable camera. To achieve this goal, we develop a commonality clustering method that leverages three important aspects: inter-video similarity, intra-video sparseness, and people’s visual attention. The problem is posed as normalized spectral clustering, and is solved e ciently using a weighted covariance matrix. Experimental results suggest the e↵ectiveness of our method over several state-of-the-art methods in terms of both accuracy and e ciency of visual motif discovery.

BibTeX

@conference{Yonetani-2016-109800,
author = {Ryo Yonetani and Kris M. Kitani and Yoichi Sato},
title = {Visual Motif Discovery via First-Person Vision},
booktitle = {Proceedings of (ECCV) European Conference on Computer Vision},
year = {2016},
month = {October},
pages = {187 - 203},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.