Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent - Robotics Institute Carnegie Mellon University

Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent

Yuqian Fu, Chengrong Wang, Yanwei Fu, Yuxiong Wang, Cong Bai, Xiangyang Xue, and Yu-Gang Jiang
Conference Paper, Proceedings of 27th ACM International Conference on Multimedia (MM '19), pp. 411 - 419, October, 2019

Abstract

One-shot learning aims to recognize novel target classes from few examples by transferring knowledge from source classes, under a general assumption that the source and target classes are semantically related but not exactly the same. Based on this assumption, recent work has focused on image-based one-shot learning, while little work has addressed video-based one shot learning. One of the challenges lies in that it is difficult to maintain the disjoint-class assumption for videos, since video clips of target classes may potentially appear in the videos of source classes. To address this issue, we introduce a novel setting, termed as embodied agents based one-shot learning, which leverages synthetic videos produced in a virtual environment to understand realistic videos of target classes. In this setting, we further propose two types of learning tasks: embodied one-shot video domain adaptation and embodied one-shot video transfer recognition. These tasks serve as a testbed for evaluating video related one-shot learning tasks. In addition, we propose a general video segment augmentation method, which significantly facilitates a variety of one-shot learning tasks. Experimental results validate the soundness of our setting and learning tasks, and also show the effectiveness of our augmentation approach to video recognition in the small-sample size regime.

BibTeX

@conference{Fu-2019-122551,
author = {Yuqian Fu and Chengrong Wang and Yanwei Fu and Yuxiong Wang and Cong Bai and Xiangyang Xue and Yu-Gang Jiang},
title = {Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent},
booktitle = {Proceedings of 27th ACM International Conference on Multimedia (MM '19)},
year = {2019},
month = {October},
pages = {411 - 419},
}