Discovering Primitive Action Categories by Leveraging Relevant Visual Context - Robotics Institute Carnegie Mellon University

Discovering Primitive Action Categories by Leveraging Relevant Visual Context

Kris M. Kitani, Takahiro Okabe, Yoichi Sato, and Akihiro Sugimoto
Workshop Paper, ECCV '08 8th International Workshop on Visual Surveillance (VS '08), October, 2008

Abstract

Under the bag-of-features framework we aim to learn primitive action categories from video without supervision by leveraging relevant visual context in addition to motion features. We define visual context as the appearance of the entire scene including the actor, related objects and relevant background features. To leverage visual context along with motion features, we learn a bi-modal latent variable model to discover action categories without supervision. Our experiments show that the combination of relevant visual context and motion features improves the performance of action discovery. Furthermore, we show that our method is able to leverage relevant visual features for action discovery despite the presence of irrelevant background objects.

BibTeX

@workshop{Kitani-2008-109831,
author = {Kris M. Kitani and Takahiro Okabe and Yoichi Sato and Akihiro Sugimoto},
title = {Discovering Primitive Action Categories by Leveraging Relevant Visual Context},
booktitle = {Proceedings of ECCV '08 8th International Workshop on Visual Surveillance (VS '08)},
year = {2008},
month = {October},
}