Pixel-level Hand Detection in Ego-Centric Videos - Robotics Institute Carnegie Mellon University

Pixel-level Hand Detection in Ego-Centric Videos

Kris M. Kitani and Cheng Li
Conference Paper, Proceedings of (CVPR) Computer Vision and Pattern Recognition, pp. 3570 - 3577, June, 2013

Abstract

We address the task of pixel-level hand detection in the context of ego-centric cameras. Extracting hand regions in ego-centric videos is a critical step for understanding hand- object manipulation and analyzing hand-eye coordination. However, in contrast to traditional applications of hand de- tection, such as gesture interfaces or sign-language recog- nition, ego-centric videos present new challenges such as rapid changes in illuminations, significant camera motion and complex hand-object manipulations. To quantify the challenges and performance in this new domain, we present a fully labeled indoor/outdoor ego-centric hand detection benchmark dataset containing over 200 million labeled pix- els, which contains hand images taken under various illu- mination conditions. Using both our dataset and a pub- licly available ego-centric indoors dataset, we give exten- sive analysis of detection performance using a wide range of local appearance features. Our analysis highlights the effectiveness of sparse features and the importance of mod- eling global illumination. We propose a modeling strategy based on our findings and show that our model outperforms several baseline approaches.

BibTeX

@conference{Kitani-2013-7731,
author = {Kris M. Kitani and Cheng Li},
title = {Pixel-level Hand Detection in Ego-Centric Videos},
booktitle = {Proceedings of (CVPR) Computer Vision and Pattern Recognition},
year = {2013},
month = {June},
pages = {3570 - 3577},
keywords = {First-person vision, hand detection},
}