Estimating Focus of Attention based on Gaze and Sound

Rainer Stiefelhagen, Jie Yang, and Alex Waibel

Workshop Paper, Workshop on Perceptive User Interfaces (PUI '01), November, 2001

View Publication

Abstract

Estimating a person's focus of attention is useful for various human-computer interaction applications, such as smart meeting rooms, where a user's goals and intent have to be monitored. In work presented here, we are interested in modeling focus of attention in a meeting situation. We have developed a system capable of estimating participants' focus of attention from multiple cues. We employ an omnidirectional camera to simultaneously track participants' faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants' focus of attention from acoustic and visual information separately, and then combines the output of the audio- and video-based focus of attention predictors. We have evaluated the system using the data from three recorded meetings. The acoustic information has provided 8% error reduction on average compared to using a single modality.

BibTeX

@workshop{Stiefelhagen-2001-8350,
author = {Rainer Stiefelhagen and Jie Yang and Alex Waibel},
title = {Estimating Focus of Attention based on Gaze and Sound},
booktitle = {Proceedings of Workshop on Perceptive User Interfaces (PUI '01)},
year = {2001},
month = {November},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.