Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier

S. Lucey, S. Sridharan, and V. Chandran

Conference Paper, Proceedings of International Symposium on Intelligent Multimedia, Video and Speech Processing (ISIMP '01), pp. 551 - 554, May, 2001

Abstract

The adaptive fusion of video and audio is one of the fundamental pursuits of audio visual speech recognition (AVSR). In this paper the use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Results are presented that lie above or equal to the boundary of catastrophic fusion across a number of audio noise levels.

BibTeX

@conference{Lucey-2001-121094,
author = {S. Lucey and S. Sridharan and V. Chandran},
title = {Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier},
booktitle = {Proceedings of International Symposium on Intelligent Multimedia, Video and Speech Processing (ISIMP '01)},
year = {2001},
month = {May},
pages = {551 - 554},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.