Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier - Robotics Institute Carnegie Mellon University

Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier

S. Lucey, S. Sridharan, and V. Chandran
Conference Paper, Proceedings of International Symposium on Intelligent Multimedia, Video and Speech Processing (ISIMP '01), pp. 551 - 554, May, 2001

Abstract

The adaptive fusion of video and audio is one of the fundamental pursuits of audio visual speech recognition (AVSR). In this paper the use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Results are presented that lie above or equal to the boundary of catastrophic fusion across a number of audio noise levels.

BibTeX

@conference{Lucey-2001-121094,
author = {S. Lucey and S. Sridharan and V. Chandran},
title = {Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier},
booktitle = {Proceedings of International Symposium on Intelligent Multimedia, Video and Speech Processing (ISIMP '01)},
year = {2001},
month = {May},
pages = {551 - 554},
}