Person identification using multi-modal features: Speech, lip, and face - Robotics Institute Carnegie Mellon University

Person identification using multi-modal features: Speech, lip, and face

N. Fox, Ralph Gross, P. de Chazal, Jeffrey Cohn, and R. Reilly
Workshop Paper, ACM Multimedia Workshop in Biometrics Methods and Applications (WBMA '03), pp. 25 - 32, November, 2003

Abstract

This paper presents a multi-expert person identification system based on the integration of three separate systems employing audio features, static face images and lip motion features respectively. Audio person identification was carried out using a text dependent Hidden Markov Model methodology. Modeling of the lip motion was carried out using Gaussian probability density functions. The static image based identification was carried out using the FaceIt system. Experiments were conducted with 251 subjects from the XM2VTS audio-visual database. Late integration using automatic weights was employed to combine the three experts. The integration strategy adapts automatically to the audio noise conditions. It was found that the integration of the three experts improved the person identification accuracies for both clean and noisy audio conditions compared with the audio only case. For audio, FaceIt, lip motion, and tri-expert identification, maximum accuracies achieved were 98%, 93.22%, 86.37% and 100% respectively. Maximum bi-expert integration of the two visual experts achieved an identification accuracy of 96.8% which is comparable to the best audio accuracy of 98%.

BibTeX

@workshop{Fox-2003-16889,
author = {N. Fox and Ralph Gross and P. de Chazal and Jeffrey Cohn and R. Reilly},
title = {Person identification using multi-modal features: Speech, lip, and face},
booktitle = {Proceedings of ACM Multimedia Workshop in Biometrics Methods and Applications (WBMA '03)},
year = {2003},
month = {November},
pages = {25 - 32},
}