Person identification using multi-modal features: Speech, lip, and face

N. Fox, Ralph Gross, P. de Chazal, Jeffrey Cohn, and R. Reilly
ACM Multimedia Workshop in Biometrics Methods and Applications (WBMA 2003), 2003, pp. 25 - 32.


Download
  • Adobe portable document format (pdf) (114KB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
This paper presents a multi-expert person identification system based on the integration of three separate systems employing audio features, static face images and lip motion features respectively. Audio person identification was carried out using a text dependent Hidden Markov Model methodology. Modeling of the lip motion was carried out using Gaussian probability density functions. The static image based identification was carried out using the FaceIt system. Experiments were conducted with 251 subjects from the XM2VTS audio-visual database. Late integration using automatic weights was employed to combine the three experts. The integration strategy adapts automatically to the audio noise conditions. It was found that the integration of the three experts improved the person identification accuracies for both clean and noisy audio conditions compared with the audio only case. For audio, FaceIt, lip motion, and tri-expert identification, maximum accuracies achieved were 98%, 93.22%, 86.37% and 100% respectively. Maximum bi-expert integration of the two visual experts achieved an identification accuracy of 96.8% which is comparable to the best audio accuracy of 98%.

Notes
Associated Center(s) / Consortia: Vision and Autonomous Systems Center
Associated Lab(s) / Group(s): Face Group
Number of pages: 8

Text Reference
N. Fox, Ralph Gross, P. de Chazal, Jeffrey Cohn, and R. Reilly, "Person identification using multi-modal features: Speech, lip, and face," ACM Multimedia Workshop in Biometrics Methods and Applications (WBMA 2003), 2003, pp. 25 - 32.

BibTeX Reference
@inproceedings{Gross_2003_4813,
   author = "N. Fox and Ralph Gross and P. de Chazal and Jeffrey Cohn and R. Reilly",
   title = "Person identification using multi-modal features: Speech, lip, and face",
   booktitle = "ACM Multimedia Workshop in Biometrics Methods and Applications (WBMA 2003)",
   pages = "25 - 32",
   year = "2003",
}