Extraction of visual features for lipreading

Iain Matthews, T.F. Cootes, J.A. Bangham, S. Cox, and R. Harvey
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 2, February, 2002, pp. 198 - 213.


Download
  • Adobe portable document format (pdf) (1MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
The multimodal nature of speech is often ignored in human-computer interaction, but lip deformations and other body motion, such as those of the head, convey additional information. We integrate speech cues from many sources and this improves intelligibility, especially when the acoustic signal is degraded. The paper shows how this additional, often complementary, visual speech information can be used for speech recognition. Three methods for parameterizing lip image sequences for recognition using hidden Markov models are compared. Two of these are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape or shape and appearance, respectively. The third, bottom-up, method uses a nonlinear scale-space analysis to form features directly from the pixel intensity. All methods are compared on a multitalker visual speech recognition task of isolated letters

Notes
Associated Center(s) / Consortia: Vision and Autonomous Systems Center
Associated Lab(s) / Group(s): Face Group
Number of pages: 16

Text Reference
Iain Matthews, T.F. Cootes, J.A. Bangham, S. Cox, and R. Harvey, "Extraction of visual features for lipreading," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 2, February, 2002, pp. 198 - 213.

BibTeX Reference
@article{Matthews_2002_5392,
   author = "Iain Matthews and T.F. Cootes and J.A. Bangham and S. Cox and R. Harvey",
   title = "Extraction of visual features for lipreading",
   journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
   pages = "198 - 213",
   month = "February",
   year = "2002",
   volume = "24",
   number = "2",
}