Text, Speech, and Vision for Video Segmentation: The Informedia Project

Alex Hauptmann and Michael Smith
AAAI Fall 1995 Symposium on Computational Models for Integrating Language and Vision, 1995.


Download
  • Adobe portable document format (pdf) (3MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
We describe three technologies involved in creating a digital video library suitable for full-content search and retrieval. Image processing analyzes scenes, speech processing transcribes the audio signal, and natural language processing determines word relevance. The integration of these technologies enables us to include vast amounts of video data in the library.

Notes
Associated Center(s) / Consortia: Vision and Autonomous Systems Center
Associated Project(s): Informedia Digital Video Library

Text Reference
Alex Hauptmann and Michael Smith, "Text, Speech, and Vision for Video Segmentation: The Informedia Project," AAAI Fall 1995 Symposium on Computational Models for Integrating Language and Vision, 1995.

BibTeX Reference
@inproceedings{Hauptmann_1995_2710,
   author = "Alex Hauptmann and Michael Smith",
   title = "Text, Speech, and Vision for Video Segmentation: The Informedia Project",
   booktitle = "AAAI Fall 1995 Symposium on Computational Models for Integrating Language and Vision",
   year = "1995",
}