Automating the Creation of a Digital Video Library - The Robotics Institute Carnegie Mellon University
Home/Automating the Creation of a Digital Video Library

Automating the Creation of a Digital Video Library

Michael Smith and Michael Christel
Conference Paper, Proceedings of 3rd ACM International Conference on Multimedia (MULTIMEDIA '95), pp. 357 - 358, November, 1995


The InformediaTM Project has established a large on-line digital video library, incorporating video assets from WQED/Pittsburgh. The project is creating intelligent, automatic mechanisms for populating the library and allowing for its full-content and knowledge-based search and segment retrieval. An example of the display environment for the system is shown in Figure 1. The library retrieval system can effectively process natural queries and deliver relevant video data in a compact, subject-specific format, based on information embedded with the video during library creation. Through the combined efforts of Carnegie Mellon's speech, image and natural language processing groups, this system provides a robust tool for utilizing all modes of video data [Christel95]. The Informedia Project uses the Sphinx-II speech recognition system to transcribe narratives and dialogues automatically [Hwang94]. The resulting transcript is then processed through methods of natural language understanding to extract subjective descriptions and mark potential segment boundaries where significant semantic changes occur [Mauldin91]. Comparative difference measures are used in processing the video to mark potential segment boundaries. Images with small histogram disparity are considered to be relatively equivalent. By detecting significant changes in the weighted histogram of each successive frame, a sequence of images can be grouped into a segment. This simple and robust method for segmentation is fast and can detect 90% of the scene changes in video.

Segment breaks produced by image processing are examined along with the boundaries identified by the natural language processing of the transcript, and an improved set of segment boundaries are heuristically derived to partition the video library into sets of segments, or "video paragraphs" [Hauptmann95]. The technology for this process is shown in Figure 2.


author = {Michael Smith and Michael Christel},
title = {Automating the Creation of a Digital Video Library},
booktitle = {Proceedings of 3rd ACM International Conference on Multimedia (MULTIMEDIA '95)},
year = {1995},
month = {November},
pages = {357 - 358},