Informedia at TRECVID 2003: Analyzing and Searching Broadcast News Video - Robotics Institute Carnegie Mellon University

Informedia at TRECVID 2003: Analyzing and Searching Broadcast News Video

Alex Hauptmann, Dorbin Ng, Robert Baron, M-Y. Chen, Michael Christel, Pinar Duygulu, C. Huang, W-H. Lin, Howard Wactlar, N. Moraveji, Norman Papernick, C.G.M. Snoek, G. Tzanetakis, Jie Yang, R. Yan, and R. Jin
Conference Paper, Proceedings of 12th Text Retrieval Conference (TREC '03), November, 2003

Abstract

A concentrated effort was made by the authors to develop an interface allowing a human to succeed with video topics as defined in TRECVID 2001. This interface was part of the TRECVID 2002 interactive query task, in which a person could issue multiple queries and refinements to the video corpus in formulating the shot answer set for the topic at hand. The interface was designed to present a visually rich set of thumbnail images to the user, tailored for expert control over the number, scale, and attributes of the images. Armed with this interface, an expert user completely familiar with the retrieval system and its features, but having no a priori knowledge of the TRECVID 2002 search test corpus, performed well on the search tasks. This exact system as used in the TRECVID 2002 interactive query task was again used for the TRECVID 2003 evaluation. To facilitate better visual browsing, we extended the storyboard idea to show keyframes across multiple video documents, where a document is automatically derived by segmenting a video production into story units through speech, silence, black frames, and other heuristics. The hierarchy of information units is frame, shot, document and full production. A set of documents is returned by a query. The shots for these documents are presented in a single storyboard, i.e., an ordered set of keyframes presented simultaneously on the computer screen, one keyframe per shot. Without further filtering, most queries would overwhelm the user with too many images. Through the use of query context, the cardinality of the image set can be greatly reduced. The search engine for text queries makes use of the Okapi method. The multiple document storyboard can be set to show only the shots containing matching words. This strategy of selecting a single thumbnail image to represent a video document based on query context resulted in more efficient information retrieval with greater user satisfaction.

BibTeX

@conference{Hauptmann-2003-8818,
author = {Alex Hauptmann and Dorbin Ng and Robert Baron and M-Y. Chen and Michael Christel and Pinar Duygulu and C. Huang and W-H. Lin and Howard Wactlar and N. Moraveji and Norman Papernick and C.G.M. Snoek and G. Tzanetakis and Jie Yang and R. Yan and R. Jin},
title = {Informedia at TRECVID 2003: Analyzing and Searching Broadcast News Video},
booktitle = {Proceedings of 12th Text Retrieval Conference (TREC '03)},
year = {2003},
month = {November},
}