Automatic Summarization of English Broadcast News Speech - Robotics Institute Carnegie Mellon University

Automatic Summarization of English Broadcast News Speech

Chiori Hori, Sadaoki Furui, Rob Malkin, Hua Yu, and Alex Waibel
Conference Paper, Proceedings of 2nd International Conference on Human Language Technology Research (HLT '02), pp. 241 - 246, March, 2002

Abstract

Currently various applications of LVCSR systems, such as automatic closed captioning [1], meeting/conference summarization [2][3] and indexing for information retrieval [4], are actively being investigated. Transcribed speech usually includes not only redundant information such as disfluencies, filled pauses, repetitions, repairs and word fragments, but also irrelevant information caused by recognition errors. Therefore, especially for spontaneous speech, practical applications using speech recognizer require a process of summarization which removes redundant and irrelevant information and extracts relatively important information depending on users' requirements. Speech summarization producing understandable sentences from original utterances can be considered as a kind of speech understanding. We proposed an automatic speech summarization technique [5][6][7], and investigated its performance using Japanese broadcast news speech. Since our method is based on a statistical approach, it can be applied not only to Japanese but also other languages. In this paper, English broadcast news speech transcribed using a speech recognizer [8] is automatically summarized and evaluated. In order for our method to apply to English, a model to estimate dependency structures in original sentences based on Stochastic Dependency Context Free Grammar (SDCFG) is extended.

BibTeX

@conference{Hori-2002-8399,
author = {Chiori Hori and Sadaoki Furui and Rob Malkin and Hua Yu and Alex Waibel},
title = {Automatic Summarization of English Broadcast News Speech},
booktitle = {Proceedings of 2nd International Conference on Human Language Technology Research (HLT '02)},
year = {2002},
month = {March},
pages = {241 - 246},
}