Large-scale Topic Detection and Language Model Adaptation

Kristie Seymore and Ronald Rosenfeld

Tech. Report, CMU-CS-97-152, Computer Science Department, Carnegie Mellon University, June, 1997

View Publication

Abstract

The subject matter of any conversation or document can typically be described as some combination of elemental topics. We have developed a language model adaptation scheme that takes a piece of text, chooses the most similar topic clusters from a set of over 5000 elemental topics, and uses topic specific language models built from the topic clusters to rescore N-best lists. We are able to achieve a 15% reduction in perplexity and a small improvement in WER by using this adaptation. We also investigate the use of a topic tree, where the amount of training data for a specific topic can be judiciously increased in cases where the elemental topic cluster has too few word tokens to build a reliably smoothed and representative language model. Our system is able to fine-tune topic adaptation by interpolating models chosen from thousands of topics, allowing for adaptation to unique, previously unseen combinations of subjects.

BibTeX

@techreport{Seymore-1997-14416,
author = {Kristie Seymore and Ronald Rosenfeld},
title = {Large-scale Topic Detection and Language Model Adaptation},
year = {1997},
month = {June},
institute = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-CS-97-152},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.