Nonlinear Interpolation of Topic Models for Language Model Adaptation

Kristie Seymore, Stanley Chen, and Ronald Rosenfeld

Conference Paper, Proceedings of 5th International Conference on Spoken Language Processing (ICSLP '98), December, 1998

View Publication

Abstract

Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representative of the current domain. In order to adapt this model for a new document, the topic (or topics) of the new document are identified. Then, the probabilities of words that are more likely to occur in the identified topic(s) than in general are boosted, and the probabilities of words that are unlikely for the identified topic(s) are suppressed. We present a novel technique for adapting a language model to the topic of a document, using a nonlinear interpolation of n-gram language models. A three-way, mutually exclusive division of the vocabulary into general, on-topic and off-topic word classes is used to combine word predictions from a topic-specific and a general language model. We achieve a slight decrease in perplexity and speech recognition word error rate on a Broadcast News test set using these techniques. Our results are compared to results obtained through linear interpolation of topic models.

BibTeX

@conference{Seymore-1998-14813,
author = {Kristie Seymore and Stanley Chen and Ronald Rosenfeld},
title = {Nonlinear Interpolation of Topic Models for Language Model Adaptation},
booktitle = {Proceedings of 5th International Conference on Spoken Language Processing (ICSLP '98)},
year = {1998},
month = {December},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.