Nonlinear Interpolation of Topic Models for Language Model Adaptation - Robotics Institute Carnegie Mellon University

Nonlinear Interpolation of Topic Models for Language Model Adaptation

Kristie Seymore, Stanley Chen, and Ronald Rosenfeld
Conference Paper, Proceedings of 5th International Conference on Spoken Language Processing (ICSLP '98), December, 1998

Abstract

Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representative of the current domain. In order to adapt this model for a new document, the topic (or topics) of the new document are identified. Then, the probabilities of words that are more likely to occur in the identified topic(s) than in general are boosted, and the probabilities of words that are unlikely for the identified topic(s) are suppressed. We present a novel technique for adapting a language model to the topic of a document, using a nonlinear interpolation of n-gram language models. A three-way, mutually exclusive division of the vocabulary into general, on-topic and off-topic word classes is used to combine word predictions from a topic-specific and a general language model. We achieve a slight decrease in perplexity and speech recognition word error rate on a Broadcast News test set using these techniques. Our results are compared to results obtained through linear interpolation of topic models.

BibTeX

@conference{Seymore-1998-14813,
author = {Kristie Seymore and Stanley Chen and Ronald Rosenfeld},
title = {Nonlinear Interpolation of Topic Models for Language Model Adaptation},
booktitle = {Proceedings of 5th International Conference on Spoken Language Processing (ICSLP '98)},
year = {1998},
month = {December},
}