Speaker Compensation with Sine-Log All-Pass Transforms - Robotics Institute Carnegie Mellon University

Speaker Compensation with Sine-Log All-Pass Transforms

John McDonough, Florian Metze, Hagen Soltau, and Alex Waibel
Conference Paper, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), pp. 369 - 372, May, 2001

Abstract

In previous work, we proposed the rational all-pass transform (RAPT) as the basis of a speaker adaptation scheme intended for use with a large vocabulary speech recognition system. It was shown that RAPT-based adaptation reduces to a linear transformation of cepstral means, much like the better known maximum likelihood linear regression (MLLR). In a set of speech recognition experiments conducted on the Switchboard Corpus, we obtained a word error rate (WER) of 37.9% using RAPT adaptation, a significant improvement over the 39.5% WER achieved with MLLR. In the present work, we propose the sine-log all-pass transform (SLAPT) as a replacement for the RAPT. Our findings indicate the SLAPT is just as effective as the RAPT at reducing WER when used as the basis for a variety of speaker compensation schemes, but in addition conduces to far more tractable computation of transformed cepstral sequences, and the estimation of optimal transform parameters.

BibTeX

@conference{McDonough-2001-8230,
author = {John McDonough and Florian Metze and Hagen Soltau and Alex Waibel},
title = {Speaker Compensation with Sine-Log All-Pass Transforms},
booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01)},
year = {2001},
month = {May},
pages = {369 - 372},
}