Speaker Identification Using Multilingual Phone Strings - Robotics Institute Carnegie Mellon University

Speaker Identification Using Multilingual Phone Strings

Qin Jin, T. Schultz, and Alex Waibel
Conference Paper, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), pp. 145 - 148, May, 2002

Abstract

Far-field speaker identification is very challenging since varying recording conditions often result in unmatching training and test situations. Although the widely used Gaussian Mixture Models (GMM) approach achieves reasonable good results when training and testing conditions match, its performance degrades dramatically under non-matching conditions. In this paper we propose a new approach for far-field speaker identification: the usage of multilingual phone strings derived from recognizers in eight different languages. The experiments are carried out on a database of 30 speakers recorded with eight different microphone distances. The results show that the multilingual phone string approach is robust against non-matching conditions and significantly outperforms the GMMs. On 10-second test chunks, the average closed-set identification performance achieves 96.7% on variable distance data.

BibTeX

@conference{Jin-2002-8453,
author = {Qin Jin and T. Schultz and Alex Waibel},
title = {Speaker Identification Using Multilingual Phone Strings},
booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02)},
year = {2002},
month = {May},
pages = {145 - 148},
}