Speaker, Accent, and Language Identification Using Multilingual Phone Strings - Robotics Institute Carnegie Mellon University

Speaker, Accent, and Language Identification Using Multilingual Phone Strings

T. Schultz, Qin Jin, Kornel Laskowski, Alicia Tribble, and Alex Waibel
Conference Paper, Proceedings of 2nd International Conference on Human Language Technology Research (HLT '02), pp. 125 - 131, March, 2002

Abstract

The identification of an utterance's non-verbal cues, such as speaker, accent and language, can provide useful information for speech analysis. In this paper we investigate far-field speaker identification, as well as accent and language identification, using multilingual phone strings produced by phone recognizers trained on data from different languages. Currently, approaches based on Gaussian Mixture Models (GMMs) [4] are the most widely and successfully used methods for speaker identification. Although GMMs have been applied successfully to close-speaking microphone scenarios under matched training and testing conditions, their performance degrades dramatically under mismatched conditions. The term "mismatched condition" describes a situation in which the testing conditions, e.g. microphone distance, are quite different from what had been seen during training. For language and accent identification, phone recognition together with phone N-gram modeling has been the most successful approach in the past [6]. More recently, Kohler introduced an approach for speaker recognition where a phonotactic N-gram model is used. In this paper, we extend this idea to far-field speaker identification, as well as to accent and language identification. We introduce two different methods based on multilingual phone strings to tackle mismatched distance and channel conditions and compare them to the GMM approach.

BibTeX

@conference{-2002-8402,
author = {T. Schultz and Qin Jin and Kornel Laskowski and Alicia Tribble and Alex Waibel},
title = {Speaker, Accent, and Language Identification Using Multilingual Phone Strings},
booktitle = {Proceedings of 2nd International Conference on Human Language Technology Research (HLT '02)},
year = {2002},
month = {March},
pages = {125 - 131},
}