Data-Driven Determination of Appropriate Dictionary Units for Korean LVCSR - Robotics Institute Carnegie Mellon University

Data-Driven Determination of Appropriate Dictionary Units for Korean LVCSR

Daniel Kiecza, Tanja Schultz, and Alex Waibel
Conference Paper, Proceedings of International Conference on Speech Processing (ICSP '99), pp. 323 - 327, August, 1999

Abstract

This paper describes the design of our Korean large vocabulary speech recognition system using the multilingual dictation database GlobalPhone. Defining appropriate dictionary units for this purpose is not a trivial task since using word phrases (eojeols) gives very high OOV-rates, above 30%, whereas using syllable units results in high confusabilities and a very limited scope of standard language models. We investigate a data-driven approach which overcomes these limitations. The results show that the data-driven approach reduces the OOV-rate to below 1 % and significantly outperforms the syllable based approach according to phone and syllable accuracy giving 79.4 % and 69.3 % accuracy respectively. For our best system we present lattice based accuracies achieving 95.0 % syllable accuracy and 82.7 % eojeol accuracy.

BibTeX

@conference{Kiecza-1999-14974,
author = {Daniel Kiecza and Tanja Schultz and Alex Waibel},
title = {Data-Driven Determination of Appropriate Dictionary Units for Korean LVCSR},
booktitle = {Proceedings of International Conference on Speech Processing (ICSP '99)},
year = {1999},
month = {August},
pages = {323 - 327},
}