About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH - Eurospeech 2003
Conference paper
Using place name data to train language identification models
Abstract
The language of origin of a name affects its pronunciation, so language identification is an important technology for speech synthesis recognition. Previous work on this task has typically used training sets that are proprietary or limited in coverage. In this work, we investigate the use of a publicallyavailable geographic database for training language ID models. We automatically cluster place names by language, show that models trained from place name data are effective for language ID on person names. In addition, we compare several source-channel direct models for language ID, achieve a 24% reduction in error rate over a source-channel letter trigram model on a 26-way language ID task.