Building language detectors using small amounts of training data
conference paper
In this paper we present language detectors built using relatively small amounts of training data. This is carried out using the modelling power of a Linear Discriminant Analysis back-end for the languages which have a small amount of training data. We present experiments on NIST 2005 Language Recognition Evaluation data, where we use a jackknifing technique to remove well-trained language knowledge from the LDA back-end, using only sparse trials for training the LDA. We investigate three systems, which show different levels of loss of language detection capability. We validate the technique on an independent collection of 21 languages, where we show that with less than one hour training we obtain an error rate for ‘new’ languages that is only slightly over twice the error rate for languages for which the full 60 hours of CallFriend data is available. © Odyssey 2008: Speaker and Language Recognition Workshop. All rights reserved.
Topics
TNO Identifier
954020
Publisher
International Speech Communication Association
Article nr.
15157
Source title
Odyssey 2008: Speaker and Language Recognition Workshop, Speaker and Language Recognition Workshop, Odyssey 2008, 21 January 2008 through 24 January 2008
Files
To receive the publication files, please send an e-mail request to TNO Repository.