Optimal selection of speech data for automatic speech recognition systems
conference paper
This paper presents a method designed to select a limited set of maximally information rich speech data from a database for optimal training and diagnostic testing of Automatic Speech Recognition (ASR) systems. The method uses Principal Component Analysis (PCA) to map the variance of the speech material in a database into a low-dimensional space, followed by clustering and a selection technique. It appears that a very straightforward implementation of this procedure automatically detects at least two criteria for a classification of speakers of standard Dutch, viz. gender and the way in which the /r/ is produced. To verify the power of the technique to improve ASR, data sets of equal size selected with this method and obtained randomly were used to train a recognition system on Dutch connected digits. The results show an improvement in the recognition performance when optimal data sets were used, especially for the conditions where the sub-corpora used for training were relatively small.
TNO Identifier
745341
Publisher
International Speech Communication Association
Source title
7th International Conference on Spoken Language Processing, ICSLP 2002. 16 September 2002 through 20 September 2002
Pages
2473-2476
Files
To receive the publication files, please send an e-mail request to TNO Repository.