Speech intelligibility and speech quality of a MELP speech coder in combination with a noise pre-processing system [Spraakverstaanbaarheid en spraakkwaliteit van een MELP-spraakcodeersysteem in combinatie met een ruisonderdrukkend systeem]
report
The speech intelligibility and quality of a MELP low bitrate speech coder (1200 bps) were investigated in an experiment, in which the speech coder was either combined with or without a noise pre-processing system. The central question to be addressed was to what extent the performance of the speech coder could be improved by adding a noise pre-processing system.
In the speech intelligibility test, participants were presented with CVC-words (consonant-vowel-consonant words) under fifteen different transmission conditions: three systems (speech coder, noise pre-processor, combination of speech coder and noise pre-processor) were tested for five noise conditions, i.e. a 'clean' condition (no additive noise), aircraft noise at 12 and 6 dB SNR, and vehicle noise at 12 and 6 dB SNR. For each transmission condition, word scores and individual phoneme scores for the initial consonant, vowel and the final consonant were obtained. These can be used for diagnostic purposes. In the speech quality test, mean opinion scores (MOS) were obtained: participants rated the quality of short sentences on a five-point scale ranging from bad to excellent. The sentences were presented under similar transmission conditions as in the speech intelligibility test.
The results of both tests show that the performance of the MELP speech coder is affected by noise, as could be expected. Furthermore, its performance seems to depend on the fundamental frequency of the speaker's voice: word (intelligibility) and mean opinion (quality) scores for male speech generally are higher than for female speech. When the MELP speech coder is combined with the noise pre-processor used in this study the speech intelligibility and speech quality are significantly improved in case of low frequency noise (e.g. vehicle), but for high frequency noise (e.g. aircraft) improvements, if any, are found only when the noise level is high (6 dB SNR).
In the speech intelligibility test, participants were presented with CVC-words (consonant-vowel-consonant words) under fifteen different transmission conditions: three systems (speech coder, noise pre-processor, combination of speech coder and noise pre-processor) were tested for five noise conditions, i.e. a 'clean' condition (no additive noise), aircraft noise at 12 and 6 dB SNR, and vehicle noise at 12 and 6 dB SNR. For each transmission condition, word scores and individual phoneme scores for the initial consonant, vowel and the final consonant were obtained. These can be used for diagnostic purposes. In the speech quality test, mean opinion scores (MOS) were obtained: participants rated the quality of short sentences on a five-point scale ranging from bad to excellent. The sentences were presented under similar transmission conditions as in the speech intelligibility test.
The results of both tests show that the performance of the MELP speech coder is affected by noise, as could be expected. Furthermore, its performance seems to depend on the fundamental frequency of the speaker's voice: word (intelligibility) and mean opinion (quality) scores for male speech generally are higher than for female speech. When the MELP speech coder is combined with the noise pre-processor used in this study the speech intelligibility and speech quality are significantly improved in case of low frequency noise (e.g. vehicle), but for high frequency noise (e.g. aircraft) improvements, if any, are found only when the noise level is high (6 dB SNR).
TNO Identifier
9641
Publisher
TNO
Place of publication
Soesterberg