On the assessment of high-quality voice recordings including voice postprocessing

article
When we assess the quality of a voice recording two different aspects play a role-the voice characteristics (voice quality) and the audio chain characteristics (audio quality). Subjective experiments where no clear ideal reference is provided, so called absolute category rating experiments, assess the speech quality, i.e., the combined effect of voice and audio quality. This paper investigates whether voice postprocessing such as timbre optimization, loudness optimization, de-essing, room reverberation optimization, and (background) noise suppression can improve the quality of a high quality voice recording. It turned out that none of the processing provides a significant improvement in perceived quality. The best postprocessing is noise reduction to absolute silence, delivering only a non-significant improvement when the voice recording is of high quality. The subjective quality evaluations show a significant preference of male over female voice and a significant effect of speaker/sentence dependency on the perceived quality of certain types of degradation. The subjective results are compared with predictions made with the ITU-T standard for the objective assessment of speech quality POLQA (ITU-T Recommendation P.863 versions 1.1 and 2.4) and shows that many speech quality effects are predicted correctly, on condition level as well as individual sentence level.
TNO Identifier
954639
ISSN
15494950
Source
AES: Journal of the Audio Engineering Society, 63(3), pp. 174-183.
Pages
174-183
Files
To receive the publication files, please send an e-mail request to TNO Repository.