Emotion recognition from speech by combining databases and fusion of classifiers

conference paper
We explore possibilities for enhancing the generality, portability and robustness of emotion recognition systems by combining data-bases and by fusion of classifiers. In a first experiment, we investigate the performance of an emotion detection system tested on a certain database given that it is trained on speech from either the same database, a different database or a mix of both. We observe that generally there is a drop in performance when the test database does not match the training material, but there are a few exceptions. Furthermore, the performance drops when a mixed corpus of acted databases is used for training and testing is carried out on real-life recordings. In a second experiment we investigate the effect of training multiple emotion detectors, and fusing these into a single detection system. We observe a drop in the Equal Error Rate (eer) from 19.0 % on average for 4 individual detectors to 4.2 % when fused using FoCal [1]. © 2010 Springer-Verlag Berlin Heidelberg.
TNO Identifier
425140
ISSN
03029743
ISBN
3642157599 9783642157592
Source title
13th International Conference on Text, Speech and Dialogue, TSD 2010, 6 September 2010 through 10 September 2010, Brno. Conference code: 82076
Pages
353-360
Files
To receive the publication files, please send an e-mail request to TNO Repository.