Recognizing Stress Using Semantics and Modulation of Speech and Gestures
article
This paper investigates how speech and gestures convey stress, and how they can be used for automatic stress recognition. As a first step, we look into how humans use speech and gestures to convey stress. In particular, for both speech and gestures, we distinguish between stress conveyed by the intended semantic message (e.g. spoken words for speech, symbolic meaning for gestures), and stress conveyed by the modulation of either speech and gestures (e.g. intonation for speech, speed and rhythm for gestures). As a second step, we use this decomposition of stress as an approach for automatic stress prediction. The considered components provide an intermediate representation with intrinsic meaning, which helps bridging the semantic gap between the low level sensor representation and the high level context sensitive interpretation of behavior. Our experiments are run on an audiovisual dataset with service-desk interactions. The final goal is having a surveillance system that would notify when the stress level is high and extra assistance is needed. We find that speech modulation is the best performing intermediate level variable for automatic stress prediction. Using gestures increases the performance and is mostly beneficial when speech is lacking. The two-stage approach with intermediate variables performs better than baseline feature level or decision level fusion. © 2015 IEEE.
Topics
TNO Identifier
537432
ISSN
19493045
Source
IEEE Transactions on Affective Computing, 7(2), pp. 162-175.
Publisher
Institute of Electrical and Electronics Engineers Inc.
Article nr.
7145400
Pages
162-175
Files
To receive the publication files, please send an e-mail request to TNO Repository.