Title
An audio-visual dataset of human-human interactions in stressful situations
Author
Lefter, I.
Burghouts, G.J.
Rothkrantz, L.J.M.
Publication year
2014
Abstract
Stressful situations are likely to occur at human operated service desks, as well as at human-computer interfaces used in public domain. Automatic surveillance can help notifying when extra assistance is needed. Human communication is inherently multimodal e.g. speech, gestures, facial expressions. It is expected that automatic surveillance systems can benefit from exploiting multimodal information. This requires automatic fusion of modalities, which is still an unsolved problem. To support the development of such systems, we present and analyze audio-visual recordings of human-human interactions at a service desk. The corpus has a high degree of realism: all interactions are freely improvised by actors based on short scenarios where only the sources of conflict were provided. The recordings can be considered as a prototype for general stressful human-human interaction. The recordings were annotated on a 5 point scale on degree of stress from the perspective of surveillance operators. The recordings are very rich in hand gestures. We find that the more stressful the situation, the higher the proportion of speech that is accompanied by gestures. Understanding the function of gestures and their relation to speech is essential for good fusion strategies. Taking speech as the basic modality, one of our research questions was, what is the role of gestures in addition to speech. Both speech and gestures can express emotion, so we say that they have an emotional function. They can also express non-emotional information, in which case we say that they have a semantic function. We learn that when speech and gestures have the same function, they are usually congruent, but intensities and clarity can vary. Most gestures in this dataset convey emotion. We identify classes of gestures in our recordings, and argue that some classes are clear indications of stressful situations © 2014 OpenInterface Association.
Subject
Physics & Electronics
II - Intelligent Imaging
TS - Technical Sciences
Safety and Security
Safety
Defence, Safety and Security
Gestures
Multimodal communication
Multimodal fusion
Speech
Video recordings
Stressful situations dataset
Human computer interaction
Semantics
Automatic surveillance systems
Human-human interactions
Audio recordings
To reference this document use:
http://resolver.tudelft.nl/uuid:3c7bbed0-7319-480a-a2ae-1f4b35506730
DOI
https://doi.org/10.1007/s12193-014-0150-7
TNO identifier
507095
Publisher
Springer Verlag
ISSN
1783-8738
Source
Journal on Multimodal User Interfaces, 8 (1), 29-41
Document type
article