Print Email Facebook Twitter An audio-visual dataset of human-human interactions in stressful situations Title An audio-visual dataset of human-human interactions in stressful situations Author Lefter, I. Burghouts, G.J. Rothkrantz, L.J.M. Publication year 2014 Abstract Stressful situations are likely to occur at human operated service desks, as well as at human-computer interfaces used in public domain. Automatic surveillance can help notifying when extra assistance is needed. Human communication is inherently multimodal e.g. speech, gestures, facial expressions. It is expected that automatic surveillance systems can benefit from exploiting multimodal information. This requires automatic fusion of modalities, which is still an unsolved problem. To support the development of such systems, we present and analyze audio-visual recordings of human-human interactions at a service desk. The corpus has a high degree of realism: all interactions are freely improvised by actors based on short scenarios where only the sources of conflict were provided. The recordings can be considered as a prototype for general stressful human-human interaction. The recordings were annotated on a 5 point scale on degree of stress from the perspective of surveillance operators. The recordings are very rich in hand gestures. We find that the more stressful the situation, the higher the proportion of speech that is accompanied by gestures. Understanding the function of gestures and their relation to speech is essential for good fusion strategies. Taking speech as the basic modality, one of our research questions was, what is the role of gestures in addition to speech. Both speech and gestures can express emotion, so we say that they have an emotional function. They can also express non-emotional information, in which case we say that they have a semantic function. We learn that when speech and gestures have the same function, they are usually congruent, but intensities and clarity can vary. Most gestures in this dataset convey emotion. We identify classes of gestures in our recordings, and argue that some classes are clear indications of stressful situations © 2014 OpenInterface Association. Subject Physics & ElectronicsII - Intelligent ImagingTS - Technical SciencesSafety and SecuritySafetyDefence, Safety and SecurityGesturesMultimodal communicationMultimodal fusionSpeechVideo recordingsStressful situations datasetHuman computer interactionSemanticsAutomatic surveillance systemsHuman-human interactionsAudio recordings To reference this document use: http://resolver.tudelft.nl/uuid:3c7bbed0-7319-480a-a2ae-1f4b35506730 DOI https://doi.org/10.1007/s12193-014-0150-7 TNO identifier 507095 Publisher Springer Verlag ISSN 1783-8738 Source Journal on Multimodal User Interfaces, 8 (1), 29-41 Document type article Files To receive the publication files, please send an e-mail request to TNO Library.