Title
Learning the Fusion of Audio and Video Aggression Assessment by Meta-Information from Human Annotations
Author
Lefter, I.
Burghouts, G.J.
Rothkrantz, L.J.M.
Publication year
2012
Abstract
The focus of this paper is finding a method to predict aggression using a multimodal system, given multiple unimodal features. The mechanism underlying multimodal sensor fusion is complex and not completely clear. We try to understand the process of fusion and make it more transparent. As a case study we use a database with audio-visual recordings of aggressive behavior in trains. We have collected multi- and unimodal assessments by humans, who have given aggression scores on a 3 point scale. There are no trivial fusion steps to predict the multimodal labels from the unimodal labels. We propose an intermediate step to discover the structure in the fusion process. We call these meta-features and we find a set of five which have an impact on the fusion process. Using a propositional rule based learner we show the high positive impact of the meta-features on predicting the multimodal label for the complex situations in which the labels for audio, video and multimodal do not reinforce each other. We continue with an experiment by which we prove the added value of such an approach on the whole data set.
Subject
Multimodal fusion
Aggression detection
Meta-features
Safety and Security
Defence, Safety and Security
Physics & Electronics
II - Intelligent Imaging
TS - Technical Sciences
To reference this document use:
http://resolver.tudelft.nl/uuid:daf6e991-2db5-4019-abd6-ba3bcd707d8c
TNO identifier
455443
Publisher
IEEE, Piscataway, NJ
Source
Proceedings of the 15th International Conference on Information Fusion - FUSION 2012, 9-12 July 2012, Singapore
Document type
conference paper