Title
Automatic audio-visual fusion for aggression detection using meta-information
Author
Lefter, I.
Burghouts, G.J.
Rothkrantz, L.J.M.
Publication year
2012
Abstract
We propose a new method for audio-visual sensor fusion and apply it to automatic aggression detection. While a variety of definitions of aggression exist, in this paper we see it as any kind of behavior that has a disturbing effect on others. We have collected multi- and unimodal assessments by humans, who have given aggression scores on a 3 point scale. There are no trivial fusion algorithms to predict the multimodal labels from the unimodal labels. We propose an intermediate step to discover the structure in the fusion process. We call these metafeatures and we find a set of five which have an impact on the fusion process. We use simple state of the art low level audio and video features to predict the level of aggression in audio and video, and we also predict the three most feasible metafeatures. We show the significant positive impact of adding the meta-features on predicting the multimodal label as compared to standard fusion techniques like feature and decision level fusion. © 2012 IEEE.
Subject
Physics & Electronics
II - Intelligent Imaging
TS - Technical Sciences
Safety and Security
Defence, Safety and Security
Aggression detection
Audio-visual sensor fusion
High level fusion
Surveillance
To reference this document use:
http://resolver.tudelft.nl/uuid:59a03a34-d4e4-4f96-8363-656d2b0afc13
DOI
https://doi.org/10.1109/avss.2012.13
TNO identifier
465811
Publisher
IEEE, Piscataway, NJ
Source
2012 IEEE 9th International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2012, 18-21 September 2012, Beijing, China, 19-24
Article number
6327978
Document type
conference paper