Automatic audio-visual fusion for aggression detection using meta-information