Recognition and localization of relevant human behavior in videos