Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos