Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos

article
In this paper, a system is presented that can detect 48 human actions in realistic videos, ranging from simple actions such as ‘walk’ to complex actions such as ‘exchange’. We propose a method that gives a major contribution in performance. The reason for this major improvement is related to a different approach on three themes: sample selection, twostage classification, and the combination of multiple features. First, we show that the sampling can be improved by smart selection of the negatives. Second, we show that exploiting all 48 actions’ posteriors by two-stage classification greatly improves its detection. Third, we show how low-level motion and high-level object features should be combined. These three yield a performance improvement of a factor 2.37 for human action detection in the visint.org test set of 1,294 realistic videos. In addition, we demonstrate that selective sampling and the two-stage setup improve on standard bagof- feature methods on the UT-interaction dataset, and our method outperforms state-of-the-art for the IXMAS dataset
TNO Identifier
472169
Source
Machine Vision and Applications(May)
Files
To receive the publication files, please send an e-mail request to TNO Repository.