Improved Action Recognition by Combining Multiple 2D Views in the Bag-of-Words Model

Burghouts, G.J.; Eendebak, P.T.; Bouma, H.; Hove, R.J.M. ten

Improved Action Recognition by Combining Multiple 2D Views in the Bag-of-Words Model

conference paper

2013

Burghouts, G.J.

Eendebak, P.T.

Bouma, H.

Hove, R.J.M. ten

Action recognition is a hard problem due to the many degrees of freedom of the human body and the movement of its limbs. This is especially hard when only one camera viewpoint is available and when actions involve subtle movements. For instance, when looked from the side, checking one’s watch may look very similar to crossing one’s arms. In this paper, we investigate how much the recognition can be improved when multiple views are available. The novelty is that we explore various combination schemes within the robust and simple bag-of words (BoW) framework, from early fusion of features to late fusion of multiple classifiers. In new experiments on the publicly available IXMAS dataset, we learn that action recognition can be improved significantly already by only adding one viewpoint. We demonstrate that the state-of-the-art on this dataset can be improved by 5% - achieving 96.4% accuracy - when multiple views are combined. Cross-view invariance of the BoW pipeline can be improved by 32% with intermediate-level fusion.

Topics

Action recognition Human motion Image processing Multiple views

TNO Identifier

472964

Repository link

https://resolver.tno.nl/uuid:41a88abe-f879-4fe7-b2d4-52d0f0a71f16

Publisher

IEEE

Source title

10th IEEE International Conference on Advanced Video and Signal-Based Surveillance - AVSS 2013, 27-30 August 2013, Krakow, Poland

Place of publication

Piscataway, NJ

Pages

250-255

Files

Download burghouts-2013-improved.pdf

Improved Action Recognition by Combining Multiple 2D Views in the Bag-of-Words Model

Make TNO yours!