Multi-view 3D Human Pose Estimation in Complex Environment

article
We introduce a framework for unconstrained 3D human upper body pose estimation from multiple camera views in complex environment. Its main novelty lies in the integration of three components: single-frame pose recovery, temporal integration and model texture adaptation. Single-frame pose recovery consists of a hypothesis generation stage, in which candidate 3D poses are generated, based on probabilistic hierarchical shape matching in each camera view. In the subsequent hypothesis verification stage, the candidate 3D poses are re-projected into the other camera views and ranked according to a multi-view likelihood measure. Temporal integration consists of computing K-best trajectories combining a motion model and observations in a Viterbi-style maximum-likelihood approach. Poses that lie on the best trajectories are used to generate and adapt a texture model, which in turn enriches the shape likelihood measure used for pose recovery. The multiple trajectory hypotheses are used to generate pose predictions, augmenting the 3D pose candidates generated at the next time step. We demonstrate that our approach outperforms the state-of-the-art in experiments with large and challenging real-world data from an outdoor setting.
TNO Identifier
482075
Source
International Journal of Computer Vision, 96(1), pp. 103-124.
Pages
103-124