The recovery of 3D human pose is an important problem in computer vision with many potential applications in human computer interfaces, motion analysis (e.g. sports, medical) and surveillance. 3D human pose also provides informative, viewinvariant features for a subsequent activity recognition step. Most if not all previous work on 3D human pose recovery has either assumed known human models, constrained motions, controlled environments, a large number of overlapping cameras, or a combination thereof. In this thesis, we introduce a novel framework for unconstrained 3D human upper body pose recovery in dynamic and cluttered environments, using a moderate number of overlapping camera views. It consists of three components: single-frame pose recovery, temporal integration and human model adaptation. Single-frame 3D pose recovery consists of a hypothesis generation stage, in which candidate poses are generated based on probabilistic hierarchical shape matching in each camera view. In a subsequent verification stage, the candidate poses are re-projected into the other camera views and ranked according to a multi-view likelihood measure. Temporal integration is based on computing K-best trajectories combining a motion model and observations in a Viterbi-style maximum-likelihood approach. The multiple trajectory hypotheses are used to generate pose predictions, augmenting the pose candidates generated at the next time step. Given that 3D pose recovery performance is closely linked to the quality of the underlying human model, we present approaches to automatically adapt the appearance and shape of a generic human model to the persons in the scene. Appearance adaptation is realized by learning a texture model from the beforementioned best pose trajectories; this texture model enriches the shape likelihood measure used for pose recovery. Shape adaptation is implemented as batch-mode optimization of the shape- and pose-parameters of a generic model. An important step is the selection of the most informative frames, upon which to base the optimization; these frames are determined using a criterion that takes both shape-texture likelihood and pose diversity into account. We demonstrate the effectiveness of our 3D human pose recovery framework using a novel and challenging data set, derived from three synchronized color CCD cameras looking over a train station platform. In several sequences, various persons perform unscripted movements against a backdrop of moving people and trains, and lighting changes.