This paper presents a method for combining the shape and appearance feature types in a discriminative learning framework for human pose estimation. We first present a new appearance descriptor that is distinctive and resilient to noise for 3D human pose estimation. We then combine the proposed appearance descriptor with a shape descriptor computed from the silhouette of the human subject using discriminative learning. Our method, which we refer to as a localized decision level fusion technique, is based on clustering the output pose space into several partitions and learning a decision level fusion model for the shape and appearance descriptors in each region. The combined shape and appearance descriptor allows complementary information of the individual feature types to be exploited, leading to improved performance of the pose estimation system. We evaluate our proposed fusion method with feature level fusion and kernel level fusion methods using a synchronized video and 3D motion dataset. Our experimental results show that the proposed feature combination method gives more accurate pose estimation than the one obtained from each individual feature type. Among the three fusion methods, our localized decision level fusion method is demonstrated to perform the best for 3D pose estimation. © 2013 Elsevier Ltd.