This work introduces a novel two steps framework for the task of multi-human pose estimation. Multi-person pose estimation in wild images is a challenging problem, where human detector inevitably suffers from errors both in localization and recognition. These undesirable errors would ultimately result in failures of most CNN-based single-person pose estimators. In this paper, a novel regional multi-person pose estimation (RMPE) framework is proposed to facilitate single-person pose estimator in presence of the inaccurate human detector. In particular, our framework consists of three novel techniques, namely, symmetric spatial transformer networks (SSTN), deep proposals generator (DPG) and parametric pose non-maximum suppression (NMS). Extensive experimental results have demonstrated the validity and effectiveness of the proposed approach. In comparison to state-of-the-art approaches, the proposed approach significantly achieves 16% relative increase in mAP and 600 times speed up on MPII (multi person) dataset.


Our source code is available on Github, including:

  • Training/test code
  • Pretrained model
  • Evaluation code


Download the paper here.


  Title = {{RMPE}: Regional Multi-person Pose Estimation},
  Author = {Haoshu Fang, Shuqin Xie and Cewu Lu },
  Journal = {arXiv preprint arXiv:1612.00137},
  Year = {2016}

Example Results

What the predictions from the model looks like.
Browse more results in the supplementary material or using our github code.