Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Zhe Cao, Tomas Simon, Shih-En Wei and Yaser Ajmal Sheikh
Tech. Report, CMU-RI-TR-17-18, Robotics Institute, Carnegie Mellon University, April, 2017

Download Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective of the number of people in the image. The architecture is designed to jointly learn part locations and their association via two branches of the same sequential prediction process. Our method placed first in the inaugural COCO 2016 keypoints challenge, and significantly exceeds the previous state-of-the-art result on the MPII Multi-Person benchmark, both in performance and efficiency.

author = {Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Ajmal Sheikh},
title = {Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields},
year = {2017},
month = {April},
institution = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-17-18},
keywords = {Realtime, 2D human pose estimation, multiple people},
} 2017-09-15T10:04:30-04:00