Convolutional Pose Machines: A Deep Architecture for Estimating Articulated Poses - Robotics Institute Carnegie Mellon University

Convolutional Pose Machines: A Deep Architecture for Estimating Articulated Poses

Master's Thesis, Tech. Report, CMU-RI-TR-16-22, Robotics Institute, Carnegie Mellon University, May, 2016

Abstract

Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation. We achieve this by designing a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference. Our approach addresses the characteristic difficulty of vanishing gradients during training by providing a natural learning objective function that enforces intermediate supervision, thereby replenishing back-propagated gradients and conditioning the learning procedure. We demonstrate state-of-the-art performance and outperform competing methods on standard benchmarks including the MPII, LSP, and FLIC datasets.

BibTeX

@mastersthesis{Wei-2016-5508,
author = {Shih-En Wei},
title = {Convolutional Pose Machines: A Deep Architecture for Estimating Articulated Poses},
year = {2016},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-16-22},
}