cuMASM: Realtime Automatic Facial Landmarking using Active Shape Models on Graphics Processor Units - Robotics Institute Carnegie Mellon University

cuMASM: Realtime Automatic Facial Landmarking using Active Shape Models on Graphics Processor Units

Nicholas Alexander Vandal
Master's Thesis, Tech. Report, CMU-RI-TR-11-16, May, 2011

Abstract

Automatic, robust, and accurate landmarking of dense sets of facial features is a key
component in face-based biometric identification systems. Among other uses, dense
landmarking is used to normalize raw faces for scale perform facial expression analysis,
and is an essential component for generating 3D face models from a single 2D
image. Active shape models (ASMs), which incorporate constrained statistical models
of shape with local texture models of each landmark, have been applied successfully to
this problem as well as landmarking tasks in other domains. Recent work has demonstrated
that Modified Active Shape Models (MASMs), which utilize improved subspace
models of 2D landmark neighborhoods, generalize better to unseen faces and to
real-world dynamic environments. This superior performance comes with a significant
computational cost, on the order of seconds per image to reach convergence. Compounded
with the time required for face detection on high-resolution images, robust
facial landmarking on the CPU is decidedly not realtime even for a well-optimized,
multithreaded C++ implementation. In this paper, we demonstrate realtime MASM facial
landmarking by parallelizing the algorithm on Graphics Processing Units (GPUs)
using the CUDA programming platform. Our GPU-based implementation is designed
for integration into a larger face recognition routine and is able to accept updated model
parameters without recompilation or re-synthesis. Unlike previous GPU-based ASM
implementations, which parallelize the original ASM algorithm utilizing 1D profiles,
we implement the 2D subspace-modeled profile searching of the more robust MASM
technique. We report GPU speedups of 24X over single-threaded CPU implementations
of MASM and approximately 12X over a 8-threaded CPU implementation. By
leveraging this untapped source of computational power, we are able to achieve realtime
frame rates of approximately 20 FPS using a 79-point landmarking scheme. We
discuss parallelizing the facial landmarking fitting process, specific GPU implementation
details, GPU architecture-specific optimizations required to take advantage of the
underlying hardware, and general CUDA programming concepts.

BibTeX

@mastersthesis{Vandal-2011-112696,
author = {Nicholas Alexander Vandal},
title = {cuMASM: Realtime Automatic Facial Landmarking using Active Shape Models on Graphics Processor Units},
year = {2011},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-11-16},
}