Carnegie Mellon University
The
Deep Region and Multi-label Learning for Facial Action Unit Detection

Kaili Zhao, Wen-Sheng Chu, and Honggang Zhang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June, 2016.


Download
  • Adobe portable document format (pdf) (2MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
Region learning (RL) and multi-label learning (ML) have recently attracted increasing attentions in the field of facial Action Unit (AU) detection. Knowing that AUs are active on sparse facial regions, RL aims to identify these regions for a better specificity. On the other hand, a strong statistical evidence of AU correlations suggests that ML is a natural way to model the detection task. In this paper, we propose Deep Region and Multi-label Learning (DRML), a unified deep network that simultaneously addresses these two problems. One crucial aspect in DRML is a novel region layer that uses feed-forward functions to induce important facial regions, forcing the learned weights to capture structural information of the face. Our region layer serves as an alternative design between locally connected layers (\ie, confined kernels to individual pixels) and conventional convolution layers (\ie, shared kernels across an entire image). Unlike previous studies that solve RL and ML alternately, DRML by construction addresses both problems, allowing the two seemingly irrelevant problems to interact more directly. The complete network is end-to-end trainable, and automatically learns representations robust to variations inherent within a local region. Experiments on BP4D and DISFA benchmarks show that DRML performs the highest average F1-score and AUC within and across datasets in comparison with alternative methods.

Notes

Text Reference
Kaili Zhao, Wen-Sheng Chu, and Honggang Zhang, "Deep Region and Multi-label Learning for Facial Action Unit Detection," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June, 2016.

BibTeX Reference
@inproceedings{Zhao_2016_8318,
   author = "Kaili Zhao and Wen-Sheng Chu and Honggang Zhang",
   title = "Deep Region and Multi-label Learning for Facial Action Unit Detection",
   booktitle = "IEEE Conference on Computer Vision and Pattern Recognition (CVPR)",
   month = "June",
   year = "2016",
}