Exploiting Space-Time Statistics of Videos for Face "Hallucination"

Goksel Dedeoglu
doctoral dissertation, tech. report CMU-RI-TR-07-05, Robotics Institute, Carnegie Mellon University, April, 2007

  • Adobe portable document format (pdf) (13MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Face "Hallucination" aims to recover high quality, high-resolution images of human faces from low-resolution, blurred, and degraded images or video. This thesis presents person-specific solutions to this problem through careful exploitation of space (image) and space-time (video) models. The results demonstrate accurate restoration of facial details, with resolution enhancements upto a scaling factor of 16.

The algorithms proposed in this thesis follow the analysis-by-synthesis paradigm; they explain the observed (low-resolution) data by fitting a (high-resolution) model. In this context, the first contribution is the discovery of a scaling-induced bias that plagues most model-to-image (or image-to-image) fitting algorithms. It was found that models and observations should be treated asymmetrically, both to formulate an unbiased objective function and to derive an accurate optimization algorithm. This asymmetry is most relevant to Face Hallucination: when applied to the popular Active Appearance Model, it leads to a novel face tracking and reconstruction algorithm that is significantly more accurate than state-of-the-art methods. The analysis also reveals the inherent trade-off between computational efficiency and estimation accuracy in low-resolution regimes.

The second contribution is a statistical generative model of face videos. By treating a video as a composition of space-time patches, this model efficiently encodes the temporal dynamics of complex visual phenomena such as eye-blinks and the occlusion or appearance of teeth. The same representation is also used to define a data-driven prior on a three-dimensional Markov Random Field in space and time. Experimental results demonstrate that temporal representation and reasoning about facial expressions improves robustness by regularizing the Face Hallucination problem.

The final contribution is an approximate compensation scheme against illumination effects. It is observed that distinct illumination subspaces of a face (each coming from a different pose and expression) still exhibit similar variation with respect to illumination. This motivates augmenting the video model with a low-dimensional illumination subspace, whose parameters are estimated jointly with high-resolution face details. Successful Face Hallucinations beyond the lighting conditions of the training videos are reported.

Associated Project(s): Face Video Hallucination
Number of pages: 140
Note: Video demonstrations available at http://www.cs.cmu.edu/~dedeoglu/thesis/

Text Reference
Goksel Dedeoglu, "Exploiting Space-Time Statistics of Videos for Face "Hallucination"," doctoral dissertation, tech. report CMU-RI-TR-07-05, Robotics Institute, Carnegie Mellon University, April, 2007

BibTeX Reference
   author = "Goksel Dedeoglu",
   title = "Exploiting Space-Time Statistics of Videos for Face "Hallucination"",
   booktitle = "",
   school = "Robotics Institute, Carnegie Mellon University",
   month = "April",
   year = "2007",
   number= "CMU-RI-TR-07-05",
   address= "Pittsburgh, PA",
   Notes = "Video demonstrations available at http://www.cs.cmu.edu/~dedeoglu/thesis/"