Carnegie Mellon University
8:30 am to 9:30 am
Newell-Simon Hall 3305
Abstract:
Synthesizing photorealistic human faces from novel viewpoints using only a single frontal image remains a challenging problem in computer vision. Large viewpoint changes introduce geometric distortions, self-occlusions, and missing visual information, making identity preservation and high-frequency detail reconstruction particularly difficult. While recent generative approaches such as diffusion models and 3D-aware neural representations produce visually compelling results, they typically require expensive training and slow inference. In contrast, lightweight feed-forward models enable efficient inference but often fail to capture fine-grained details and complex appearance variations.
This thesis presents a geometry-guided framework that integrates explicit 3D structure with learned image refinement to achieve both efficiency and realism. The method first builds a geometrically consistent prior by fitting a 3D Morphable Model, estimating a texture map from the frontal image, augmenting it with a lightweight hair representation, and rendering the target viewpoint. A convolutional residual network then refines this prior by predicting residual corrections that restore fine details and enhance local realism while preserving geometric consistency. Adversarial supervision further improves perceptual quality, encouraging sharper textures and more natural appearance without increasing inference cost.
We compare the proposed approach against Cap4D, a state-of-the-art method, in a single-image side-view synthesis setting. The results demonstrate substantially improved computational efficiency—achieving inference within seconds on a single GPU—while maintaining stronger identity preservation. These findings show that geometry-guided residual refinement offers a practical and scalable alternative to heavy 3D-aware generative pipelines for identity-consistent novel view synthesis.
