Toward Realistic Visual Content Creation: Generative AI for Human-Centric and Product-Centric Scene Synthesis - Robotics Institute Carnegie Mellon University

Toward Realistic Visual Content Creation: Generative AI for Human-Centric and Product-Centric Scene Synthesis

Master's Thesis, Tech. Report, CMU-RI-TR-25-32, May, 2025

Abstract

The synthesis of realistic and context-aware visual content is a core challenge in the application of generative AI to both creative media and e-commerce. This thesis explores two distinct but complementary directions in AI-driven scene generation: human-centric insertion and product-centric advertisement creation.

In the first part, we present Teleportraits, a training-free pipeline for realistic human insertion into diverse background scenes using pre-trained text-to-image diffusion models. By leveraging inversion techniques and classifier-free guidance, our method jointly addresses the problems of human placement and high-fidelity personalization without requiring additional training. A novel mask-guided self-attention mechanism further enhances identity preservation, capturing fine details such as clothing and body features from a single reference image. Our approach sets a new state-of-the-art in seamless, high-quality human integration within composite scenes.

In the second part, we introduce a scalable solution for automated lifestyle advertisement generation: Multi-Object Advertisement Creative Gener- ation. Recognizing the limitations of current GenAI tools in generating realistic, brand-aligned ad content at scale, we design a modular system that independently addresses product pairing, layout composition, and background synthesis. The system includes a user-friendly interface sup- porting global batch generation and local control, enabling advertisers to efficiently produce high-quality, contextually rich images across extensive product catalogs. Comprehensive evaluations and user studies demon- strate the effectiveness of our pipeline in bridging creativity and scalability for real-world e-commerce applications.

Together, these works highlight the transformative potential of generative models in automating complex visual synthesis tasks while retaining personalization, realism, and user control.

BibTeX

@mastersthesis{Gao-2025-146393,
author = {Jialu Gao},
title = {Toward Realistic Visual Content Creation: Generative AI for Human-Centric and Product-Centric Scene Synthesis},
year = {2025},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-32},
keywords = {Generative AI, Image Synthesis, Diffusion Models},
}