Generative 3D Garment Modeling with Sparse Visual Cues - Robotics Institute Carnegie Mellon University

Generative 3D Garment Modeling with Sparse Visual Cues

Tech. Report, CMU-RI-TR-25-39, May, 2025

Abstract

Professional fashion designers rely on advanced software to create highly detailed 3D garments. However, as digital apparel becomes integral to virtual environments and personalized experiences, there is a growing need for intuitive tools that enable non-experts to design and interact with 3D garments. To broaden accessibility, these tools should function with minimal input, raising a key question: How can we enable high-quality 3D garment generation and manipulation using only sparse visual cues?

This thesis addresses this challenge by leveraging the strong priors of large pre-trained vision foundation models to tackle two core problems: (1) reconstructing and editing 3D garment assets from a single-view image and (2) transferring textures from an in-the-wild image to existing 3D garment models. To this end, we present two complementary systems: GarmentCrafter for 3D garment reconstruction and modification and FabricDiffusion for texture transfer, together democratizing 3D garment creation.

GarmentCrafter enables non-professional users to generate and modify 3D garments from a single image. Existing single-view reconstruction methods often rely on generative models to hallucinate novel views based on a reference image and camera pose but struggle with cross-view consistency. GarmentCrafter addresses this by integrating progressive depth prediction and image warping to approximate novel views, followed by a multi-view diffusion model that refines occluded and unknown clothing regions. By jointly inferring RGB and depth, it enforces cross-view coherence, reconstructing detailed and geometrically accurate garments.

Complementing this, FabricDiffusion transfers fabric textures from a single image onto 3D garments of arbitrary shapes. Inspired by the observation that in fashion industry most garments are constructed by stitching sewing patterns with flat, repeatable textures, we recast texture transfer as extracting distortion-free, tileable textures that can be mapped onto a garment’s UV space. Building upon this insight, we train a denoising diffusion model with a large-scale synthetic dataset to rectify distortions in the input texture image, producing a flat texture map that integrates seamlessly with Physically-Based Rendering (PBR) pipelines. This enables realistic relighting under various lighting conditions, and preserves intricate texture details with high visual fidelity.

Together, these systems form a unified framework for generative 3D garment modeling from sparse inputs. They significantly lower the barrier for 3D content creation by allowing users to work with minimal visual guidance while still achieving high levels of realism, detail, and geometric accuracy. By bridging the gap between limited visual input and high-quality 3D output, this thesis takes a step toward making accessible, scalable, and customizable garment modeling a reality.

BibTeX

@techreport{Wang-2025-146373,
author = {Yuanhao Wang},
title = {Generative 3D Garment Modeling with Sparse Visual Cues},
year = {2025},
month = {May},
institute = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-39},
keywords = {3D Computer Vision, Generative Models, 3D Garment, Fashion},
}