Abstract:
As digital apparel becomes increasingly vital to virtual environments and personalized experiences, there is a growing need for intuitive tools that enable non-experts to create and interact with 3D garments. To broaden accessibility, these tools must function effectively with minimal input – raising the key question: How can we achieve high-quality 3D garment modeling using only sparse visual cues?
This thesis tackles this challenge by leveraging the rich priors of large pre-trained vision models to address two core problems: (1) reconstructing and editing 3D garments from a single-view image, and (2) transferring textures from in-the-wild images to existing 3D models. We present two complementary systems to solve these tasks: GarmentCrafter enables single-view 3D garment reconstruction and editing by combining progressive depth prediction and image warping to approximate novel views, followed by a multi-view diffusion model that completes occluded regions. By jointly inferring RGB and depth, it enforces cross-view consistency, enabling 3D reconstruction and editing with rich geometric and textual details. FabricDiffusion transfers fabric textures from a single image onto 3D garments of arbitrary shape. Inspired by the use of flat sewing patterns with repeatable textures in the fashion industry, we reformulate texture transfer as generating distortion-free, tileable texture maps for UV mapping. We train a diffusion model on a large synthetic dataset to correct input distortions, producing realistic, relightable textures that integrate seamlessly with Physically-Based Rendering (PBR) pipelines.
Together, these systems provide an accessible and robust framework for generative 3D garment modeling from sparse inputs. They lower the barrier to high-quality 3D content creation and open new possibilities for applications in fashion, gaming, and virtual experiences.
Committee:
Prof. Fernando De la Torre (Advisor)
Prof. Jun-Yan Zhu
Yehonathan Litman
