Learning to Create 3D Content: Geometry, Appearance, and Physics

PhD Thesis, Tech. Report, CMU-RI-TR-25-85, August, 2025

View Publication

Abstract

With the popularity of Virtual Reality (VR), Augmented Reality (AR), and other 3D applications, developing methods that let everyday users capture and create their own 3D content has become increasingly essential. Current 3D creation pipelines, however, often require tedious manual effort or specialized capture setups. Furthermore, resulting assets frequently suffer from baked-in lighting, inconsistent representations, and a lack of physical plausibility, limiting their use in downstream applications.

This dissertation addresses these challenges by developing methods that leverage data-driven priors to significantly lower the barrier for 3D content creation. By utilizing information from other modalities, large datasets, and pre-trained generative models, the work presented here reduces the burden on user input to casually captured photos, simple sketches, and text prompts.

We first show how depth priors can enable users to digitalize 3D scenes without dense data capture, and discuss how to enable interactive 3D editing and generation through 2D user inputs such as sketches. We then propose an end-to-end text-to-3D generation pipeline that generates both the geometry and texture of 3D assets. For geometry generation, we propose an octree-based adaptive tokenization scheme that allocates representational capacity based on shape complexity, enabling higher-fidelity and more efficient reconstruction and generation of 3D shapes. Moreover, we address appearance modeling by utilizing data and diffusion model priors to generate relightable textures on meshes using text input, ensuring that generated 3D objects are functional in downstream production workflows. Finally, to ground digital designs in reality, we introduce BrickGPT, which incorporates manufacturing and physics constraints to generate physically stable and buildable toy brick structures from text prompts.

Collectively, these contributions bridge the gap between high-level user intent and the creation of editable, functional, and physically realizable 3D content by addressing the core challenges in geometry representation, appearance modeling, and physics-aware generation.

BibTeX

@phdthesis{Deng-2025-148415,
author = {Kangle Deng},
title = {Learning to Create 3D Content: Geometry, Appearance, and Physics},
year = {2025},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-85},
keywords = {3D Generation; 3D Reconstruction},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.