Towards Natural Language-Driven Shape Arrangement Synthesis with Semantically-Aware Geometric Constraint Systems

Master's Thesis, Tech. Report, CMU-RI-TR-25-19, April, 2025

View Publication

Abstract

While diffusion-based models excel at generating photorealistic images from text, a more nuanced challenge emerges when constrained to using only a fixed set of rigid shapes—akin to solving tangram puzzles or arranging real-world objects to match semantic descriptions. We formalize this problem as shape-based image generation, a new natural language-guided image-to-image translation task that requires rearranging the input set of rigid shapes into non-overlapping configurations and visually communicating the target concept.

Unlike pixel-manipulation approaches, our method explicitly parameterizes each shape within a differentiable vector graphics pipeline, iteratively optimizing placement and orientation through score distillation sampling from pretrained diffusion models. To preserve arrangement clarity, we introduce a semantically-aware collision resolution mechanism that applies minimal contextually coherent adjustments when overlaps occur, ensuring smooth convergence toward physically valid configurations. By bridging diffusion-based semantic guidance with explicit geometric constraint systems, our approach yields interpretable compositions where spatial relationships clearly embody the natural language prompt. Extensive experiments demonstrate compelling results across diverse scenarios, with quantitative and qualitative advantages over alternative techniques.

BibTeX

@mastersthesis{Misra-2025-146190,
author = {Vihaan Misra},
title = {Towards Natural Language-Driven Shape Arrangement Synthesis with Semantically-Aware Geometric Constraint Systems},
year = {2025},
month = {April},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-19},
keywords = {Differentiable Rendering, Semantic Arrangement, Geometric Constraints, Collision Resolution},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.