In this thesis, we present two works with the aim of bridging this gap. In the first work – TAX3D (Non-rigid Relative Placement through 3D Dense Diffusion) – we extend the principles of object-centric reasoning methods to deformable transformations by modeling geometric relationships through dense diffusion. In the second – TAX3Dv2 (Object-Centric Point Diffusion for 3D Goal Prediction) – we further generalize our method into a hierarchical goal-prediction framework for object placement. Specifically, we model global scene-level placements through a novel Dense Gaussian Mixture Model (GMM), and local object-level configurations through a disentangled diffusion objective. By decoupling placement prediction into these two stages, our method generalizes effectively to scene multi-modality while supporting high-precision and non-rigid placements. Importantly, we also show that our method can easily be incorporated into policy learning with dramatic implications for performance and sample efficiency, circumventing the need for task-specific primitives typically present in object-centric goal prediction methods. We validate our approach across a suite of challenging tasks in simulation and the real world, demonstrating strong performance in both rigid and non-rigid settings.
