Learning Structured World Model for Deformable Object Manipulation

PhD Thesis, Tech. Report, CMU-RI-TR-22-48, Robotics Institute, Carnegie Mellon University, July, 2022

View Publication

Abstract

Manipulation of deformable objects challenges common assumptions in robotic manipulation, such as low-dimension state representation, known dynamics, and minimal occlusion. Deformable objects have high intrinsic state representation, complex dynamics with high degrees of freedom, and severe self-occlusion. These properties make them difficult for state estimation and planning. In this thesis, we introduce benchmarks and methods for solving various deformable object manipulation tasks, hoping to relax the commonly made assumptions and build a more robust manipulation system.

We take the approach of learning the dynamics model from data for planning. Compared to analytical models, the learned models are more flexible. We can train them to model the dynamics at different levels of detail. More specifically, we learn structured world models with spatial and temporal abstraction. At a very granular level, we can represent the physical world as atoms and predict their movement at infinitesimal time steps. Such a dynamics model is general, accurate, and without any abstraction but also expensive to compute and difficult for state estimation. With spatial abstraction, we can reason at a higher level, such as representing the world as objects and their interactions with each other. Spatial abstraction enables efficient learning, planning, and compositional generalization. With temporal abstraction, we model the dynamics to predict the future state at longer time steps or even over the span of low-level skills. We can then plan with them to solve long-horizon tasks.

In this thesis, we first introduce the first benchmark on deformable object manipulation, including manipulation of fluid, cloth, and ropes. Second, we present methods that learn dynamics models for cloth manipulation, representing the cloth as a graph for spatial abstraction. Third, we propose a framework for learning the skill dynamics model and using it for planning long-horizon sequences. We then apply the framework to manipulate elastoplastic objects with multiple tools. Finally, we show how to combine spatial and temporal abstraction to achieve long-horizon planning with compositional generalization for deformable object manipulation.

BibTeX

@phdthesis{Lin-2022-132558,
author = {Xingyu Lin},
title = {Learning Structured World Model for Deformable Object Manipulation},
year = {2022},
month = {July},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-22-48},
keywords = {Deformable Object Manipulation; Structured World Model},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.