Redefining the Perception-Action Interface: Visual Action Representations for Contact-Centric Manipulation - Robotics Institute Carnegie Mellon University

Redefining the Perception-Action Interface: Visual Action Representations for Contact-Centric Manipulation

PhD Thesis, Tech. Report, CMU-RI-TR-23-73, September, 2023

Abstract

In robotics, understanding the link between perception and action is pivotal. Typically, perception systems process sensory data into state representations such as segmentations and bounding boxes, which a planner uses to plan actions. However, these state estimation approaches can fail in environments with partial observability, or in cases with challenging object properties like transparency and deformability. Alternatively, sensorimotor policies directly convert raw sensor input into actions, but they produce actions that are not grounded in contact, and perform poorly in unseen task configurations.

To address these shortcomings, we delve into visual action representations, a class of approaches in which the perception system conveys information to the planner about potential actions. Visual action representations do not require full state estimation, generalize well to unseen task configurations, and output object-centric actions, reasoning about where to make contact with an object, how to approach contact locations, and how to manipulate the object once contact is made. Reformulating the role of perception to include action reasoning simplifies downstream planning.

This thesis presents visual action representations for addressing visual and geometric challenges in manipulation. We devise a transfer learning method for grasping transparent and specular objects, and present Neural Grasp Distance Fields for 6-DOF grasping and motion planning. We then introduce algorithms for cloth manipulation, starting with adapting semantic segmentation for the task of grasping edges and corners of cloth. Next, we develop a tactile sensing-based closed-loop policy to manipulate stacked cloth layers. Finally, we present FabricFlowNet, a policy that learns optical flow-based correspondences for goal-conditioned, bimanual cloth folding.

BibTeX

@phdthesis{Weng-2023-138447,
author = {Thomas Weng},
title = {Redefining the Perception-Action Interface: Visual Action Representations for Contact-Centric Manipulation},
year = {2023},
month = {September},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-23-73},
keywords = {Learning for Manipulation, Representation Learning, Deformable Object Manipulation, Grasping, Motion Planning, Computer Vision},
}