Perception amidst interaction: spatial AI with vision and touch for robot manipulation

PhD Thesis, Tech. Report, CMU-RI-TR-24-06, February, 2024

View Publication

Abstract

Robots currently lack the cognition to replicate even a fraction of the tasks humans do, a trend summarized by Moravec's Paradox. Humans effortlessly combine their senses for everyday interactions—we can rummage through our pockets in search of our keys, and deftly insert them to unlock our front door. Before robots can demonstrate such dexterity, they must first exhibit spatial awareness of the objects they manipulate. Specifically, object pose and shape are important quantities for downstream planning and control. The status quo for in-hand perception is restricted to the narrow scope of tracking known objects with vision as the dominant modality. As robots move out of instrumented labs and factories to cohabit our spaces, it is clear that a missing piece is generalizable spatial AI.

Often overlooked is tactile sensing, which provides a direct window into robot-object interaction, free from occlusion and aliasing. With hardware advances like vision-based touch, we now have situated yet detailed information to complement cameras. However, interactive perception is intrusive—the act of sensing itself perturbs the object. Can we robustly estimate object shape and pose online from a stream of multimodal robot manipulation data?

In this thesis, I study the intersection of simultaneous localization and mapping (SLAM) and robot manipulation. More specifically, I look at: (1) spatial representations for object-centric SLAM, (2) tactile perception and simulation, and (3) combining learned models with online optimization. First, I show how factor graphs fuse touch with physics-based constraints for SLAM in planar manipulation. Next, I present a schema for online shape learning from visuo-tactile sensing. I then demonstrate a learned tactile representation for global localization via touch. Drawing upon the above efforts, I culminate with unifying vision, touch and proprioception into a neural representation for SLAM during in-hand manipulation.

BibTeX

@phdthesis{Suresh-2024-139984,
author = {Sudharshan Suresh},
title = {Perception amidst interaction: spatial AI with vision and touch for robot manipulation},
year = {2024},
month = {February},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-06},
keywords = {robotics; robot manipulation; tactile sensing; SLAM; learning for perception},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.