Self supervised tactile perception for robot dexterity

PhD Thesis, Tech. Report, CMU-RI-TR-26-05, January, 2026

View Publication

Abstract

Humans are incredibly dexterous. We interact with and manipulate tools effortlessly, leveraging touch without a second thought. Yet, replicating this level of dexterity in robots is a major challenge. While the robotics community, recognizing the importance of touch in fine manipulation, has developed a wide variety of tactile sensors, how best to leverage these sensors for both perception and manipulation is unclear. In this thesis, we address how to efficiently integrate tactile sensing for robot perception and dexterous manipulation.

Specifically, we turn to self-supervised learning (SSL) to train tactile representations that can generalize across sensors, standardize usage across downstream tactile tasks, and further alleviate the need to collect labeled task data which is often impractical to collect for tasks such as uncalibrated force field estimation. To this end, we discuss Sparsh and Sparsh-skin, a family of SSL models for vision and magnetic-skin based tactile sensors respectively. Sparsh and Sparsh-skin are trained via self-distillation for full-hand tactile sensors in downstream tasks. We find that both Sparsh and Sparsh-skin not only outperform task and sensor-specific end-to-end models by a large margin, but also that they are data efficient for downstream task training.

Second, we note that existing work often overlooks the multimodal aspects of human touch, such as vibration and heat sensing. We discuss Sparsh-X, a compact tactile representation fusing image, pressure, audio and inertial measurements from the DIGIT360 sensor. With Sparsh-X we demonstrate that multimodal sensing improves both passive perception tasks as well as dexterous manipulation tasks such as in-hand rotation.

Finally, we present privileged tactile latent distillation (PTLD), a novel method to imbue tactile sensing in dexterous manipulation policies trained via reinforcement learning. PTLD avoids simulating tactile sensors and uses privileged sensors to bridge the sim-to-real gap. With PTLD, we first show that one can improve existing RL trained policies such as in-hand rotation and then that it can enable learning more challenging tasks such as in-hand reorientation.

Jointly these contributions provide a path to leverage tactile sensing in both imitation and reinforcement learning based robot manipulation.

BibTeX

@phdthesis{Sharma-2026-150233,
author = {Akash Sharma},
title = {Self supervised tactile perception for robot dexterity},
year = {2026},
month = {January},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-26-05},
keywords = {Tactile perception, Self supervised learning, Representation Learning, Reinforcement learning, Dexterous Manipulation, Multi-fingered hands},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.