Specifically, we turn to self-supervised learning (SSL) to train tactile representations that can generalize across sensors, standardize usage across downstream tactile tasks, and further alleviate the need to collect labeled task data which is often impractical to collect for tasks such as uncalibrated force field estimation. To this end, we discuss Sparsh and Sparsh-skin, a family of SSL models for vision and magnetic-skin based tactile sensors respectively. Sparsh and Sparsh-skin are trained via self-distillation for full-hand tactile sensors in downstream tasks. We find that both Sparsh and Sparsh-skin not only outperform task and sensor-specific end-to-end models by a large margin, but also that they are data efficient for downstream task training.
Second, we note that existing work often overlooks the multimodal aspects of human touch, such as vibration and heat sensing. We discuss Sparsh-X, a compact tactile representation fusing image, pressure, audio and inertial measurements from the DIGIT360 sensor. With Sparsh-X we demonstrate that multimodal sensing improves both passive perception tasks as well as dexterous manipulation tasks such as in-hand rotation.
Finally, we present privileged tactile latent distillation (PTLD), a novel method to imbue tactile sensing in dexterous manipulation policies trained via reinforcement learning. PTLD avoids simulating tactile sensors and uses privileged sensors to bridge the sim-to-real gap. With PTLD, we first show that one can improve existing RL trained policies such as in-hand rotation and then that it can enable learning more challenging tasks such as in-hand reorientation.
Jointly these contributions provide a path to leverage tactile sensing in both imitation and reinforcement learning based robot manipulation.
Thesis Committee Members:
Michael Kaess, chair
Shubham Tulsiani
Guanya Shi
Mustafa Mukadam, Amazon Robotics
Jitendra Malik, UC Berkeley & Amazon FAR
