Camera-to-Robot Pose Estimation from a Single Image

Timothy E. Lee, Jonathan Tremblay, Thang To, Jia Cheng, Terry Mosier, Oliver Kroemer, Dieter Fox, and Stan Birchfield

Conference Paper, Proceedings of (ICRA) International Conference on Robotics and Automation, pp. 9426 - 9432, May, 2020

View Publication

Abstract

We present an approach for estimating the pose of an external camera with respect to a robot using a single RGB image of the robot. The image is processed by a deep neural network to detect 2D projections of keypoints (such as joints) associated with the robot. The network is trained entirely on simulated data using domain randomization to bridge the reality gap. Perspective-n-point (PnP) is then used to recover the camera extrinsics, assuming that the camera intrinsics and joint configuration of the robot manipulator are known. Unlike classic hand-eye calibration systems, our method does not require an off-line calibration step. Rather, it is capable of computing the camera extrinsics from a single frame, thus opening the possibility of on-line calibration. We show experimental results for three different robots and camera sensors, demonstrating that our approach is able to achieve accuracy with a single frame that is comparable to that of classic off-line hand-eye calibration using multiple frames. With additional frames from a static pose, accuracy improves even further. Code, datasets, and pretrained models for three widely-used robot manipulators are made available.

Notes
Work was completed while the first author was an intern at NVIDIA.

BibTeX

@conference{Lee-2020-119511,
author = {Timothy E. Lee and Jonathan Tremblay and Thang To and Jia Cheng and Terry Mosier and Oliver Kroemer and Dieter Fox and Stan Birchfield},
title = {Camera-to-Robot Pose Estimation from a Single Image},
booktitle = {Proceedings of (ICRA) International Conference on Robotics and Automation},
year = {2020},
month = {May},
pages = {9426 - 9432},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
keywords = {computer vision, deep learning, keypoint detection, manipulation, pose estimation, camera calibration, sim2real, DREAM},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.