3D Ego-Pose Estimation via Imitation Learning

Ye Yuan and Kris M. Kitani
Conference Paper, European Conference on Computer Vision (ECCV), September, 2018

View Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Ego-pose estimation, i.e., estimating a person’s 3D pose with a single wearable camera, has many potential applications in activity monitoring. For these applications, both accurate and physically plausible estimates are desired, with the latter often overlooked by existing work. Traditional computer vision-based approaches using temporal smoothing only take into account the kinematics of the motion without considering the physics that underlies the dynamics of motion, which leads to pose estimates that are physically invalid. Motivated by this, we propose a novel control-based approach to model human motion with physics simulation and use imitation learning to learn a video-conditioned control policy for ego-pose estimation. Our imitation learning framework allows us to perform domain adaption to transfer our policy trained on simulation data to real-world data. Our experiments with real egocentric videos show that our method can estimate both accurate and physically plausible 3D ego-pose sequences without observing the cameras wearer’s body.

author = {Ye Yuan and Kris M. Kitani},
title = {3D Ego-Pose Estimation via Imitation Learning},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2018},
month = {September},
keywords = {first-person vision, pose estimation, imitation learning},
} 2018-09-11T08:34:25-04:00