Loading Events

PhD Thesis Proposal

October

27
Wed
Ye Yuan Robotics Institute,
Carnegie Mellon University
Wednesday, October 27
9:00 am to 10:00 am
Simulation, Perception, and Generation of Human Behavior

Abstract:
Understanding and modeling human behavior is fundamental to almost any computer vision and robotics applications that involve humans. In this thesis, we take a holistic approach to human behavior modeling and tackle its three essential aspects — simulation, perception, and generation. Throughout this thesis, we show how the three aspects are deeply connected and how utilizing and improving one aspect can greatly benefit the other aspects. In particular, our path to uniting the three aspects is divided into the following steps. First, as humans live in a physical world, we treat physics simulation as the foundation of our approach and seek the correct way to represent human behavior in simulation. Second, we investigate how physics simulation and optimal control can improve the perception of human behavior. Next, we develop deep generative models in tandem with physics simulation for better human behavior generation. Finally, we propose to leverage behavior generation models to help perceive humans even when they are invisible; we also aim to improve the simulation model of humans to further benefit its downstream tasks including perception and generation.

In the first part of this thesis, we start by developing a foundational framework for representing human behavior in physics simulation. In particular, we model a human using a proxy humanoid agent inside a physics simulator and treat human behavior as the result of an optimal control policy for the humanoid. This framework allows us to formulate human behavior modeling as policy learning, which can be solved with reinforcement learning (RL).

We utilize this simulation-based framework for the perception task of human pose estimation, where we learn a video-conditioned policy with RL using a reward function based on how the policy-generated pose aligns with the ground truth. For both first-person and third-person human pose estimation, our approach significantly outperforms kinematics-based methods in terms of pose accuracy and physical plausibility. The improvement is especially evident in the challenging first-person setting where the front-facing camera cannot see the person.

In addition to perception, in the second part, we also investigate how physics simulation combined with deep generative models can improve the generation of human behavior. We first present a simulation-based generation approach that can generate a single future motion of a person from a first-person video. To address the uncertainty in future human behavior, we develop two deep generative models that can generate diverse future human motions using determinantal point processes (DPPs) and latent normalizing flows respectively. As deep generative models can produce implausible motions, we further propose an autoregressive motion generation framework that tightly integrates physics simulation and deep generative models to produce diverse yet physically-plausible human motions.

We propose two new directions to further explore the synergy between simulation, perception, and generation of human behavior. First, we propose to leverage human behavior generation models for global human pose estimation with dynamic cameras and heavy occlusions. We aim to use deep generative motion and trajectory models to hallucinate pose for occluded frames and generate global trajectories from estimated body pose. Second, we propose to improve the simulation of human behavior by optimizing the design of the humanoid agent in the physics simulator. The optimized agent with enhanced motion imitation ability could further benefit simulated-based methods for the perception and generation of human behavior.

More Information

Thesis Committee Members:
Kris Kitani, Chair
Jessica Hodgins
David Held
Josh Merel, Facebook Reality Lab
Sanja Fidler, University of Toronto