Vision-based Human Motion Modeling and Analysis - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Defense

July

15
Tue
Jinkun Cao PhD Student Robotics Institute,
Carnegie Mellon University
Tuesday, July 15
10:00 am to 12:00 pm
NSH 4305
Vision-based Human Motion Modeling and Analysis
Abstract:
Modern computer vision has achieved remarkable success in tasks such as detecting, segmenting, and estimating human pose in images and videos—often reaching or even surpassing human-level performance. However, significant challenges remain in predicting and analyzing future human motion. This thesis explores how vision-based methods can improve the fidelity and accuracy of human motion modeling and analysis.

We begin by studying multi-object tracking, linking static human localization results with temporal data. By examining correlations between human detections over time—using both motion and appearance matching—we found that while learning-based methods dominate appearance matching, classical linear filtering methods perform exceptionally well in motion-based matching. Our proposed methods offer new insights into human motion tracking and establish strong baselines, underscoring the continued value of filtering-based approaches alongside modern learning-based techniques.

Building on our tracking work, we then approach human motion from a probabilistic perspective. We propose a novel method for reversible distribution transformation in human trajectory forecasting. Addressing the limitations of conventional symmetric unimodal Gaussian assumptions, we introduce an adaptive construction of mixed Gaussian distributions to better model asymmetric and imbalanced trajectory data. This approach significantly improves controllability, diversity, and accuracy in future trajectory modeling.

Finally, moving from coarse-grained representations of human position to fine-grained articulation and deformation, we investigate the generation and reconstruction of full-body human motion from images or videos. By leveraging generative human motion priors to constrain vision-based estimation, we enhance accuracy and robustness to occlusion and blurring. We propose a unified generative model for whole-body motion generation and reconstruction, advancing the understanding and synthesis of complex human motion conditioned on multi-modal conditions.

Thesis Committee Members:
Kris Kitani (Chair)
Deva Ramanan
Shubham Tulsiani
Siyu Tang (ETH Zurich)