Name: Vision-based Human Motion Modeling and Analysis
Start: 2025-07-15T10:00:00-04:00
End: 2025-07-15T12:00:00-04:00
Location: NSH 4305

Jinkun Cao PhD Student Robotics Institute,
Carnegie Mellon University

Tuesday, July 15
10:00 am to 12:00 pm
NSH 4305

Vision-based Human Motion Modeling and Analysis

Abstract:

Modern computer vision has achieved remarkable success in tasks such as detecting, segmenting, and estimating human pose in images and videos—often reaching or even surpassing human-level performance. However, significant challenges remain in predicting and analyzing future human motion. This thesis explores how vision-based methods can improve the fidelity and accuracy of human motion modeling and analysis.

We begin by studying multi-object tracking, linking static human localization results with temporal data. By examining correlations between human detections over time—using both motion and appearance matching—we found that while learning-based methods dominate appearance matching, classical linear filtering methods perform exceptionally well in motion-based matching. Our proposed methods offer new insights into human motion tracking and establish strong baselines, underscoring the continued value of filtering-based approaches alongside modern learning-based techniques.

Building on our tracking work, we then approach human motion from a probabilistic perspective. We propose a novel method for reversible distribution transformation in human trajectory forecasting. Addressing the limitations of conventional symmetric unimodal Gaussian assumptions, we introduce an adaptive construction of mixed Gaussian distributions to better model asymmetric and imbalanced trajectory data. This approach significantly improves controllability, diversity, and accuracy in future trajectory modeling.

Finally, moving from coarse-grained representations of human position to fine-grained articulation and deformation, we investigate the generation and reconstruction of full-body human motion from images or videos. By leveraging generative human motion priors to constrain vision-based estimation, we enhance accuracy and robustness to occlusion and blurring. We propose a unified generative model for whole-body motion generation and reconstruction, advancing the understanding and synthesis of complex human motion conditioned on multi-modal conditions.

Thesis Committee Members:

Kris Kitani (Chair)

Deva Ramanan

Shubham Tulsiani

Siyu Tang (ETH Zurich)

PhD Thesis Defense

July

Event Navigation

PhD Thesis Defense

July

Share This Event!

Event Navigation