We begin by studying multi-object tracking, linking static human localization results with temporal data. By examining correlations between human detections over time—using both motion and appearance matching—we found that while learning-based methods dominate appearance matching, classical linear filtering methods perform exceptionally well in motion-based matching. Our proposed methods offer new insights into human motion tracking and establish strong baselines, underscoring the continued value of filtering-based approaches alongside modern learning-based techniques.
Building on our tracking work, we then approach human motion from a probabilistic perspective. We propose a novel method for reversible distribution transformation in human trajectory forecasting. Addressing the limitations of conventional symmetric unimodal Gaussian assumptions, we introduce an adaptive construction of mixed Gaussian distributions to better model asymmetric and imbalanced trajectory data. This approach significantly improves controllability, diversity, and accuracy in future trajectory modeling.
Finally, moving from coarse-grained representations of human position to fine-grained articulation and deformation, we investigate the generation and reconstruction of full-body human motion from images or videos. By leveraging generative human motion priors to constrain vision-based estimation, we enhance accuracy and robustness to occlusion and blurring. We propose a unified generative model for whole-body motion generation and reconstruction, advancing the understanding and synthesis of complex human motion conditioned on multi-modal conditions.
