Abstract:
Building large-scale human datasets from multi-view videos is essential for advancing research in human behavior understanding, virtual reality, animation, and robotics. Compared to traditional motion capture systems that rely on physical markers to track motion, vision-based reconstruction not only enables the capture of human motion in unconstrained environments but also avoids altering human appearance with markers. However, existing multi-view reconstruction methods often fail when humans interact closely with others or with objects due to severe occlusions and truncations introduced by complex activities. In this thesis, we develop a markerless capture system capable of handling close human interactions and dexterous hand-object manipulations. Using this system, we construct two large-scale human datasets, Harmony4D and Contact4D, which serve as landmarks for advancing fundamental research in human-centric AI, such as human pose estimation, contact estimation, and motion generation.
Committee:
Kris Kitani, Chair
Fernando De La Torre
Shubham Tulsiani
Erica Weng
