MBW: Multi-view Bootstrapping in the Wild

Mosam Dabhi, Chaoyang Wang, Tim Clifford, Laszlo Attila Jeni, Ian R. Fasel, and Simon Lucey

Conference Paper, Proceedings of (NeurIPS) Neural Information Processing Systems, December, 2022

View Publication

Abstract

Labeling articulated objects in unconstrained settings has a wide variety of applications including entertainment, neuroscience, psychology, ethology, and many fields of medicine. Large offline labeled datasets do not exist for all but the most common articulated object categories (e.g., humans). Hand labeling these landmarks within a video sequence is a laborious task. Learned landmark detectors can help, but can be error-prone when trained from only a few examples. Multi-camera systems that train fine-grained detectors have shown significant promise in detecting such errors, allowing for self-supervised solutions that only need a small percentage of the video sequence to be hand-labeled. The approach, however, is based on calibrated cameras and rigid geometry, making it expensive, difficult to manage, and impractical in real-world scenarios. In this paper, we address these bottlenecks by combining a non-rigid 3D neural prior with deep flow to obtain high-fidelity landmark estimates from videos with only two or three uncalibrated, handheld cameras. With just a few annotations (representing 1−2 % of the frames), we are able to produce 2D results comparable to state-of-the-art fully supervised methods, along with 3D reconstructions that are impossible with other existing approaches. Our Multi-view Bootstrapping in the Wild (MBW) approach demonstrates impressive results on standard human datasets, as well as tigers, cheetahs, fish, colobus monkeys, chimpanzees, and flamingos from videos captured casually in a zoo. We release the codebase for MBW as well as this challenging zoo dataset consisting of image frames of tail-end distribution categories with their corresponding 2D and 3D labels generated from minimal human intervention.

BibTeX

@conference{Dabhi-2022-136125,
author = {Mosam Dabhi and Chaoyang Wang and Tim Clifford and Laszlo Attila Jeni and Ian R. Fasel and Simon Lucey},
title = {MBW: Multi-view Bootstrapping in the Wild},
booktitle = {Proceedings of (NeurIPS) Neural Information Processing Systems},
year = {2022},
month = {December},
publisher = {Conference on Neural Information Processing Systems},
keywords = {Self-supervision; Auto labeling; Multi-view 2D-3D; Keypoint Detection},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.