Open World Object Detection and Tracking

PhD Thesis, Tech. Report, CMU-RI-TR-21-17, Robotics Institute, Carnegie Mellon University, May, 2021

View Publication

Abstract

Computer vision today excels at recognizing narrow slices of the real world: our models seem to accurately detect objects like cats, cars, or chairs in benchmark datasets. However, deploying models requires that they work in the open world, which includes arbitrary objects in diverse settings. Current methods struggle on both axes: they recognize only a few classes, and struggle in settings that differ from the training distribution. A model that addresses these challenges can serve as a fundamental building block for downstream applications, including recognizing actions, manipulating objects, and navigating around obstacles. This thesis presents our work in building robust models for detecting and tracking any object, especially ones with few or even no training examples.

We start by exploring how traditional models, which recognize only a small set of object classes, generalize to the real world. We show that current methods are extremely sensitive: even subtle changes in the input image or test distribution can lead to drops in accuracy. Our systematic evaluations show that models — even ones trained for robustness to adversarial or synthetic corruptions — often correctly classify one frame of a video, but fail on a perceptually similar nearby frame. A similar phenomenon applies even to small distribution shifts arising from natural variation between datasets. Finally, we present an approach for addressing an extreme form of generalization to object appearance: detecting fully occluded objects.

Next, we explore generalization to large or infinite vocabularies, which contain rare and never-before-seen classes. Since current datasets are largely limited to a small, closed-world set of objects, we first present a large vocabulary benchmark for measuring progress in detection and tracking. We show that current evaluations do not suffice for large vocabulary benchmarks, and present alternative metrics that appropriately evaluate progress in this setting. Finally, we present approaches which leverage advances in closed-world recognition to build accurate, generic detectors and trackers for any object.

BibTeX

@phdthesis{Dave-2021-127393,
author = {Achal Dave},
title = {Open World Object Detection and Tracking},
year = {2021},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-21-17},
keywords = {open world, detection, tracking, long tail, robustness, distribution shift},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.