Loading Events

PhD Thesis Defense

March

31
Wed
Achal Dave Robotics Institute,
Carnegie Mellon University
Wednesday, March 31
10:00 am to 11:00 am
Open-world Object Detection and Tracking

Abstract:
Computer vision today excels at recognizing narrow slices of the real world: our models seem to accurately detect objects like cats, cars, or chairs in benchmark datasets. However, deploying models requires that they work in the open world, which includes arbitrary objects in diverse settings. Current methods struggle on both axes: they recognize only a few classes, and struggle in settings that differ from the training distribution. A model that addresses these challenges can serve as a fundamental building block for downstream applications, including recognizing actions, manipulating objects, and navigating around obstacles. This thesis presents our work in building robust models for detecting and tracking anyobject, especially ones with few or even no training examples.

We start by exploring how traditional models, which recognize only a small set of object classes, generalize to the real world. We show that current methods are extremely sensitive: even subtle changes in the input image or test distribution can lead to drops in accuracy. Our systematic evaluations show that models — even ones trained for robustness to adversarial or synthetic corruptions — often correctly classify one frame of a video, but fail on a perceptually similar nearby frame. A similar phenomenon applies even to small distribution shifts arising from natural variation between datasets. Finally, we present an approach for addressing an extreme form of generalization to object appearance: detecting fully occluded objects.

Next, we explore generalization to large or infinite vocabularies, which contain rare and never-before-seen classes. Since current datasets are largely limited to a small, closed-world set of objects, we first present a large vocabulary benchmark for measuring progress in detection and tracking. We show that current evaluations do not suffice for large vocabulary benchmarks, and present alternative metrics that appropriately evaluate progress in this setting. Finally, we present approaches which leverage advances in closed-world recognition to build accurate, generic detectors and trackers for any object.

Thesis Committee Members:
Deva Ramanan, Chair
Katerina Fragkiadaki
Kris Kitani
Cordelia Schmid, INRIA
Ross Girshick, Facebook AI Research