/Synthesizing Scenes for Instance Detection

Synthesizing Scenes for Instance Detection

Debidatta Dwibedi
Master's Thesis, Tech. Report, CMU-RI-TR-17-21, Robotics Institute, Carnegie Mellon University, May, 2017

Download Publication (PDF)

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.


Object detection models have made significant progress in recent years. A major impediment in rapidly deploying these models for instance detection is the lack of large annotated datasets. For example, finding a large labeled dataset containing instances in a particular kitchen is unlikely. The brute force data collection approach would require a lot of manual effort for each new environment with new instances. In this thesis, we explore three methods to tackle the above problem. First, we present how we can use object tracking in videos to propagate bounding box annotations from one frame to the subsequent frames. Next, we show how 3D reconstruction can be used to produce annotations for object detection and pose estimation. Finally, we present a novel approach for generating synthetic scenes with annotations for instance detection. Our key insight is that ensuring only patch-level realism provides enough training signal for current object detector models. A naive way to do this results in pixel artifacts which result in poor performance for trained models. We show how to make detectors ignore these artifacts during training and generate data that gives competitive performance to real data. Our results show that we outperform existing synthesis approaches and that the complementary information contained in our synthetic data when combined with real data improves performance by more than 10 AP points on benchmark datasets.

BibTeX Reference
author = {Debidatta Dwibedi},
title = {Synthesizing Scenes for Instance Detection},
year = {2017},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-17-21},
keywords = {synthetic data, object detection, deep learning, instance detection, computer vision},