Deliberative Perception

PhD Thesis, Tech. Report, CMU-RI-TR-17-67, Robotics Institute, Carnegie Mellon University, August, 2017

View Publication

Abstract

A recurrent and elementary robot perception task is to identify and localize objects of interest in the physical world. In many real-world situations such as in automated warehouses and assembly lines, this task entails localizing specific object instances with known 3D models. Most modern-day methods for the 3D multi-object localization task employ scene-to-model feature matching or regression/classification by learners trained on synthetic or real scenes. While these methods are typically fast in producing a result, they are often brittle, sensitive to occlusions, and depend on the right choice of features and/or training data.

This thesis introduces and advocates a deliberative approach, where the multi-object localization task is framed as an optimization over the space of hypothesized scenes. We demonstrate that deliberative reasoning — such as understanding inter-object occlusions — is essential to robust perception, and that discriminative techniques can effectively guide such reasoning. The contributions of this thesis broadly fall under three parts:

The first part, PErception via SeaRCH (PERCH) and its extension C-PERCH, formulates Deliberative Perception as an optimization over hypothesized scenes, and develops an efficient tree search algorithm for the same.

The second part focuses on accelerating global search through statistical learners, in the form of search heuristics (Discriminatively-guided Deliberative Perception), and by modulating the search-space (RANSAC-Trees).

The final part introduces general-purpose graph search algorithms that bridge statistical learning and search. Of these, the first is an anytime algorithm for leveraging edge validity priors to accelerate graph search, and the second, Improved Multi-Heuristic A*, permits the use of multiple, inadmissible heuristics that might arise from learning.

Experimental validation on multiple robots and real-world datasets, one of which we introduce, indicates that we can leverage the complementary strengths of fast learning-based methods and deliberative classical search to handle both "hard" (severely occluded) and "easy" portions of a scene by automatically sliding the amount of deliberation required.

BibTeX

@phdthesis{Narayanan-2017-27584,
author = {Venkatraman Narayanan},
title = {Deliberative Perception},
year = {2017},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-17-67},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.