Visual Assessment for Non-Disruptive Object Extraction

Master's Thesis, Tech. Report, CMU-RI-TR-20-28, Robotics Institute, Carnegie Mellon University, August, 2020

View Publication

Abstract

Robots operating in human environments need to perform a variety of dexterous manipulation tasks on object arrangements that have complex physical support relationships, e.g. procuring utensils from a large pile of dishes, grabbing a bottle from a stuffed fridge, or fetching a book from a loaded shelf. The cost of a misjudged extraction in these situations can be very high (e.g., other objects falling) and therefore robots must be careful not to disturb other objects when executing manipulation skills. This requires robots to reason about the effect of their manipulation choices by accounting for the support relationships among objects in the scene. Humans do this in part by visually assessing the scene and using physics intuition to infer how likely it is that a particular object can be safely moved. Inspired by this human capability, we explore how robots can emulate similar vision-based physics intuition using deep learning based data-driven models.

We formulate our research problem as a scene understanding task for visually assessing the feasibility of extraction from an arrangement of objects. We focus on data-driven approaches that assess possible object interactions with only a few glimpses of the scene. Ongoing work has shown that deep convolutional neural networks can learn intuitive physics over images generated in simulation and determine the stability of an arrangement of objects in the real world. We extend these physics intuition models to the task of assessing safe object extraction by conditioning the visual images on specific objects in the scene using object masks. Our method identifies which objects can be safely extracted, from which direction to extract them, and the potential impact such extraction will have on nearby objects. Our results, in both simulation and real-world settings, show that physics intuition models using our proposed method can successfully inform a robot's actions during object extraction. We compare the performance of our method against simulation-based and geometry-based assessment methods and highlight their pros and cons for their application to the task of assessing safe object-extraction. Furthermore, we show that using aggregation techniques to combine multiple views, we can obtain a unified visual assessment that improves the model's predictive performance.

BibTeX

@mastersthesis{Ahuja-2020-123582,
author = {Sarthak Ahuja},
title = {Visual Assessment for Non-Disruptive Object Extraction},
year = {2020},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-20-28},
keywords = {Intuitive Physics, Visual Assessment, Non-Disruptive Object Extraction},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.