Visual Assessment for Non-Disruptive Object Extraction - The Robotics Institute Carnegie Mellon University
Home/Visual Assessment for Non-Disruptive Object Extraction

Visual Assessment for Non-Disruptive Object Extraction

Master's Thesis, Tech. Report, CMU-RI-TR-20-28, Robotics Institute, Carnegie Mellon University, August, 2020
Download Publication


Robots operating in human environments need to perform a variety of dexterous manipulation tasks on object arrangements that have complex physical support relationships, e.g. procuring utensils from a large pile of dishes, grabbing a bottle from a stuffed fridge, or fetching a book from a loaded shelf. The cost of a misjudged extraction in these situations can be very high (e.g., other objects falling) and therefore robots must be careful not to disturb other objects when executing manipulation skills. This requires robots to reason about the effect of their manipulation choices by accounting for the support relationships among objects in the scene. Humans do this in part by visually assessing the scene and using physics intuition to infer how likely it is that a particular object can be safely moved. Inspired by this human capability, we explore how robots can emulate similar vision-based physics intuition using deep learning based data-driven models.

We formulate our research problem as a scene understanding task for visually assessing the feasibility of extraction from an arrangement of objects. We focus on data-driven approaches that assess possible object interactions with only a few glimpses of the scene. Ongoing work has shown that deep convolutional neural networks can learn intuitive physics over images generated in simulation and determine the stability of an arrangement of objects in the real world. We extend these physics intuition models to the task of assessing safe object extraction by conditioning the visual images on specific objects in the scene using object masks. Our method identifies which objects can be safely extracted, from which direction to extract them, and the potential impact such extraction will have on nearby objects. Our results, in both simulation and real-world settings, show that physics intuition models using our proposed method can successfully inform a robot's actions during object extraction. We compare the performance of our method against simulation-based and geometry-based assessment methods and highlight their pros and cons for their application to the task of assessing safe object-extraction. Furthermore, we show that using aggregation techniques to combine multiple views, we can obtain a unified visual assessment that improves the model's predictive performance.


author = {Sarthak Ahuja},
title = {Visual Assessment for Non-Disruptive Object Extraction},
year = {2020},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-20-28},
keywords = {Intuitive Physics, Visual Assessment, Non-Disruptive Object Extraction},