Kitchen Robot Case Studies: Learning Manipulation Tasks from Human Video Demonstrations - Robotics Institute Carnegie Mellon University

Kitchen Robot Case Studies: Learning Manipulation Tasks from Human Video Demonstrations

Master's Thesis, Tech. Report, CMU-RI-TR-23-87, December, 2023

Abstract

The vision of integrating a robot into the kitchen, capable of acting as a chef, remains a sought-after goal in robotics. Current robotic systems, mostly programmed for specific tasks, fall short in versatility and adaptability to a diverse culinary environment. While significant progress has been made in robot learning, with advancements in behavior cloning, reinforcement learning, and recent strides in diffusion policies and transformers, the challenge remains to develop a robot that matches human capabilities in learning and generalizing across tasks, particularly in complex, unstructured real-world scenarios.

In the thesis, I focus on enabling robots to learn manipulation tasks from a single human demonstration, with predefined primitives that are generalizable across similar objects and environments. We developed a system that can process RGBD video demonstrations to identify task-relevant key poses and frames using Segment Anything. We then addressed challenges for robots replicating human actions, such as collision and robot configuration limitations. To validate the effectiveness of our approach, we conducted experiments focusing on manual dishwashing. With one human demonstration in a lab kitchen, the method was tested under varied conditions in a standard home kitchen, differing in geometry and appearance from the learning environment.

Further, we broaden the scope of learning to more generalized data sources, particularly focusing on videos from unstructured environments like YouTube. By enabling the use of unseen videos as a source for specific robot learning tasks, we translated visual elements into physical constraints and goals in simulation, inferring physics of the tasks. We demonstrated the transferability of this learning methods to real-world scenarios with actual robots, on tasks including fruit cutting, dough manipulation, and pouring liquids.

BibTeX

@mastersthesis{Guo-2023-139203,
author = {Dingkun Guo},
title = {Kitchen Robot Case Studies: Learning Manipulation Tasks from Human Video Demonstrations},
year = {2023},
month = {December},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-23-87},
keywords = {Robot Learning, Manipulation, Learning from Demonstration},
}