Abstract:
Humans possess an extraordinary ability to manipulate objects, discerning position, shape, and other properties with just a glance. How can robots be endowed with similar perceptual and dexterous manipulation capabilities? In this talk, I will present a method that combines the sample efficiency of traditional model-based approaches with the high generalizability of deep learning methods to tackle dexterous manipulation tasks. I will begin with a brief introduction to the model-based framework grounded in Koopman Operator Theory, highlighting its reliance on ground-truth (GT) object states in real-world applications.
To address this limitation, we propose Koopman Operator Rollout for Object Feature Learning (KOROL)—an approach that removes the dependency on GT states in model-based manipulation learning. KOROL learns visual features that predict robot states throughout dynamics model rollouts. Unlike prior approaches that learn implicit visual features for direct image-to-action policies, KOROL explicitly trains on object-centric visual representations, encoding essential scene information to improve robot state predictions during autoregressive rollouts. This establishes a synergistic relationship between the learned object features and the Koopman operator.
Our experiments demonstrate that KOROL: (i) improves performance across various simulated manipulation tasks compared to Koopman operators using GT object states and other baselines, (ii) extends Koopman-based methods to vision-based real-world tasks, and (iii) enables multitasking through dimensionally aligned object features.
Committee:
Prof. Jeffrey Ichnowski
Prof. Guanya Shi
Prof. Oliver Kroemer
Bardienus Duisterhof
