Deep Abstraction Learning for Neuro-Symbolic World Modeling - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Proposal

May

18
Mon
Bowen Li PhD Student Robotics Institute,
Carnegie Mellon University
Monday, May 18
10:00 am to 11:30 am
Newell-Simon Hall 4305
Deep Abstraction Learning for Neuro-Symbolic World Modeling
Abstract: Modern foundation models have achieved remarkable progress by learning broad physical and semantic common sense from large-scale data. However, robots operating in open-ended environments require more than general knowledge alone: they must continually specialize in new tasks, environments, and experiences encountered during deployment. Given only limited deployment-time data, how can robots learn to solve substantially more—and conceptually harder—problems than those seen during training?

This thesis addresses this challenge through deep abstraction learning, where robots discover relational abstractions grounded by deep neural networks to construct neuro-symbolic world models from high-dimensional and noisy observations. By ignoring task-irrelevant details, abstractions can be learned efficiently from limited experience while enabling abstract planning for long-horizon decision-making problems involving many interacting objects. The central hypothesis is that abstractions and world models should co-evolve: abstractions enable efficient reasoning and planning, while planning structure and execution failures drive the discovery of richer abstractions.

This framework consists of three key components. Deep state abstractions, represented as relational predicates, map high-dimensional observations into symbolic concepts that support reasoning and planning under noisy sensory inputs. Deep action abstractions, represented as relational option policies, capture reusable behaviors that enable robots to recover from failures and progressively acquire new abstractions from interaction. Neuro-symbolic world models describe how action abstractions transform state abstractions, enabling abstract planning that generalizes to unseen long-horizon tasks. To study these challenges, this thesis introduces benchmark suites that reveal the limitations of purely neural approaches and motivate abstraction-based world modeling.

Building on these foundations, this dissertation proposes two future directions. The first studies how pre-trained coding agents can synthesize expressive programmatic world models that support planning with recomposable tools such as abstractions and perception models. The second, RoboSymphony, studies multi-agent decision-making in which robots learn abstractions over the intentions and behaviors of other agents and humans, enabling coordination in collaborative tasks.

Together, these contributions advance a unified neuro-symbolic approach for robots that learn and plan with deep abstractions, enabling efficient specialization and decision-making in open-ended real-world environments.

Thesis Committee: 
Sebastian Scherer (Chair)
Maxim Likhachev
Katia Sycara
Tom Silver, Princeton University
Leslie P. Kaelbling, Massachusetts Institute of Technology