Modeling what Matters: Emergent Abstraction In Reinforcement Learning - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Defense

December

12
Fri
Benjamin (Ben) Freed PhD Student Robotics Institute,
Carnegie Mellon University
Friday, December 12
3:00 pm to 4:30 pm
Newell-Simon Hall 4305
Modeling what Matters: Emergent Abstraction In Reinforcement Learning
Abstract: Real-world decision-making is rife with partial observability, long horizons, and complex multi-agent interactions. This thesis argues that abstraction—forming simplified representations of the task that retain relevant information—offers a unifying principle for tackling these challenges across model-free and model-based reinforcement learning (RL). We develop methods in which abstractions are not hand-designed but emerge from learning objectives, yielding representations that improve an agent’s ability to cope with high-dimensional observations, extended temporal dependencies, and inter-agent coupling.


On the model-free, multi-agent side, we introduce Partial Reward Decoupling (PRD), a game-abstraction mechanism that dynamically decomposes teams into subgroups, simplifying cross-agent credit assignment and accelerating cooperative learning. We also study discrete communication learning under bandwidth constraints, where agents learn what information to transmit, to whom, and how to encode it—linking communication learning to representation learning and generative modeling.

We also show how abstraction mitigates the misalignment between model-learning and task objectives typically found in model-based RL methods. By focusing limited model capacity on task-relevant factors and operating at an appropriate temporal scale, abstraction improves the utility of world models for decision-making. Toward this end, we explore the use of variational inference (VI) to learn both state and temporal abstractions. We demonstrate a state-abstraction method that ignores distracting details while retaining task-relevant features, attaining strong results on distraction-rich control benchmarks without relying on data-augmentation heuristics. We also propose a latent-variable approach to temporal abstraction that extracts skills and learns a temporally abstract dynamics model from offline data, enabling effective long-horizon prediction and planning for downstream tasks.

Finally, we present Unified RL, which blends model-based and model-free updates by detecting when a learned model ceases to be useful for policy improvement and falling back to model-free learning updates. Empirically, Unified RL retains the data efficiency of model-based methods while achieving asymptotic performance comparable to model-free RL.

Committee Members: 

Howie Choset, chair
Jeff Schneider, co-chair
Ruslan Salakhutdinov
Roberto Calandra, TU Dresden
 
Link to thesis