On the model-free, multi-agent side, we introduce Partial Reward Decoupling (PRD), a game-abstraction mechanism that dynamically decomposes teams into subgroups, simplifying cross-agent credit assignment and accelerating cooperative learning. We also study discrete communication learning under bandwidth constraints, where agents learn what information to transmit, to whom, and how to encode it—linking communication learning to representation learning and generative modeling.
We also show how abstraction mitigates the misalignment between model-learning and task objectives typically found in model-based RL methods. By focusing limited model capacity on task-relevant factors and operating at an appropriate temporal scale, abstraction improves the utility of world models for decision-making. Toward this end, we explore the use of variational inference (VI) to learn both state and temporal abstractions. We demonstrate a state-abstraction method that ignores distracting details while retaining task-relevant features, attaining strong results on distraction-rich control benchmarks without relying on data-augmentation heuristics. We also propose a latent-variable approach to temporal abstraction that extracts skills and learns a temporally abstract dynamics model from offline data, enabling effective long-horizon prediction and planning for downstream tasks.
Finally, we present Unified RL, which blends model-based and model-free updates by detecting when a learned model ceases to be useful for policy improvement and falling back to model-free learning updates. Empirically, Unified RL retains the data efficiency of model-based methods while achieving asymptotic performance comparable to model-free RL.
Committee Members:
Jeff Schneider, co-chair
Ruslan Salakhutdinov
Roberto Calandra, TU Dresden
