Towards Efficient Multi-Agent and Temporal Credit Assignment in Reinforcement Learning

Master's Thesis, Tech. Report, CMU-RI-TR-25-56, June, 2025

View Publication

Abstract

This thesis tackles the core challenge of credit assignment in reinforcement learning (RL), where agents must determine which actions or agents deserve credit for outcomes in complex environments. Traditional RL struggles when rewards are sparse, delayed, or shared among multiple agents, as is common in real-world tasks like robotics or game AI. To address this, the thesis introduces two innovative approaches: one for multi-agent collaboration and another for temporal decision-making.

The first contribution, ME-IGM, solves a critical problem in multi-agent RL. While existing methods like QMIX decompose global rewards into individual agent contributions, they often fail when combined with maximum entropy RL, which encourages exploration but can misalign local actions with the team’s best strategy. ME-IGM fixes this by introducing an order-preserving transformation that ensures each agent’s policy respects the global optimal action sequence. Experiments on the SMAC-v2 benchmark show ME-IGM outperforming prior methods, even matching imitation learning approaches without needing expert demonstrations.

The second contribution, RICOL, rethinks temporal credit assignment by leveraging large language models (LLMs) to analyze past decisions and assign rewards more efficiently. Instead of training a critic network from scratch, RICOL uses an LLM to retrospectively evaluate actions, converting sparse rewards into dense learning signals. This method is much more sample-efficient than traditional Monte Carlo estimation. RICOL builds on this by fine-tuning LLM policies through iterative updates, achieving dramatic improvements over PPO in tasks like BabyAI while remaining robust to noisy feedback.

Together, these advances make RL more practical for real-world applications by improving how agents learn from limited feedback. ME-IGM enables better teamwork in decentralized systems, while RICOL unlocks faster learning in sequential tasks. The results suggest promising directions for future work, such as adapting these methods to continuous control or integrating them with larger-scale AI systems. By refining credit assignment at both multi-agent and temporal levels, this research helps bridge the gap between theoretical RL and deployable solutions.

BibTeX

@mastersthesis{Chen-2025-147290,
author = {Wen-Tse Chen},
title = {Towards Efficient Multi-Agent and Temporal Credit Assignment in Reinforcement Learning},
year = {2025},
month = {June},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-56},
keywords = {Credit Assignment, Reinforcement Learning, Multi-Agent Reinforcement Learning, Large Language Model},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.