Towards Socially Intelligent Multi-Agent Systems: Zero-Shot MARL Coordination and Theory-of-Mind Benchmarking of LLM Agents for Strategic Deception - Robotics Institute Carnegie Mellon University
Loading Events

MSR Thesis Presentation

June

24
Wed
Karan Mirakhor MSR Student Robotics Institute,
Carnegie Mellon University
Wednesday, June 24
2:00 pm to 3:00 pm
Newell-Simon Hall 4305
Towards Socially Intelligent Multi-Agent Systems: Zero-Shot MARL Coordination and Theory-of-Mind Benchmarking of LLM Agents for Strategic Deception

Abstract:
An agent that performs well on its own may still struggle when working with others. In multi-agent environments, success depends not only on understanding the world but also on understanding what other agents know, intend, and conceal. Cooperative partners follow hidden conventions, while adversarial opponents deceive. This work argues that robust multi-agent behavior requires explicit reasoning about these hidden mental states, and that we must measure this reasoning directly rather than simply looking at task outcomes.

These concepts are developed through two complementary projects. The first, BEACON, addresses the zero-shot coordination problem: how can an agent coordinate effectively with unfamiliar partners it has never trained with? When agents learn from offline data, they often lock into dataset-specific conventions that work well with familiar partners but fail with new ones. BEACON is an offline-to-online learning framework that clusters offline trajectories into different conventions, trains diverse specialists for each convention, and uses belief-conditioned counterfactual rollouts to adapt online. On 2- and 3-player Hanabi, BEACON achieves state-of-the-art zero-shot coordination performance while using up to five times fewer training frames than strong online baselines. It also coordinates with human partners comparably to a leading online method. The second project, AmongUs-X, asks whether large language model agents genuinely deceive or merely win through other means. Built on the social-deduction game Among Us and spanning 21 model families across more than 8,700 games, the benchmark elicits agents’ beliefs at fixed points during meetings. This yields eight Theory-of-Mind metrics measuring detection, deception, influence, and grounding. Win-rate-derived ratings track crewmate detection but miss impostor deception entirely. However, the elicited beliefs remain well-calibrated, enabling direct mechanism-level evaluation.

Both projects arrive at the same conclusion: high self-play scores can hide poor coordination, and high win rates can hide absent deception. Modeling other agents’ hidden information and measuring that modeling explicitly is essential for building socially intelligent multi-agent systems and evaluating them reliably.

Committee:
Dr. Katia Sycara (chair)
Dr. Jiaoyang Li
Renos Zabounidis