Enhancing Concept-Based Decision Making in AI Models with Disentanglement - Robotics Institute Carnegie Mellon University
Loading Events

MSR Thesis Defense

May

16
Fri
Renos Zabounidis PhD Student Robotics Institute,
Carnegie Mellon University
Friday, May 16
4:00 pm to 5:00 pm
NSH 4305
Enhancing Concept-Based Decision Making in AI Models with Disentanglement
Abstract: Deploying AI in high-stakes settings requires models that are not only accurate but also interpretable and amenable to human oversight. Concept Bottleneck Models (CBMs) support these goals by structuring predictions around human-understandable concepts, enabling interpretability and post-hoc human intervenability. However, CBMs rely on a ‘complete’ concept set, requiring practitioners to define and label enough concepts to match the predictive power of black-box models. To relax this requirement, prior work introduced residual connections that bypass the concept layer and recover information missing from an incomplete concept set. While effective in bridging the performance gap, these residuals can redundantly encode concept information, a phenomenon we term concept-residual overlap.
In this work, we investigate the effects of concept-residual overlap and evaluate strategies to mitigate it. We (1) define metrics to quantify the extent of concept-residual overlap in CRMs; (2) introduce metrics to evaluate how this overlap impacts interpretability, concept importance, and the effectiveness of concept-based interventions; and (3) present Disentangled Concept-Residual Models (D-CRMs), a general class of CRMs designed to mitigate this issue. Within this class, we propose a novel disentanglement approach based on minimizing mutual information (MI). Using CelebA, CIFAR100, AA2, CUB, and OAI, we show that standard CRMs exhibit significant concept-residual overlap, and that reducing this overlap with MI-based D-CRMs restores key properties of CBMs, including interpretability, functional reliance on concepts, and intervention robustness, without sacrificing predictive performance.
Committee:
Prof. Katia Sycara (advisor)
Prof. Jun-Yan Zhu
Prof. Zachary Lipton
Mohamad Qadri