Improving the Transparency of Agent Decision Making to Humans Using Demonstrations

PhD Thesis, Tech. Report, CMU-RI-TR-24-05, February, 2024

View Publication

Abstract

For intelligent agents (e.g. robots) to be seamlessly integrated into human society, humans must be able to understand their decision making. For example, the decision making of autonomous cars must be clear to the engineers certifying their safety, passengers riding them, and nearby drivers negotiating the road simultaneously. As an agent's decision making depends on its reward function to a great extent, we focus on teaching agent reward functions to humans. Through reasoning that resembles inverse reinforcement learning (IRL), humans naturally infer reward functions that underlie demonstrations of decision-making. Thus agents can teach their reward functions through demonstrations that are informative for IRL. However, we critically note that IRL does not consider the difficulty for a human to learn from each demonstration. Thus, this thesis proposes to augment teaching for IRL with principles from the education literature to provide demonstrations that belong in a human's zone of proximal development (ZPD) or their ``Goldilocks'' zone, i.e. demonstrations that are not too easy nor too difficult given their current beliefs. This thesis provides contributions in the following three areas.

We first consider the problem of teaching reward functions through select demonstrations. Based on ZPD, we use scaffolding to convey demonstrations that gradually increase in information gain and difficulty and ease the human into learning. Importantly, we argue that a demonstration's information gain is not intrinsic to the demonstration itself but must be conditioned on the human's current beliefs. An informative demonstration is accordingly one that meaningfully differs from the human’s expectations (i.e. counterfactuals) of what the agent will do given their current understanding of the agent’s decision making.

We secondly consider the problem of testing how much the human has learned from demonstrations, by asking humans to predict the agent's actions in new environments. We demonstrate two ways of measuring the difficulty of a test for a human. The first is a gross measure of difficulty that correlates test difficulty with the answer's information gain at revealing the agent's reward function. The second is a more tailored measure that conditions the difficulty of a test on the human's current beliefs of the reward function, estimating difficulty as the proportion of the human's beliefs that would yield the correct answer.

Finally, we introduce a closed-loop teaching framework that brings together teaching and testing. While informative teaching demonstrations may be selected a priori, student learning may deviate from the preselected curriculum in situ. Our teaching framework thus provides intermittent tests and feedback in between groups of related demonstrations to support tailored instruction in two ways. First, we are able to maintain a novel particle filter model of human beliefs and provide demonstrations targeted to the human's current understanding. And second, we are able to leverage tests not only as a tool for assessment but also for teaching, according to the testing effect in the education literature.

BibTeX

@phdthesis{Lee-2024-139987,
author = {Michael S. Lee},
title = {Improving the Transparency of Agent Decision Making to Humans Using Demonstrations},
year = {2024},
month = {February},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-05},
keywords = {Explainable AI, Transparency, Inverse Reinforcement Learning, Human-Robot Interaction},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.