Abstract: Imitation learning has achieved strong performance in sequential decision-making tasks, but typically requires large numbers of expert demonstrations, has limited generalization capability in unseen scenarios, and is challenging for laypeople without technical backgrounds. This thesis introduces structured policies, a framework that integrates human domain knowledge into imitation learning by using large language models (LLMs) to generate semantically meaningful policy structures from natural language instructions while learning continuous parameters from demonstrations. By explicitly encoding task-relevant latent variables and their dependencies, structured policies focus on the essential causal structure of the expert policy, improving sample efficiency, robustness, and interpretability. We first present Knowledge Informed Models (KIM) that integrate expert domain knowledge and demonstrations in a straightforward way, and demonstrate its sample-efficient and robustness in continuous control domains such as Lunar Lander and Car Racing. We then present Interactive Policy Restructuring and Training (InterPReT), an interactive learning paradigm that allows end-users to iteratively provide instructions and demonstrations to refine the policy. And we show how it can learn dependable policies from laypeople through a user study. Together, the two projects show that structured policy is a promising way to integrate symbolic knowledge and continuous demonstrations for learning from human teachers.
Committee:
Prof. Reid Simmons (co-advisor)
Prof. Jean Oh (co-advisor)
Prof. Yonatan Bisk
Bowen Li
