A-EXP4: Online Social Policy Learning for Adaptive Robot-Pedestrian Interaction - Robotics Institute Carnegie Mellon University

A-EXP4: Online Social Policy Learning for Adaptive Robot-Pedestrian Interaction

Pengju Jin, Eshed Ohn-Bar, Kris Kitani, and Chieko Asakawa
Conference Paper, Proceedings of (IROS) IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5086 - 5093, November, 2019

Abstract

We study self-supervised adaptation of a robot's policy for social interaction, i.e., a policy for active communication with surrounding pedestrians through audio or visual signals. Inspired by the observation that humans continually adapt their behavior when interacting under varying social context, we propose Adaptive EXP4 (A-EXP4), a novel online learning algorithm for adapting the robot-pedestrian interaction policy. To address limitations of bandit algorithms in adaptation to unseen and highly dynamic scenarios, we employ a mixture model over the policy parameter space. Specifically, a Dirichlet Process Gaussian Mixture Model (DPMM) is used to cluster the parameters of sampled policies and maintain a mixture model over the clusters, hence effectively discovering policies that are suitable to the current environmental context in an unsupervised manner. Our simulated and real-world experiments demonstrate the feasibility of A-EXP4 in accommodating interaction with different types of pedestrians while jointly minimizing social disruption through the adaptation process. While the A-EXP4 formulation is kept general for application in a variety of domains requiring continual adaptation of a robot's policy, we specifically evaluate the performance of our algorithm using a suitcase-inspired assistive robotic platform. In this concrete assistive scenario, the algorithm observes how audio signals produced by the navigational system affect the behavior of pedestrians and adapts accordingly. Consequently, we find A-EXP4 to effectively adapt the interaction policy for gently clearing a navigation path in crowded settings, resulting in significant reduction in empirical regret compared to the EXP4 baseline.

BibTeX

@conference{Jin-2019-121339,
author = {Pengju Jin and Eshed Ohn-Bar and Kris Kitani and Chieko Asakawa},
title = {A-EXP4: Online Social Policy Learning for Adaptive Robot-Pedestrian Interaction},
booktitle = {Proceedings of (IROS) IEEE/RSJ International Conference on Intelligent Robots and Systems},
year = {2019},
month = {November},
pages = {5086 - 5093},
}