Jointly Forecasting and Controlling Behavior by Learning from High-Dimensional Data - Robotics Institute Carnegie Mellon University

Jointly Forecasting and Controlling Behavior by Learning from High-Dimensional Data

PhD Thesis, Tech. Report, CMU-RI-TR-19-76, Robotics Institute, Carnegie Mellon University, September, 2019

Abstract

Achieving a precise predictive understanding of the future is difficult, yet widely studied in the natural sciences. Significant research activity has been dedicated to building testable models of cause and effect. From a certain view, the ability to forecast the universe is the "holy grail"; the ultimate goal of science. If we had it, we could anticipate, and therefore (at least implicitly) understand all observable phenomena. The human capability to forecast offers complementary motivation. Critical to our intelligence is our ability to plan behaviors by considering how our actions are likely to result in future payoff, especially in the presence of other collaborative and competitive agents. In this work, we seek to computationally model the future in the presence of agent behavior given rich observations of the environment. The brunt of our focus is to reason about what agents could do, instead of other sources of stochasticity. This focus on future agent behavior allows us to tightly couple and jointly perform forecasting and control.

The field of Computer Vision (CV) is focused on designing algorithms to automatically understand images, videos, and other perceptual data. However, the field's effort to-date focuses on non-interactive, present-focused tasks. Most CV contributions are algorithms to answer questions like "what is that", and "what happened", rather than "what could happen", or "how could I achieve X". Computer Vision has under-explored reasoning about the interactive and decision-based nature of the world. In contrast, Reinforcement Learning (RL) prioritizes modeling interactions and decisions by focusing on how to design algorithms to evoke behavior that maximizes a scalar reward signal. The resulting learning agents, in order to perform well, must have an understanding of how their current behaviors will affect their prospects of future reward. However, in the dominant paradigm of model-free RL, agents reason implicitly about the future. In contrast, model-based RL learns one-step dynamics to estimate "what could happen in the near future". Yet model-based RL primarily focuses on control, rather than explicitly forecasting a single agent (let alone multiple agents).

In this thesis, we consider the problem of designing algorithms to enable computational systems to (1) forecast future behavior of intelligent agents given rich observations of their environments, as well as to (2) use this reasoning for control. We believe these two problems should be tightly integrated and jointly considered, and use them to structure this thesis. We define forecasting to be the problem of estimating the set of possible outcomes of a system, whereas control is the problem of producing actions that generate a single outcome of a system. We often use Imitation Learning and Reinforcement Learning to formulate and situate our work.

We contribute forecasting and control approaches to excel in diverse, realistic, single-agent, and multi-agent domains. The first part of the thesis focuses on progressively designing more capable forecasting models. We proceed through approaches to (1) forecast single actions of daily behavior by developing matrix factorization models, (2) forecast goal-driven action trajectories of daily behavior by developing Online Inverse Reinforcement Learning models, (3) forecast motion trajectories of vehicles by developing a deep reversible generative models. The second part of the thesis focuses on progressively designing more capable models that tightly couple forecasting and control. We discuss (4) forecasting as auxiliary supervision for implicitly-planned control, (5) generating and executing forecasting and explicitly planning with the same model, and (6) forecasting and planning future interactions of multiple agents.

BibTeX

@phdthesis{Rhinehart-2019-117954,
author = {Nicholas Rhinehart},
title = {Jointly Forecasting and Controlling Behavior by Learning from High-Dimensional Data},
year = {2019},
month = {September},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-76},
keywords = {Forecasting, Imitation Learning, Reinforcement Learning, Computer Vision, Machine Learning, Robotics},
}