Loading Events

PhD Thesis Defense

April

13
Fri
Jacob Walker Robotics Institute,
Carnegie Mellon University
Friday, April 13
2:30 pm to 3:30 pm
NSH 3305
Data-Driven Visual Forecasting

Abstract:
Understanding the temporal dimension of images is a fundamental part of computer vision. Humans are able to interpret how the entities in an image will change over time. However, it has only been relatively recently that researchers have focused on visual forecasting—getting machines to anticipate events in the visual world before the actually happen. This aspect of vision has many practical implications in tasks ranging from human-computer interaction to anomaly detection. In addition, temporal prediction can serve as a task for representation learning, useful for various other recognition problems.

In this thesis, we focus on visual forecasting that is data-driven, self supervised, and relies on little to no explicit semantic information. Towards this goal, we explore prediction at different timeframes. We first consider predicting instantaneous pixel motion—optical flow. We apply convolutional neural networks to predict optical flow in static images. We then extend this idea to a longer timeframe, generalizing to pixel trajectory prediction in space-time. We incorporate models such as Variational Autoencoders to generate future possible motions in the scene. After this, we consider a mid-level element approach to forecasting. By combining a Markovian reasoning framework with an intermediate representation, we are able to forecast events over longer timescales.

We then build upon these ideas towards structured representations for visual forecasting. Specifically, we aim to reason about the future of images in a structured state space. Instead of directly predicting events in a low-level feature space such as pixels or motion, we forecast events in a higher level representation that is still visually meaningful. This approach confers a number of advantages. It is not restricted by explicit timescales like motion-based approaches, and unlike direct pixel-based approaches predictions are less likely to “fall off” the manifold of the true visual world.

More Information

Thesis Committee Members:
Martial Hebert, Co-chair
Abhinav Gupta, Co-chair
Ruslan Salakhutdinov
David Forsyth, University of Illinois at Urbana-Champaign