Features in Extra Dimensions: Spatial and Temporal Scene Representations - Robotics Institute Carnegie Mellon University

Features in Extra Dimensions: Spatial and Temporal Scene Representations

Master's Thesis, Tech. Report, CMU-RI-TR-22-39, Robotics Institute, Carnegie Mellon University, August, 2022

Abstract

Computer vision models have made great progress in featurizing pixels of images. However, an image is only a projection of the actual 3D scene: occlusions and perspective distortions exist. To arrive at a better representation of the scene itself, extra dimensions are needed to learn spatial or temporal priors.

In this thesis, we propose two methods that introduce extra dimensions for modelling the scene space and time. The first method lifts features from the image plane onto the bird's eye view (BEV) plane for perception in autonomous driving. Features over the scene space enables our models to handle occlusion better, producing accurate BEV semantic representation. The second method introduces extra dimensions for modelling time, for better geometry-free point tracking. We track points through partial or full occlusions, using components that drive the current state-of-the-art in flow and object tracking, such as learned temporal priors, iterative optimization, and appearance updates. Features allocated over timesteps enables our models to track over long horizons and through occlusions, outperforming previous feature-matching and optical flow methods.

BibTeX

@mastersthesis{Fang-2022-133140,
author = {Zhaoyuan Fang},
title = {Features in Extra Dimensions: Spatial and Temporal Scene Representations},
year = {2022},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-22-39},
keywords = {3D Vision; BEV Perception; Tracking},
}