Spatial, Temporal and Spatio-temporal Correspondence for Computer Vision Problems

PhD Thesis, Tech. Report, CMU-RI-TR-14-18, Robotics Institute, Carnegie Mellon University, September, 2014

View Publication

Abstract

Many computer vision problems, such as object classification, motion estima- tion or shape registration rely on solving the correspondence problem. Existing al- gorithms to solve spatial or temporal correspondence problems are usually NP-hard, difficult to approximate, lack flexible models and mechanism for feature weight- ing. This proposal addresses the correspondence problem in computer vision, and proposes two new spatio-temporal correspondence problems and three algorithms to solve spatial, temporal and spatio-temporal matching between video and other sources. The main contributions of the thesis are: (1) Factorial graph matching (FGM). FGM extends existing work on graph match- ing (GM) by finding an exact factorization of the affinity matrix. Four are the ben- efits that follow from this factorization: (a) There is no need to compute the costly (in space and time) pairwise affinity matrix; (b) It provides a unified framework that reveals commonalities and differences between GM methods. Moreover, the factorization provides a clean connection with other matching algorithms such as iterative closest point; (c) The factorization allows the use of a path-following op- timization algorithm, that leads to improved optimization strategies and matching performance; (d) Given the factorization, it becomes straight-forward to incorporate geometric transformations (rigid and non-rigid) to the GM problem. (2) Canonical time warping (CTW). CTW is a technique to temporally align multiple multi-dimensional and multi-modal time series. CTW extends DTW by incorporating a feature weighting layer to adapt different modalities, allowing a more flexible warping as combination of monotonic functions, and has linear complexity (unlike DTW that has quadratic). We applied CTW to align human motion captured with different sensors (e.g., audio, video, accelerometers). (3) Spatio-temporal matching (STM). Given a video and a 3D motion capture model, STM finds the correspondence between subsets of video trajectories and the motion capture model. STM is efficiently and robustly solved using linear program- ming. We illustrate the performance of STM on the problem of human detection in video, and show how STM achieves state-of-the-art performance.

BibTeX

@phdthesis{Zhou-2014-7930,
author = {Feng Zhou},
title = {Spatial, Temporal and Spatio-temporal Correspondence for Computer Vision Problems},
year = {2014},
month = {September},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-14-18},
keywords = {Correspondence Problem, Temporal Alignment, Graph Matching, Dynamic Time},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.