Leveraging Structure for Generalization and Prediction in Visual System

Yufei Ye
Master's Thesis, Tech. Report, CMU-RI-TR-19-70, June, 2019

Download Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Our surrounding world is highly structured. Humans have a great capacity of understanding and leveraging those structures to generalize to novel scenarios and to predict the future. The thesis studies how computer vision systems benefit from a similar process — leveraging inherent structures in data to improve generalization and prediction capacity.

It focuses on two specific aspects: zero-shot recognition using categorical structures which is explicitly specified by knowledge graphs; video predictions by leveraging the implicit physical structures among entities. Both methods are based on the scalable machine learning framework, graph neural network, to directly learn structures from large-scale data. In zero-shot recognition, we have shown that accuracy improves significantly and is more robust due to external knowledge in the knowledge graph. In video prediction, we have found the long-term prediction is significantly sharper when factoring the structure among entities.

This work serves as the master thesis of Yufei Ye.

author = {Yufei Ye},
title = {Leveraging Structure for Generalization and Prediction in Visual System},
year = {2019},
month = {June},
school = {},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-70},
} 2019-08-12T13:41:46-04:00