Vision Model Diagnosis: A Generative Perspective

Master's Thesis, Tech. Report, CMU-RI-TR-23-57, August, 2023

View Publication

Abstract

In the evolving landscape of computer vision, deep learning has emerged as a transformative force, enhancing a myriad of societal facets. The deployment of these models necessitates rigorous evaluation and analysis, particularly when outcomes bear significant societal implications, such as their influence across varied ethnicities and genders. This imperative forms the nucleus of trustworthy deep learning, which aims to equip scientists and engineers with an understanding of the robustness, interpretability, fairness, safety, and tractability of these models.

Traditionally, Model Diagnostics refers to the validity assessment of a regression model, including the exploration of assumptions and the structural examination. As we transit into the era of deep learning for computer vision, we reinterpret Vision Model Diagnosis (VMD) as the systematic analysis and evaluation of deep vision models. As we increasingly delegate decision-making power to deep learning vision systems, their output can significantly impact individuals and society. Hence the process of VMD, which has attracted increasing attention from the research community, can enable us to comprehend the deep vision model's behavior, interpret its performance, and fix potential shortcomings and biases.

The main goal of this thesis is to provide a thorough understanding from a generative perspective: how generative models can help diagnose the decision-making process of a model, its fairness, and its robust behavior under various conditions. The use of various generative models with different paradigms, including conditional VAE and CLIP-guided StyleGAN, can empower VMD with rich semantic spaces that provide analysis for attributional fairness and visualize where the model fails. We hope that, with this thesis, we can provide valuable insights into how a diagnostic process should be constructed and raise attention in the research community to address issues of model trustworthiness and alignments. How to accurately uncovers a model's potential limitations and weaknesses is essential for securely publicizing deep learning models, and we envision the significant importance of this theme that will be growing fast in the upcoming decades.

BibTeX

@mastersthesis{Luo-2023-137520,
author = {Jinqi Luo},
title = {Vision Model Diagnosis: A Generative Perspective},
year = {2023},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-23-57},
keywords = {Generative Model, Trustworthy Computer Vision, Multimodal Machine Learning},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.