Vision Model Diagnosis and Improvement Via Large Pretrained Models - Robotics Institute Carnegie Mellon University

Vision Model Diagnosis and Improvement Via Large Pretrained Models

Master's Thesis, Tech. Report, CMU-RI-TR-24-22, May, 2024

Abstract

Recent years have witnessed a rapid evolution in the field of artificial intelligence (AI). As AI becomes increasingly pervasive in real-world applications, the deployment of machine learning models in real-world applications has underscored critical challenges in model robustness, fairness and performance. Despite significant advances, existing models often exhibit biases, fail to generalize across diverse data distributions, and struggle with unexpected input variations, leading to suboptimal or even discriminatory outcomes.

This thesis addresses these pressing challenges by harnessing the power of large pretrained models, especially vision generative models. In particular, two key problems are studied: (1) the identification of model biases and vulnerabilities, and (2) the utilization of synthetic data generation to improve model generalizability and performance. Along these lines, this thesis introduces two frameworks: Unsupervised Model Diagnosis (UMO) and Domain Gap Embeddings for Generative Dataset Augmentation (DoGE), which together offer a comprehensive and accessible solution to the challenges of model bias and distribution shifts in data.

UMO enables diagnosing model vulnerabilities in an unsupervised manner by employing generative models to produce semantic counterfactual explanations without the need for extensively annotated datasets or explicit user input. This framework facilitates the identification of sensitive semantic directions and spurious correlations within models, highlighting potential failure modes and biases without human intervention.

Complementing UMO, DoGE introduces a diffusion-based data augmentation technique that efficiently bridges cross-distribution gaps between training and target datasets. By capturing and embedding distribution differences in a latent form, DoGE enables the generation of synthetic datasets that closely align with target distributions, significantly improving model performance across various tasks.

The UMO framework's ability to diagnose model vulnerabilities without extensive annotated datasets or explicit user input, combined with DoGE's capability to augment data distributions to better align with target or underrepresented distributions, presents a powerful methodology for enhancing model fairness, robustness, and performance. Through these works, this thesis aims to enhance the robustness, fairness, and performance of machine learning models, thereby fostering the development of more reliable and equitable AI systems.

BibTeX

@mastersthesis{Wang-2024-140637,
author = {Yinong Wang},
title = {Vision Model Diagnosis and Improvement Via Large Pretrained Models},
year = {2024},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-22},
keywords = {Model Diagnosis, Bias and Fairness, Generative Model, Synthetic Dataset},
}