Learned Metrics-Aware Covariance for Visual-Inertial Fusion
Abstract:
Visual-inertial state estimation integrates cameras and inertial measurement units (IMUs) to achieve accurate, metric-scale state estimation for autonomous systems. The covariance matrices associated with visual and inertial measurements determine how the estimator weights each sensing modality, making correct covariance modeling critical for fusion accuracy and consistency. However, most existing VI state estimators rely on constant or heuristically tuned covariance parameters that fail to capture the observation-dependent nature of real sensor uncertainty. As a result, they require tedious manual calibration for each new platform and environment and often yield suboptimal performance.
Visual-inertial state estimation integrates cameras and inertial measurement units (IMUs) to achieve accurate, metric-scale state estimation for autonomous systems. The covariance matrices associated with visual and inertial measurements determine how the estimator weights each sensing modality, making correct covariance modeling critical for fusion accuracy and consistency. However, most existing VI state estimators rely on constant or heuristically tuned covariance parameters that fail to capture the observation-dependent nature of real sensor uncertainty. As a result, they require tedious manual calibration for each new platform and environment and often yield suboptimal performance.
This thesis extends and improves learned metrics-aware covariance models (observation-dependent; uncertainty follows the error in the same physical scale) of both modalities for principled, tuning-free visual-inertial fusion. We present two systems (MAC-I² and MAC-VIO) that leverage these covariance models.
MAC-I² performs visual-inertial initialization and extrinsic calibration by fusing visual pose covariance with learned inertial covariance in a multi-frame optimization. MAC-VIO utilizes metrics-aware covariance to continuous visual-inertial odometry, performing two-frame optimization with feature-level visual residuals and IMU preintegration constraints. Experiments show that our systems achieve robust and accurate state estimation, even in challenging scenarios involving illumination changes, dynamic objects, and occlusions.
Committee Members:
Prof. Howie Choset (advisor)
Prof. Sebastian Scherer
Prof. Howie Choset (advisor)
Prof. Sebastian Scherer
Shibo Zhao
