Efficient Warping for Visual Perception - Robotics Institute Carnegie Mellon University
Loading Events

PhD Speaking Qualifier

April

28
Tue
Shen Zheng PhD Student Robotics Institute,
Carnegie Mellon University
Tuesday, April 28
3:00 pm to 4:00 pm
Newell-Simon Hall 4305
Efficient Warping for Visual Perception

Abstract:

Salient foreground regions (e.g., vehicles, faces) occupy only ~10% of pixels, while less informative backgrounds (e.g., sky, trees) dominate ~90%. This imbalance fundamentally limits visual perception:

(1) 2D discriminative tasks (e.g., domain-adaptive detection and segmentation) rely heavily on large background regions with high cross-domain variation, making domain adaptation difficult.

(2) 2D generative tasks (e.g., image-to-image translation) compress images into a latent space where small foreground regions receive even fewer spatial resources, making it difficult to preserve and reconstruct fine-grained details.

(3) 3D tasks (e.g., occupancy prediction) allocate substantial computation on less informative background regions, leading to significant latency and memory overhead.

In this work, we propose an efficient image warping framework with instance-level saliency to oversample salient foregrounds and undersample less informative backgrounds. Our method is model-agnostic, works with arbitrary saliency priors, requires no architecture modification, and introduces negligible computational, memory, and latency overhead. Experiments on domain-adaptive object detection and semantic segmentation, image-to-image translation (e.g., human and driving scene relighting, and driving scene translation), and 3D occupancy prediction demonstrate improved accuracy for discriminative and 3D tasks, and enhanced fidelity and realism for generative tasks, while maintaining high efficiency.

Committee:

Prof. Srinivasa Narasimhan

Prof. Deva Ramanan

Prof. Shubham Tulsiani

Gaurav Parmar