Loading Events

MSR Speaking Qualifier

December

6
Tue
Mayank Singh Robotics Institute,
Carnegie Mellon University
Tuesday, December 6
2:00 pm to 3:00 pm
3305 Newell-Simon Hall
MSR Thesis Talk: Mayank Singh
Title: Analogical Networks: Memory-Modulated In-Context 3D Parsing
Abstract:
Recent advances in the applications of deep neural networks to numerous visual perception tasks have shown excellent performance. However, this generally requires access to large amount of training samples and hence one persistent challenge is the setting of few-shot learning. In most existing works, a separate parametric neural model is trained to parse each semantic category, which hinders knowledge sharing across objects and few-shot generalization to novel categories.

In this thesis, we present Analogical Networks, a model that casts fine-grained 3D visual parsing as analogical inference: instead of mapping input scenes to part labels, which is hard to adapt in a few-shot manner to novel inputs, our model retrieves related scenes from memory and their corresponding part structures, and predicts analogous part structures in the input scene, via an end-to-end learnable modulation mechanism. By conditioning on more than one memory and using this memory as in-context information, compositions of structures are predicted, that mix and match parts from different visual experiences. This is a memory inspired learning framework for perception parsing tasks that encodes domain knowledge explicitly in a vast collection of memories at different levels of abstraction, in addition to those implicitly encoded as model parameters. We show that Analogical Networks excel at few-shot 3D parsing, where  instances of novel object categories are successfully parsed simply by expanding the model’s memory, without any weight updates.  Analogical Networks outperform existing state-of-the-art detection transformer models at part segmentation, as well as paradigms of meta-learning and few-shot learning. We show that part correspondences emerge across memory and input scenes by simply training for a label-free segmentation objective,  as a byproduct of the analogical inductive bias.

Committee:

Prof. Katerina Fragkiadaki (advisor)
Prof. Shubham Tulsiani
Gengshan Yang