Loading Events

MSR Speaking Qualifier

June

3
Mon
Yubo Zhang Robotics Institute,
Carnegie Mellon University
Monday, June 3
3:30 pm to 5:00 pm
NSH 4305
Yubo Zhang – MSR Thesis Talk

Title: A structured model for action detection

 

Abstract: 

A dominant paradigm for learning-based approaches in computer vision is training generic models, such as ResNet for image recognition, or I3D for video understanding, on large datasets and allowing them to discover the optimal representation for the problem at hand. While this is an obviously attractive approach, it is not applicable in all scenarios. Action detection is one such challenging problem – the models that need to be trained are large, and the labeled data is expensive to obtain. To address this limitation, in this talk, I will introduce a novel method that incorporates domain knowledge into the structure of the model to simplify optimization. In particular, a standard I3D network is augmented with a tracking module to aggregate long term motion patterns, and a graph convolutional network is used to reason about interactions between actors and objects. Evaluated on the challenging AVA dataset, the proposed approach improves over the I3D baseline by 5.5% mAP and over the state-of-the-art by 4.8% mAP.

 

Committee:

Martial Hebert (advisor)

Abhinav Gupta

Achal Dave