3:00 pm - 4:00 pm
In this talk, I will present two recent pieces of work on leveraging temporal information and synthetic data to enhance video and image understanding. In the first part, I will introduce a progressive learning framework, Spatio-TEmporalProgressive (STEP), for action detection in videos. STEP is able to more effectively make use of longer temporal information, and performs detection simply from a handful of initial proposals, while other methods rely on thousands of densely sampled anchors or an extra person detector. In the second part, I will talk about a joint discriminative and generative learning framework for person re-identification by end-to-end coupling re-id learning and image synthesis in a unified network DG-Net. There is an online interactive loop between the discriminative and generative modules to mutually benefit the two tasks in DG-Net.
Xiaodong Yang recently joined QCraft to build and lead a perception and learning team for autonomous driving. Before that, he was a Senior Research Scientist at NVIDIA Research. His general research interests are computer vision and machine learning. He has been working on large-scale image and video understanding, human activity and hand gesture recognition, dynamic facial analytics, target re-identification, deep generative models, multimedia search, assistive technology, etc. He received the B.Eng. degree from Huazhong University of Science and Technology in 2009, and the Ph.D. degree from City University of New York in 2015. He is a recipient of the best paper award from Journal of Visual Communication and Image Representation in 2015. He and his collaborators won the first place in the optical flow competition of Robust Vision Challenge at CVPR 2018. He co-organized tutorials and workshops at GTC 2019 and CVPR 2019.