Carnegie Mellon University

Advanced Search   
  Look in
       Title    Full-text
  Date Range
VASC Seminar: (Talk 1) Kris Kitani (Talk 2) Abhinav Shrivastava (Talk 3) David Fouhey
(Talk 1) Activity Forecasting (Talk 2) Constrained Semi-Supervised Learning using Attributes and Comparative Attributes (Talk 3) People Watching: Human Actions as a Cue for Single View Geometry

(Talk 1) Kris Kitani (Talk 2) Abhinav Shrivastava (Talk 3) David Fouhey

October 01, 2012, 3pm - 4pm, NSH 1305

(Talk 1) We address the task of inferring the future actions of people from noisy visual input. We denote this task activity forecasting. To achieve accurate activity forecasting, our approach models the e ffect of the physical environment on the choice of human actions. This is accomplished by the use of state-of-the-art semantic scene understanding combined with ideas from optimal control theory.

(Talk 2) We consider the problem of semi-supervised bootstrap learning for scene categorization. Existing semi-supervised approaches are typically unreliable and face semantic drift because the learning task is under-constrained. This is primarily because they ignore the strong interactions that often exist between scene categories, such as the common attributes shared across categories as well as the attributes which make one scene different from another. The goal of this paper is to exploit these relationships and constrain the semi-supervised learning problem. For example, the knowledge that an image is an auditorium can improve labeling of amphitheaters by enforcing constraints that an amphitheater image should have more circular structures than an auditorium image. We propose constraints based on mutual exclusion, binary attributes and comparative attributes and show that they help us to constrain the learning problem and avoid semantic drift. We demonstrate the effectiveness of our approach through extensive experiments, including results on a very large dataset of one million images.

(Talk 3) We present an approach which exploits the coupling between human actions and scene geometry. We investigate the use of human pose as a cue for single-view 3D scene understanding. Our method builds upon recent advances in still-image pose estimation to extract functional and geometric constraints about the scene. These constraints are then used to improve state-of-the-art single-view 3D scene understanding approaches. The proposed method is validated on a collection of monocular time lapse sequences collected from YouTube and a dataset of still images of indoor scenes. We demonstrate that observing people performing different actions can significantly improve estimates of 3D scene geometry.

Speaker Biography

(Talk 1) Kris Kitani is a postdoctoral research fellow at the Robotics Institute. He specializes in the area of vision-based human activity analysis.

(Talk 2) Abhinav Shrivastava is a PhD student in Robotics Institute (CMU) working with Alyosha Efros, Abhinav Gupta and Martial Hebert. Before joining PhD, he finished his Masters from Robotics Institute in Dec'11. His research interests are scene understanding, object recognition and computer graphics, and large-scale machine learning techniques applied to such problems.

(Talk 3) David Fouhey is a 2nd year Ph.D. student in the Robotics Institute, supervised by Abhinav Gupta and Martial Hebert. He holds an A.B. in Computer Science from Middlebury College. His research focuses on computer vision and machine learning, and he is particularly interested in single-view scene understanding problems.