Scene semantics from long-term observation of people

Vincent Delaitre, David Fouhey, Ivan Laptev, Josef Sivic, Abhinav Gupta, and Alexei A. Efros
European Conference on Computer Vision (ECCV), October, 2012.


Download
  • Adobe portable document format (pdf) (3MB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
Our everyday objects support various tasks and can be used by people for dirent purposes. While object classification is a widely studied topic in computer vision, recognition of object function, i.e., what people can do with an object and how they do it, is rarely addressed. In this paper we construct a functional object description with the aim to recognize objects by the way people interact with them. We describe scene objects (sofas, tables, chairs) by associated human poses and object appearance. Our model is learned discriminatively from automatically estimated body poses in many realistic scenes. In particular, we make use of time-lapse videos from YouTube providing a rich source of common human-object interactions and minimizing the ert of manual object annotation. We show how the models learned from human observations significantly improve object recognition and enable prediction of characteristic human poses in new scenes. Results are shown on a dataset of more than 400,000 frames obtained from 146 time-lapse videos of challenging and realistic indoor scenes.

Notes
Associated Center(s) / Consortia: Vision and Autonomous Systems Center

Text Reference
Vincent Delaitre, David Fouhey, Ivan Laptev, Josef Sivic, Abhinav Gupta, and Alexei A. Efros, "Scene semantics from long-term observation of people," European Conference on Computer Vision (ECCV), October, 2012.

BibTeX Reference
@inproceedings{Fouhey_2012_7249,
   author = "Vincent Delaitre and David Fouhey and Ivan Laptev and Josef Sivic and Abhinav Gupta and Alexei A. Efros",
   title = "Scene semantics from long-term observation of people",
   booktitle = "European Conference on Computer Vision (ECCV)",
   month = "October",
   year = "2012",
}