Carnegie Mellon University Robotics Institute Homepage

Carnegie Mellon University Robotics Institute Research Guide

Carnegie Mellon University, Robotics Institute, Research Guide

Computer Vision

The core vision faculty includes seven researchers. Additions since 2005 include Yaser Sheikh, whose recruitment strengthened the areas related to 3D geometry and motion as well as computer graphics, and Fernando De la Torre, whose research addresses behavior and face analysis and is closely aligned with work in the machine learning group. Researchers address all of the main areas of computer vision: object recognition and scene analysis (Efros, Hebert, Kanade); 3D geometry and reconstruction (Kanade, Sheikh); motion analysis and tracking (De la Torre, Hebert, Kanade, Sheikh); physics-based vision (Narasimhan); and analysis of 3D data (Hebert, Huber).

The pace of progress in the areas of object recognition and scene analysis has increased over the past five years in the computer vision community, owing in part to more sophisticated use of the tools from machine learning and the availability of large data sets. The CMU group has greatly expanded its activities in these areas through close collaborations with the machine learning area, as documented in the Machine Learning section. In particular, the vision group is a leading contributor to the transformation of the field in the last five years from pattern classification of image pixels to deep understanding of scenes, including geometry, context, and other physical world constraints (Efros, Hebert, Kanade). This research area has considerable potential given the unique collaborations with researchers in Machine Learning and AI. Another important development in the last five years is the access to huge amounts of visual data (images and videos), for example from web repositories, which raises new research questions and offers new opportunities for computer vision algorithms. We (Efros) are at the forefront of this transformation through major contributions to defining new ways of addressing computer vision tasks that exploit the richness of the data sets.

At the other end of the spectrum, recent years have seen a resurgence of interest in recognition tasks in which the goal is to recognize instances of a specific object rather than broad categories of objects. Interest in this problem is based on renewed interest in classic robotics problems, such as bin-picking, but also in human-centric vision tasks in which one goal is to understand a person's environment. New opportunities in this area are developing, through industry funding (Kanade) and through the QoLT ERC (Hebert), respectively, and we expect this area to grow in the future.

In the past three decades, the RI vision group has made major contributions to the problem of reconstructing the 3D geometry of a scene from multiple views. While this problem is now well addressed in the case of static environments, 3D understanding of dynamic environments poses difficult challenges, which are being addressed (Kanade, Sheikh) through an ambitious research program. This includes investigating the use of a large number of imaging sensors based on the concept that complex sensing tasks, such as dynamic 3D scene understanding, should be solved by a large number of parallel but simple perceptual processes. As cameras proliferate in society with hand-held devices, this research will enable research into large-scale sensing challenges.

All aspects of human sensing, such as detecting, tracking, and understanding peoples' faces, bodies, and activities are addressed by the vision group (Cohn, De la Torre, Kanade, Lucey, Sheikh). This research has led to important developments both in the theoretical and the algorithmic aspects, including facial expression analysis (Cohn, De ,la Torre) and body posture recovery and tracking (Kanade, Sheikh). Work on recognizing actions and activities is expanding through new approaches for recognition in videos (Hebert, Sheikh). In addition to the research products, the vision group has had major impact on the field through several databases and benchmarks, e.g., the face databases and the Grand Challenge database of human activities generated in the QoLT ERC.

The area of physics-based vision is led since Fall 2004 by Narasimhan who has developed it into three key areas: the mathematical modeling of the interactions of light with materials and the atmosphere; the design of novel cameras with higher resolution in space, color, and intensity; and the development of algorithms for rendering and interpreting scene appearance. This work contributed fundamental tools toward modeling and understanding light transport and reflection and it generated new applications in a number of fields including robotics, digital entertainment, remote sensing, and underwater imaging. Most recent work includes the development of new solutions for structured light sensing, and new display technology. This activity is central to the unique strength of RI in research activities combining computer vision and computer graphics.

As computer vision research matures, opportunities for applications and collaborations have increased. Accordingly, in addition to the basic research lines, the computer vision group collaborates with, and contributes to a large number of other areas and projects both within RI and with other parts of the University.

Video interpretation tasks, e.g., for surveillance, are traditional settings for tracking and motion analysis research. One important direction for this area is towards systems that incorporate higher level of reasoning. The vision faculty is well-positioned for this research direction, given the strong ties to colleagues in other parts of SCS. For example, the most recent effort in this area involves a collaboration between researchers in vision (Hebert, Huber, Kanade, Sheikh), machine learning (Bagnell), and AI/cognitive psychology (Lebiere).

The vision faculty continued to develop opportunities for computer vision applications in the area of mobile systems, e.g., unmanned ground vehicles, intelligent cars. For example, motion analysis and tracking techniques are instrumental in systems for intelligent driving (Kanade, Sheikh); 3D scene interpretation (Hebert, Huber) is central to the development of unmanned ground vehicles (UGVs). Existing and recent activities include industry-funded projects, and large DoD efforts in the area of UGVs, in collaboration with NREC researchers.

The vision faculty has built on new opportunities in the general area of using computer vision to enhance communication and interaction. For example, owing to progress in face analysis, new modes of interactions combining face analysis and synthesis are being investigated (Cohn, Kanade, Sheikh), at the boundary between computer vision and computer graphics. In the HRI area, research involves the use of sensor fusion and activity recognition to optimize the efficiency of industrial workcells to allow people and intelligent and dexterous machines to work together safely (Huber, Rybski).

The QoLT ERC provided a new set of challenging vision tasks in the areas of recognition and behavior understanding in the general area of assistive technologies. This area provided new opportunities for collaboration with practitioners in rehabilitation science, aging, nursing, and related fields at the University of Pittsburgh and at CMU, giving the vision researchers a better understanding of real-world applications and access to data and users studies. In the last two years, in particular, the impact of this work was evidenced by a range of programs addressing human-centric applications of major impact, e.g., participation in a NSF Expeditions for research on autism (Kanade), computer vision for sensory substitution (Sheikh), and behavior and face analysis (De la Torre).

New exciting opportunities are being pursued in biological engineering in which the expertise gained in the motion analysis and tracking areas led to development of a fully-automated computer vision-based cell tracking algorithms which can track whole populations of cells on a test chip in real-time (Kanade). This approach has the potential of transforming key aspects of biological engineering by reducing the high cost and long timelines for gathering and interpreting experimental data.

All of these activities are supported by strong ties to other units, which we continuously strengthen. Given the importance of machine learning in computer vision, strong ties have been established with faculty in this area both in RI, CSD, and MLD. For example, many of the current vision projects are joint with machine learning faculty (Bagnell, Guestrin, Gordon). In human-centric and medical activities, the involvement of colleagues in psychology, biology, and rehabilitation and related disciplines (University of Pittsburgh) is critical. In mobile systems, projects are collaborative with researchers at NREC (Stentz, Kelly). 3D mapping and inspection projects are based on long-standing collaborations with CEE in (Akinci, Bielak, Garrett). In the area of computer graphics, the Institute had identified the importance of the synergy between computer graphics and computer vision which had led by 2005 to the recruitment of Efros and Narasimhan and more recently of Sheikh. Since then, the graphics/vision has been extremely successful as evidenced by the publication record at SIGGRAPH, which is the venue of record for computer graphics publications, by the volume of media coverage and the students graduated in this area. Finally, collaboration with external organizations through adjunct appointments continued to be an important component of the computer vision work, for example, in the face and behavior analysis (Cohn, Univ. of Pittsburgh, Lucey, CSIRO) and motion and object recognition (Sukthankar, Chen, Intel).

Continue Reading: Field Robotics


  1. Fernando
    De la Torre

  2. Alexei

  3. Martial

  4. Takeo

  5. Daniel

  6. Yaser

  7. Srinivasa

Project Images

  • Facial Feature Detection

  • Data-Driven Scene Completion

  • Spatial and Temporal Features for Action Recognition

  • Automated Reverse Engineering of Buildings

  • Tracking a Large Number of Migrating and Proliferating Cells in Time-Lapse Microscopy Imagery

  • A Motion-aware Camera

  • Synthesizing Hidden Views of Moving Objects