The Robotics Institute
Search the site
RI | People | Takeo Kanade

Text only version of this site

Takeo Kanade
U.A. and Helen Whitaker University Prof., RI/CS

Associated centers: VASC and MRTC

Email address: tk@cs.cmu.edu
Office: NSH 4119
Phone: (412) 268-3016
Fax: 412-268-5570

Mailing address:
Carnegie Mellon University
Robotics Institute
5000 Forbes Avenue
Pittsburgh, PA 15213

Director, Quality of Life Technology (QoLT) Center

Face Alignment Demo -
submit your picture with faces and the program will locate faces and their parts.



[Example of result]

Virtualized RealityTM at SuperBowl XXXV!


For appointments, please contact:
Suzette A. Mongell, (412) 268 7967 (smongell@cs.cmu.edu)

For more information, see my personal homepage.

Jump to: Biography | Research interests | Keywords | Labs & groups | Projects | Publications

Biography


TK60 - Celebrating Kanade's Vision
Pittsburgh Post-Gazette article
Pittsburgh Tribune-Review article

Takeo Kanade is the U. A. and Helen Whitaker University Professor of Computer Science and Robotics, and the Director of Quality of Life Technology Engineering Research Center at Carnegie Mellon University. He received his Doctoral degree in Electrical Engineering from Kyoto University, Japan, in 1974. After holding a faculty position in the Department of Information Science, Kyoto University, he joined Carnegie Mellon University in 1980, where he was the Director of the Robotics Institute from 1992 to 2001.

Dr. Kanade works in multiple areas of robotics: computer vision, multi-media, manipulators, autonomous mobile robots, and sensors. He has written more than 250 technical papers and reports in these areas, and holds more than 15 patents. He has been the principal investigator of more than a dozen major vision and robotics projects at Carnegie Mellon.

Dr. Kanade has been elected to the National Academy of Engineering and the American Academy of Arts and Sciences. He is a Fellow of the IEEE, a Fellow of the ACM, a Founding Fellow of American Association of Artificial Intelligence (AAAI), and the former and founding editor of International Journal of Computer Vision.  The awards he received include the C&C Award, the Okawa Prize, IEEE PAMI-TC A. Rosenfeld Life Time Achievement Award, Joseph Engelberger Award, IEEE Robotics and Automation Society Pioneer Award, FIT Funai Accomplishment Award, Allen Newell Research Excellence Award, JARA Award, IEEE Computer Vision Longuet-Higgins Prize, and Marr Prize Award. Dr. Kanade has served on government, industry, and university advisory or consultant committees, including the Aeronautics and Space Engineering Board (ASEB) of the National Research Council, NASA's Advanced Technology Advisory Committee, the PITAC Panel for Transforming Healthcare Panel, and the Advisory Board of the Canadian Institute for Advanced Research.

Research interests

My research interests are in the areas of computer vision, visual and multi-media technology, and robotics. Common themes that my students and I emphasize in performing research are the formulation of sound theories which use the physical, geometrical, and semantic properties involved in perceptual and control processes in order to create intelligent machines, and the demonstration of the working systems based on these theories.

My current projects include basic research and system development in computer vision (motion, stereo and object recognition), recognition of facial expressions, virtual(ized) reality, content-based video and image retrieval, VLSI-based computational sensors, medical robotics, and an autonomous helicopter.

Computer vision

Within the Image Understanding (IU) project, my students and I are conducting basic research in interpretation and sensing for computer vision. My major thrust is the "science of computer vision." Traditionally, many computer vision algorithms were derived heuristically either by introspection or biological analogy. In contrast, my approach to vision is to transform the physical, geometrical, optical and statistical processes, which underlie vision, into mathematical and computational models. This approach results in algorithms that are far more powerful and revealing than traditional ad hoc methods based solely on heuristic knowledge. With this approach we have developed a new class of algorithms for color, stereo, motion, and texture.

The two most successful examples of this approach are the factorization method and the multi-baseline stereo method. The factorization method is for the robust recovering of shape and motion from an image sequence. Based on this theory we have been developing a system for "modeling by video taping"; a user takes a video tape of a scene or an object by either moving a camera or moving the object, and then from the video a three-dimensional model of the scene or the object is created. The multi-baseline stereo method, the second example, is a new stereo theory that uses multi-image fusion for creating a dense depth map of a natural scene. Based on this theory, a video-rate stereo machine has been developed, which can produce a 200x200 depth image at 30 frames/sec, aligned with an intensity image; in other words, a real 3D camera!!

Currently, we are working on a rapidly trainable object recognition method, a system for modeling-by-video-taping, and a multi-camera 3D object copying/reconstruction method.

Visual media technology for human-computer interaction

A combination of computer vision and computer graphics technology presents an opportunity for a new exciting visual media. We have been developing a new visual medium, named "virtualized reality." In the existing visual medium, the view of the scene is determined at the transcription time, independent of the viewer. In contrast, the virtualized reality delays the selection of the viewing angle till view time, using techniques from computer vision and computer graphics. The visual event is captured using many cameras that cover the action from all sides. The 3D structure of the event, aligned with the pixels of the image, is computed for a few selected directions using the multi-baseline stereo technique. Triangulation and texture mapping enable the placement of a soft-camera to reconstruct the event from any new viewpoint. The viewer, wearing a stereo-viewing system, can freely move about in the world and observe it from a viewpoint chosen dynamically at view time. We have built a 3D Virtualized Studio using a hemispherical dome, 5 meters in diameter, currently with 51 cameras attached at its nodes.

There are many applications of virtualized reality. Virtualized reality starts with a real world, rather than creating an artificial model of it. So, training can become safer, more real and more effective. A surgery, recorded in a virtualized reality studio, could be revisited by medical students repeatedly, viewing it from positions of their choice. Or, an entirely new generation of entertainment media can be developed - "Let's watch NBA in the court": basketball enthusiasts could watch a game from inside the court, from a referee's point of view, or even from the "ball's eye" point of view.

A Virtualized Reality application, CBS's Eye Vision, was demonstrated during SuperBowl XXXV.

Also, I am interested in and currently working on vision techniques for recognizing facial expression, gaze, and hand-finger gestures. Such techniques will provide natural non-intrusive means for human-computer interface by replacing current clumsy mechanical devices, such as datagloves.

Informedia Project

With the growth and popularity of multimedia computing technologies, video is gaining importance and broadening its uses in libraries. Digital video libraries open up great potentials for education, training and entertainment; but to achieve this potential, the information embedded within the digital video library must be easy to locate, manage and use. Searches within a large data set or lengthy video would take a user through vast amounts of material irrelevant to the search topic. The typical database, which searches by keywords (e.g. title) - where images are only referenced and not directly searched for - is not appropriate or useful for the digital video library, since it does not provide the user a way to know the contents of the image, short of viewing it. New techniques are needed to organize these vast video collections so that users can effectively retrieve and browse their holdings based on their content. The Informedia Digital Video Library, funded by NSF, ARPA, and NASA, is developing intelligent, automatic mechanisms to populate the video library and allow for a full-content knowledge-based search, retrieval and presentation of video. The distinguishing feature of Informedia's approach is the integrated application of speech, language and image understanding technologies.

Computational Sensor

While significant advancements have been made over the last 30 years of computer vision research, the consistent paradigm has been that a "camera" sees the world and a computer "algorithm" recognizes the object. I have been undertaking a project with Dr. Vladimir Brajovic that breaks away from this traditional paradigm by integrating sensing and processing into a single VLSI chip a computational sensor. The first successful example was an ultra fast range sensor which can produce approximately 1000 frames of range images per second an improvement of two orders of magnitude over the state of the art. A few new sensors are being developed including a sorting sensor chip, a 2D salient feature detector (2D winner-take-all circuits), and others.

Medical Robotics and Computer Assisted Surgery

The emerging field of Medical Robotics and Computer Assisted Surgery strives to develop smart tools to perform medical procedures better than either a physician or machine could alone. Robotic and computer-based systems are now being applied in specialties that range from neurosurgery and laparoscopy to opthalmology and family practice. Robots are able to perform precise and repeatable tasks that would be impossible for any human. The physician provides these systems with the decision making skills and adaptable dexterity that are well beyond current technology. The potential combination of robots and physicians has created a new worldwide interest in the area of medical robotics.

We have developed a new computer assisted surgical systems for total hip replacement. The work is based on biomechanics-based surgical simulations and less invasive and more accurate vision-based techniques for determining the position of the patient anatomy during a robot surgery. The developed system, HipNav, has been already test -used in clinical setting.

Vision-based Autonomous Helicopter

An unmanned helicopter can take maximum advantage of the high maneuverability of helicopters in dangerous support tasks, such as search and rescue, and fire fighting, since it does not place a human pilot in danger. The CMU Vision-Guided Helicopter Project (with Dr. Omead Amidi) has been developing the basic technologies for an unmanned autonomous helicopter including robust control methods, vision algorithms for real-time object detection and tracking, integration of GPS, motion sensors, vision output for robust positioning, and high-speed real-time hardware. After having tested various control algorithms and real-time vision algorithms using an electric helicopter on an indoor teststand, we have developed a computer controlled helicopter (4 m long), which carries two CCD cameras, GPS, gyros and accelerometers together with a multiprocessor computing system. Autonomous outdoor free flight has been demonstrated with such capabilities as following prescribed trajectory, detecting an object, and tracking or picking it from the air.

Research interest keywords

computational sensors, computer vision, human-computer interaction, medical applications, mobile robots, quality-of-life technology, and stereo vision

Current Labs & Groups [Past labs]

Biomedical Image Analysis - We are an interdisciplinary team interested in develping algorithms for biomedical image analysis under an image feature-based machine learning framework.
Computational Sensor Laboratory - We are developing specialty imaging sensors for improving robustness and capabilities of robot vision systems.
Face Group - Robust detection, recognition, and tracking of human faces with automated analysis of expressions
Helicopter Lab - A vision-guided robot helicopter which can function in any weather conditions using only on-board intelligence and computing power.
Human Identification at a Distance - We are developing and evaluating human identification technologies as part of the Defense Advanced Research Projects Agency (DARPA) sponsored program in Human Identification at a Distance (HumanID).
Human Sensing - The goal of the Human Sensing Lab is to develop new machine learning algorithms to model and understand human behavior from sensory data.
Medical Robotics and Computer Assisted Surgery - Researching planning (medical image computing, simulation) and execution (intraoperative sensing and actuation) technologies for computer-assisted surgery.
People Image Analysis Consortium - The People Image Analysis (PIA) Consortium develops and distributes technologies that process images and videos to detect, track, and understand peoples' faces, bodies, and activities.
Virtualized RealityTM - Construct views of real events from nearly any viewpoint
 

Current Projects [Past projects]

3D Head Motion Recovery in Real Time - A cylindrical model-based algorithm recovers the full motion (3D rotations and 3D translations) of the head in real time.
3D Image Overlay - X-ray vision has always been the dream of surgeons; Image Overlay is the next best thing.
A Statistical Quantification of Human Brain Asymmetry - Constructing image index features to retrieve medically similar cases from a multimedia medical database.
Accurate Camera Calibration from Planar Patterns - A novel camera calibration method can increases not only an accuracy of intrinsic camera parameters but also an accuracy of stereo camera calibration by utilizing a single framework for square, circle, and ring planar calibration patterns.
Autonomous Helicopter - Develop a vision-guided robot helicopter
Cell Tracking - We are developing fully-automated computer vision-based cell tracking algorithms and a system that automatically determines the spatiotemporal history of dense populations of cells over extended period of time.
Cohn-Kanade AU-Coded Facial Expression Database - An AU-coded database of over 2000 video sequences of over 200 subjects displaying various facial expressions.
Component Analysis for Data Analysis - Component analysis (CA) is a set of techniques to decompose a signal (e.g. audio, video) into interesting components useful for classification, clustering, modeling or visualization. This project extends traditional CA techniques and unifies them, providing a cleaner theoretical framework for its analysis.
Computer Assisted Medical Instrument Navigation - We are developing a system to help clinicians to precisely navigate various catheters inside human hearts.
Coplanar Shadowgrams for Acquiring Visual Hulls of Intricate Objects - We present a practical approach to shape-from-silhouettes using a novel technique called coplanar shadowgram imaging that allows us to use dozens to even hundreds of views for visual hull reconstruction.
Deception Detection - Learning facial indicators of deception
Dynamic Conformal Radiotherapy
EyeVision
Face Detection - We are developing computer methods to automatically locate human faces in photos and video.
Face Detection Databases - A collection of databases for training and testing face detectors.
Face Recognition Across Illumination - Recognizing people from faces: video and still iamges.
Face Video Hallucination - A learning-based approach to super-resolve human face videos.
Facial Expression Analysis - Automatic facial expression encoding, extraction and recognition, and expression intensity estimation for the applications of MPEG4 application: teleconferencing, human-computer interaction/interface.
Feature-based 3D Head Tracking - A feature-based head tracking algorithm can handle occlusions and fast motion of face.
Frontal Face Alignment - This face alignment method detects generic frontal faces with large appearance variations and 2D pose changes and identifies detailed facial structures in images.
GPU-accelerated Computer Vision - We are exploiting programmable graphics hardware to improve existing vision algorithms and enable novel approaches to robot perception.
Hand Tracking and 3-D Pose Estimation - A 2-D and 3-D model-based tracking method can track a human hand rapidly moving and deformed on complicated backgrounds and recover its 3-D pose parameters.
Human Kinematic Modeling and Motion Capture - We are developing a system for building 3D kinematic models of humans and then using the models to track the person in new video sequences.
Human Motion Transfer - We are developing a system for capturing the motion of one person and rendering a different person performing the same motion.
Informedia Digital Video Library - Informedia Digital Video Library - Informedia is pioneering new approaches for automated video and audio indexing, navigation, visualization, summarization search, and retrieval and embedding them in systems for use in education, health care, defense intelligence and understanding of human activity.
Knee Surgery Simulation - Haptic interface for simulated knee surgery and interaction with volumetric data.
Modeling by Videotape - Factorization method of solving the structure-from-motion problem
Multi-People Tracking - Our multi-people tracking method can automatically initialize and terminate paths of people and follow multiple and changeable number of people on cluttered scenes over long time intervals.
Multi-view Car Detection and Registration - This method can detect cars with occlusions and varying viewpoints from a single still images by using multi-class boosting algorithm.
Object Recognition Using Statistical Modeling - Automobile and human face detection via statistical modeling.
Perception for Humanoid Robots - Real-time perception algorithms for autonomous humanoid navigation, manipulation and interaction.
Precision Freehand Sculpting - We are developing a handheld tool to accurately cut bone for joint replacement surgery.
Quality of Life Technology Center - QoLT is a unique partnership between Carnegie Mellon and the University of Pittsburgh that brings together a cross-disciplinary team of technologists, clinicians, industry partners, end users, and other stakeholders to create revolutionary technologies that will improve and sustain the quality of life for all people.
Real-time Face Detection - A face detection system has an accurate detection rate and real time performance by using an ensemble of weak classifiers.
Reconfigurable Vision Machine - Developing new hardware and software for high performance computer vision.
Soft Tissue Simulation for Plastic Surgery
Spatio-Temporal Facial Expression Segmentation - A two-step approach temporally segment facial gestures from video sequences. It can register the rigid and non-rigid motion of the face.
Temporal Shape-From-Silhouette - We are developing algorithms for the computation of 3D shape from multiple silhouette images captured across time.

Selected publications [View all publications]


The Robotics Institute is part of the School of Computer Science, Carnegie Mellon University.
For updates and comments, please see these instructions.
This page maintained by robotwebmaster@ri.cmu.edu