The Robotics Institute
RI | Seminar | April 25

Robotics Institute Seminar, April 25
Time and Place | Seminar Abstract | Speaker Biography | Speaker Appointments

Vision is a chicken-egg problem*: From visual illusions to harmonic computational geometry

Yiannis Aloimonos
University of Maryland

Time and Place

1305 Newell-Simon Hall
Refreshments 3:15 pm
Talk 3:30 pm


State of the art theories of Computational Vision call for the implementation of a set of specific steps that are well defined and happen in a sequence in order to obtain 3D representations of the visual world. For example to find 3D models of the world from multiple views of it, first we have to match the images (correspondence problem), then using the matches to solve for camera geometry (where are the cameras) and finally to develop a 3D model on the basis of triangulation. Recognition  is also characterized by similar steps. Despite tremendous progress, we are still far from general tools that would produce 3D models of space and space-time (e.g. action)  from images in an automatic manner. One reason for this is that the important problems of visual recovery are chicken-egg problems, or more formally they are compositional problems. For  the example given before, to find camera geometry we need correspondence, but if we knew something about the scene, then correspondence would be easier to solve. But the reason we want to find  the camera geometry is to obtain a 3D model of the scene. Thus, the quantities involved in making 3D models are best obtained by a set of synergistically operating processes. In formal terms,  there is feedback at all levels of the system.  With regard to our example, we will have matched the images when we will have computed almost every aspect of the sceneís 3D model.

This talk describes the new framework of feedback loops for the Vision problem, and consists of three parts:

(a)     A new uncertainty principle in visual processing is introduced. Based on statistics, this principle is so powerful that it explains (predicts) hundreds of visual illusions and demonstrates that illusions are not just a human phenomenon (any system will experience them). The uncertainty principle is one of the reasons why feedback is needed.

(b)     A new matching algorithm is proposed in the feedback loop framework; it is a matching algorithm that performs a segmentation in 3D at the same time. It is demonstrated through the process of motion segmentation (in video from a moving camera to find all independently moving objects and the background) and the nature of several loops is explained further.

(c)     Feedback comes down to the level of the images.  But how do we bridge the gap between signals and geometry? We need new mathematics, new equations relating images seen from different viewpoints to 3D models. New constraints are introduced  (the harmonic epipolar, the harmonic trifocal, and others) that relate frequencies in image patches of different views to 3D motion and structure, not points and lines as is usually done. The ultimate goal of this Harmonic Computational Geometry is to achieve 3D models using as input the output of filters applied to the images and will be an exciting field of inquiry for many years in the future.

The Feedback loop theory suggests that models of space and action are within our reach, thus calling for the formulation of a geometry of the mind.  As time permits,  the architecture of an intelligent system that uses action as a primitive will be sketched. Within that system, Vision operates as a controlled halucination process.

*: Joint work with Cornelia Fermuller and our students Patrick Baker, Ji Hui, Jan Neumann and Abhijit Ogale.

Speaker Biography

Yiannis Aloimonos studied Mathematics in Athens Greece (Dipl. 1982) and Computer Science at the University of Rochester, NY (PhD. 1987). He is currently the Director of the Computer Vision Laboratory at the Univ. of Maryland and a Professor of Computational Vision at the Dept. of Computer Science. His major interest is the relationship of action to intelligence. He is known for his work in Active Vision and Motion Analysis (trifocal constraints). He has authored and coauthored several books on Computer Vision and Visual Navigation including one on Artificial Intelligence, with Tom Dean and James Allen. The address contains a description of his research including a Socratic dialogue written at the level of a Scientific American Article. The site  describes in detail the first part of the talk, i.e. why many illusions happen. The address has pointers to a dialogue (A VISUAL DIALOGUE) among Socrates and Archimedes and Euclid, which introduces the new field of Harmonic Computational Geometry, another topic of the talk.

Speaker Appointments

For appointments, please contact Sanjiv Singh (

The Robotics Institute is part of the School of Computer Science, Carnegie Mellon University.