Two important tasks in many computer vision applications are motion estimation and tracking of objects in video-streams. Scenarios where this is particularly difficult are those where the motion is fast, noise levels are high, and the computation needs to happen in real time. An example of such a domain is mobile robotics. In particular, three mobile robot scenarios under investigation at CMU each display typical challenges. Indoor robots are not that fast, but operate in changing and noisy environments. Autonomous vehicles operate at high speeds, and although more predictable than people in a building, perceiving and avoiding other cars presents significant perceptual challenges. Finally, an autonomous helicopter has perhaps a more predictable environment, but it must operate under high speed and cope with high noise levels.
State of the Art:
Image-based approaches to motion estimation, on the other hand, use all the information available in the image, but do not employ recursive estimation techniques to integrate those measurements over time. Presumably, it is deemed infeasible to formulate a state space representation that can accurately predict the images, nor is it clear how such a state would be updated and maintained over time. However, unlike feature-based approaches, image-based techniques do use all of the available information in one image.
The method I propose, Super-Resolved Texture Tracking [1, 2], is an attempt at using all information available in the video-stream, both in space and in time, yielding unprecedented accuracy and robustness. As with the current state of the art in feature-based motion estimation, a Kalman filter is used to formalize the problem as a recursive state estimation problem. However, to be able to use the whole image as our measurement vector, we incorporate a texture map into the system state, modeling the texture present on the surfaces that we are tracking (see Figure 1). As the measurement model, we use texture mapping, a technique from computer graphics that is normally used to render realistically looking surfaces.
The novel combination of a Kalman filter with texture mapping yields some unique advantages. In particular, the estimated texture map can be kept at an arbitrary resolution. Thus, if we keep it at a higher resolution than the source images themselves, our method can produce super-resolved texture estimates as more image measurements are taken. However, the texture map can also be kept at a lower resolution while still maintaining accurate tracking. In addition, since we can predict entire images, deviations from the prediction enables us to see what objects are incompatible with the expectations formed using our internal model. As an example, this could allow us to detect independently moving objects such as cars or people in a known environment.
|The Robotics Institute is part of the School of Computer Science, Carnegie Mellon University.|
Contact Us | Update Instructions