An outdoor mobile robot, such as the Navlab, needs not only information derived from appearance (e.g., road location in a color image, or terrain type), but also shape information. In some tasks, such as cross-country navigation, the three-dimensional geometry of the environment is the most important source of information. In order to build three-dimensional representations of the environment we use an imaging laser range finder. 3-D vision for mobile robots has two objectives: object detection, and terrain analysis. Obstacle detection allows the system to locally steer the vehicle on a safe path. Terrain analysis provides a more detailed description of the environment which can be used for cross-country navigation or for object recognition.
Objects are detected from a range image by extracting the surface patches that are facing the vehicle. Neighboring patches are grouped into three-dimensional objects. The objects detected over many frames as the vehicle navigates can be combined into an object map. The resulting map can be used for navigating through the same region. Matching objects between observations is not very expensive in our case because we have only a few objects to match in each frame and because we can assume that we have a reasonable estimate of the displacement between frames from INS or dead-reckoning so that the locations of the objects detected in one image can be easily predicted in the next image. The algorithm for building object maps includes provisions for removing spurious objects and for the optimal estimation of object locations.
Object maps are not sufficient for detailed analysis. For greater accuracy we need to do more careful terrain analysis and to combine sequences of images corresponding to overlapping parts of the environment into an extended terrain map. The terrain analysis algorithm first attempts to find groups of points that belong to the same surface and then uses these groups as seeds for the region growing phase. Each group is expanded into a smooth connected surface patch. In addition, surface discontinuities are used to limit the region growing phase. This terrain representation is used in a cross-country navigation system for the Navlab.
As in the case of object descriptions, composite maps can be built from terrain descriptions. The basic problem is to match terrain features between successive images and to compute the transformation between features. In this case the features are the polygons that describe the terrain parameterized by their areas, the equation of the underlying surface, the center of the region, and the main directions of the region. If objects are detected they are also used in the matching. Finally, if the vehicle is traveling on a road, the edges of the road can also be used for the matching. As in the case of object matching, an initial estimate of the displacement between successive frames is used to predict the matching features. A search procedure is used to find the most consistent set of matches. Once a set of consistent matches is found, the transformation between frames is recomputed and the common features are merged.