Semantic Mapping for Autonomous Navigation and Exploration - The Robotics Institute Carnegie Mellon University
Home/Semantic Mapping for Autonomous Navigation and Exploration

Semantic Mapping for Autonomous Navigation and Exploration

PhD Thesis, Tech. Report, CMU-RI-TR-21-55, August, 2021
Download Publication


The last two decades have seen enormous progress in the sensors and algorithms for 3D perception, giving robots the means to build accurate spatial maps and localize themselves in them in real time. The geometric information in these maps is invaluable for navigation while avoiding obstacles, but insufficient, by itself, for robots to robustly perform tasks according to human goals and preferences. Semantic mapping is a promising framework to provide robots with a richer representation of their environment by augmenting spatial maps with semantic labels -- in other words, a map of what is where. However, for semantic maps to fulfill their potential to improve robotic capabilities, we need systems capable of building and continuously updating these maps from noisy and ambiguous sensor streams with acceptable levels of accuracy and latency. In this thesis, we make several contributions to address these challenges and demonstrate their benefits in real-world scenarios.

First, we introduce a system for real-time semantic mapping from low-altitude aerial lidar that explicitly models the ground surface to extract more robust point cloud features. We show this approach improves the classification accuracy of relevant categories for safe navigation of an autonomous helicopter in human-populated environments.

Second, we advance the state of the art in point cloud classification by moving away from hand-engineered features with VoxNet, a novel deep learning architecture based on 3D Convolutional Neural Networks (CNNs) that learns features and classifiers directly from a volumetric representation. VoxNet outperforms various baselines for the task of mapping safe landing zones for a helicopter in cluttered terrain, and sets the state of the art in 3D object recognition benchmarks from three different domains.

Third, we develop two systems for multimodal semantic mapping with camera imagery and lidar point clouds. The first system implements a fast decoupled strategy, where image and lidar are used to infer semantic labels and elevation maps, respectively. The second system learns to fuse both modalities with a novel joint 2D/3D CNN architecture for semantic segmentation. We apply these systems to the task of off-road navigation with an autonomous all-terrain vehicle, allowing it to traverse cluttered and narrow trails in off-road environments.

Finally, we develop a lightweight semantic mapping system for Micro-Aerial Vehicles (MAVs) with payload constraints that preclude lidar and high-powered computing platforms. We propose a novel 2.5D mapping system that takes advantage of publicly available digital elevation maps and priors of object height to achieve real-time mapping of distant objects using camera imagery. We show this system enables significant time savings in the task of autonomously gathering information for semantic classes of interest with MAVs.

Overall, our work -- which spans three robotic platforms, four different tasks, and a wide range of sensing and computing capabilities -- shows that semantic mapping is a versatile and pragmatic framework to extend and improve robotic abilities.


author = {Daniel Maturana},
title = {Semantic Mapping for Autonomous Navigation and Exploration},
year = {2021},
month = {August},
school = {},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-21-55},
keywords = {Robotics, deep learning, computer vision},