Season-Invariant Semantic Segmentation with A Deep Multimodal Network - Robotics Institute Carnegie Mellon University

Season-Invariant Semantic Segmentation with A Deep Multimodal Network

Dong-Ki Kim, Daniel Maturana, Masashi Uenoyama, and Sebastian Scherer
Conference Paper, Proceedings of 11th International Conference on Field and Service Robotics (FSR '17), pp. 255 - 270, September, 2017

Abstract

Semantic scene understanding is a useful capability for autonomous vehicles operating in off-roads. While cameras are the most common sensor used for semantic classification, the performance of methods using camera imagery may suffer when there is significant variation between the train and testing sets caused by illumination, weather, and seasonal variations. On the other hand, 3D information from active sensors such as LiDAR is comparatively invariant to these factors, which motivates us to investigate whether it can be used to improve performance in this scenario. In this paper, we propose a novel multimodal Convolutional Neural Network (CNN) architecture consisting of two streams, 2D and 3D, which are fused by projecting 3D features to image space to achieve a robust pixelwise semantic segmentation. We evaluate our proposed method in a novel off-road terrain classification benchmark, and show a 25% improvement in mean Intersection over Union (IoU) of navigation-related semantic classes, relative to an image-only baseline.

BibTeX

@conference{Scherer-2017-101807,
author = {Dong-Ki Kim and Daniel Maturana and Masashi Uenoyama and Sebastian Scherer},
title = {Season-Invariant Semantic Segmentation with A Deep Multimodal Network},
booktitle = {Proceedings of 11th International Conference on Field and Service Robotics (FSR '17)},
year = {2017},
month = {September},
pages = {255 - 270},
}