Designing Convolutional Neural Networks for Urban Scene Understanding - Robotics Institute Carnegie Mellon University

Designing Convolutional Neural Networks for Urban Scene Understanding

Master's Thesis, Tech. Report, CMU-RI-TR-17-06, Robotics Institute, Carnegie Mellon University, May, 2017

Abstract

Semantic segmentation is one of the essential and fundamental problems in com- puter vision community. The task is particularly challenging when it comes to urban street scenes, where the object scales vary significantly. Recent advances in deep learning, especially deep convolutional neural networks (CNNs), have led to signif- icant improvement over previous semantic segmentation systems. In this work, we show how to improve semantic understanding of urban street scenes by manipulating convolution-related operations that are better for practical use. First, we implement dense upsampling convolution (DUC) to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bi- linear upsampling. Second, we propose a hybrid dilated convolution (HDC) framework in the encoding phase. This framework 1) effectively enlarges the receptive fields of the network to aggregate global information; 2) alleviates what we call the gridding issue caused by the standard dilated convolution operation. We evaluate our approaches thoroughly on the Cityscapes dataset, and achieve a new state-of-art result of 80.1% mIOU in the test set. We also are state-of-the-art overall on the KITTI road estimation benchmark and the PASCAL VOC2012 segmentation task.

BibTeX

@mastersthesis{Yuan-2017-22829,
author = {Ye Yuan},
title = {Designing Convolutional Neural Networks for Urban Scene Understanding},
year = {2017},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-17-06},
keywords = {Semantic Segmentation, Urban Scene Understanding, Hybrid Dilated Convolu- tion, Dense Upsampling Convolution},
}