MSE-Optimal Neural Network Initialization via Layer Fusion

Ramina Ghods, Andrew S. Lan, Tom Goldstein, and Christoph Studer

Conference Paper, Proceedings of 54th Annual Conference on Information Sciences and Systems (CISS '20), March, 2020

Abstract

Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. However, the use of stochastic gradient descent combined with the nonconvexity of the underlying optimization problems renders parameter learning susceptible to initialization. To address this issue, a variety of methods that rely on random parameter initialization or knowledge distillation have been proposed in the past. In this paper, we propose FuseInit, a novel method to initialize shallower networks by fusing neighboring layers of deeper networks that are trained with random initialization. We develop theoretical results and efficient algorithms for mean-square error (MSE)- optimal fusion of neighboring dense-dense, convolutional-dense, and convolutional-convolutional layers. We show experiments for a range of classification and regression datasets, which suggest that deeper neural networks are less sensitive to initialization and shallower networks can perform better (sometimes as well as their deeper counterparts) if initialized with FuseInit.

BibTeX

@conference{Ghods-2020-122436,
author = {Ramina Ghods and Andrew S. Lan and Tom Goldstein and Christoph Studer},
title = {MSE-Optimal Neural Network Initialization via Layer Fusion},
booktitle = {Proceedings of 54th Annual Conference on Information Sciences and Systems (CISS '20)},
year = {2020},
month = {March},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.