Loss Landscapes and Generalization in Neural Networks: Theory and Applications

Rangamani, Akshay

Loss Landscapes and Generalization in Neural Networks: Theory and Applications

dc.contributor.advisor	Tran, Trac D
dc.contributor.committeeMember	Patel, Vishal
dc.contributor.committeeMember	Vidal, Rene
dc.creator	Rangamani, Akshay
dc.creator.orcid	0000-0002-1961-0445
dc.date.accessioned	2020-06-21T20:04:24Z
dc.date.available	2020-06-21T20:04:24Z
dc.date.created	2020-05
dc.date.issued	2020-01-21
dc.date.submitted	May 2020
dc.date.updated	2020-06-21T20:04:24Z
dc.description.abstract	In the last decade or so, deep learning has revolutionized entire domains of machine learning. Neural networks have helped achieve significant improvements in computer vision, machine translation, speech recognition, etc. These powerful empirical demonstrations leave a wide gap between our current theoretical understanding of neural networks and their practical performance. The theoretical questions in deep learning can be put under three broad but inter-related themes: 1) Architecture/Representation, 2) Optimization, and 3) Generalization. In this dissertation, we study the landscapes of different deep learning problems to answer questions in the above themes. First, in order to understand what representations can be learned by neural networks, we study simple Autoencoder networks with one hidden layer of rectified linear units. We connect autoencoders to the well-known problem in signal processing of Sparse Coding. We show that the squared reconstruction error loss function has a critical point at the ground truth dictionary under an appropriate generative model. Next, we turn our attention to a problem at the intersection of optimization and generalization. Training deep networks through empirical risk minimization is a non-convex problem with many local minima in the loss landscape. A number of empirical studies have observed that “flat minima” for neural networks tend to generalize better than sharper minima. However, quantifying the flatness or sharpness of minima has been an issue due to possible rescaling in neural networks with positively homogenous activations. We use ideas from Riemannian geometry to define a new measure of flatness that is invariant to rescaling. We test the hypothesis that flatter minima generalize better through a number of different experiments on deep networks. Finally we apply deep networks to computer vision problems with com- pressed measurements of natural images and videos. We conduct experiments to characterize the situations in which these networks fail, and those in which they succeed. We train deep networks to perform object detection and classification directly on these compressive measurements of images, without trying to reconstruct the scene first. These experiments are conducted on public datasets as well as datasets specific to a sponsor of our research.
dc.format.mimetype	application/pdf
dc.identifier.uri	http://jhir.library.jhu.edu/handle/1774.2/62471
dc.language.iso	en_US
dc.publisher	Johns Hopkins University
dc.publisher.country	USA
dc.subject	Signal Processing
dc.subject	Machine Learning
dc.subject	Deep Learning
dc.subject	Deep Neural Networks
dc.subject	Autoencoders
dc.subject	Representation Learning
dc.subject	Dictionary Learning
dc.subject	Flat Minima
dc.subject	Riemannian Geometry
dc.title	Loss Landscapes and Generalization in Neural Networks: Theory and Applications
dc.type	Thesis
dc.type.material	text
thesis.degree.department	Electrical and Computer Engineering
thesis.degree.discipline	Electrical Engineering
thesis.degree.grantor	Johns Hopkins University
thesis.degree.grantor	Whiting School of Engineering
thesis.degree.level	Doctoral
thesis.degree.name	Ph.D.

Files

Original bundle

Now showing 1 - 1 of 1

Name:: RANGAMANI-DISSERTATION-2020.pdf
Size:: 3.74 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: LICENSE.txt
Size:: 2.67 KB
Format:: Plain Text
Description:

Download

Collections

ETD -- Doctoral Dissertations