Loss Landscapes and Generalization in Neural Networks: Theory and Applications

dc.contributor.advisorTran, Trac D
dc.contributor.committeeMemberPatel, Vishal
dc.contributor.committeeMemberVidal, Rene
dc.creatorRangamani, Akshay
dc.creator.orcid0000-0002-1961-0445
dc.date.accessioned2020-06-21T20:04:24Z
dc.date.available2020-06-21T20:04:24Z
dc.date.created2020-05
dc.date.issued2020-01-21
dc.date.submittedMay 2020
dc.date.updated2020-06-21T20:04:24Z
dc.description.abstractIn the last decade or so, deep learning has revolutionized entire domains of machine learning. Neural networks have helped achieve significant improvements in computer vision, machine translation, speech recognition, etc. These powerful empirical demonstrations leave a wide gap between our current theoretical understanding of neural networks and their practical performance. The theoretical questions in deep learning can be put under three broad but inter-related themes: 1) Architecture/Representation, 2) Optimization, and 3) Generalization. In this dissertation, we study the landscapes of different deep learning problems to answer questions in the above themes. First, in order to understand what representations can be learned by neural networks, we study simple Autoencoder networks with one hidden layer of rectified linear units. We connect autoencoders to the well-known problem in signal processing of Sparse Coding. We show that the squared reconstruction error loss function has a critical point at the ground truth dictionary under an appropriate generative model. Next, we turn our attention to a problem at the intersection of optimization and generalization. Training deep networks through empirical risk minimization is a non-convex problem with many local minima in the loss landscape. A number of empirical studies have observed that “flat minima” for neural networks tend to generalize better than sharper minima. However, quantifying the flatness or sharpness of minima has been an issue due to possible rescaling in neural networks with positively homogenous activations. We use ideas from Riemannian geometry to define a new measure of flatness that is invariant to rescaling. We test the hypothesis that flatter minima generalize better through a number of different experiments on deep networks. Finally we apply deep networks to computer vision problems with com- pressed measurements of natural images and videos. We conduct experiments to characterize the situations in which these networks fail, and those in which they succeed. We train deep networks to perform object detection and classification directly on these compressive measurements of images, without trying to reconstruct the scene first. These experiments are conducted on public datasets as well as datasets specific to a sponsor of our research.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://jhir.library.jhu.edu/handle/1774.2/62471
dc.language.isoen_US
dc.publisherJohns Hopkins University
dc.publisher.countryUSA
dc.subjectSignal Processing
dc.subjectMachine Learning
dc.subjectDeep Learning
dc.subjectDeep Neural Networks
dc.subjectAutoencoders
dc.subjectRepresentation Learning
dc.subjectDictionary Learning
dc.subjectFlat Minima
dc.subjectRiemannian Geometry
dc.titleLoss Landscapes and Generalization in Neural Networks: Theory and Applications
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineElectrical Engineering
thesis.degree.grantorJohns Hopkins University
thesis.degree.grantorWhiting School of Engineering
thesis.degree.levelDoctoral
thesis.degree.namePh.D.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RANGAMANI-DISSERTATION-2020.pdf
Size:
3.74 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.67 KB
Format:
Plain Text
Description: