Matching and Inference for Multiple Correlated Data Sets

dc.contributor.advisorFishkind, Donniell E.
dc.contributor.committeeMemberPriebe, Carey E.
dc.contributor.committeeMemberTang, Minh
dc.contributor.committeeMemberVogelstein, Joshua T.
dc.contributor.committeeMemberLyzinski, Vince
dc.creatorShen, Cencheng
dc.date.accessioned2016-12-15T06:52:28Z
dc.date.available2016-12-15T06:52:28Z
dc.date.created2015-05
dc.date.issued2015-03-17
dc.date.submittedMay 2015
dc.date.updated2016-12-15T06:52:28Z
dc.description.abstractGiven multiple correlated data sets, an important question is how to make use of them to benefit later statistical inference. This is a realistic setting in the modern world as more and more related data sets are collected, say images and their descriptions, articles in multiple languages, actors in multiple social networks; and real data are often multivariate or high-dimensional such that dimension reduction is necessary before any inference. In this dissertation, I consider three dimension reduction and matching methods, namely principal component analysis followed by Procrustes matching, canonical correlation analysis, and nonlinear matching using shortest-path distance and joint neighborhood. I investigate their theoretical properties and their impact on later inference using the Procrustes fitting error, classification error, and hypothesis testing respectively. The main conclusion of this dissertation is that given a particular inference task for multiple correlated data sets, we may significantly improve the inference performance by joint matching and projection, compared to separate projection or omitting modalities. Numerical experiments are provided to illustrate the theorems and the methodology using simulated data and real data.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://jhir.library.jhu.edu/handle/1774.2/39366
dc.languageen
dc.publisherJohns Hopkins University
dc.publisher.countryUSA
dc.subjectDimension reduction
dc.subjectMachine learning
dc.subjectData matching
dc.subjectStatistical inference
dc.titleMatching and Inference for Multiple Correlated Data Sets
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentApplied Mathematics and Statistics
thesis.degree.disciplineApplied Mathematics & Statistics
thesis.degree.grantorJohns Hopkins University
thesis.degree.grantorWhiting School of Engineering
thesis.degree.levelDoctoral
thesis.degree.namePh.D.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SHEN-DISSERTATION-2015.pdf
Size:
1.19 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.68 KB
Format:
Plain Text
Description: