On Model-Based Semi-Supervised Clustering

dc.contributor.advisorPark, Youngser
dc.contributor.committeeMemberPriebe, Carey E.
dc.contributor.committeeMemberLyzinski, Vince
dc.contributor.committeeMemberTang, Minh
dc.creatorYoder, Jordan
dc.date.accessioned2017-07-26T17:52:03Z
dc.date.available2017-07-26T17:52:03Z
dc.date.created2016-05
dc.date.issued2016-03-16
dc.date.submittedMay 2016
dc.date.updated2017-07-26T17:52:04Z
dc.description.abstractWe consider an extension of model-based clustering to the semi-supervised case, where some of the data are pre-labeled. We provide a derivation of the Bayesian Information Criterion (BIC) approximation to the Bayes factor in this setting. We then use the BIC to the select number of clusters and the variables useful for clustering. We discuss some considerations for $O(1)$ terms in information criteria when performing model-based clustering. Next, we explore a novel method for the initialization of the EM algorithm for the semi-supervised case using modifications to the k-means++ algorithm to account for the labels. Then, we derive an improved theoretical bound on expected cost and observe improved performance in simulated and real data examples. This analysis provides theoretical justification for a typically linear time semi-supervised clustering algorithm. We show how this algorithms outperforms related semi-supervised k-means-style algorithms on several datasets. Finally, we demonstrate semi-supervised model based clustering with our improved k-means++ initialization on two applications. First, we identify behaviotypes in a fly larva dataset. Next, we nominate interesting vertices in graphs using two types of supervision.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://jhir.library.jhu.edu/handle/1774.2/40715
dc.languageen
dc.publisherJohns Hopkins University
dc.publisher.countryUSA
dc.subjectSemi-superivsed clusteringen_US
dc.subjectk-meansen_US
dc.subjectclusteringen_US
dc.subjectk-means++en_US
dc.subjectapproximation algorithmen_US
dc.subjectGMMen_US
dc.titleOn Model-Based Semi-Supervised Clustering
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentApplied Mathematics and Statistics
thesis.degree.disciplineApplied Mathematics & Statistics
thesis.degree.grantorJohns Hopkins University
thesis.degree.grantorWhiting School of Engineering
thesis.degree.levelDoctoral
thesis.degree.namePh.D.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
YODER-DISSERTATION-2016.pdf
Size:
11.4 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.68 KB
Format:
Plain Text
Description: