Pattern Recognition on Random Graphs

Embargo until
Date
2015-03-17
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Johns Hopkins University
Abstract
The field of pattern recognition developed significantly in the 1960s, and the field of random graph inference has enjoyed much recent progress in both theory and application. This dissertation focuses on pattern recognition in the context of a particular family of random graphs, namely the stochastic blockmodels, from the two main perspectives of single graph inference and joint graph inference. Single graph inference is the performance of statistical inference on one single observed graph. Given a single graph realized from a stochastic blockmodel, we here consider the specific exploitation tasks of vertex classification, clustering, and nomination. Given an observed graph, vertex classification is the identification of the block labels of test vertices after learning from the training vertices. We propose a robust vertex classifier, which utilizes a representation of a test vertex as a sparse combination of the training vertices. Our proposed classifier is demonstrated to be robust against data contamination, and has superior performance over classical spectral-embedding classifiers in simulation and real data experiments. Vertex clustering groups vertices based on their similarities. We present a model-based clustering algorithm for graphs drawn from a stochastic blockmodel, and illustrate its usefulness on a case study in online advertising. We demonstrate the practical value of our vertex clustering method for efficiently delivering internet advertisements. Under the stochastic blockmodel framework, suppose one block is of particular interest. The task of vertex nomination is to create a nomination list so that vertices from the group of interest are concentrated abundantly near the top of the list. We present several vertex nomination schemes, and propose a vertex nomination scheme that is scalable for large graphs. We demonstrate the effectiveness of our methodology on simulation and real datasets. Next, we consider joint graph inference, which involves the joint space of multiple graphs; in this dissertation, we specifically consider joint graph inference on two graphs. Given two graphs, we consider the tasks of seeded graph matching for large graphs and joint vertex classification. Graph matching is the task of aligning two graphs so as to minimize the number of edge disagreements between them. We propose a scalable graph matching algorithm, which uses a divide-and-conquer approach to scale the state-of-the-art seeded graph matching algorithm to big graph data. We prove theoretical performance guarantees, and demonstrate desired properties such as scalability, robustness, accuracy and runtime in both simulated data and human brain connectome data. Within the joint graph inference framework, we present a case study on the paired chemical and electrical Caenorhabditis elegans neural connectomes. Motivated by the success of seeded graph matching on the paired connectomes, we propose joint vertex classification on the paired connectomes. We demonstrate that joint inference on the paired connectomes yields more accurate results than single inference on either connectome. This serves as a first step for providing a methodological and quantitative approach for understanding the coexistent significance of the chemical and electrical connectomes.
Description
Keywords
Pattern recognition, random graph, stochastic blockmodel, vertex classification, vertex clustering, vertex nomination, graph matching, joint vertex classification, single graph inference, joint graph inference.
Citation