Correcting for the Dependence Structure in Social Networks

Embargo until
Date
2014-04-25
Journal Title
Journal ISSN
Volume Title
Publisher
Johns Hopkins University
Abstract
The use of social network data has recently become increasingly prevalent in social science research and in clinical fields. While some researchers deliberately exploit the social network structures to maximize response rates, reach hidden populations, or learn about the transmission of information or diseases from one person to another through network connections, others unintentionally sample observations from connected networks within the overall target population; the latter is especially common when samples are collected from contiguous geographic areas or similar institutions. Statistical inference from observations sampled from social networks is problematic because the observations are often inherently correlated, but this dependence is rarely adequately accounted for in statistical inference. Failing to account for the dependence between network observations has unfavorable, sometimes dangerous, consequences for inference, causing underestimated standard errors, inflated statistical significance, and high type I error rates. Throughout this work, we demonstrate the gravity of these repercussions through simulations, which entail constructing network structures resembling realistic social networks, associating independent outcomes with each subject, and generating various levels of dependence in the sample. We sample the generated outcome data to draw inferences about the population mean, incorrectly assuming independence between observations. We find that ignoring network dependence has devastating consequences for the validity of inference, and become more severe with increasing correlation: estimated coverage of 95% confidence intervals dropped as low as 33% when the sample exhibited high dependence. We suggest informal methods for quantifying and accounting for dependence in various research settings, but each with the objective of drawing valid inferences for a population mean. We demonstrate the efficacy of these methods by implementing them in all simulated dependence settings. We found that by employing these methods, we were able to attain valid, or nearly valid, inference which we assess through estimated coverage. An important objective of future work in this area is to extend these methods to allow for more general applications.
Description
Keywords
social networks, network, dependence, statistical inference
Citation