Stream sketches, sampling, and sabotage

Embargo until
Date
2015-02-04
Journal Title
Journal ISSN
Volume Title
Publisher
Johns Hopkins University
Abstract
Exact solutions are unattainable for important problems. The calculations are limited by the memory of our computers and the length of time that we can wait for a solution. The field of approximation algorithms has grown to address this problem; it is practically important and theoretically fascinating. We address three questions along these lines. What are the limits of streaming computation? Can we efficiently compute the likelihood of a given network of relationships? How robust are the solutions to combinatorial optimization problems? High speed network monitoring and rapid acquisition of scientific data require the development of space efficient algorithms. In these settings it is impractical or impossible to store all of the data, nonetheless the need for analyzing it persists. Typically, the goal is to compute some simple statistics on the input using sublinear, or even polylogarithmic, space. Our main contributions here are the complete classification of the space necessary for several types of statistics. Our sharpest results characterize the complexity in terms of the domain size and stream length. Furthermore, our algorithms are universal for their respective classes of statistics. A network of relationships, for example friendships or species-habitat pairings, can often be represented as a binary contingency table, which is {0,1}-matrix with given row and column sums. A natural null model for hypothesis testing here is the uniform distribution on the set of binary contingency tables with the same line sums as the observation. However, exact calculation, asymptotic approximation, and even Monte-Carlo approximation of p-values are so-far practically unattainable for many interesting examples. This thesis presents two new algorithms for sampling contingency tables. One is a hybrid algorithm that combines elements of two previously known algorithms. It is intended to exploit certain properties of the margins that are observed in some data sets. Our other algorithm samples from a larger set of tables, but it has the advantage of being fast. The robustness of a system can be assessed from optimal attack strategies. Interdiction problems ask about the worst-case impact of a limited change to an underlying optimization problem. Most interdiction problems are NP-hard, and furthermore, even designing efficient approximation algorithms that allow for estimating the order of magnitude of a worst-case impact has turned out to be very difficult. We suggest a general method to obtain pseudoapproximations for many interdiction problems.
Description
Keywords
approximation algorithms, streaming, sketching, interdiction, sampling contingency tables, combinatorial optimization, matroids
Citation