- This event has passed.

# Statistics Seminar

## October 7 @ 4:00 pm - 5:00 pm

Title: Randomly Collected, Worst Case Data

Presenter: Professor Gregory Valiant

Abstract: I’ll discuss a new framework for statistical estimation that leverages knowledge of how samples are collected but makes no distributional assumptions on the data values. Specifically, consider a population of elements 1,..,n with corresponding data values x1,..,xn. After observing the values indexed by a sample subset of indices, A, the goal will be to estimate some statistic of the entire set x1,..,xn. We make no assumptions on the values x1,…,xn, and instead assume that the sample indices are drawn according to a known distribution, P over subsets of 1,..,n. How can the distribution, P, be leveraged to minimize the worst-case expected error of the estimator, where the expectation is with respect to P and the worst-case is with respect to the data values x1,..,xn? For which distributions, P, is this error small? Within this general framework we give an efficient near-optimal algorithm for mean estimation, leveraging a surprising connection to the Grothendieck problem. We also discuss this framework in several specific settings where membership in a sample may be correlated with data values, such as when probabilities of sampling vary as in “importance sampling”, when individuals are recruited into a sample through their social networks as in “snowball/chain sampling” or when samples have chronological structure. This talk is based on joint works with Mingda Qiao, and with Justin Chen and Paul Valiant.